ANNC User Manual

1 Introduction

Accelerated Neural Network Compiler (ANNC) speeds up neural network computing. It improves model inference for recommendation systems and foundation models by optimizing computation graphs, fusing and integrating high-performance operators, and generating efficient code. In addition, ANNC works with popular open-source inference frameworks.

2 Installing and Building ANNC

2.1 Out-of-the-Box Installation

With openEuler system 24.03-LTS-SP2 (the latest release), you can directly install the ANNC package.

shell

yum install -y ANNC

If you need to use features from other ANNC versions or install ANNC on other OSs, rebuild ANNC as follows:

2.2 Build and Installation Using RPM (Recommended)

Run as the root user to install rpmbuild and rpmdevtools. The commands are as follows:

bash

# Install rpmbuild
yum install dnf-plugins-core rpm-build
# Install rpmdevtools
yum install rpmdevtools

Create the rpmbuild folder in the /root directory.

bash

rpmdev-setuptree
# Check the automatically generated directory structure
ls ~/rpmbuild/
BUILD  BUILDROOT  RPMS  SOURCES  SPECS  SRPMS

Run the git clone -b openEuler-24.03-LTS-SP2 https://atomgit.com/src-openeuler/ANNC.git command to pull the code from the openEuler-24.03-LTS-SP2 branch of the target repository and save the target file to rpmbuild folder.
shell
```
cp ANNC/*.tar.gz* ~/rpmbuild/SOURCES
cp ANNC/*.patch ~/rpmbuild/SOURCES/
cp ANNC/ANNC.spec ~/rpmbuild/SPECS/
```

Generate the RPM package of ANNC through the following steps:

bash

# Install ANNC dependencies
yum-builddep ~/rpmbuild/SPECS/ANNC.spec
# Build ANNC dependency packages
# If check-rpaths errors are reported, add QA_RPATHS=0x0002 before rpmbuild as follows
# QA_RPATHS=0x0002 rpmbuild -ba ~/rpmbuild/SPECS/ANNC.spec
rpmbuild -ba ~/rpmbuild/SPECS/ANNC.spec
# Install the RPM package
cd ~/rpmbuild/RPMS/<arch>
rpm -ivh ANNC-<version>-<release>.<arch>.rpm

Note: If file conflicts arise from older RPMs already installed on your system, address them with the following methods:

bash

# Method 1: Install the new version forcibly
rpm -ivh ANNC-<version>-<release>.<arch>.rpm --force
# Method 2: Update the installation package
rpm -Uvh ANNC-<version>-<release>.<arch>.rpm

2.3 Build and Installation Using Source Code

Obtain ANNC source code from https://atomgit.com/openeuler/ANNC.

Check that the following dependencies have been installed:

shell

yum install -y gcc gcc-c++ bzip2 python3-devel python3-numpy python3-setuptools python3-wheel libstdc++-static java-11-openjdk java-11-openjdk-devel make

Download bazel-6.5.0 from https://releases.bazel.build/6.5.0/release/bazel-6.5.0-dist.zip, and install Bazel.

bash

unzip bazel-6.5.0-dist.zip -d bazel-6.5.0
cd bazel-6.5.0
env EXTRA_BAZEL_ARGS="--tool_java_runtime_version=local_jdk" bash ./compile.sh

export PATH=/path/to/bazel-6.5.0/output:$PATH
bazel --version

Download the ANNC source package from the source code address, and install ANNC.

bash

git clone https://atomgit.com/openeuler/ANNC.git

cd ANNC

bazel --output_user_root=./output build -c opt \
    --copt="-DANNC_ENABLE_GRAPH_OPT" \
    --copt="-DANNC_ENABLE_OPENBLAS" \
    annc/service/cpu:libannc.so

cp bazel-bin/annc/service/cpu/libannc.so /usr/lib64
mkdir -p /usr/include/annc
cp annc/service/cpu/kdnn_rewriter.h /usr/include/annc
cd python
python3 setup.py bdist_wheel
python3 -m pip install dist/*.whl

3 Usage Process

Note:

You need to deploy TensorFlow serving (tf-serving) in advance and integrate it into the ANNC optimization extension kit through compilation options and code patches.

3.1 Graph Fusion with Hand-written Operators

Download a baseline model.

bash

git clone https://atomgit.com/openeuler/sra_benchmark.git

Obtain the following target recommendation models from the baseline model library: DeepFM, DFFM, DLRM, and W&D.

Implement graph fusion using the command.

bash

# Install dependencies

python3 -m pip install tensorflow==2.15.1

# Execute model conversion and the DeepFM model is used as an example

annc-opt -I /path/to/model_DeepFM/1730800001/1 -O deepfm_new/1 dnn_sparse linear_sparse
cp -r /path/to/model_DeepFM/1730800001/1/variables deepfm_new/1

A new model file saved_model.pbtxt is generated in the output directory deepfm_new/1. Search for KPFusedSparseEmbedding to ensure that the graph fusion operator is correctly generated.

bash

# Go to the tf-serving directory and create a custom operator folder

cd /path/to/serving
mkdir tensorflow_serving/custom_ops

# Copy the ANNC operator to the directory

cp /usr/include/annc/fused*.cc tensorflow_serving/custom_ops/

Create the operator build file tensorflow_serving/custom_ops/BUILD and write the following content to the file:

ini

package(
   default_visibility = [
        "//visibility:public",
       ],
       licenses = ["notice"],
)

cc_library(
    name = 'recom_embedding_ops',
    srcs = [
      "fused_sparse_embedding.cc",
      "fused_linear_embedding_with_hash_bucket.cc",
      "fused_dnn_embedding_with_hash_bucket.cc"
     ],
     alwayslink = 1,
     deps = [
       "@org_tensorflow//tensorflow/core:framework",
     ]
)

bash

# Open **tensorflow_serving/model_servers/BUILD**, search for **SUPPORTED_TENSORFLOW_OPS**, and add the following content to register the operator:

"//tensorflow_serving/custom_ops:recom_embedding_ops"

After the operator is registered, run the following command to rebuild tf-serving. After successful rebuilding, the operator is successfully registered.

bash

 bazel --output_user_root=./output build -c opt --distdir=./proxy \
   tensorflow_serving/model_servers:tensorflow_model_server

3.2 Automatic Graph Fusion and Operator Fusion

Go to the tf-serving directory and rebuild tf-serving with the ANNC enabled. The options are as follows:

bash

bazel --output_user_root=./output build -c opt --distdir=./proxy \
   --copt=-DANNC_ENABLED_KDNN \
   tensorflow_serving/model_servers:tensorflow_model_server

Start tf-serving, specify the target model, and check that the service is started properly.

3.3 Operator Memory and Layout Optimization

Go to the tf-serving directory, apply the related patch, and rebuild tf-serving.

bash

# Apply the related patch, which is stored in the ANNC directory.

export ANNC_PATH=/path/to/ANNC
cp $ANNC_PATH/tfserver/llvm/llvm.sh /path/to/serving/output/{id}/external/llvm-raw
cp ANNC_PATH/tfserver/xla/xla.sh /path/to/serving/output/{id}/external/org_tensorflow/third_party/xla

cd path/to/serving/output/{id}/external/llvm-raw
bash ./llvm.sh
cd /path/to/serving/output/{id}/external/org_tensorflow/third_party/xla
bash ./xla.sh

# Recompile

bazel --output_user_root=./output build -c opt --distdir=./proxy \
   --copt=-DANNC_ENABLED_KDNN \
   tensorflow_serving/model_servers:tensorflow_model_server

Set environment variables and enable the optimization feature.

bash

export XLA_FLAGS="--xla_cpu_use_xla_runtime=true --xla_cpu_enable_concat_optimization=true --xla_cpu_enable_output_tensor_reuse=true --xla_cpu_enable_mlir_tiling_and_fusion=true"

3.4 Operator Selection and Operator Library Integration

Go to the tf-serving directory and rebuild tf-serving.

bash

cd ./output/{id}/external/org_tensorflow
patch -p1 < /usr/include/tensorflow.patch

# Enable MatMul operator offloading and operator library integration compilation options

cd /path/to/serving
bazel --output_user_root=./output build -c opt --distdir=./proxy \
   --copt=-DANNC_ENABLED_KDNN \
   --copt=-DDISABLE_TF_MATMUL_FUSION \
   tensorflow_serving/model_servers:tensorflow_model_server

ANNC User Manual ​

1 Introduction ​

2 Installing and Building ANNC ​

2.1 Out-of-the-Box Installation ​

2.2 Build and Installation Using RPM (Recommended) ​

2.3 Build and Installation Using Source Code ​

3 Usage Process ​

3.1 Graph Fusion with Hand-written Operators ​

3.2 Automatic Graph Fusion and Operator Fusion ​

3.3 Operator Memory and Layout Optimization ​

3.4 Operator Selection and Operator Library Integration ​

ANNC User Manual

1 Introduction

2 Installing and Building ANNC

2.1 Out-of-the-Box Installation

2.2 Build and Installation Using RPM (Recommended)

2.3 Build and Installation Using Source Code

3 Usage Process

3.1 Graph Fusion with Hand-written Operators

3.2 Automatic Graph Fusion and Operator Fusion

3.3 Operator Memory and Layout Optimization

3.4 Operator Selection and Operator Library Integration