ANNC User Manual

1 Introduction

Accelerated Neural Network Compiler (ANNC) speeds up neural network computing. It improves model inference for recommendation systems and foundation models by optimizing computation graphs, fusing and integrating high-performance operators, and generating efficient code. In addition, ANNC works with popular open-source inference frameworks.

2 Installing and Building ANNC

2.1 Out-of-the-Box Installation (via EUR)

bash

wget https://eur.openeuler.openatom.cn/results/lesleyzheng1103/ANNC/openeuler-22.03_LTS_SP4-aarch64/00110327-ANNC/ANNC-0.0.2-3.aarch64.rpm

#  Install the package to the / directory
rpm -ivh ANNC-0.0.2-3.aarch64.rpm

2.2 Build and Installation Using RPM (Recommended)

Run as the root user to install rpmbuild and rpmdevtools. The commands are as follows:

bash

# Install rpmbuild
yum install dnf-plugins-core rpm-build
# Install rpmdevtools
yum install rpmdevtools

Create the rpmbuild folder in the /root directory.

bash

rpmdev-setuptree
# Check the automatically generated directory structure
ls ~/rpmbuild/
BUILD  BUILDROOT  RPMS  SOURCES  SPECS  SRPMS

Use git clone -b master https://gitee.com/src-openeuler/ANNC.git to pull code from the master branch of the target repository and place the target file in the corresponding folder of rpmbuild.
shell
```
cp ANNC/*.tar.gz* ~/rpmbuild/SOURCES
cp ANNC/*.patch ~/rpmbuild/SOURCES/
cp ANNC/ANNC.spec ~/rpmbuild/SPECS/
```

Generate the RPM package of ANNC through the following steps:

bash

# Install ANNC dependencies
yum-builddep ~/rpmbuild/SPECS/ANNC.spec
# Build ANNC dependency packages
# If check-rpaths errors are reported, add QA_RPATHS=0x0002 before rpmbuild as follows
# QA_RPATHS=0x0002 rpmbuild -ba ~/rpmbuild/SPECS/ANNC.spec
rpmbuild -ba ~/rpmbuild/SPECS/ANNC.spec
# Install the RPM package
cd ~/rpmbuild/RPMS/<arch>
rpm -ivh ANNC-<version>-<release>.<arch>.rpm

Note: If file conflicts arise from older RPMs already installed on your system, address them with the following methods:

bash

# Method 1: Install the new version forcibly
rpm -ivh ANNC-<version>-<release>.<arch>.rpm --force
# Method 2: Update the installation package
rpm -Uvh ANNC-<version>-<release>.<arch>.rpm

2.3 Build and Installation Using Source Code

Obtain ANNC source code from https://gitee.com/openeuler/ANNC.

Check that the following dependencies have been installed:

shell

yum install -y gcc gcc-c++ bzip2 python3-devel python3-numpy python3-setuptools python3-wheel libstdc++-static java-11-openjdk java-11-openjdk-devel make

Download bazel-6.5.0 from https://releases.bazel.build/6.5.0/release/bazel-6.5.0-dist.zip, and install Bazel.

bash

unzip bazel-6.5.0-dist.zip -d bazel-6.5.0
cd bazel-6.5.0
env EXTRA_BAZEL_ARGS="--tool_java_runtime_version=local_jdk" bash ./compile.sh

export PATH=/path/to/bazel-6.5.0/output:$PATH
bazel --version

Prepare XNNPACK.

bash

git clone https://gitee.com/openeuler/ANNC.git
export ANNC="your_path_to_ANNC"

cd $ANNC/annc/service/cpu/xla/libs
bash xnnpack.sh

cd $ANNC/annc/service/cpu/xla/libs/XNNPACK/build
cp libXNNPACK.so /usr/lib64
export XNNPACK_BASE="$ANNC/annc/service/cpu/xla/libs"
export XNNPACK_DIR="$XNNPACK_BASE/XNNPACK"

CPLUS_INCLUDE_PATH+="$ANNC/annc/service/cpu/xla/:"
CPLUS_INCLUDE_PATH+="$ANNC/annc/service/:"
CPLUS_INCLUDE_PATH+="$XNNPACK_DIR/:"
CPLUS_INCLUDE_PATH+="$XNNPACK_DIR/include/:"
CPLUS_INCLUDE_PATH+="$XNNPACK_DIR/src/:"
CPLUS_INCLUDE_PATH+="$XNNPACK_DIR/build/pthreadpool-source/include/:"
export CPLUS_INCLUDE_PATH

Download the ANNC source package from the source code address, and install ANNC.

bash

cd $ANNC

bash build.sh

cp bazel-bin/annc/service/cpu/libannc.so /usr/lib64
mkdir -p /usr/include/annc
cp annc/service/cpu/kdnn_rewriter.h /usr/include/annc
cd python
python3 setup.py bdist_wheel
python3 -m pip install dist/*.whl

3 Usage Process

NOTE

You need to deploy TensorFlow serving (tf-serving) in advance and integrate it into the ANNC optimization extension kit through compilation options and code patches.

3.1 Graph Fusion with Hand-written Operators

Download a baseline model.

bash

git clone https://gitee.com/openeuler/sra_benchmark.git

Obtain the following target recommendation models from the baseline model library: DeepFM, DFFM, DLRM, and W&D.

Run the following command to implement graph fusion:

bash

# Install dependencies

python3 -m pip install tensorflow==2.15.1

# Execute model conversion and the DeepFM model is used as an example

annc-opt -I /path/to/model_DeepFM/1730800001/1 -O deepfm_new/1 dnn_sparse linear_sparse
cp -r /path/to/model_DeepFM/1730800001/1/variables deepfm_new/1

After executing the commands above, the output directory deepfm_new/1 should contain the newly generated model file saved_model.pbtxt. Search for KPFusedSparseEmbedding to ensure that the graph fusion operator has been correctly generated.

bash

# Go to the tf-serving directory and create a custom operator folder

cd /path/to/serving
mkdir tensorflow_serving/custom_ops

# Copy the ANNC operator to the directory

cp /usr/include/annc/fused*.cc tensorflow_serving/custom_ops/

Create the operator build file tensorflow_serving/custom_ops/BUILD and write the following content to the file:

ini

package(
   default_visibility = [
        "//visibility:public",
       ],
       licenses = ["notice"],
)

cc_library(
    name = 'recom_embedding_ops',
    srcs = [
      "fused_sparse_embedding.cc",
      "fused_linear_embedding_with_hash_bucket.cc",
      "fused_dnn_embedding_with_hash_bucket.cc"
     ],
     alwayslink = 1,
     deps = [
       "@org_tensorflow//tensorflow/core:framework",
     ]
)

bash

# Open tensorflow_serving/model_servers/BUILD, search for SUPPORTED_TENSORFLOW_OPS, and add the following content to register the operator

"//tensorflow_serving/custom_ops:recom_embedding_ops"

After the operator is registered, run the following command to rebuild tf-serving. After successful rebuilding, the operator is successfully registered.

bash

bazel --output_user_root=./output build -c opt --distdir=./proxy \
   --define tflite_with_xnnpack=false \
   tensorflow_serving/model_servers:tensorflow_model_server

3.2 Enabling Operator Optimization and Graph Optimization

In the TensorFlow XLA path of the built server, use the patch script to enable the following patches:

bash

export TF_PATH="$HOME/serving/output/XXX/external/org_tensorflow"
export XLA_PATH="$HOME/serving/output/XXX/external/org_tensorflow/third_party/xla"

# ANNC installed using method 1
cd /usr/include/annc/tfserver/xla

# Modify the first two lines of xla2.sh as follows
TF_PATCH_PATH="$ANNC" 
PATH_OF_PATCHES="$ANNC/xla"
export ANNC_PATH=/usr/include/annc
bash xla2.sh

# ANNC installed using method 2
cd $ANNC/install/tfserver/xla
export ANNC_PATH=$ANNC
bash xla2.sh

# Recompile
bazel --output_user_root=./output build -c opt --distdir=./proxy \
   --define tflite_with_xnnpack=false \
   tensorflow_serving/model_servers:tensorflow_model_server

3.3 Graph Optimization

Set environment variables and enable the optimization feature.

bash

export 'TF_XLA_FLAGS=--tf_xla_auto_jit=2 --tf_xla_cpu_global_jit --tf_xla_min_cluster_size=16'
export OMP_NUM_THREADS=1
export PORT=7004 #Port number
ANNC_FLAGS="--graph-opt" ENABLE_BISHENG_GRAPH_OPT="" ./bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server 
--port=$PORT --rest_api_port=7005 
--model_base_path=/path/to/model_Boss/ 
--model_name=deepfm 
--tensorflow_intra_op_parallelism=1 --tensorflow_inter_op_parallelism=-1 
--xla_cpu_compilation_enabled=true

3.4 Operator Optimization

Configure the environment variable ANNC_FLAGS to enable MatMul offloading and leverage OpenBLAS optimizations. Then start TF-Serving and specify the target model.

bash

export 'TF_XLA_FLAGS=--tf_xla_auto_jit=2 --tf_xla_cpu_global_jit --tf_xla_min_cluster_size=16'
export OMP_NUM_THREADS=1
export PORT=7004 #Port number
ANNC_FLAGS="--gemm-opt"  XLA_FLAGS="--xla_cpu_enable_xnnpack=true" ./bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server \
    --port=$PORT --rest_api_port=7005 \
    --model_base_path=/path/to/model_DeepFM/1730800001/ \
    --model_name=deepfm \
    --tensorflow_intra_op_parallelism=1 --tensorflow_inter_op_parallelism=-1 \
    --xla_cpu_compilation_enabled=true

3.5 Remapper (TensorFlow) Graph Fusion Optimization

This feature is developed based on the native TensorFlow framework and invokes the ANNC graph optimizer in the Remapper optimizer to perform graph fusion optimization.

Step 1: Download TensorFlow 2.15

bash

git clone https://gitee.com/mirrors/tensorflow.git -b v2.15.0

Step 2: Apply the ANNC patch and fusion operators

bash

export ANNC="your_path_to_ANNC"

cd tensorflow
patch -p1 < $ANNC/annc/tensorflow/tf_annc_optimizer.patch
cp -r $ANNC/annc/tensorflow/graph_optimizer ./tensorflow/core/grappler/optimizers/
cp $ANNC/annc/tensorflow/kernels/* ./tensorflow/core/kernels/
cp $ANNC/annc/tensorflow/ops/* ./tensorflow/core/ops/
cp $ANNC/annc/tensorflow/api_def/* ./tensorflow/core/api_def/python_api/
cp $ANNC/annc/tensorflow/api_def/* ./tensorflow/core/api_def/base_api/

Step 3: Compile TensorFlow

bash

bazel build --config=v2 --config=xla --config=noaws --distdir=./proxy //tensorflow:tensorflow_cc

cd ./bazel-bin/tensorflow

ln -s libtensorflow_framework.so.2.15.0 libtensorflow_framework.so.2
ln -s libtensorflow_framework.so.2 libtensorflow_framework.so
ln -s libtensorflow_cc.so.2.15.0 libtensorflow_cc.so.2
ln -s libtensorflow_cc.so.2 libtensorflow_cc.so

# Configure environment variables
export LD_LIBRARY_PATH=/path_to_tensorflow/bazel-bin/tensorflow

Step 4: Enable graph fusion optimization

bash

# When using TensorFlow for inference, enable the optimization option
ANNC_FUASED_ALL = 1

ANNC User Manual ​

1 Introduction ​

2 Installing and Building ANNC ​

2.1 Out-of-the-Box Installation (via EUR) ​

2.2 Build and Installation Using RPM (Recommended) ​

2.3 Build and Installation Using Source Code ​

3 Usage Process ​

3.1 Graph Fusion with Hand-written Operators ​

3.2 Enabling Operator Optimization and Graph Optimization ​

3.3 Graph Optimization ​

3.4 Operator Optimization ​

3.5 Remapper (TensorFlow) Graph Fusion Optimization ​

Step 1: Download TensorFlow 2.15 ​

Step 2: Apply the ANNC patch and fusion operators ​

Step 3: Compile TensorFlow ​

Step 4: Enable graph fusion optimization ​

ANNC User Manual

1 Introduction

2 Installing and Building ANNC

2.1 Out-of-the-Box Installation (via EUR)

2.2 Build and Installation Using RPM (Recommended)

2.3 Build and Installation Using Source Code

3 Usage Process

3.1 Graph Fusion with Hand-written Operators

3.2 Enabling Operator Optimization and Graph Optimization

3.3 Graph Optimization

3.4 Operator Optimization

3.5 Remapper (TensorFlow) Graph Fusion Optimization

Step 1: Download TensorFlow 2.15

Step 2: Apply the ANNC patch and fusion operators

Step 3: Compile TensorFlow

Step 4: Enable graph fusion optimization