Server

Version: 24.03 LTS SP3

ANNC User Manual

1 Introduction

Accelerated Neural Network Compiler (ANNC) speeds up neural network computing. It improves model inference for recommendation systems and foundation models by optimizing computation graphs, fusing and integrating high-performance operators, and generating efficient code. In addition, ANNC works with popular open-source inference frameworks.

2 Installing and Building ANNC

2.1 Out-of-the-Box Installation (via EUR)

bash
wget https://eur.openeuler.openatom.cn/results/lesleyzheng1103/ANNC/openeuler-22.03_LTS_SP4-aarch64/00110327-ANNC/ANNC-0.0.2-3.aarch64.rpm

#  Install the package to the / directory
rpm -ivh ANNC-0.0.2-3.aarch64.rpm
  1. Run as the root user to install rpmbuild and rpmdevtools. The commands are as follows:

    bash
    # Install rpmbuild
    yum install dnf-plugins-core rpm-build
    # Install rpmdevtools
    yum install rpmdevtools
  2. Create the rpmbuild folder in the /root directory.

    bash
    rpmdev-setuptree
    # Check the automatically generated directory structure
    ls ~/rpmbuild/
    BUILD  BUILDROOT  RPMS  SOURCES  SPECS  SRPMS
  3. Use git clone -b master https://gitee.com/src-openeuler/ANNC.git to pull code from the master branch of the target repository and place the target file in the corresponding folder of rpmbuild.

    shell
    cp ANNC/*.tar.gz* ~/rpmbuild/SOURCES
    cp ANNC/*.patch ~/rpmbuild/SOURCES/
    cp ANNC/ANNC.spec ~/rpmbuild/SPECS/
  4. Generate the RPM package of ANNC through the following steps:

    bash
    # Install ANNC dependencies
    yum-builddep ~/rpmbuild/SPECS/ANNC.spec
    # Build ANNC dependency packages
    # If check-rpaths errors are reported, add QA_RPATHS=0x0002 before rpmbuild as follows
    # QA_RPATHS=0x0002 rpmbuild -ba ~/rpmbuild/SPECS/ANNC.spec
    rpmbuild -ba ~/rpmbuild/SPECS/ANNC.spec
    # Install the RPM package
    cd ~/rpmbuild/RPMS/<arch>
    rpm -ivh ANNC-<version>-<release>.<arch>.rpm

    Note: If file conflicts arise from older RPMs already installed on your system, address them with the following methods:

    bash
    # Method 1: Install the new version forcibly
    rpm -ivh ANNC-<version>-<release>.<arch>.rpm --force
    # Method 2: Update the installation package
    rpm -Uvh ANNC-<version>-<release>.<arch>.rpm

2.3 Build and Installation Using Source Code

Obtain ANNC source code from https://gitee.com/openeuler/ANNC.

Check that the following dependencies have been installed:

shell
yum install -y gcc gcc-c++ bzip2 python3-devel python3-numpy python3-setuptools python3-wheel libstdc++-static java-11-openjdk java-11-openjdk-devel make

Download bazel-6.5.0 from https://releases.bazel.build/6.5.0/release/bazel-6.5.0-dist.zip, and install Bazel.

bash
unzip bazel-6.5.0-dist.zip -d bazel-6.5.0
cd bazel-6.5.0
env EXTRA_BAZEL_ARGS="--tool_java_runtime_version=local_jdk" bash ./compile.sh

export PATH=/path/to/bazel-6.5.0/output:$PATH
bazel --version

Prepare XNNPACK.

bash
git clone https://gitee.com/openeuler/ANNC.git
export ANNC="your_path_to_ANNC"

cd $ANNC/annc/service/cpu/xla/libs
bash xnnpack.sh

cd $ANNC/annc/service/cpu/xla/libs/XNNPACK/build
cp libXNNPACK.so /usr/lib64
export XNNPACK_BASE="$ANNC/annc/service/cpu/xla/libs"
export XNNPACK_DIR="$XNNPACK_BASE/XNNPACK"

CPLUS_INCLUDE_PATH+="$ANNC/annc/service/cpu/xla/:"
CPLUS_INCLUDE_PATH+="$ANNC/annc/service/:"
CPLUS_INCLUDE_PATH+="$XNNPACK_DIR/:"
CPLUS_INCLUDE_PATH+="$XNNPACK_DIR/include/:"
CPLUS_INCLUDE_PATH+="$XNNPACK_DIR/src/:"
CPLUS_INCLUDE_PATH+="$XNNPACK_DIR/build/pthreadpool-source/include/:"
export CPLUS_INCLUDE_PATH

Download the ANNC source package from the source code address, and install ANNC.

bash
cd $ANNC

bash build.sh

cp bazel-bin/annc/service/cpu/libannc.so /usr/lib64
mkdir -p /usr/include/annc
cp annc/service/cpu/kdnn_rewriter.h /usr/include/annc
cd python
python3 setup.py bdist_wheel
python3 -m pip install dist/*.whl

3 Usage Process

NOTE

You need to deploy TensorFlow serving (tf-serving) in advance and integrate it into the ANNC optimization extension kit through compilation options and code patches.

3.1 Graph Fusion with Hand-written Operators

Download a baseline model.

bash
git clone https://gitee.com/openeuler/sra_benchmark.git

Obtain the following target recommendation models from the baseline model library: DeepFM, DFFM, DLRM, and W&D.

Run the following command to implement graph fusion:

bash
# Install dependencies

python3 -m pip install tensorflow==2.15.1

# Execute model conversion and the DeepFM model is used as an example

annc-opt -I /path/to/model_DeepFM/1730800001/1 -O deepfm_new/1 dnn_sparse linear_sparse
cp -r /path/to/model_DeepFM/1730800001/1/variables deepfm_new/1

After executing the commands above, the output directory deepfm_new/1 should contain the newly generated model file saved_model.pbtxt. Search for KPFusedSparseEmbedding to ensure that the graph fusion operator has been correctly generated.

Register the open-source operator library provided by ANNC with tf-serving.

bash
# Go to the tf-serving directory and create a custom operator folder

cd /path/to/serving
mkdir tensorflow_serving/custom_ops

# Copy the ANNC operator to the directory

cp /usr/include/annc/fused*.cc tensorflow_serving/custom_ops/

Create the operator build file tensorflow_serving/custom_ops/BUILD and write the following content to the file:

ini
package(
   default_visibility = [
        "//visibility:public",
       ],
       licenses = ["notice"],
)

cc_library(
    name = 'recom_embedding_ops',
    srcs = [
      "fused_sparse_embedding.cc",
      "fused_linear_embedding_with_hash_bucket.cc",
      "fused_dnn_embedding_with_hash_bucket.cc"
     ],
     alwayslink = 1,
     deps = [
       "@org_tensorflow//tensorflow/core:framework",
     ]
)
bash
# Open tensorflow_serving/model_servers/BUILD, search for SUPPORTED_TENSORFLOW_OPS, and add the following content to register the operator

"//tensorflow_serving/custom_ops:recom_embedding_ops"

After the operator is registered, run the following command to rebuild tf-serving. After successful rebuilding, the operator is successfully registered.

bash
bazel --output_user_root=./output build -c opt --distdir=./proxy \
   --define tflite_with_xnnpack=false \
   tensorflow_serving/model_servers:tensorflow_model_server

3.2 Enabling Operator Optimization and Graph Optimization

In the TensorFlow XLA path of the built server, use the patch script to enable the following patches:

bash
export TF_PATH="$HOME/serving/output/XXX/external/org_tensorflow"
export XLA_PATH="$HOME/serving/output/XXX/external/org_tensorflow/third_party/xla"

# ANNC installed using method 1
cd /usr/include/annc/tfserver/xla

# Modify the first two lines of xla2.sh as follows
TF_PATCH_PATH="$ANNC" 
PATH_OF_PATCHES="$ANNC/xla"
export ANNC_PATH=/usr/include/annc
bash xla2.sh

# ANNC installed using method 2
cd $ANNC/install/tfserver/xla
export ANNC_PATH=$ANNC
bash xla2.sh

# Recompile
bazel --output_user_root=./output build -c opt --distdir=./proxy \
   --define tflite_with_xnnpack=false \
   tensorflow_serving/model_servers:tensorflow_model_server

3.3 Graph Optimization

Set environment variables and enable the optimization feature.

bash
export 'TF_XLA_FLAGS=--tf_xla_auto_jit=2 --tf_xla_cpu_global_jit --tf_xla_min_cluster_size=16'
export OMP_NUM_THREADS=1
export PORT=7004 #Port number
ANNC_FLAGS="--graph-opt" ENABLE_BISHENG_GRAPH_OPT="" ./bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server 
--port=$PORT --rest_api_port=7005 
--model_base_path=/path/to/model_Boss/ 
--model_name=deepfm 
--tensorflow_intra_op_parallelism=1 --tensorflow_inter_op_parallelism=-1 
--xla_cpu_compilation_enabled=true

3.4 Operator Optimization

Configure the environment variable ANNC_FLAGS to enable MatMul offloading and leverage OpenBLAS optimizations. Then start TF-Serving and specify the target model.

bash
export 'TF_XLA_FLAGS=--tf_xla_auto_jit=2 --tf_xla_cpu_global_jit --tf_xla_min_cluster_size=16'
export OMP_NUM_THREADS=1
export PORT=7004 #Port number
ANNC_FLAGS="--gemm-opt"  XLA_FLAGS="--xla_cpu_enable_xnnpack=true" ./bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server \
    --port=$PORT --rest_api_port=7005 \
    --model_base_path=/path/to/model_DeepFM/1730800001/ \
    --model_name=deepfm \
    --tensorflow_intra_op_parallelism=1 --tensorflow_inter_op_parallelism=-1 \
    --xla_cpu_compilation_enabled=true

3.5 Remapper (TensorFlow) Graph Fusion Optimization

This feature is developed based on the native TensorFlow framework and invokes the ANNC graph optimizer in the Remapper optimizer to perform graph fusion optimization.

Step 1: Download TensorFlow 2.15

bash
git clone https://gitee.com/mirrors/tensorflow.git -b v2.15.0

Step 2: Apply the ANNC patch and fusion operators

bash
export ANNC="your_path_to_ANNC"

cd tensorflow
patch -p1 < $ANNC/annc/tensorflow/tf_annc_optimizer.patch
cp -r $ANNC/annc/tensorflow/graph_optimizer ./tensorflow/core/grappler/optimizers/
cp $ANNC/annc/tensorflow/kernels/* ./tensorflow/core/kernels/
cp $ANNC/annc/tensorflow/ops/* ./tensorflow/core/ops/
cp $ANNC/annc/tensorflow/api_def/* ./tensorflow/core/api_def/python_api/
cp $ANNC/annc/tensorflow/api_def/* ./tensorflow/core/api_def/base_api/

Step 3: Compile TensorFlow

bash
bazel build --config=v2 --config=xla --config=noaws --distdir=./proxy //tensorflow:tensorflow_cc

cd ./bazel-bin/tensorflow

ln -s libtensorflow_framework.so.2.15.0 libtensorflow_framework.so.2
ln -s libtensorflow_framework.so.2 libtensorflow_framework.so
ln -s libtensorflow_cc.so.2.15.0 libtensorflow_cc.so.2
ln -s libtensorflow_cc.so.2 libtensorflow_cc.so

# Configure environment variables
export LD_LIBRARY_PATH=/path_to_tensorflow/bazel-bin/tensorflow

Step 4: Enable graph fusion optimization

bash
# When using TensorFlow for inference, enable the optimization option
ANNC_FUASED_ALL = 1