AI4C User Guide

1 Introduction to AI4C

AI4C is a suite of AI-assisted compilers, which is a framework that enables compilers to integrate machine learning-driven compilation optimization.

2 Software Architecture

This framework consists of the following modules. The automatic compilation tuning tool depends on the Python environment.

The inference engine for AI-assisted compilation optimization drives the compiler to use the results obtained from AI model inference in the optimization pass to implement compilation optimization.
- Currently, the AI-enabled optimization pass in GCC is implemented in the form of compiler plugins and is decoupled from the main version of the compiler.
The automatic compilation tuning tool uses the external tuning tool (OpenTuner) to drive the compiler to perform multi-layer automatic compilation tuning. Currently, the GCC and LLVM compilers are supported.
- Option tuning tool, which is used to optimize application-level compilation options.
- Compilation tuning tool, which is implemented based on Autotuner and can implement fine-grained and coarse-grained compilation tuning.
  - Fine-grained tuning: tunes key optimization parameters in the optimization pass, for example, the number of times that a loop is unrolled (unroll count).
  - Coarse-grained tuning: tunes function-level compilation options.

Future planning:

[ ] Integrate the LLVM compilation optimization model of ACPO and extract the related code of ACPO LLVM into a plugin to decouple it from the main version of LLVM.
[ ] Enable the AI4C framework to support inference of more open-source machine learning frameworks (PyTorch-LibTorch and TensorFlow-LiteRT).
[ ] Provide more AI-assisted compilation optimization models and corresponding compiler plugins.
[ ] Integrate new search algorithms (based on white-box information) and optimize the parameter search space (tuning of hotspot functions).
[ ] Support tuning of JDK compilation parameters.

3 Installation and Build of AI4C

3.1 Directly Installing AI4C

If you are using the latest openEuler system (24.03-LTS-SP2) and only want to use the existing features of AI4C, you can directly install the AI4C package.

shell

yum install -y AI4C

If you use the AI4C feature of another version or install AI4C on another OS, you need to rebuild AI4C. Perform the following steps:

3.2 (Recommended) RPM Package Build and Installation Process

Run as the root user to install rpmbuild and rpmdevtools. The commands are as follows.

bash

# Install rpmbuild:
yum install dnf-plugins-core rpm-build
# Install rpmdevtools:
yum install rpmdevtools

Generate the rpmbuild folder in the /root directory:

bash

rpmdev-setuptree
# Check the automatically generated directory structure:
ls ~/rpmbuild/
BUILD  BUILDROOT  RPMS  SOURCES  SPECS  SRPMS

Run the git clone https://atomgit.com/src-openeuler/AI4C.git command to pull code from the openEuler-24.03-LTS-SP2 branch of the target repository and save the target files to the corresponding folder in rpmbuild.
shell
```
cp AI4C/AI4C-v%{version}-alpha.tar.gz ~/rpmbuild/SOURCES/
cp AI4C/*.patch ~/rpmbuild/SOURCES/
cp AI4C/AI4C.spec ~/rpmbuild/SPECS/
```
You can perform the following steps to generate the RPM package of AI4C:
shell
```
# Install the dependencies required by AI4C.
yum-builddep ~/rpmbuild/SPECS/AI4C.spec
# Build the AI4C dependency package.
# If **check-rpaths** errors are reported, add **QA_RPATHS=0x0002** before **rpmbuild** as follows:
# QA_RPATHS=0x0002 rpmbuild -ba ~/rpmbuild/SPECS/AI4C.spec
rpmbuild -ba ~/rpmbuild/SPECS/AI4C.spec
# Install the RPM package.
cd ~/rpmbuild/RPMS/<arch>
rpm -ivh AI4C-<version>-<release>.<arch>.rpm
```
Note: If file conflicts arise from older RPMs already installed on your system, address them with the following methods:
shell
```
# Method 1: Install the new version forcibly.
rpm -ivh AI4C-<version>-<release>.<arch>.rpm --force
# Method 2: Update the installation package.
rpm -Uvh AI4C-<version>-<release>.<arch>.rpm
```
After the installation is complete, the following files will be generated in the system:
- /usr/bin/ai4c-*: wrapper of the AI-enabled compiler and automatic tuning tool
- /usr/lib64/libonnxruntime.so: dynamic library of the ONNX Runtime inference framework
- /usr/lib64/AI4C/*.onnx: AI-assisted compilation optimization model (in ONNX format)
- /usr/lib64/python<version>/site-packages/ai4c/lib/*.so:
  - Dynamic library of the inference engine for AI-assisted compilation optimization
  - Dynamic library of the compiler plugin for AI-assisted compilation optimization and compiler tuning
- /usr/lib64/python<version>/site-packages/ai4c/autotuner/*: files related to the coarse-grained and fine-grained tuning tools
- /usr/lib64/python<version>/site-packages/ai4c/optimizer/*: files related to AI-assisted compilation optimization
- /usr/lib64/python<version>/site-packages/ai4c/option_tuner/*: files related to application-level compilation option tuning

3.3 Source Code Build and Installation Process

The source code address of AI4C is https://atomgit.com/openeuler/AI4C.

3.3.1 Installing the ONNX Runtime Dependency

Solution 1:

Download version 1.16.3 from GitHub and decompress the .tgz file of the corresponding architecture. For example, in the AArch64 architecture, download onnxruntime-linux-aarch64-1.16.3.tgz.

Address: https://github.com/microsoft/onnxruntime/releases/tag/v1.16.3

Note: After the tgz file is decompressed, the dynamic library libonnxruntime.so is stored in the lib directory. To build the AI4C framework, you need to rename the lib directory to lib64. Otherwise, the error message indicating that the path of -lonnxruntime cannot be found may be displayed.

Solution 2

Ensure that the following ONNX Runtime dependency packages have been installed:

shell

yum install -y cmake make gcc gcc-c++ abseil-cpp-devel boost-devel bzip2 python3-devel python3-numpy python3-setuptools python3-pip

Use CMake to install ONNX Runtime.

shell

cd path/to/your/AI4C/third_party/onnxruntime
cmake \
    -DCMAKE_INSTALL_PREFIX=path/to/your/onnxruntime \
    -Donnxruntime_BUILD_SHARED_LIB=ON \
    -Donnxruntime_BUILD_UNIT_TESTS=ON \
    -Donnxruntime_INSTALL_UNIT_TESTS=OFF \
    -Donnxruntime_BUILD_BENCHMARKS=OFF \
    -Donnxruntime_USE_FULL_PROTOBUF=ON \
    -DPYTHON_VERSION=%{python3_version} \
    -Donnxruntime_ENABLE_CPUINFO=ON \
    -Donnxruntime_DISABLE_ABSEIL=ON \
    -Donnxruntime_USE_NEURAL_SPEED=OFF \
    -Donnxruntime_ENABLE_PYTHON=OFF \
    -DCMAKE_BUILD_TYPE=Release \
    -S cmake
make -j %{max_jobs} && make install

3.3.2 Installing Other Build Dependencies of AI4C

Check that the following dependencies have been installed:

shell

yum install -y python3-wheel openssl openssl-devel yaml-cpp yaml-cpp-devel gcc-plugin-devel libstdc++-static

3.3.3 Building the AI4C Framework

shell

cd path/to/your/AI4C/python
python3 setup.py bdist_wheel                       \
    -Donnxruntime_ROOTDIR=path/to/your/onnxruntime \
    -DCMAKE_BUILD_TYPE=Release                     \
    -DCMAKE_CXX_COMPILER=path/to/your/g++          \
    -DCMAKE_C_COMPILER=path/to/your/gcc
pip3 install dist/ai4c-<version>-<python_version>-<python_version>-<os>_<arch>.whl --force-reinstall --no-deps

After the installation is complete, the following files exist in the system:

path/to/your/pythonbin/ai4c-*: wrapper of the AI-enabled compiler and auto-tuning tool
path/to/your/onnxruntime/lib64/libonnxruntime.so: dynamic library of the ONNX Runtime inference framework
path/to/your/AI4C/models/*.onnx: AI-assisted compilation optimization model (in ONNX format)
path/to/your/pythonlib/ai4c/lib/*.so:
- Dynamic library of the inference engine for AI-assisted compilation optimization
- Dynamic library of the compiler plugin for AI-assisted compilation optimization and compilation tuning
path/to/your/pythonlib/ai4c/autotuner/*: files related to coarse-grained and fine-grained tuning tools
path/to/your/pythonlib/ai4c/optimizer/*: files related to AI-assisted compilation optimization
path/to/your/pythonlib/ai4c/option_tuner/*: files related to application-level compilation option tuning

Notes:

path/to/your/pythonbin: After the installation is complete, you can run the which ai4c-gcc command to view the path of the bin directory.
path/to/your/pythonlib: After the installation is complete, you can run the pip show ai4c command to view the path of the lib directory, namely, Location in the command output.

4. Usage Process

4.1 AI-Assisted Compilation Optimization

The current AI-assisted compilation optimization module consists of three parts:

ONNX model, which is the trained model for AI-assisted compilation optimization.
Compiler plugin (only GCC is supported currently), which is used to run the ONNX model inference and obtain tuning parameters.
AI4Compiler framework, which provides the ONNX inference engine and GCC optimization compile commands.

You can train an AI model in advance using an open-source machine learning framework and export the model in ONNX format. In addition, provide a compiler plugin for the AI model. The plugin must contain at least three modules that have the following functions:

Extracts the compiler input features required by the AI model.
Drives the inference engine to call the AI model to perform inference.
Labels the data structure of the inference result returned to the compiler.

In the following test cases, you only need to add three plugin-related compilation options to the compile command for compiling the target binary each time. The three options are the plugin path, AI model path corresponding to the plugin, and inference engine path. In this way, AI-assisted compilation can be enabled to optimize the model during compilation.

shell

# If onnxruntime is installed in a non-system folder, set the environment variable.
# export LD_LIBRARY_PATH=path/to/your/onnxruntime/lib64/:$LD_LIBRARY_PATH

gcc_compiler=path/to/your/gcc
infer_engine_path=$(ai4c-gcc --inference-engine)
model_path=path/to/your/model.onnx
plugin_path=path/to/your/<model_plugin>.so

$gcc_compiler test.c -O2 -o test                            \
    -fplugin=$plugin_path                                   \
    -fplugin-arg-<model_plugin>-model=$model_path           \
    -fplugin-arg-<model_plugin>-engine=$infer_engine_path

Currently, the supported plugins are stored in the same directory as $(ai4c-gcc --inference-engine), and the supported models are stored in path/to/your/AI4C/models.

Notes:

The compiler plugin for compiling the AI model must be the same as that for compiling the target application to be optimized. Otherwise, compilation errors will occur due to inconsistent compiler versions.
Currently, AI4C supports only the use of AI-assisted compilation optimization passes implemented in the cc1 phase of the GCC compiler in the form of plugins.

For details about the compiler plugin development and usage processes, see the AI-assisted compilation optimization manual and test cases.

The following provides two examples of using AI-assisted compilation optimization models in different compilation phases. The loop unrolling and function inlining model is located in the cc1 compilation optimization phase, and the AI model adaptation and inference are implemented in the form of GCC plugins. The BOLT sampling basic block precision correction model is located in the BOLT post-link optimization phase, and the model adaptation layer is located in the LLVM-BOLT repository.

4.1.1 Loop Unrolling and Function Inlining Model

The compilation optimization options corresponding to the loop unrolling and function inlining model are as follows:

Option Name	Description
-fplugin	Specifies the absolute path of the loop unrolling and function inlining plugin (`-fplugin=/path/to/<ipa_inline_unroll_plugin>.so`).
-fplugin-arg-<ipa_inline_unroll_plugin>-engine	Specifies the absolute path of the inference engine of the function inlining ONNX model (`-fplugin-arg-<ipa_inline_unroll_plugin>-inline_model=/path/to/inference_engine.so`), which must be enabled together with `-fplugin`. You can obtain the path of `/path/to/inference_engine.so` using `ai4c-gcc --inference-engine`.
-fplugin-arg-<ipa_inline_unroll_plugin>-inline_model	Specifies the absolute path of the ONNX model for function inlining (`-fplugin-arg-<ipa_inline_unroll_plugin>-inline_model=/path/to/inline_model.onnx`), which must be enabled together with `-fplugin` and `-fplugin-arg-<ipa_inline_unroll_plugin>-engine`.
-fplugin-arg-<ipa_inline_unroll_plugin>-unroll_model	Specifies the absolute path of the ONNX model for loop unrolling (`-fplugin-arg-<ipa_inline_unroll_plugin>-unroll_model=/path/to/unroll_model.onnx`), which must be enabled together with `-fplugin` and `-fplugin-arg-<ipa_inline_unroll_plugin>-engine`.

You can enable multiple AI-assisted compilation and optimization models in a GCC plugin at the same time. For example:

shell

gxx_compiler=path/to/your/g++
infer_engine_path=$(ai4c-gcc --inference-engine)
inline_model_path=path/to/your/inline_model.onnx
unroll_model_path=path/to/your/unroll_model.onnx
plugin_path=path/to/your/<ipa_inline_unroll_plugin>.so

$gxx_compiler test.cc -O3 -o test -funroll-loops                           \
    -fplugin=$plugin_path                                                  \
    -fplugin-arg-<ipa_inline_unroll_plugin>-engine=$infer_engine_path        \
    -fplugin-arg-<ipa_inline_unroll_plugin>-inline_model=$inline_model_path  \
    -fplugin-arg-<ipa_inline_unroll_plugin>-unroll_model=$unroll_model_path

4.1.2 Basic Block Accuracy Correction Model for BOLT Sampling

The BOLT optimization options corresponding to the basic block accuracy correction model for BOLT sampling are as follows:

Option Name	Description
-block-correction	Enables the AI-based CFG BB count optimization. This option must be enabled together with the `-model-path` option to specify the ONNX model.
-model-path	Specifies the absolute path of the ONNX model (`-model-path=/path/to/model.onnx`). This option must be enabled together with the `-block-correction` option.
-annotate-threshold	Confidence threshold of the model prediction result. The default value is 0.95.

The custom optimization options in BOLT can be enabled by using the GCC -fbolt-option option. For example:

shell

g++ -fbolt-use=<gcov_file> -fbolt-target=<bin_file> -fbolt-option=\"-block-correction -model-path=path/to/your/block_correction_model.onnx\"

4.2 Fine-Grained Tuning

Here, we use the fine-grained tuning of the loop unrolling optimization pass in GCC as an example to describe the usage process of the tuning tool.

The current fine-grained tuning module consists of two parts:

Tuning configuration file (.ini) of the application: processes the compilation and execution of the application.
Search space configuration file (YAML): configures the parameter search space in the Autotuner phase, which can be used to replace the default parameter search space.

The current fine-grained tuning is implemented based on Autotuner.

In the generate phase of the compiler, a group of tunable compilation data structures and tunable coefficient sets are generated and saved in opp/*.yaml.
Based on the provided compilation parameter search space (search_space.yaml) and tunable data structures, the Autotuner generates a group of tuning coefficients for each tunable data structure using the tuning algorithm, and saves the coefficients in input.yaml.
In the autotune phase of the compiler, the tuning coefficients are marked to the corresponding data structures based on the hash value of the data structure in input.yaml.

Before enabling fine-grained tuning, install the following dependencies:

shell

yum install -y BiSheng-Autotuner bisheng-opentuner

In the following test case, we will tune the loop unrolling parameters of CoreMark. First, we will prepare the tuning configuration file coremark_sample.inifor CoreMark. The user needs to

Provide the application path and the commands for building and running the application.
Add the dynamic library for fine-grained tuning-fplugin=%(PluginPath)s/rtl_unroll_autotune_plugin_gcc12.soto the basic compile command.
- In the generateand autotunephases, add the corresponding input file of-fplugin-arg-rtl_unroll_autotune_plugin_gcc12-<stage>.
You can customize the paths of the configuration file for tuning structures (./opp/*.yaml) and the input file generated by the autotuner (input.yaml).

ini

[DEFAULT] # optional
# PluginPath = /path/to/gcc-plugins

[Environment Setting]  # optional
# prepend a list of paths into the PATH in order.
# PATH = /path/to/bin
# you can also set other environment variables here too

[Compiling Setting] # required
# NOTE: ConfigFilePath is set to the path to the current config file automatically by default.
CompileDir = /path/to/coremark
LLVMInputFile = %(CompileDir)s/input.yaml

# OppDir and OppCompileCommand are optional, 
# do not have to specify this if not using auto_run sub-command
OppDir = autotune_datadir/opp

CompilerCXX = /path/to/bin/gcc
BaseCommand = %(CompilerCXX)s -I. -I./posix -DFLAGS_STR=\""  -lrt"\" \
                -DPERFORMANCE_RUN=1 -DITERATIONS=10000 -g            \
                core_list_join.c  core_main.c core_matrix.c          \
                core_state.c core_util.c posix/core_portme.c         \
                -funroll-loops -O2 -o coremark                       \
                -fplugin=%(PluginPath)s/rtl_unroll_autotune_plugin_gcc12.so

# auto-tuning
CompileCommand = %(BaseCommand)s \
    -fplugin-arg-rtl_unroll_autotune_plugin_gcc12-autotune=%(LLVMInputFile)s

RunDir = %(CompileDir)s
RunCommand = ./coremark 0x0 0x0 0x66 100000 # run 300000 iterations for coremark

# generate
OppCompileCommand = %(BaseCommand)s \
    -fplugin-arg-rtl_unroll_autotune_plugin_gcc12-generate=%(OppDir)s

Second, we can prepare an additional parameter search space file search_space.yaml to customize the parameter space. For example, the default search space for the loop unrolling coefficient in the dynamic library is ${0, 2^{0} = 1, 2^{1} = 2, . . ., 2^{6} = 64}$ . You can adjust the search space to ${0, 2^{0} = 1, 2^{1} = 2, . . ., 2^{5} = 32}$ .

yaml

CodeRegion:
   CodeRegionType: loop
   Pass: loop2_unroll
   Args:
     UnrollCount:
       Value: [0, 1, 2, 4, 8, 16, 32]
       Type: enum

Finally, we place the coremark, coremark_sample.ini, and search_space.yaml files in the same folder and run the following script:

shell

ai4c-autotune autorun coremark_sample.ini \
  -scf search_space.yaml --stage-order loop \
  --time-after-convergence=100

The time-after-convergence parameter indicates the number of seconds after which the tuning is terminated if no new optimal configuration is found after the historical optimal value is obtained.

After the tuning is complete, the optimal configuration is saved in the loop.yaml file. You can run the compile command in the autotune phase and modify the input file (i.e., -fplugin-arg-rtl_unroll_autotune_plugin_gcc12-autotune=loop.yaml) of the autotune option to reproduce the performance value of the tuning combination.

You can obtain the historical tuning configuration file (autotune_config.csv) and performance data file (autotune_data.csv) in the following ways:

shell

ai4c-autotune dump -c coremark/input.yaml \
    --database=opentuner.db/localhost.localdomain.db -o autotune

Notes:

By default, the program running time is used as the performance value.

For details, see the Fine-Grained Tuning User Guide and the test case at https://atomgit.com/openeuler/AI4C/tree/master/python/test/autotuner/loop_unroll.

For details about fine-grained tuning of the LLVM compiler, see the tutorial in the Autotuner repository.

4.3 Function-Level Coarse-Grained Tuning

The current function-level coarse-grained tuning module consists of three parts:

Tuning configuration file (.ini) of the application: processes the compilation and execution of the application.
Search space configuration file (YAML): tuning search space of options configured in the Autotuner phase, which can be replaced with the default search space.
Compilation option set file (YAML): a preset compilation option search space. The default file is located in path/to/your/python<version>/site-packages/ai4c/autotuner/yaml/coarse_options.yaml.

The current function-level coarse-grained tuning is implemented based on Autotuner. It helps each function use different combinations of compilation options for compilation and optimization. The tuning principle is the same as that of fine-grained tuning. Because there are many compilation options that can be tuned for each function, the option space can be pruned in advance.

Before enabling function-level coarse-grained tuning, you need to install the following dependencies:

shell

yum install -y BiSheng-Autotuner bisheng-opentuner

The process of using coarse-grained tuning is similar to that of fine-grained tuning. In the following test case, we will tune the compilation option parameters of each function in test_coarse_tuning.cc. First, we will prepare the tuning configuration file test_coarse_tuning.ini for test_coarse_tuning.cc. The user needs to

Provide the application path and the commands for compiling and running the application.
Add the coarse-grained tuning dynamic library -fplugin=%(PluginPath)s/coarse_option_tuning_plugin_gcc12.so and the compilation option set file -fplugin-arg-coarse_option_tuning_plugin_gcc12-yaml=<YAML_FILE> to the basic compile command.
- In the generate and autotune phases, add the corresponding input files of -fplugin-arg-coarse_option_tuning_plugin_gcc12-<stage>.
You can customize the paths of the configuration file for tuning the structure (./opp/*.yaml) and the input file generated by the autotuner (input.yaml).

ini

[DEFAULT] # optional
# TuningYAMLFile = /path/to/coarse_option_tuning_yaml_config_file

[Environment Setting]  # optional

[Compiling Setting] # required
CompileDir = ./autotune_datadir
LLVMInputFile = %(CompileDir)s/input.yaml

OppDir = opp

Compiler = g++
BaseCommand = %(Compiler)s ../test_coarse_tuning.cc -O2 -o test_coarse_tuning \
    -fplugin=%(PluginPath)s/coarse_option_tuning_plugin_gcc12.so \
    -fplugin-arg-coarse_option_tuning_plugin_gcc12-yaml=%(TuningYAMLFile)s

# auto-tuning
CompileCommand = %(BaseCommand)s \
    -fplugin-arg-coarse_option_tuning_plugin_gcc12-autotune=input.yaml

RunDir = %(CompileDir)s
RunCommand = ./test_coarse_tuning 3

# generate
OppCompileCommand = %(BaseCommand)s \
    -fplugin-arg-coarse_option_tuning_plugin_gcc12-generate=%(OppDir)s

Second, we can prepare an additional parameter search space file search_space.yaml to customize the parameter space. For example, in the following file, we limit the search space to tuning of prefetch-related options.

yaml

CodeRegion:
  CodeRegionType: function
  Pass: coarse_option_generate
  Args:
    flag_prefetch_loop_arrays:
      Type: bool
    param_prefetch_latency:
      Min: 100
      Max: 2000
      Type: int
    param_simultaneous_prefetches:
      Min: 1
      Max: 80
      Type: int

Finally, we place test_coarse_tuning.cc, test_coarse_tuning.ini, and search_space.yaml in the same folder and run the following script:

shell

ai4c-autotune autorun test_coarse_tuning.ini \
    -scf search_space.yaml \
    --stage-order function \
    --time-after-convergence=10

The time-after-convergence parameter indicates the number of seconds after which the tuning is terminated if no new optimal configuration is found. That is, the tuning is terminated in advance.

After the tuning is complete, the optimal tuning configuration is saved in the function.yaml file. You can invoke the compile command in the autotune phase again and modify the input file (i.e., -fplugin-arg-coarse_option_tuning_plugin_gcc12-autotune=function.yaml) of the autotune option to reproduce the performance value of the tuning combination.

Notes:

Currently, the program running time is used as the performance value by default.
The historical data stored in the dump database is not supported in coarse-grained tuning.
The current coarse-grained tuning can be used with GCC 12.3.1. For other compiler versions, some compilation options may not be supported. You can comment out the compilation options that are not recognized by the compiler in path/to/your/AI4C/aiframe/include/option_utils.h.

For details, see the test case at https://atomgit.com/openeuler/AI4C/tree/master/python/test/autotuner/coarse_tuning.

For details about how to perform coarse-grained tuning on the LLVM compiler, see the Autotuner repository.

4.4 Application-Level Option Tuning

The current application-level option tuning module consists of three parts:

The compilation and running script (shell) of the application: processes the compilation, execution, and performance data collection of the application, and replaces the generated next group of options into the compilation script.
The configuration file (YAML) of the search space for compilation options and dynamic library options: configures the search space for option tuning, including the switch options (compilation optimization/dynamic library), compilation parameters, and enumeration options.
The configuration file (YAML) of performance values: configures the weights of multiple performance items and the target optimization direction (maximum or minimum value). The configuration must be consistent with the number and sequence of performance values obtained in the performance data collection process.

The application-level option tuning tool continuously collects the performance data of the application, updates the performance model, and generates a new compilation option combination with a high expected benefit. The new compilation option combination is replaced into the compilation script through the application compilation and running script, a new binary file is generated, and the next round of running is performed. Perform repeated tuning to obtain the historical optimal performance value.

Before enabling application-level tuning, install the following dependencies:

shell

pip install xgboost scikit-learn
yum install -y time

The following example will use different compilation option combinations to build and tune test.cc for three rounds. The compilation and running script of the application is as follows:

shell

# ---------- run_test.sh ---------- #
parent_dir=$1                                               # path for intermediate tuning files
config=$(cat ${parent_dir}/tuning/config.txt)               # current compiler configuration file
performance_file="${parent_dir}/tuning/performance.txt"     # current performance data file

measure_raw_file="time.txt"

compiler=g++
compile_command="${compiler} test.cc -O2 -o test_opt_tuner"
eval "${compile_command} ${config}"                         # program compilation, appending tuning options

run_command="time -p -o ${measure_raw_file} ./test_opt_tuner 3"
eval "${run_command}"                                       # program execution

info_collect_command="grep real ${measure_raw_file} | awk '{printf \"1 1 %s\", \$2}' > ${performance_file}"
eval "${info_collect_command}"                              # program performance collection

# ---------- run_option_tuner.sh ---------- #
ai4c-option-tune --test_limit 3 --runfile run_test.sh
    # --optionfile path/to/your/python<version>/site-packages/ai4c/option_tuner/input/options.yaml \
    # --libfile path/to/your/python<version>/site-packages/ai4c/option_tuner/input/options_lib.yaml \
    # --measurefile path/to/your/python<version>/site-packages/ai4c/option_tuner/input/config_measure.yaml

The default options and performance value configuration file are stored in the following path: path/to/your/python<version>/site-packages/ai4c/option_tuner/input/*.yaml

You can modify the configuration files of compilation options and dynamic library options as required. The related keywords are as follows:

required_*: mandatory tuning item, which will be retained in the tuning process.
bool_*: optional compilation optimization switch.
interval_*: optional compilation parameter (value option, data range).
enum_*: optional compilation parameter (enumerated option).

Example:

yaml

required_config:
- -O2
bool_config:
- -funroll-loops
interval_config:
- name: --param max-inline-insns-auto
  default: 15
  min: 10
  max: 190

You can modify the performance value configuration file as required. The related keywords are as follows:

weight: performance value weight
optim: target optimization direction (maximum or minimum value)

Example:

yaml

config_measure:
- name: throughput
  weight: 1
  optim: maximize

After the tuning is complete, the historical and optimal tuning data is saved in ${parent_dir}/tuning/train.csv and ${parent_dir}/tuning/result.txt.

For details, see the test case at https://atomgit.com/openeuler/AI4C/tree/master/python/test/option_tuner.

AI4C User Guide ​

1 Introduction to AI4C ​

2 Software Architecture ​

3 Installation and Build of AI4C ​

3.1 Directly Installing AI4C ​

3.2 (Recommended) RPM Package Build and Installation Process ​

3.3 Source Code Build and Installation Process ​

3.3.1 Installing the ONNX Runtime Dependency ​

3.3.2 Installing Other Build Dependencies of AI4C ​

3.3.3 Building the AI4C Framework ​

4. Usage Process ​

4.1 AI-Assisted Compilation Optimization ​

4.1.1 Loop Unrolling and Function Inlining Model ​

4.1.2 Basic Block Accuracy Correction Model for BOLT Sampling ​

4.2 Fine-Grained Tuning ​

4.3 Function-Level Coarse-Grained Tuning ​

4.4 Application-Level Option Tuning ​

AI4C User Guide

1 Introduction to AI4C

2 Software Architecture

3 Installation and Build of AI4C

3.1 Directly Installing AI4C

3.2 (Recommended) RPM Package Build and Installation Process

3.3 Source Code Build and Installation Process

3.3.1 Installing the ONNX Runtime Dependency

3.3.2 Installing Other Build Dependencies of AI4C

3.3.3 Building the AI4C Framework

4. Usage Process

4.1 AI-Assisted Compilation Optimization

4.1.1 Loop Unrolling and Function Inlining Model

4.1.2 Basic Block Accuracy Correction Model for BOLT Sampling

4.2 Fine-Grained Tuning

4.3 Function-Level Coarse-Grained Tuning

4.4 Application-Level Option Tuning