AI4C 使用手册

1 AI4C 介绍

AI4C 代表 AI 辅助编译器的套件，是一个使编译器能够集成机器学习驱动编译优化的框架。

2 软件架构说明

本框架包含以下几个模块，自动编译调优工具依赖 python 环境：

AI 辅助编译优化的推理引擎，驱动编译器在优化 pass 内使用AI模型推理所获得的结果实现编译优化。
- 当前 GCC 内的 AI 使能优化 pass 基本通过编译器插件的形式实现，与编译器主版本解耦。
自动编译调优工具，通过编译器外部的调优工具（OpenTuner）驱动编译器执行多层粒度的自动编译调优，当前支持 GCC 和 LLVM 编译器。
- 选项调优工具，用于应用级的编译选项调优。
- 编译调优工具，基于 Autotuner 实现，可实现细粒度和粗粒度的编译调优。
  - 细粒度调优，调优优化 pass 内的关键优化参数，例如，循环展开的次数（unroll count）。
  - 粗粒度调优，调优函数级的编译选项。

未来规划方向：

[ ] 集成 ACPO 的 LLVM 编译优化模型，同时将 ACPO LLVM 侧的相关代码提取成插件，与 LLVM 主版本解耦。
[ ] AI4Compiler 框架支持更多的开源机器学习框架的推理（pytorch - LibTorch、tensorflow - LiteRT）。
[ ] 提供更多的 AI 辅助编译优化模型及相应的编译器插件。
[ ] 集成新的搜索算法（基于白盒信息）并优化参数搜索空间（热点函数调优）。
[ ] 支持 JDK 的编译参数调优。

3 AI4C 的安装构建

3.1 直接安装AI4C

若用户使用最新的openEuler系统（24.03-LTS-SP1），同时只准备使用AI4C的现有特性，可以直接安装AI4C包。

shell

yum install -y AI4C

若用户使用其他版本的AI4C特性或在其他OS版本中安装AI4C，需重新构建AI4C，可以参考以下步骤。

3.2 RPM包构建安装流程（推荐）

使用 root 权限，安装 rpmbuild、rpmdevtools，具体命令如下：

bash

# 安装 rpmbuild
yum install dnf-plugins-core rpm-build
# 安装 rpmdevtools
yum install rpmdevtools

在主目录/root下生成 rpmbuild 文件夹：

bash

rpmdev-setuptree
# 检查自动生成的目录结构
ls ~/rpmbuild/
BUILD  BUILDROOT  RPMS  SOURCES  SPECS  SRPMS

使用git clone https://gitee.com/src-openeuler/AI4C.git，从目标仓库的 openEuler-24.03-LTS-SP1 分支拉取代码，并把目标文件放入 rpmbuild 的相应文件夹下：
shell
```
cp AI4C/AI4C-v%{version}-alpha.tar.gz ~/rpmbuild/SOURCES/
cp AI4C/*.patch ~/rpmbuild/SOURCES/
cp AI4C/AI4C.spec ~/rpmbuild/SPECS/
```
用户可通过以下步骤生成 AI4C 的 RPM 包：
shell
```
# 安装 AI4C 所需依赖
yum-builddep ~/rpmbuild/SPECS/AI4C.spec
# 构建 AI4C 依赖包
# 若出现 check-rpaths 相关报错，则需要在 rpmbuild 前添加 QA_RPATHS=0x0002，例如
# QA_RPATHS=0x0002 rpmbuild -ba ~/rpmbuild/SPECS/AI4C.spec
rpmbuild -ba ~/rpmbuild/SPECS/AI4C.spec
# 安装 RPM 包
cd ~/rpmbuild/RPMS/<arch>
rpm -ivh AI4C-<version>-<release>.<arch>.rpm
```
注意事项：若系统因存有旧版本的 RPM 安装包而导致文件冲突，可以通过以下方式解决：
shell
```
# 解决方案一：强制安装新版本
rpm -ivh AI4C-<version>-<release>.<arch>.rpm --force
# 解决方案二：更新安装包
rpm -Uvh AI4C-<version>-<release>.<arch>.rpm
```
安装完成后，系统内会存在以下文件：
- /usr/bin/ai4c-*: AI 使能的编译器以及自动调优工具的 wrapper
- /usr/lib64/libonnxruntime.so: ONNX Runtime 的推理框架动态库
- /usr/lib64/AI4C/*.onnx: AI 辅助编译优化模型（ONNX 格式）
- /usr/lib64/python<version>/site-packages/ai4c/lib/*.so:
  - AI 辅助编译优化的推理引擎动态库
  - AI 辅助编译优化与编译调优的编译器插件动态库
- /usr/lib64/python<version>/site-packages/ai4c/autotuner/*: 粗、细粒度调优工具的相关文件
- /usr/lib64/python<version>/site-packages/ai4c/optimizer/*: AI 辅助编译优化的相关文件
- /usr/lib64/python<version>/site-packages/ai4c/option_tuner/*: 应用级编译选项调优的相关文件

3.3 源码构建安装流程

AI4C 的源码地址：https://gitee.com/openeuler/AI4C

3.3.1 安装 ONNX Runtime 依赖

方案一：

在 GitHub 下载 1.16.3 版本，并解压相应架构的 tgz 文件，例如，aarch64 架构下，下载onnxruntime-linux-aarch64-1.16.3.tgz。

地址：https://github.com/microsoft/onnxruntime/releases/tag/v1.16.3

注意事项：tgz 文件解压后，libonnxruntime.so的动态库存在于lib目录下，为构建 AI4C 框架，需将lib目录重命名为lib64，否则可能会导致-lonnxruntime找不到路径的报错。

方案二：

保证以下 onnxruntime 的依赖包已安装：

shell

yum install -y cmake make gcc gcc-c++ abseil-cpp-devel boost-devel bzip2 python3-devel python3-numpy python3-setuptools python3-pip

使用 cmake 安装 onnxruntime：

shell

cd path/to/your/AI4C/third_party/onnxruntime
cmake \
    -DCMAKE_INSTALL_PREFIX=path/to/your/onnxruntime \
    -Donnxruntime_BUILD_SHARED_LIB=ON \
    -Donnxruntime_BUILD_UNIT_TESTS=ON \
    -Donnxruntime_INSTALL_UNIT_TESTS=OFF \
    -Donnxruntime_BUILD_BENCHMARKS=OFF \
    -Donnxruntime_USE_FULL_PROTOBUF=ON \
    -DPYTHON_VERSION=%{python3_version} \
    -Donnxruntime_ENABLE_CPUINFO=ON \
    -Donnxruntime_DISABLE_ABSEIL=ON \
    -Donnxruntime_USE_NEURAL_SPEED=OFF \
    -Donnxruntime_ENABLE_PYTHON=OFF \
    -DCMAKE_BUILD_TYPE=Release \
    -S cmake
make -j %{max_jobs} && make install

3.3.2 安装 AI4C 的其他构建依赖

保证以下依赖包已安装：

shell

yum install -y python3-wheel openssl openssl-devel yaml-cpp yaml-cpp-devel gcc-plugin-devel libstdc++-static

3.3.3 构建 AI4C 框架

shell

cd path/to/your/AI4C/python
python3 setup.py bdist_wheel                       \
    -Donnxruntime_ROOTDIR=path/to/your/onnxruntime \
    -DCMAKE_BUILD_TYPE=Release                     \
    -DCMAKE_CXX_COMPILER=path/to/your/g++          \
    -DCMAKE_C_COMPILER=path/to/your/gcc
pip3 install dist/ai4c-<version>-<python_version>-<python_version>-<os>_<arch>.whl --force-reinstall --no-deps

安装完成后，系统内会存在以下文件：

path/to/your/pythonbin/ai4c-*: AI 使能的编译器以及自动调优工具的 wrapper
path/to/your/onnxruntime/lib64/libonnxruntime.so: ONNX Runtime 的推理框架动态库
path/to/your/AI4C/models/*.onnx: AI 辅助编译优化模型（ONNX 格式）
path/to/your/pythonlib/ai4c/lib/*.so:
- AI 辅助编译优化的推理引擎动态库
- AI 辅助编译优化与编译调优的编译器插件动态库
path/to/your/pythonlib/ai4c/autotuner/*: 粗、细粒度调优工具的相关文件
path/to/your/pythonlib/ai4c/optimizer/*: AI 辅助编译优化的相关文件
path/to/your/pythonlib/ai4c/option_tuner/*: 应用级编译选项调优的相关文件

注意事项：

path/to/your/pythonbin：安装完成后，可通过which ai4c-gcc查看 bin 的路径
path/to/your/pythonlib：安装完成后，可通过pip show ai4c显示的 Location 查看 lib 的路径

4 使用流程

4.1 AI 辅助编译优化

当前的 AI 辅助编译优化模块，主要由三部分输入组成：

ONNX 模型，训练后的辅助编译优化模型。
编译器插件（当前仅支持 GCC 编译器），用于运行 ONNX 模型推理并获取优化参数。
AI4Compiler 框架，提供 ONNX 推理引擎和 GCC 优化编译命令。

用户事先根据开源机器学习框架训练一个 AI 模型，输出成 ONNX 格式。同时，针对该 AI 模型提供一个对应的编译器插件，插件内至少包含三个模块：

提取 AI 模型所需的编译器输入特征。
驱动推理引擎调用 AI 模型执行推理。
标注推理结果回编译器的数据结构。

在下述测试例中，仅需要在每次编译目标二进制的编译命令中，增加三个与插件相关的编译选项：插件路径、插件对应的 AI 模型路径、推理引擎路径，即可在编译时使能 AI 辅助编译优化模型。

shell

# 若 onnxruntime 安装在非系统的文件夹下，注意设置环境变量
# export LD_LIBRARY_PATH=path/to/your/onnxruntime/lib64/:$LD_LIBRARY_PATH

gcc_compiler=path/to/your/gcc
infer_engine_path=$(ai4c-gcc --inference-engine)
model_path=path/to/your/model.onnx
plugin_path=path/to/your/<model_plugin>.so

$gcc_compiler test.c -O2 -o test                            \
    -fplugin=$plugin_path                                   \
    -fplugin-arg-<model_plugin>-model=$model_path           \
    -fplugin-arg-<model_plugin>-engine=$infer_engine_path

当前已支持的插件存在于$(ai4c-gcc --inference-engine)的同目录下，已支持的模型存在于path/to/your/AI4C/models下。

注意事项：

编译 AI 模型对应的编译器插件与编译目标优化应用的编译器需保证为同一个，否则会出现编译器版本不一致导致的编译报错。
当前 AI4C 仅支持在 GCC 编译器 cc1 阶段实现的 AI 辅助编译优化 pass 使用插件形式。

详细的编译器插件开发流程与使用流程可以参照 AI 辅助编译优化手册和测试例进行。

下面我们举两个位于不同编译阶段的 AI 辅助编译优化模型的使用例。循环展开与函数内联模型位于cc1编译优化阶段，使用 GCC 插件形式实现 AI 模型适配与推理；BOLT 采样基本块精度修正模型位于BOLT链接后优化阶段，模型适配层位于 LLVM-BOLT 仓库。

4.1.1 循环展开与函数内联模型

循环展开与函数内联模型对应的编译优化选项如下：

选项名	说明
-fplugin	指定循环展开与函数内联插件的绝对路径（`-fplugin=/path/to/<ipa_inline_unroll_plugin>.so`）。
-fplugin-arg-<ipa_inline_unroll_plugin>-engine	指定函数内联 ONNX 模型的推理引擎绝对路径（`-fplugin-arg-<ipa_inline_unroll_plugin>-inline_model=/path/to/inference_engine.so`），需要与`-fplugin`同时开启。`/path/to/inference_engine.so`的路径可通过`ai4c-gcc --inference-engine`获得。
-fplugin-arg-<ipa_inline_unroll_plugin>-inline_model	指定函数内联 ONNX 模型的绝对路径（`-fplugin-arg-<ipa_inline_unroll_plugin>-inline_model=/path/to/inline_model.onnx`），需要与`-fplugin`和`-fplugin-arg-<ipa_inline_unroll_plugin>-engine`同时开启。
-fplugin-arg-<ipa_inline_unroll_plugin>-unroll_model	指定循环展开 ONNX 模型的绝对路径（`-fplugin-arg-<ipa_inline_unroll_plugin>-unroll_model=/path/to/unroll_model.onnx`），需要与`-fplugin`和`-fplugin-arg-<ipa_inline_unroll_plugin>-engine`同时开启。

用户可同时启用一个 GCC 插件内的多个 AI 辅助编译优化模型，例如：

shell

gxx_compiler=path/to/your/g++
infer_engine_path=$(ai4c-gcc --inference-engine)
inline_model_path=path/to/your/inline_model.onnx
unroll_model_path=path/to/your/unroll_model.onnx
plugin_path=path/to/your/<ipa_inline_unroll_plugin>.so

$gxx_compiler test.cc -O3 -o test -funroll-loops                           \
    -fplugin=$plugin_path                                                  \
    -fplugin-arg-<ipa_inline_unroll_plugin>-engine=$infer_engine_path        \
    -fplugin-arg-<ipa_inline_unroll_plugin>-inline_model=$inline_model_path  \
    -fplugin-arg-<ipa_inline_unroll_plugin>-unroll_model=$unroll_model_path

4.1.2 BOLT 采样基本块精度修正模型

BOLT 采样的基本块精度修正模型对应的 BOLT 优化选项如下：

选项名	说明
-block-correction	开启 AI 优化 CFG BB Count 选项，需要与 `-model-path` 选项同时开启以指定 ONNX 模型。
-model-path	指定 ONNX 模型的绝对路径（`-model-path=/path/to/model.onnx`），需要与`-block-correction`同时开启。
-annotate-threshold	使用模型预测结果的置信度阈值，默认是 0.95。

BOLT 内自定义的优化选项可以通过 GCC 的-fbolt-option调用使能，例如：

shell

g++ -fbolt-use=<gcov_file> -fbolt-target=<bin_file> -fbolt-option=\"-block-correction -model-path=path/to/your/block_correction_model.onnx\"

4.2 细粒度调优

此处我们以 GCC 内循环展开优化 pass 的细粒度调优为例，展开调优工具的使用流程。

当前的细粒度调优模块，由两部分输入组成：

应用的调优配置文件（.ini）：处理应用的编译流程、执行流程。
搜参空间配置文件（YAML）：Autotuner 阶段配置的选项调优搜参空间，可替换默认搜参空间。

当前细粒度调优基于 Autotuner 实现：

在编译器的generate阶段，生成一组可调优的编译数据结构与可调优系数集合，保存在opp/*.yaml内。
根据额外提供的编译搜参空间（search_space.yaml）与可调优数据结构，Autotuner 通过调优算法针对每个可调优数据结构生成下一组调优系数，保存在input.yaml中。
在编译器的autotune阶段，根据input.yaml内数据结构的 hash 值，将调优系数标注到对应的数据结构里，完成调优。

在开启细粒度调优前，需安装以下依赖包：

shell

yum install -y BiSheng-Autotuner bisheng-opentuner

下列测试例中，我们将调优 CoreMark 的循环展开参数。首先，我们将准备CoreMark的调优配置文件coremark_sample.ini。用户需要

提供应用路径、应用的编译与运行命令。
在基础编译命令中加入细粒度调优的动态库-fplugin=%(PluginPath)s/rtl_unroll_autotune_plugin_gcc12.so。
- 在generate和autotune阶段，分别加入-fplugin-arg-rtl_unroll_autotune_plugin_gcc12-<stage>的相应输入文件。
可自定义可调优结构配置文件的路径（./opp/*.yaml）、Autotuner 生成的编译器输入文件路径（input.yaml）等。

ini

[DEFAULT] # optional
# PluginPath = /path/to/gcc-plugins

[Environment Setting]  # optional
# prepend a list of paths into the PATH in order.
# PATH = /path/to/bin
# you can also set other enviroment variables here too

[Compiling Setting] # required
# NOTE: ConfigFilePath is set to the path to the current config file automatically by default.
CompileDir = /path/to/coremark
LLVMInputFile = %(CompileDir)s/input.yaml

# OppDir and OppCompileCommand are optional, 
# do not have to specify this if not using auto_run sub-command
OppDir = autotune_datadir/opp

CompilerCXX = /path/to/bin/gcc
BaseCommand = %(CompilerCXX)s -I. -I./posix -DFLAGS_STR=\""  -lrt"\" \
                -DPERFORMANCE_RUN=1 -DITERATIONS=10000 -g            \
                core_list_join.c  core_main.c core_matrix.c          \
                core_state.c core_util.c posix/core_portme.c         \
                -funroll-loops -O2 -o coremark                       \
                -fplugin=%(PluginPath)s/rtl_unroll_autotune_plugin_gcc12.so

# auto-tuning
CompileCommand = %(BaseCommand)s \
    -fplugin-arg-rtl_unroll_autotune_plugin_gcc12-autotune=%(LLVMInputFile)s

RunDir = %(CompileDir)s
RunCommand = ./coremark 0x0 0x0 0x66 100000 # run 300000 iterations for coremark

# generate
OppCompileCommand = %(BaseCommand)s \
    -fplugin-arg-rtl_unroll_autotune_plugin_gcc12-generate=%(OppDir)s

其次，我们可以准备一份额外的参数搜索空间文件seach_space.yaml，自定义缩小参数空间。例如，动态库默认选择循环展开系数空间为，我们可以把搜索空间调整为。

yaml

CodeRegion:
   CodeRegionType: loop
   Pass: loop2_unroll
   Args:
     UnrollCount:
       Value: [0, 1, 2, 4, 8, 16, 32]
       Type: enum

最终我们将 coremark，coremark_sample.ini，和search_space.yaml 放在同一个文件夹下，并运行以下脚本：

shell

ai4c-autotune autorun coremark_sample.ini \
  -scf search_space.yaml --stage-order loop \
  --time-after-convergence=100

其中，参数time-after-convergence代表历史最佳值后多少秒未发现新的最优配置时，即提早结束调优。

调优完成后，最佳调优配置将保存在loop.yaml内，并可通过重新调用autotune阶段编译命令，同时修改autotune选项的输入文件（i.e., -fplugin-arg-rtl_unroll_autotune_plugin_gcc12-autotune=loop.yaml），复现该调优组合的性能值。

用户可以通过以下方式调取历史调优配置文件（autotune_config.csv）与性能数据文件（autotune_data.csv）：

shell

ai4c-autotune dump -c coremark/input.yaml \
    --database=opentuner.db/localhost.localdomain.db -o autotune

注意事项：

当前默认支持程序运行时间作为性能值。

详细使用信息，请参考细粒度调优使用手册与该测试例：https://gitee.com/openeuler/AI4C/tree/master/python/test/autotuner/loop_unroll

LLVM 编译器的细粒度调优请参考 Autotuner 仓库的使用教程。

4.3 函数级的粗粒度调优

当前的函数级粗粒度调优模块，由三部分输入组成：

应用的调优配置文件（.ini）：处理应用的编译流程、执行流程。
搜参空间配置文件（YAML）：Autotuner 阶段配置的选项调优搜参空间，可替换默认搜参空间。
编译选项全集文件（YAML）：预先设置的编译选项搜索空间全集，默认文件位于path/to/your/python<version>/site-packages/ai4c/autotuner/yaml/coarse_options.yaml。

当前函数级粗粒度调优基于 Autotuner 实现，可以帮助各函数使用不同的编译选项组合执行编译优化，其调优原理细粒度调优与一致。由于各函数可调优的编译选项众多，可预先对选项空间做裁剪。

在开启函数级的粗粒度调优前，需安装以下依赖包：

shell

yum install -y BiSheng-Autotuner bisheng-opentuner

粗粒度调优的使用流程基本与细粒度调优一致。下列测试例中，我们将调优test_coarse_tuning.cc中各函数的编译选项参数。首先，我们将准备test_coarse_tuning.cc的调优配置文件test_coarse_tuning.ini。用户需要

提供应用路径、应用的编译与运行命令。
在基础编译命令中加入粗粒度调优的动态库-fplugin=%(PluginPath）s/coarse_option_tuning_plugin_gcc12.so和编译选项全集文件-fplugin-arg-coarse_option_tuning_plugin_gcc12-yaml=<YAML_FILE>。
- 在generate和autotune阶段，分别加入-fplugin-arg-coarse_option_tuning_plugin_gcc12-<stage>的相应输入文件。
可自定义可调优结构配置文件的路径（./opp/*.yaml）、Autotuner 生成的编译器输入文件路径（input.yaml）等。

ini

[DEFAULT] # optional
# TuningYAMLFile = /path/to/coarse_option_tuning_yaml_config_file

[Environment Setting]  # optional

[Compiling Setting] # required
CompileDir = ./autotune_datadir
LLVMInputFile = %(CompileDir)s/input.yaml

OppDir = opp

Compiler = g++
BaseCommand = %(Compiler)s ../test_coarse_tuning.cc -O2 -o test_coarse_tuning \
    -fplugin=%(PluginPath)s/coarse_option_tuning_plugin_gcc12.so \
    -fplugin-arg-coarse_option_tuning_plugin_gcc12-yaml=%(TuningYAMLFile)s

# auto-tuning
CompileCommand = %(BaseCommand)s \
    -fplugin-arg-coarse_option_tuning_plugin_gcc12-autotune=input.yaml

RunDir = %(CompileDir)s
RunCommand = ./test_coarse_tuning 3

# generate
OppCompileCommand = %(BaseCommand)s \
    -fplugin-arg-coarse_option_tuning_plugin_gcc12-generate=%(OppDir)s

其次，我们可以准备一份额外的参数搜索空间文件seach_space.yaml，自定义参数空间。例如，在以下文件中，我们将搜索空间限制在预取相关选项上的调优。

yaml

CodeRegion:
  CodeRegionType: function
  Pass: coarse_option_generate
  Args:
    flag_prefetch_loop_arrays:
      Type: bool
    param_prefetch_latency:
      Min: 100
      Max: 2000
      Type: int
    param_simultaneous_prefetches:
      Min: 1
      Max: 80
      Type: int

最终我们将 test_coarse_tuning.cc，test_coarse_tuning.ini，和search_space.yaml 放在同一个文件夹下，并运行以下脚本：

shell

ai4c-autotune autorun test_coarse_tuning.ini \
    -scf search_space.yaml \
    --stage-order function \
    --time-after-convergence=10

其中，参数time-after-convergence代表历史最佳值后多少秒未发现新的最优配置时，即提早结束调优。

调优完成后，最佳调优配置将保存在function.yaml内，并可通过重新调用autotune阶段编译命令，同时修改autotune选项的输入文件（i.e., -fplugin-arg-coarse_option_tuning_plugin_gcc12-autotune=function.yaml），复现该调优组合的性能值。

注意事项：

当前默认支持程序运行时间作为性能值。
粗粒度调优暂不支持 dump 数据库内保存的历史数据。
当前的粗粒度调优支持与当前版本的 GCC 版本（12.3.1）配套使用，其他编译器版本会出现部分编译选项不支持的问题。可在path/to/your/AI4C/aiframe/include/option_utils.h中注释编译器未识别的编译选项。

详细使用信息，请参考该测试例：https://gitee.com/openeuler/AI4C/tree/master/python/test/autotuner/coarse_tuning

LLVM 编译器的粗粒度调优请参考 Autotuner 仓库的使用教程。

4.4 应用级选项调优

当前的应用级选项调优模块，主要由三部分输入组成：

应用的编译与运行脚本（shell）：处理应用的编译流程（并将生成的下一组选项替换进编译脚本内）、执行流程、和性能数据采集流程。
编译选项与动态库选项的搜参空间配置文件（YAML）：配置选项调优的搜参空间，可配置开关选项（编译优化/动态库）、编译参数、枚举选项。
性能值的配置文件（YAML）：配置多个性能项的权重，与目标优化方向（最大/最小值），需与“性能数据采集流程”所获取的性能值数量、顺序对应。

应用级选项调优工具将不断收集应用的性能数据，更新性能模型，并生成一组模型预期收益较高的新编译选项组合。通过应用的编译与运行脚本将新的编译选项组合替换进编译脚本内，生成新的二进制文件并执行下一轮运行。反复调优，获取历史最优性能值。

在开启应用级选项调优前，需安装以下依赖包：

shell

pip install xgboost scikit-learn
yum install -y time

以下用例将使用不同的编译选项组合构建并调优test.cc 3 轮。应用的编译与运行脚本如下：

shell

# ---------- run_test.sh ---------- #
parent_dir=$1                                               # path for intermediate tuning files
config=$(cat ${parent_dir}/tuning/config.txt)               # current compiler configuration file
performance_file="${parent_dir}/tuning/performance.txt"     # current performance data file

measure_raw_file="time.txt"

compiler=g++
compile_command="${compiler} test.cc -O2 -o test_opt_tuner"
eval "${compile_command} ${config}"                         # program compilation, appending tuning options

run_command="time -p -o ${measure_raw_file} ./test_opt_tuner 3"
eval "${run_command}"                                       # program execution

info_collect_command="grep real ${measure_raw_file} | awk '{printf \"1 1 %s\", \$2}' > ${performance_file}"
eval "${info_collect_command}"                              # program performance collection

# ---------- run_option_tuner.sh ---------- #
ai4c-option-tune --test_limit 3 --runfile run_test.sh
    # --optionfile path/to/your/python<version>/site-packages/ai4c/option_tuner/input/options.yaml \
    # --libfile path/to/your/python<version>/site-packages/ai4c/option_tuner/input/options_lib.yaml \
    # --measurefile path/to/your/python<version>/site-packages/ai4c/option_tuner/input/config_measure.yaml

其中默认的选项与性能值配置文件存在于以下路径：path/to/your/python<version>/site-packages/ai4c/option_tuner/input/*.yaml

用户可根据需要修改编译选项与动态库选项配置文件，相关关键词为：

required_*：必选调优项，将一直保留在调优中
bool_*：可选的编译优化开关选项
interval_*: 可选的编译参数（值选项，数据区间）
enum_*: 可选的编译参数（枚举选项）

例如，

yaml

required_config:
- -O2
bool_config:
- -funroll-loops
interval_config:
- name: --param max-inline-insns-auto
  default: 15
  min: 10
  max: 190

用户可根据需要修改性能值配置文件，相关关键词为：

weight: 性能值权重
optim: 目标优化方向（最大/最小值）

例如，

yaml

config_measure:
- name: throughput
  weight: 1
  optim: maximize

调优完成后，历史与最佳调优数据将保留在${parent_dir}/tuning/train.csv和${parent_dir}/tuning/result.txt中。

详细使用信息，请参考该测试例：https://gitee.com/openeuler/AI4C/tree/master/python/test/option_tuner

AI4C 使用手册 ​

1 AI4C 介绍 ​

2 软件架构说明 ​

3 AI4C 的安装构建 ​

3.1 直接安装AI4C ​

3.2 RPM包构建安装流程（推荐） ​

3.3 源码构建安装流程 ​

3.3.1 安装 ONNX Runtime 依赖 ​

3.3.2 安装 AI4C 的其他构建依赖 ​

3.3.3 构建 AI4C 框架 ​

4 使用流程 ​

4.1 AI 辅助编译优化 ​

4.1.1 循环展开与函数内联模型 ​

4.1.2 BOLT 采样基本块精度修正模型 ​

4.2 细粒度调优 ​

4.3 函数级的粗粒度调优 ​

4.4 应用级选项调优 ​

AI4C 使用手册

1 AI4C 介绍

2 软件架构说明

3 AI4C 的安装构建

3.1 直接安装AI4C

3.2 RPM包构建安装流程（推荐）

3.3 源码构建安装流程

3.3.1 安装 ONNX Runtime 依赖

3.3.2 安装 AI4C 的其他构建依赖

3.3.3 构建 AI4C 框架

4 使用流程

4.1 AI 辅助编译优化

4.1.1 循环展开与函数内联模型

4.1.2 BOLT 采样基本块精度修正模型

4.2 细粒度调优

4.3 函数级的粗粒度调优

4.4 应用级选项调优