BiSheng-Autotuner User Guide
Introduction to BiSheng-Autotuner
BiSheng-Autotuner is a command line tool based on BiSheng-OpenTuner and works with compilers (such as LLVM for openEuler and GCC for openEuler) that support tuning. It is responsible for generating search spaces, operating parameters, and driving the entire tuning process.
BiSheng-opentuner is an open-source framework for building automatic tuners for multi-objective programs in specific domains.
This document describes the automatic tuning compilation process based on LLVM for openEuler. For automatic tuning based on GCC for openEuler, see AI4C Usage Process.
BiSheng-Autotuner Tuning Process
The tuning process (as shown in Figure 1) consists of two phases: initial compilation and tuning process.

Figure 1 BiSheng-Autotuner tuning process
Initial Compilation
The initial compilation phase occurs before the tuning process begins. BiSheng-Autotuner first instructs the compiler to compile the target program code. During the compilation, the compiler generates YAML files that contain all tunable structures, informing developers which structures (such as modules, functions, and loops) in the target program can be used for tuning. For example, loop unrolling is one of the most common optimization methods in compilers. It replicates the loop body code multiple times to increase the instruction scheduling space and reduce the overhead of loop branch instructions. If the unroll factor is used as a tuning parameter, the compiler generates, in a YAML file, all loops that can be unrolled as tunable structures.
Tuning Process
After the tunable structure is successfully generated, the tuning process starts.
BiSheng-Autotuner first reads the YAML file of the generated tunable structure to generate the corresponding search space. This includes defining the specific parameters and their ranges for each tunable code structure.
In the tuning process, the autotuner explores a parameter combination based on the specified search algorithm, and generates a compilation configuration file in YAML format. This file is then used by the compiler to compile the target program code and generate a binary file.
Finally, BiSheng-Autotuner runs the compiled file according to developer-defined methods and collects performance information as feedback.
After a certain number of iterations, BiSheng-Autotuner identifies the final optimal configuration, generates the optimal compilation configuration file, and stores the file in YAML format.
Using BiSheng-Autotuner
Environment Requirements
Mandatory:
OS: openEuler 24.03 LTS series, openEuler 25.03, or later
Architecture: AArch64 or x86_64
Python 3.11.x
SQLite 3.0
Optional:
- LibYAML: recommended for installation to improve the file parsing performance of BiSheng-Autotuner.
Obtaining BiSheng-Autotuner
With the latest openEuler system, you can directly install the BiSheng-Autotuner and clang software packages.
yum install -y BiSheng-Autotuner
yum install -y clangTo build BiSheng-Autotuner from source code, refer to the following steps:
Install BiSheng-opentuner.
shellyum install -y BiSheng-opentunerClone and install BiSheng-Autotuner.
shellcd BiSheng-Autotuner ./dev_install.sh
Running BiSheng-Autotuner
This section uses CoreMark as an example to describe how to run automatic tuning. You can obtain the CoreMark source code from the GitHub community. For more details about how to use llvm-autotune, refer to the [Help](# Help) section. The following is an example script for tuning CoreMark with 20 iterations:
export AUTOTUNE_DATADIR=/tmp/autotuner_data/
CompileCommand="clang -O2 -o coremark core_list_join.c core_main.c core_matrix.c core_state.c core_util.c posix/core_portme.c -DPERFORMANCE_RUN=1 -DITERATIONS=300000 -I. -Iposix -g -DFLAGS_STR=\"\""
$CompileCommand -fautotune-generate;
llvm-autotune minimize;
for i in $(seq 20)
do
$CompileCommand -fautotune ;
time=`{ /usr/bin/time -p ./coremark 0x0 0x0 0x66 300000; } 2>&1 | grep "real" | awk '{print $2}'`;
echo "iteration: " $i "cost time:" $time;
llvm-autotune feedback $time;
done
llvm-autotune finalize;The following provides step-by-step instructions:
Configure environment variables
Use the environment variable
AUTOTUNE_DATADIRto specify the directory for storing tuning-related data. The specified directory must be empty.shellexport AUTOTUNE_DATADIR=/tmp/autotuner_data/Initial compilation
Add the compiler option
-fautotune-generateto compile and generate tunable code structures.shellcd examples/coremark/ clang -O2 -o coremark core_list_join.c core_main.c core_matrix.c core_state.c core_util.c posix/core_portme.c -DPERFORMANCE_RUN=1 -DITERATIONS=300000 -I. -Iposix -g -DFLAGS_STR=\"\" -fautotune-generateWarning
You are advised to apply this option only to hotspot code files that require focused tuning. If it is applied to too many files (more than 500 files), a large number of tunable code structure files will be generated. This may lead to a long initialization time (which can last several minutes) in step 3, as well as issues such as an excessively large search space, less effective tuning results, and longer convergence time.
Tuning initialization
Run the
llvm-autotunecommand to initialize the tuning task. This step generates the initial compilation configuration for the next compilation stage.shellllvm-autotune minimizeminimizespecifies the tuning objective, aiming to minimize the target metric (e.g., program execution time). Alternatively,maximizecan be used to maximize the target metric (e.g., program throughput).Tuning compilation
Add the BiSheng compiler option
-fautotuneto read the currentAUTOTUNE_DATADIRconfiguration and perform compilation.shellclang -O2 -o coremark core_list_join.c core_main.c core_matrix.c core_state.c core_util.c posix/core_portme.c -DPERFORMANCE_RUN=1 -DITERATIONS=300000 -I. -Iposix -g -DFLAGS_STR=\"\" -fautotunePerformance feedback
Run the program and collect performance metrics based on your requirements. Then, provide feedback using
llvm-autotune feedback. If you want to use the CoreMark execution time as the tuning metric, use the following method:shelltime -p ./coremark 0x0 0x0 0x66 300000 2>&1 1>/dev/null | grep real | awk '{print $2}' # Returns the actual execution time: 31.09shellllvm-autotune feedback 31.09Warning
Before using the
llvm-autotune feedbackcommand, you are advised to check whether the compilation in step 4 is normal and whether the compiled program runs correctly. If any compilation or runtime issues occur, enter the worst-case value corresponding to the tuning objective. For example, if the tuning objective isminimize, enterllvm-autotune feedback 9999. If the tuning objective ismaximize, enter0or-9999.Incorrect performance feedback may affect the final tuning results.
Tuning iteration
Based on the specified number of iterations, repeat steps 4 and 5 for tuning iteration.
End tuning
After multiple iterations, end the tuning process and save the optimal configuration file. The configuration file is stored in the directory specified by the environment variable
AUTOTUNE_DATADIR.shellllvm-autotune finalizeFinal compilation
Use the optimal configuration file obtained in step 7 to perform the final compilation. If the environment variables remain unchanged, you can directly use the
-fautotuneoption:shellclang -O2 -o coremark core_list_join.c core_main.c core_matrix.c core_state.c core_util.c posix/core_portme.c -DPERFORMANCE_RUN=1 -DITERATIONS=300000 -I. -Iposix -g -DFLAGS_STR=\"\" -fautotuneAlternatively, use
-mllvm -auto-tuning-input=to directly point to the configuration file.shellclang -O2 -o coremark core_list_join.c core_main.c core_matrix.c core_state.c core_util.c posix/core_portme.c -DPERFORMANCE_RUN=1 -DITERATIONS=300000 -I. -Iposix -g -DFLAGS_STR=\"\" -mllvm -auto-tuning-input=/tmp/autotuner_data/config.yaml
Help
The execution format of llvm-autotune is as follows:
llvm-autotune [-h] {minimize,maximize,feedback,dump,finalize}Optional commands:
minimize: Initializes tuning and generates the initial compiler configuration file, aiming to minimize the target metric (e.g., execution time).maximize: Initializes tuning and generates the initial compiler configuration file, aiming to maximize the target metric (e.g., throughput).feedback: Submits performance tuning results and generates a new compiler configuration.dump: Generates the current optimal configuration without terminating the tuning process (feedbackcan continue to be applied).finalize: Terminates the tuning process and generates the optimal compiler configuration (no furtherfeedbackis allowed).
Compiler-related Options
llvm-autotune must be used in conjunction with the LLVM compiler options -fautotune-generate and -fautotune.
-fautotune-generate:Generates a list of tunable code structures in the
autotune_datadirdirectory. The default directory can be overridden by the environment variableAUTOTUNE_DATADIR.As the first step of the tuning preparation process, it is typically used before running the
llvm-autotune minimize/maximizecommand.This option can also be assigned a value to change the tuning granularity. Available values include:
Other,Function,Loop,CallSite,MachineBasicBlock,Switch,LLVMParam, andProgramParam, whereLLVMParamandProgramParamcorrespond to coarse-grained tuning. For example,-fautotune-generate=Loopenables tunable code structures only for loops, and each loop will be assigned different parameter values during tuning.Otherindicates the global scope, where the generated tunable code structures correspond to each compilation unit (code file).-fautotune-generateis equivalent to-fautotune-generate=Function,Loop,CallSiteby default. The default value is generally recommended.To enable option tuning (
LLVMParamandProgramParam), you need to specify an extended search space for llvm-autotune. The default search space does not contain preset tuning options.shellllvm-autotune minimize --search-space /usr/lib64/python<version>/site-packages/autotuner/search_space_config/extended_search_space.yamlThe
site-packagesdirectory can be found using thepip show autotunercommand.
-fautotune:Use the compiler configuration in
autotune_datadirto perform tuning compilation. The default directory can be overridden by the environment variableAUTOTUNE_DATADIR.It is typically used during the tuning iteration process, after running
llvm-autotune minimize/maximize/feedbackcommands.
Licensed under the MulanPSL2