GCC Base Performance Optimization Guide

Overview

The optimization of compiler base performance is crucial to improving the development efficiency, running performance, and maintainability of applications. It is an important research direction in computer science and one of the key steps in the process of software development. Based on the general compilation optimization capability, GCC for openEuler enhances mid- and back-end performance optimization technologies, including instruction optimization, vectorization enhancement, prefetch enhancement, and data flow analysis enhancement.

Installation and Deployment

Software Requirements

OS: openEuler 22.03 LTS SP3

Hardware Requirements

AArch64 architecture

Software Installation

Install GCC and related components as required. For example, install GCC:

yum install gcc

Usage

CRC Optimization

Description

Cyclic redundancy check (CRC) code is identified to generate efficient hardware instructions.

Usage

Add the -floop-crc option during compilation.

Note: -floop-crc must be used together with -O3 -march=armv8.1-a.

IF-conversion Enhancement

Description

IF-conversion is enhanced to use more registers to reduce conflicts.

Usage

This enhancement is part of the IF-conversion optimization of the Register Transfer Language (RTL). Enable the enhancement by using the following options.

-fifcvt-allow-complicated-cmps

-param=ifcvt-allow-register-renaming=[0,1,2] The default value is 0. The number is used to control the optimization scope.

Note: This enhancement requires the -O2 optimization level and must be used together with --param=max-rtl-if-conversion-unpredictable-cost=48 and --param=max-rtl-if-conversion-predictable-cost=48.

Multiplication Optimization

Description

Arm instructions are combined to convert low-order multiplications into high-order multiplication instructions.

Usage

Use the -fuaddsub-overflow-match-all and -fif-conversion-gimple options.

Note: This optimization requires the -O3 or higher optimization level and must be used together with -ftree-fold-phiopt option.

CMLT Instruction Generation

Description

CMLT instructions are generated for some elementary arithmetic operations to reduce the number of instructions.

Usage

Use the -mcmlt-arith option.

Note: This optimization requires the -O3 or higher optimization level.

Vectorization Enhancement

Description

Redundant instructions generated during vectorization are identified and simplified, and shorter arrays can be vectorized.

Usage

Use --param=tree-forwprop-perm=1 and --param=vect-alias-flexible-segment-len=1. The default values are 0.

Note: This optimization requires the -O3 or higher optimization level.

maxmin and UZP1/UZP2 Instruction Optimization

Description

The maxmin and UZP1/UZP2 instructions are optimized to reduce the total instructions and improve performance.

Usage

Use the -fconvert-minmax option. UZP1/UZP2 instruction optimization is enabled by default at a level higher than -O3.

Note: This optimization requires the -O3 or higher optimization level.

LDP and STP Optimization

Description

Each LDP and STP instruction with poor performance is split into two LDR and STR instructions.

Usage

Use the -fsplit-ldp-stp option. Use --param=param-ldp-dependency-search-range= [1,32] to control the search range. The default value is 16.

Note: This optimization requires the -O1 or higher optimization level.

AES Instruction Optimization

Description

The AES algorithm code is identified to accelerate instructions using hardware.

Usage

Use the -fcrypto-accel-aes option.

Note: This optimization requires the -O3 or higher optimization level.

Indirect Call Optimization

Description

Indirect calls in programs are identified and analyzed to convert them into direct calls.

Usage

Use the -ficp -ficp-speculatively option.

Note: This optimization must be used together with -O2 -flto -flto-partition=one.

IPA-prefetch

Description

Indirect memory accesses in a loop are identified to insert a prefetch instruction, thereby reducing the delay of indirect memory accesses.

Usage

Use the -fipa-prefetch -fipa-ic option.

Note: This optimization must be used together with -O3 -flto.

LLC-prefetch

Description

GCC for openEuler analyzes main execution paths in programs, performs memory multiplexing analysis on loops on the primary path, calculates and sorts top hot data, and inserts prefetch instructions to pre-allocate data to LLCs, reducing LLC misses.

Usage

Use the -fllc-allocate option. The -O2 or higher optimization level is required.

Other related interfaces:

Option	Default Value	Description
-param=mem-access-ratio=[0,100]	20	Ratio of the number of memory accesses in a loop to the number of instructions.
-param=mem-access-num=unsigned	3	Number of memory accesses in a loop.
-param=outer-loop-nums=[1,10]	1	Maximum number of outer loop layers that can be unrolled.
-param=filter-kernels=[0,1]	1	Whether to perform path series filtering on loops.
-param=branch-prob-threshold=[50,100]	80	Probability threshold for a branch to be considered highly probable.
-param=prefetch-offset=[1,999999]	1024	Prefetch offset distance. Generally, the value is a power of 2.
-param=issue-topn=unsigned	1	Number of prefetch instructions.
-param=force-issue=[0,1]	0	Whether to perform forcible prefetch, that is, the static mode.
-param=llc-capacity-per-core=[0,999999]	114	Average LLC capacity allocated to each core in multi-branch prefetch mode.

Bug Catching

Buggy Content

Bug Description

Submit As Issue

It's a little complicated....

I'd like to ask someone.

Just a small problem.

I can fix it online!

Bug Type

Specifications and Common Mistakes

● Misspellings or punctuation mistakes;

● Incorrect links, empty cells, or wrong formats;

● Chinese characters in English context;

● Minor inconsistencies between the UI and descriptions;

● Low writing fluency that does not affect understanding;

● Incorrect version numbers, including software package names and version numbers on the UI.

Usability

● Incorrect or missing key steps;

● Missing prerequisites or precautions;

● Ambiguous figures, tables, or texts;

● Unclear logic, such as missing classifications, items, and steps.

Correctness

● Technical principles, function descriptions, or specifications inconsistent with those of the software;

● Incorrect schematic or architecture diagrams;

● Incorrect commands or command parameters;

● Incorrect code;

● Commands inconsistent with the functions;

● Wrong screenshots.

Risk Warnings

● Lack of risk warnings for operations that may damage the system or important data.

Content Compliance

● Contents that may violate applicable laws and regulations or geo-cultural context-sensitive words and expressions;

● Copyright infringement.

How satisfied are you with this document

Not satisfied at all

Very satisfied

Submit

Click to create an issue. An issue template will be automatically generated based on your feedback.