Long-Term Supported Versions

    Innovation Versions

      GCC Base Performance Optimization Guide

      Overview

      The optimization of compiler base performance is crucial to improving the development efficiency, running performance, and maintainability of applications. It is an important research direction in computer science and one of the key steps in the process of software development. Based on the general compilation optimization capability, GCC for openEuler enhances mid- and back-end performance optimization technologies, including instruction optimization, vectorization enhancement, prefetch enhancement, and data flow analysis enhancement.

      Installation and Deployment

      Software Requirements

      OS: openEuler 22.03 LTS SP3

      Hardware Requirements

      AArch64 architecture

      Software Installation

      Install GCC and related components as required. For example, install GCC:

      yum install gcc
      

      Usage

      CRC Optimization

      Description

      Cyclic redundancy check (CRC) code is identified to generate efficient hardware instructions.

      Usage

      Add the -floop-crc option during compilation.

      Note: -floop-crc must be used together with -O3 -march=armv8.1-a.

      IF-conversion Enhancement

      Description

      IF-conversion is enhanced to use more registers to reduce conflicts.

      Usage

      This enhancement is part of the IF-conversion optimization of the Register Transfer Language (RTL). Enable the enhancement by using the following options.

      -fifcvt-allow-complicated-cmps

      -param=ifcvt-allow-register-renaming=[0,1,2] The default value is 0. The number is used to control the optimization scope.

      Note: This enhancement requires the -O2 optimization level and must be used together with --param=max-rtl-if-conversion-unpredictable-cost=48 and --param=max-rtl-if-conversion-predictable-cost=48.

      Multiplication Optimization

      Description

      Arm instructions are combined to convert low-order multiplications into high-order multiplication instructions.

      Usage

      Use the -fuaddsub-overflow-match-all and -fif-conversion-gimple options.

      Note: This optimization requires the -O3 or higher optimization level and must be used together with -ftree-fold-phiopt option.

      CMLT Instruction Generation

      Description

      CMLT instructions are generated for some elementary arithmetic operations to reduce the number of instructions.

      Usage

      Use the -mcmlt-arith option.

      Note: This optimization requires the -O3 or higher optimization level.

      Vectorization Enhancement

      Description

      Redundant instructions generated during vectorization are identified and simplified, and shorter arrays can be vectorized.

      Usage

      Use --param=tree-forwprop-perm=1 and --param=vect-alias-flexible-segment-len=1. The default values are 0.

      Note: This optimization requires the -O3 or higher optimization level.

      maxmin and UZP1/UZP2 Instruction Optimization

      Description

      The maxmin and UZP1/UZP2 instructions are optimized to reduce the total instructions and improve performance.

      Usage

      Use the -fconvert-minmax option. UZP1/UZP2 instruction optimization is enabled by default at a level higher than -O3.

      Note: This optimization requires the -O3 or higher optimization level.

      LDP and STP Optimization

      Description

      Each LDP and STP instruction with poor performance is split into two LDR and STR instructions.

      Usage

      Use the -fsplit-ldp-stp option. Use --param=param-ldp-dependency-search-range= [1,32] to control the search range. The default value is 16.

      Note: This optimization requires the -O1 or higher optimization level.

      AES Instruction Optimization

      Description

      The AES algorithm code is identified to accelerate instructions using hardware.

      Usage

      Use the -fcrypto-accel-aes option.

      Note: This optimization requires the -O3 or higher optimization level.

      Indirect Call Optimization

      Description

      Indirect calls in programs are identified and analyzed to convert them into direct calls.

      Usage

      Use the -ficp -ficp-speculatively option.

      Note: This optimization must be used together with -O2 -flto -flto-partition=one.

      IPA-prefetch

      Description

      Indirect memory accesses in a loop are identified to insert a prefetch instruction, thereby reducing the delay of indirect memory accesses.

      Usage

      Use the -fipa-prefetch -fipa-ic option.

      Note: This optimization must be used together with -O3 -flto.

      LLC-prefetch

      Description

      GCC for openEuler analyzes main execution paths in programs, performs memory multiplexing analysis on loops on the primary path, calculates and sorts top hot data, and inserts prefetch instructions to pre-allocate data to LLCs, reducing LLC misses.

      Usage

      Use the -fllc-allocate option. The -O2 or higher optimization level is required.

      Other related interfaces:

      OptionDefault ValueDescription
      -param=mem-access-ratio=[0,100]20Ratio of the number of memory accesses in a loop to the number of instructions.
      -param=mem-access-num=unsigned3Number of memory accesses in a loop.
      -param=outer-loop-nums=[1,10]1Maximum number of outer loop layers that can be unrolled.
      -param=filter-kernels=[0,1]1Whether to perform path series filtering on loops.
      -param=branch-prob-threshold=[50,100]80Probability threshold for a branch to be considered highly probable.
      -param=prefetch-offset=[1,999999]1024Prefetch offset distance. Generally, the value is a power of 2.
      -param=issue-topn=unsigned1Number of prefetch instructions.
      -param=force-issue=[0,1]0Whether to perform forcible prefetch, that is, the static mode.
      -param=llc-capacity-per-core=[0,999999]114Average LLC capacity allocated to each core in multi-branch prefetch mode.

      Bug Catching

      Buggy Content

      Bug Description

      Submit As Issue

      It's a little complicated....

      I'd like to ask someone.

      PR

      Just a small problem.

      I can fix it online!

      Bug Type
      Specifications and Common Mistakes

      ● Misspellings or punctuation mistakes;

      ● Incorrect links, empty cells, or wrong formats;

      ● Chinese characters in English context;

      ● Minor inconsistencies between the UI and descriptions;

      ● Low writing fluency that does not affect understanding;

      ● Incorrect version numbers, including software package names and version numbers on the UI.

      Usability

      ● Incorrect or missing key steps;

      ● Missing prerequisites or precautions;

      ● Ambiguous figures, tables, or texts;

      ● Unclear logic, such as missing classifications, items, and steps.

      Correctness

      ● Technical principles, function descriptions, or specifications inconsistent with those of the software;

      ● Incorrect schematic or architecture diagrams;

      ● Incorrect commands or command parameters;

      ● Incorrect code;

      ● Commands inconsistent with the functions;

      ● Wrong screenshots.

      Risk Warnings

      ● Lack of risk warnings for operations that may damage the system or important data.

      Content Compliance

      ● Contents that may violate applicable laws and regulations or geo-cultural context-sensitive words and expressions;

      ● Copyright infringement.

      How satisfied are you with this document

      Not satisfied at all
      Very satisfied
      Submit
      Click to create an issue. An issue template will be automatically generated based on your feedback.
      Bug Catching
      编组 3备份