Key Features

AI

openEuler offers an efficient development and runtime environment that containerizes software stacks of AI platforms with out-of-the-box availability. It also provides various AI frameworks to facilitate AI development.

  • OS for AI: openEuler offers an efficient development and runtime environment that containerizes software stacks of AI platforms with out-of-the-box availability. It also provides various AI frameworks to facilitate AI development.

    • openEuler supports TensorFlow, PyTorch, and MindSpore frameworks and software development kits (SDKs) of major computing architectures, such as Compute Architecture for Neural Networks (CANN) and Compute Unified Architecture (CUDA), to make it easy to develop and run AI applications.
    • Environment setup is further simplified by containerizing software stacks. openEuler provides three types of container images:
      • SDK images: Use openEuler as the base image and install the SDK of a computing architecture, for example, Ascend CANN and NVIDIA CUDA.
      • AI framework images: Use an SDK image as the base and install AI framework software, such as PyTorch and TensorFlow. You can use an AI framework image to quickly build a distributed AI framework, such as Ray.
      • Model application images: Provide a complete set of toolchains and model applications

    For details, see the openEuler AI Container Image User Guide.

  • AI for OS: AI is making openEuler smarter. openEuler Copilot System is an intelligent Q&A platform developed on foundation models and openEuler data. It is designed to streamline code generation, troubleshooting, and O&M.

    • Intelligent Q&A: openEuler Copilot System is accessible via web or shell.
    1. Workflow scheduling:

      • Atomic agent operations: Multiple agent operations can be combined into a multi-step workflow that is internally ordered and associated, and is executed as an inseparable atomic operation.
      • Real-time data processing: Data generated in each step of the workflow can be processed immediately and then transferred to the next step.
      • Intelligent interaction: When openEuler Copilot System receives a vague or complex user instruction, it proactively asks the user to clarify and provide more details.
    2. Task recommendation:

      • Intelligent response: openEuler Copilot System can analyze the semantic information entered in text format, determining the user's intent and selecting the most matched workflow.
      • Intelligent guidance: openEuler Copilot System comprehensively analyzes the execution status, function requirements, and associated tasks of the current workflow, and provides next-step operation suggestions based on users' personal preferences and historical patterns.
    3. Retrieval-augmented generation (RAG):

      • RAG used in openEuler Copilot System efficiently handles diverse document formats and content scenarios, improving the Q&A service experience while minimizing additional system load.
    4. Corpus governance:

      • Corpus governance is a core RAG capability. It imports corpuses into the knowledge base in a supported format using fragment relationship extraction, fragment derivative construction, and optical character recognition (OCR). This increases the retrieval hit rate.

      For details, see the openEuler Copilot System Intelligent Q&A Service User Guide.

    • Intelligent tuning: openEuler Copilot System supports the intelligent shell entry. Through this entry, you can interact with the openEuler Copilot System using a natural language and perform heuristic tuning operations such as performance data collection, system performance analysis, and system performance tuning.
    • Intelligent diagnosis:
    1. Inspection: The Inspection Agent checks for abnormalities of designated IP addresses and provides an abnormality list that contains associated container IDs and abnormal metrics (such as CPU and memory).
    2. Demarcation: The Demarcation Agent analyzes and demarcates a specified abnormality contained in the inspection result and outputs the top 3 metrics of the root cause.
    3. Location: The Detection Agent performs profiling location analysis on the root cause, and provides useful hotspot information such as the stack, system time, and performance metrics related to the root cause.
    • Intelligent container images: openEuler Copilot System can invoke environment resources through a natural language, assist in pulling container images for local physical resources, and establish a development environment suitable for debugging on existing compute devices. This system supports three types of containers, and container images have been released on Docker Hub. You can manually pull and run these container images.
    1. SDK layer: encapsulates only the component libraries that enable AI hardware resources, such as CUDA and CANN.
    2. SDKs + training/inference frameworks: accommodates TensorFlow, PyTorch, and other frameworks (for example, tensorflow2.15.0-cuda12.2.0 and pytorch2.1.0.a1-cann7.0.RC1) in addition to the SDK layer.
    3. SDKs + training/inference frameworks + LLMs: encapsulates several models (for example, llama2-7b and chatglm2-13b) based on the second type of containers

openEuler Embedded

openEuler Embedded provides a closed loop framework often found in operational technology (OT) applications such as manufacturing and robotics, whereby innovations help optimize its embedded system software stack and ecosystem. openEuler 24.03 LTS SP1 Embedded is designed for embedded applications, offering significant progress in southbound and northbound ecosystems, technical features, infrastructure, and implementation over previous generations. openEuler Embedded will work alongside community partners, users, and developers to broaden its support for emerging chip architectures, including Loongson, and a wider range of hardware. The platform will strengthen offerings in industrial middleware, embedded AI, edge computing, and simulation systems, delivering a robust solution for embedded system software.

  • Southbound ecosystem: openEuler Embedded Linux supports mainstream processor architectures like AArch64, x86_64, AArch32, and RISC-V, and will extend support to LoongArch in the future. openEuler 24.03 and later versions have a rich southbound ecosystem and support chips from Raspberry Pi, HiSilicon, Rockchip, Renesas, TI, Phytium, StarFive, and Allwinner. In openEuler 24.03 LTS SP1 Embedded, Kunpeng 920 is also supported.
  • Embedded virtualization base: openEuler Embedded uses an elastic virtualization base that enables multiple OSs to run on a system-on-a-chip (SoC). The base incorporates a series of technologies including bare metal, embedded virtualization, lightweight containers, LibOS, trusted execution environment (TEE), and heterogeneous deployment.
  • MICA deployment framework: The MICA deployment framework is a unified environment that masks the differences between technologies that comprise the embedded elastic virtualization base. The multi-core capability of hardware combines the universal Linux OS and a dedicated real-time operating system (RTOS) to make full use of all OSs.
  • Northbound ecosystem:
    • Northbound software packages: Over 600 common embedded software packages can be built using openEuler.
    • Soft real-time kernel: This capability helps respond to soft real-time interrupts within microseconds.
    • DSoftBus: The distributed soft bus system (DSoftBus) of openEuler Embedded integrates the DSoftBus and point-to-point authentication module of OpenHarmony. It implements interconnection between openEuler-based embedded devices and OpenHarmony-based devices as well as between openEuler-based embedded devices.
    • Embedded containers and edges: With iSula containers, openEuler and other OS containers can be deployed on embedded devices to simplify application porting and deployment. Embedded container images can be compressed to 5 MB, and can be easily deployed into the OS on another container.
  • UniProton: UniProton is an RTOS that features ultra-low latency and flexible MICA deployments. It is suited for industrial control because it supports both microcontroller units and multi-core CPUs. UniProton provides the following capabilities:
    • Compatible with processor architectures like Cortex-M, AArch64, x86_64, and riscv64, and supports M4, RK3568, RK3588, x86_64, Hi3093, Raspberry Pi 4B, Kunpeng 920, Ascend 310, and Allwinner D1s.
    • Connects with openEuler Embedded Linux on Raspberry Pi 4B, Hi3093, RK3588, and x86_64 devices in bare metal mode.
    • Can be debugged using GDB on openEuler Embedded Linux.

DevStation

DevStation is the first openEuler developer workstation to come pre-installed with VS Code. Designed to streamline software coding, compilation, build, release, and deployment, DevStation integrates the oeDeploy tool to simplify the deployment of the AI and cloud-native software stacks and uses oeDevPlugin to pull code repositories and the AI4C-powered compiler for compilation. Later versions will include epkg, a new openEuler package manager that enables developers to seamlessly manage and switch betwe n multiple software versions across different environments.

  • Developer-friendly integrated environment: supports multiple programming languages and comes with pre-installed development tools and IDEs such as VS Code to streamline front-end, back-end, and full-stack development.
  • Package management and automated deployment: features easy-to-use package management tools for one-click installation and updates. It integrates container technologies such as Docker and iSula, streamlining application containerization and deployment. epkg supports multi-version deployment, reducing the complexity of setting up development environments.
  • GUI-based programming: realizes intuitive programming.
  • AI development support: comes with pre-installed AI frameworks like TensorFlow and PyTorch, optimized for hardware accelerators such as GPUs and NPUs. It offers a complete environment for AI model development and training, along with openEuler Copilot System for AI-assisted OS troubleshooting.
  • Debugging and testing: provides built-in debugging tools, such as GDB, CUnit, GTest, and perf, and test and tuning tools for fast debugging and automated testing.
  • Versioning and collaboration: integrates the Git versioning tool and remote collaboration tools such as Slack, Mattermost, and GitLab, promoting team development and remote collaboration.
  • Security and compliance checks: provides tools for security scanning and code compliance checks, enabling developers to address vulnerabilities and issues early in the development process.

epkg

epkg is a new software package manager that supports the installation and use of non-service software packages. It solves version compatibility issues so that users can install and run software of different versions on the same OS by using simple commands to create, enable, and switch between environments.

  • Version compatibility: enables multiple versions of the same software package to run on the same node without version conflicts.
  • Environment management: allows users to create, enable, and switch between environments. Users can use channels of different environments to use software packages of different versions. When an environment is switched to another, the software package version is also changed.
  • Installation by common users: allows common users to install software packages, create environments, and manage their environment images, reducing security risks associated with software package installation.

openEuler GCC Toolset 14

To enable new compute features and make the most of hardware features, openEuler GCC Toolset 14 offers a minor GCC 14 toolchain that is later than the major GCC of openEuler 24.03 LTS SP1, enabling users to select the most appropriate compilation environment. By using openEuler GCC Toolset 14, users can easily switch between different GCC versions to use new hardware features.

To decouple from the default major GCC and prevent conflicts between the dependency libraries of the minor and major GCC versions, the software package of the openEuler GCC Toolset 14 toolchain is named starting with gcc-toolset-14-, followed by the name of the original GCC software package.

For version management, this solution introduces the SCL tool that provides an enable script in the /opt/openEuler/gcc-toolset-14 directory to register the environment variables of openEuler GCC Toolset 14 with SCL. Then, users can use the tool to start a new Bash shell that uses the environment variables of the minor version configured in the enable script. In this way, it is easy to switch between the major and minor GCC versions.

Kernel Innovations

openEuler 24.03 LTS SP1 runs on Linux kernel 6.6 and inherits the competitive advantages of community versions and innovative features released in the openEuler community.

  • Folio-based memory management: Folio-based Linux memory management is used instead of page. A folio consists of one or more pages and is declared in struct folio. Folio- based memory management is performed on one or more complete pages, rather than on PAGE_SIZE bytes. This alleviates compound page conversion and tail page misoperations, while decreasing the number of least recently used (LRU) linked lists and optimizing memory reclamation. It allocates more continuous memory on a per-operation basis to reduce the number of page faults and mitigate memory fragmentation. Folio-based management accelerates large I/Os and improves throughput, and large folios consisting of anonymous pages or file pages are available. For AArch64 systems, a contiguous bit (16 contiguous page table entries are cached in a single entry within a translation lookaside buffer, or TLB) is provided to reduce system TLB misses and improve system performance. In openEuler 24.03 LTS SP1, multi-size transparent hugepage (mTHP) allocation by anonymous shmem and mTHP lazyfreeing are available. The memory subsystem supports large folios, with a new sysfs control interface for allocating mTHPs by page cache and a system-level switch for feature toggling.

  • Multipath TCP (MPTCP): MPTCP is introduced to let applications use multiple network paths for parallel data transmission, compared with single-path transmission over TCP. This design improves network hardware resource utilization and intelligently allocates traffic to different transmission paths, thereby relieving network congestion and improving throughput. MPTCP features the following performance highlights:

    • Selects the optimal path after evaluating indicators such as latency and bandwidth.
    • Ensures hitless network switchover and uninterrupted data transmission when switching between networks.
    • Uses multiple channels where data packets are distributed to implement parallel transmission, increasing network bandwidth.

    In the lab environment, the Rsync file transfer tool that adopts MPTCP v1 shows good transmission efficiency improvement. Specifically, a 1.3 GB file can be transferred in just 14.35s (down from 114.83s), and the average transfer speed is increased from 11.08 MB/s to 88.25 MB/s. In simulations of path failure caused by unexpected faults during transmission, MPTCP seamlessly switches data to other available channels, ensuring transmission continuity and data integrity.

    In openEuler 24.03 LTS SP1, MPTCP-related features in Linux mainline kernel 6.9 have been fully transplanted and optimized.

  • Large folio for ext4 file systems: The IOzone performance can be improved by 80%, and the writeback process of the iomap framework supports batch block mapping. Blocks can be requested in batches in default ext4, optimizing ext4 performance in various benchmarks. For ext4 buffer I/O and page cache writeback operations, the buffer_head framework is replaced with the iomap framework that adds large folio support for ext4. In version 24.03 LTS SP1, the performance of small buffered I/Os (≤ 4 KB) is optimized when the block size is smaller than the folio size, typically seeing a 20% performance increase.

  • xcall and xint: The Linux kernel is becoming increasingly complex, and system calls, especially the simple ones, can be a performance bottleneck. System calls on the AArch64 platform share the same exception entry point, which includes redundant processes such as security checks. Common ways to reduce system call overhead include service offloading and batch processing, but both require service adaptation. xcall provides a solution that does not require service code modification. It streamlines system calls by optimizing their processing logic, trading off some maintenance and security capabilities to reduce overhead.

    To unify interrupt handling, the kernel integrates all interrupt handling processes into its general interrupt handling framework. As the kernel evolves, the general interrupt handling framework has gradually accumulated many security hardening and maintenance features that are not closely related to interrupt handling, increasing latency unpredictability. xint simplifies interrupt handling to reduce the latency and system overhead.

  • CacheFiles failover: In on-demand mode of CacheFiles, if the daemon breaks down or is killed, subsequent read and mount requests return -EIO. The mount points can be used only after the daemon is restarted and the mount operations are performed again. For public cloud services, such I/O errors will be passed to cloud service users, which may impact job execution and endanger the overall system stability. The CacheFiles failover feature renders it unnecessary to remount the mount points upon daemon crashes. It requires only the daemon to restart, ensuring that these events are invisible to users.

  • Programmable scheduling: Interfaces of programmable scheduling are available in the user space as an extension of the Completely Fair Scheduler (CFS) algorithm. Users can customize scheduling policies based on service scenarios to bypass the original CFS. This new feature allows programmable core selection, load balancing, task selection, and task preemption, and supports labelling and kfuncs.

  • SMC-D with loopback-ism: Shared memory communication over DMA (SMC-D) is a kernel network protocol stack that is compatible with socket interfaces and accelerates TCP communication transparently over shared memory. Initially, SMC-D was exclusive to IBM S/390 architecture systems. The introduction of the loopback-ism virtual device enabled broader adoption by simulating Internal Shared Memory (ISM). This innovation transformed SMC-D into a universal kernel mechanism compatible with non-S/390 architectures as well. SMC-D with loopback-ism applies to scenarios where TCP is used for inter-process or inter- container communication in the OS. It accelerates communication by bypassing the kernel TCP/IP stack. Together with smc-tools, users can preload dynamic libraries through LD_PRELOAD to replace the TCP stack without adapting applications. Feedback from the community shows that, compared with the native TCP, SMC-D with loopback-ism can improve the network throughput by more than 40%.

  • IMA RoT framework: The Linux Integrity Measurement Architecture (IMA) subsystem mainly uses the trusted platform module (TPM) as the root of trust (RoT) device to provide integrity proof for the measurement list, and code of the subsystem is tightly coupled with TPM operations. However, workloads like confidential computing require IMA to use new RoT devices, such as virtCCA supported by openEuler. This IMA RoT framework implements an abstraction layer between the IMA subsystem and RoT devices. It simplifies the adaptation of RoT devices to the IMA subsystem and facilitates the configuration and operation of various RoT devices by users and the IMA subsystem.

  • Protection against script viruses: Typical ransomware variants are script files (such as JSP files), which can run through the interpreter and bypass security technologies in the kernel to launch attacks. However, the IMA technology used by the kernel to defend against intrusion mainly targets virus files of the Executable and Linkable Format (ELF). This new feature enables IMA to check script files that are indirectly executed in the system. It uses the execveat() system call with newly added flags to check the execution permission of scripts, and call the IMA interface during the check to measure script integrity. Test and verification results demonstrate how the script interpreter proactively invokes execveat() and transfers the AT_CHECK flag to check the execute permission (including the IMA check) of scripts, which are executed only after they pass the permission check.

  • Halt polling: This feature enables guest vCPUs to poll at the idle state to avoid sending an inter-processor interrupt (IPI) when performing a wakeup. This optimization reduces the interrupt sending and handling overhead. In addition, the VM does not exit during polling, further reducing the VM entry/exit overhead. This feature also reduces the inter-process communication latency and improves VM performance.

  • CAQM for the kernel TCP/IP stack: Constrained active queue management (CAQM) is an algorithm for network congestion control. It is mainly used on compute nodes in data centers that use TCP to transmit data and network switch nodes on transmission paths. The network switch nodes calculate idle bandwidth and optimal bandwidth allocation, and the compute node protocol stack works with the switches to achieve "zero-queue" congestion control effect and ultra-low transmission latency in high-concurrency scenarios.

    The CAQM algorithm adds congestion control flags to the Ethernet link layer to dynamically adjust the queue length and improve the utilization of network resources. In latency-sensitive general-computing scenarios in data centers, this feature greatly reduces latency and packet loss, improving user experience.

    In typical data center scenarios, this algorithm outperforms the classic Cubic algorithm in two key metrics: (1) 92.4% lower transmission latency, and (2) 99.97% TCP transmission bandwidth utilization with 90% lower switch queue buffer usage.

    The CAQM algorithm requires collaboration between compute node servers and switches. Therefore, intermediate switches must support the CAQM protocol (for protocol header identification, congestion control field adjustment, and so on). This algorithm is controlled through the kernel compilation macro (CONFIG_ETH_CAQM) and is disabled by default. To use this algorithm, users need to enable this macro and recompile the kernel.

NestOS

NestOS is a community cloud OS that uses nestos-assembler for quick integration and build. It runs rpm-ostree and Ignition tools over a dual rootfs and atomic update design, and enables easy cluster setup in large-scale containerized environments. Compatible with Kubernetes and OpenStack, NestOS also reduces container overheads.

  • Out-of-the-box availability: integrates popular container engines such as iSulad, Docker, and Podman to provide lightweight and tailored OSs for the cloud.
  • Easy configuration: uses the Ignition utility to install and configure a large number of cluster nodes with a single configuration.
  • Secure management: runs rpm-ostree to manage software packages and works with the openEuler software package source to ensure secure and stable atomic updates.
  • Hitless node updating: uses Zincati to provide automatic node updates and reboot without interrupting services.
  • Dual rootfs: executes dual rootfs for active/standby switchovers, to ensure integrity and security during system running.

SysCare

SysCare is a system-level hotfix software that provides security patches and hot fixing for OSs. It can fix system errors without restarting hosts. SysCare combines kernel-mode and user-mode hot patching to take over system repair, freeing up valuable time for users to focus on core services. Looking ahead, SysCare will introduce live OS update capabilities, further improving O&M efficiency.

  • Hot patch creation: Hot patch RPM packages can be generated by providing the paths of the source RPM package, debuginfo RPM package, and patches to be applied, eliminating the need to modify the software source code.
  • Patch lifecycle management: SysCare simplifies patch lifecycle management, offering a complete and user-friendly solution. With a single command, users can efficiently manage hot patches, saving time and effort. SysCare leverages the RPM system to build hot patches with complete dependencies. This allows for easy integration into the software repository and simplifies distribution, installation, update, and uninstallation of hot patches.
  • Kernel-mode and user-mode hot patch integration: By utilizing the upatch and kpatch technologies, SysCare delivers seamless hot fixing for the entire software stack from applications and dynamic libraries to the kernel, eliminating the need for disruptive downtime.
  • New features: SysCare preserves and restores accepted hot patches after reboot, maintaining their original activation order.

NRI Plugin Support of iSula

Node Resource Interface (NRI) is a public interface for controlling node resources and provides a generic framework for pluggable extensions for CRI-compatible container runtimes. It offers a fundamental mechanism for extensions to track container states and make limited modifications to their configurations. This allows users to insert custom logic into OCI-compatible runtimes, enabling controlled changes to containers or performing additional operations outside the scope of OCI at certain points in the container lifecycle. The newly added support for NRI plugins of iSulad reduces costs for container resource management and maintenance, eliminates scheduling delays, and ensures information consistency in Kubernetes environments.

An NRI plugin establishes a connection with iSulad through the NRI runtime service started within the isula-rust-extension component. This connection enables the plugin to subscribe to both pod and container lifecycle events.

  • For pods, subscribable events include creation, stopping, and removal.
  • For containers, subscribable events include creation, post-creation, starting, post-start, updating, post-update, stopping, and removal.

Upon receiving a CRI request from Kubernetes, iSulad relays this request to all NRI plugins subscribed to the corresponding lifecycle events. The request received by an NRI plugin includes metadata and resource information for the relevant pod or container. The plugin can then adjust the resource configuration of the pod or container as required. Finally, the NRI plugin communicates the updated configuration to iSulad, which in turn relays it to the container runtime, making the updated configuration effective.

oeAware with Enhanced Information Collection and Tuning Plugins

oeAware is a framework that provides low-load collection, sensing, and tuning upon detecting defined system behaviors on openEuler. The framework divides the tuning process into three layers: collection, sensing, and tuning. Each layer is associated through subscription and developed as plugins, overcoming the limitations of traditional tuning techniques that run independently and are statically enabled or disabled.

Every oeAware plugin is a dynamic library that utilizes oeAware interfaces. The plugins comprise multiple instances that each contains several topics and deliver collection or sensing results to other plugins or external applications for tuning and analysis purposes. The framework consists of the following components:

  • The SDK enables subscription to plugin topics, with a callback function handling data from oeAware. This allows external applications to create tailored functionalities, such as cross- cluster information collection or local node analysis.
  • The Performance monitoring unit (PMU) information collection plugin gathers performance records from the system PMU.
  • The Docker information collection plugin retrieves specific parameter details about the Docker environment.
  • The system information collection plugin captures kernel parameters, thread details, and resource information (CPU, memory, I/O, network) from the current environment.
  • The thread sensing plugin monitors key information about threads.
  • The evaluation plugin examines system NUMA and network information during service operations, suggesting optimal tuning methods.
  • The system tuning plugins comprise stealtask for enhanced CPU tuning, smc_tune which leverages shared memory communication in the kernel space to boost network throughput and reduce latency, and xcall_tune which bypasses non-essential code paths to minimize system call processing overhead.
  • The Docker tuning plugin addresses CPU performance issues during sudden load spikes by utilizing the CPU burst feature.

KubeOS

KubeOS is a lightweight and secure OS designed for cloud-native environments. It simplifies O&M by providing unified tools for Kubernetes-based systems. KubeOS is specifically designed for running containers. It features a read-only root directory, includes only the essential components for the container runtime, and utilizes dm-verity security hardening. This minimizes vulnerabilities and attack surfaces while improving resource utilization and boot speed. KubeOS can leverage native Kubernetes declarative APIs to unify the upgrade, configuration, and maintenance of worker node OSs within a cluster. This approach simplifies cloud-native operations, addresses challenges associated with OS version fragmentation across cluster nodes, and provides a unified solution for OS management.

KubeOS introduces enhanced configuration capabilities, image customization features, and root file system (rootfs) integrity protection using dm-verity, as shown in the figure. Through a centralized management platform, KubeOS enables unified configuration of cluster parameters in the limits.conf file and cluster components like containerd and kubelet.

KubeOS provides comprehensive system image customization options, allowing users to configure systemd services, GRUB passwords, system drive partitions, users and user groups, files, scripts, and persist partition directories.

For static integrity protection, KubeOS activates dm-verity during VM image creation to ensure rootfs integrity. It also supports upgrades and configuration when dm-verity is enabled.

IMA

The IMA is an open source Linux technology widely used for file integrity protection in real-world applications. It performs integrity checks on system programs to prevent tampering and ensure only authenticated (signed or HMAC-verified) files are executed via an allowlist.

Applications running on Linux can be categorized into two types:

  • Binary executables: Programs in the ELF format can be directly executed by exec or mmap system calls. Through hook functions in exec and mmap system calls, the IMA triggers measurement or verification processes, ensuring integrity protection.
  • Interpreter-based applications: Programs indirectly executed via interpreters, such as scripts run by Bash, Python, and Lua interpreters and Java programs executed by the JVM.

The current IMA fails to protect interpreter-based applications, as these applications are typically loaded and parsed by interpreters through read system calls. The IMA cannot differentiate them from other mutable files, such as configuration or temporary files. As a result, enabling the IMA for read system calls inadvertently includes these mutable files in the protection scope. Mutable files lack pre-generated measurement baselines or verification credentials, causing integrity check failures.

To address this limitation, openEuler enhances the IMA feature to significantly improve integrity protection for interpreter-based applications.

Heterogeneous RoT Support

Common attack methods often target the authenticity and integrity of information systems. Hardware RoTs have become a standard method for protecting critical system components, enabling the system to measure and verify integrity. When tampering or counterfeiting is detected, the system triggers alerts or blocks the activity.

The prevailing protection approach uses the trusted platform module (TPM) as the RoT, combined with the integrity measurement software stack to establish a system trust chain that ensures system authenticity and integrity. openEuler supports integrity measurement features such as measured boot, IMA measurement for files, and dynamic integrity measurement (DIM) for memory.

The RoT framework used by openEuler 24.03 LTS SP1 offers a unified measurement interface to the upper-layer integrity protection software stack. Deployed within the kernel integrity subsystem, the framework supports multiple RoT drivers and expands integrity measurement beyond the TPM to include heterogeneous RoTs.

Enhanced secGear

The unified remote attestation framework of secGear addresses the key components related to remote attestation in confidential computing, abstracting away the differences between different Trusted Execution Environments (TEEs). It provides two components: attestation agent and attestation service. The agent is integrated by users to obtain attestation reports and connect to the attestation service. The service can be deployed independently and supports the verification of iTrustee and virtCCA remote attestation reports.

The unified remote attestation framework focuses on confidential computing functionalities, while service deployment and operation capabilities are provided by third-party deployment services. The key features of the unified remote attestation framework are as follows:

  • Report verification plugin framework: Supports runtime compatibility with attestation report verification for different TEE platforms, such as iTrustee, virtCCA, and CCA. It also supports the extension of new TEE report verification plugins.
  • Certificate baseline management: Supports the management of baseline values of Trusted Computing Bases (TCB) and Trusted Applications (TA) as well as public key certificates for different TEE types. Centralized deployment on the server ensures transparency for users.
  • Policy management: Provides default policies for ease of use and customizable policies for flexibility.
  • Identity token: Issues identity tokens for different TEEs, endorsed by a third party for mutual authentication between different TEE types.
  • Attestation agent: Supports connection to attestation services/peer-to-peer attestation, compatible with TEE report retrieval and identity token verification. It is easy to integrate, allowing users to focus on their service logic.

Two modes are supported depending on the usage scenario: peer-to-peer verification and attestation service verification.

Attestation service verification process:

  1. The user (regular node or TEE) initiates a challenge to the TEE platform.
  2. The TEE platform obtains the TEE attestation report through the attestation agent and returns it to the user.
  3. The user-side attestation agent forwards the report to the remote attestation service.
  4. The remote attestation service verifies the report and returns an identity token in a unified format endorsed by a third party.
  5. The attestation agent verifies the identity token and parses the attestation report verification result.

Peer-to-peer verification process (without the attestation service):

  1. The user initiates a challenge to the TEE platform, which then returns the attestation report to the user.
  2. The user uses a local peer-to-peer TEE verification plugin to verify the report.

gala-anteater for Minute-Level Container Interference Detection

gala-anteater is an AI-powered exception detection platform for gray faults in the OS. Integrating various exception detection algorithms, it achieves system-level fault detection and reporting through automated model pre-training, online incremental learning, and model updates.

In high-density online container deployment scenarios, the presence of disorderly resource contention can lead to inter-container interference. gala-anteater enables minute-level identification of interference sources (CPU or I/O), aiding O&M personnel in swiftly tracing and resolving issues to ensure service QoS.

gala-anteater leverages a combination of offline and online learning techniques to facilitate offline model training and online updates, ultimately enabling real-time online exception detection.

  • Offline: Initially, historical KPI datasets undergo preprocessing and feature selection to generate a training set. This set is then used to train and optimize an unsupervised neural network model (such as a variational autoencoder). Finally, a manually labeled test set aids in selecting the optimal model.

  • Online: The trained model is deployed online, where it undergoes further training and parameter tuning using real-time data. This continuously refined model then performs real-time exception detection within the online environment.

A-Ops Integration with authHub for Unified User Authentication

authHub is a unified user authentication platform built on the OAuth 2.0 protocol. A-Ops can now utilize authHub to manage application registration, enabling seamless user authentication across multiple platforms.

Core functionalities of authHub include:

  • Application management: Deployed applications can be registered with and configured on authHub to provide access to their features.

  • User authentication: Applications managed by authHub support single sign-on (SSO) and single sign-out processes.

Minute-level Demarcation and Location of Microservice Performance Problems (TCP, I/Os, and Scheduling)

gala-gopher, gala-spider, and gala-anteater implement a topological root cause analysis approach to facilitate fault detection and root cause location in large-scale clusters. In openEuler 24.03 LTS SP1, fine-grained capabilities have been introduced for cloud-native environments based on Layer 7 protocols like HTTPS, PostgreSQL, and MySQL, enabling O&M teams to quickly locate the source of faults, thus enhancing system stability and reliability. The following features are available:

  • Metric collection: gala-gopher uses eBPF to collect and report network and I/O metrics.
  • Cluster topology: gala-spider receives data reported by gala-gopher and constructs a container- and process-level call relationship topology.
  • Fault detection: gala-anteater classifies the reported metrics based on the fault detection model to determine whether an exception has occurred in the system.
  • Root cause location: gala-anteater locates the root cause node of the exception based on node exception and topology information.

utsudo

sudo is one of the commonly used utilities for Unix-like and Linux OSs. It enables users to run specific commands with the privileges of the super user. utsudo is developed to address issues of security and reliability common in sudo.

utsudo uses Rust to reconstruct sudo to deliver more efficient, secure, and flexible privilege escalation. It includes modules such as the common utility, overall framework, and function plugins.

Basic features

  • Access control: Limits the commands that can be executed by users, and specifies the required authentication method.
  • Audit log: Records and traces all commands and tasks executed by each user.
  • Temporary privilege escalation: Allows common users to temporarily escalate to a super user for executing privileged commands or tasks.
  • Flexible configuration: Allows users to set arguments such as command aliases, environment variables, and execution parameters to meet system requirements.

Enhanced features

utsudo 0.0.2 comes with openEuler 24.03 LTS SP1 to provide the following features:

  • Privilege escalation: Escalates the privilege of a process run by a common user to the root privilege.
  • Plugin loading: Parses plugin configuration files and dynamically loads plugin libraries.

utshell

utshell is a new shell that introduces new features and inherits the usability of Bash. It enables interaction through command lines, such as responding to user operations to execute commands and providing feedback, and can execute automated scripts to facilitate O&M.

Basic features

  • Command execution: Runs and sends return values from commands executed on user machines.
  • Batch processing: Automates task execution using scripts.
  • Job control: Executes, manages, and controls multiple user commands as background jobs concurrently.
  • Historical records: Records the commands entered by users.
  • Command aliases: Allows users to create aliases for commands to customize their operations.

Enhanced features

utshell 0.5 runs on openEuler 24.03 LTS SP1 to perform the following operations:

  • Parses shell scripts.
  • Runs third-party commands.

GCC for openEuler

The baseline version of GCC for openEuler has been upgraded from open source GCC 10.3 to GCC 12.3, supporting features such as automatic feedback-directed optimization (AutoFDO), software and hardware collaboration, memory optimization, Scalable Vector Extension (SVE), and vectorized math libraries.

  • The default language of GCC for openEuler has been upgraded from C14/C++14 to C17/C++17, enabling GCC for openEuler to support more hardware features like Armv9-A and x86 AVX512-FP16.
  • GCC for openEuler supports structure optimization and instruction selection optimization, leveraging Arm hardware features to improve system running efficiency. In the benchmark tests such as SPEC CPU 2017, GCC for openEuler has proven to deliver higher performance than GCC 10.3 of the upstream community.
  • Further, it fuels AutoFDO to improve the performance of MySQL databases at the application layer.

Feature Description

  • SVE vectorization: Significantly improves program running performance for Arm-based machines that support SVE instructions.
  • Memory layout: Rearranges the structure members so that frequently accessed members are placed in continuous memory locations, boosting the cache hit ratio and enhancing program performance.
  • SLP transpose optimization: Improves the analysis of loops with consecutive memory reads during loop splitting, and adds analysis to transpose grouped stores in the superword level parallelism (SLP) vectorization stage.
  • Redundant member elimination: Eliminates structure members that are never read and deletes redundant write statements, which in turn reduces the memory footprint of the structure and alleviates subsequent bandwidth pressure, while improving performance.
  • Array comparison: Implements parallel comparison of array elements to improve execution efficiency.
  • Arm instruction optimization: Simplifies the pipeline of ccmp instructions for a wide range of deployments.
  • IF statement optimization: Splits and optimizes the IF statement block to improve constant propagation within a program.
  • SLP vectorization: Enhances SLP to cover more vectorization scenarios and improve performance.
  • AutoFDO: Uses perf to collect and parse program information and optimizes feedback across the compilation and binary phases, boosting mainstream applications such as MySQL databases.

Gazelle

Gazelle is a high-performance user-mode protocol stack. It directly reads and writes NIC packets in user mode based on the Data Plane Development Kit (DPDK), transmits the packets through shared hugepage memory, and uses the LwIP protocol stack, thereby greatly improving the network I/O throughput of applications and accelerating the network for databases. With Gazelle, high performance and universality can be achieved at the same time. In openEuler 24.03 LTS SP1, Gazelle introduces eXpress Data Path (XDP) deployment mode for container scenarios and TPC-C support for the openGauss database, further enhancing the user-mode protocol stack.

  • High performance (ultra-lightweight): High-performance lightweight protocol stack capabilities are implemented based on DPDK and LwIP.
  • Ultimate performance: A highly linearizable concurrent protocol stack is implemented based on technologies such as regional hugepage splitting, dynamic core binding, and full-path zero-copy.
  • Hardware acceleration: TCP Segmentation Offload (TSO), checksum (CSUM) offload, Generic Receive Offload (GRO), and other offload technologies streamline the vertical acceleration of hardware and software.
  • Universality (POSIX compatibility): Full compatibility with POSIX APIs eliminates the need to modify applications. The recvfrom and sendto interfaces of UDP are supported.
  • General networking model: Adaptive scheduling of the networking model is implemented based on mechanisms such as fd router and wake-up proxy. The UDP multi-node multicast model meets the requirements of any network application scenario.
  • Usability (plug-and-play): LD_PRELOAD enables zero-cost deployment by removing the requirement for service adaptation.
  • Easy O&M (O&M tool): Complete O&M methods, such as traffic statistics, metric logs, and CLI commands, are provided.

New Features

  • Gazelle now supports XDP deployment on L2-mode NICs using IPVLAN.
  • Interrupt mode is introduced, which reduces LStack CPU usage in no-traffic or low-traffic scenarios.
  • Networks of the ping-pong scheme are optimized to improve packet transmission during ping-pong operations.
  • Support for single-node and single-active/standby TPC-C testing is added for the openGauss database.

virtCCA

virtCCA-based confidential VMs, built on the Secure EL2 (S-EL2) of Kunpeng 920, allow regular VM software stacks to run securely within TEEs.

Based on the standard interface of the Arm Confidential Compute Architecture (CCA), openEuler builds a TEE virtualization management module upon the TrustZone firmware. This module supports memory isolation between confidential VMs, context and lifecycle management, and page table management, and enables seamless application migration to TEEs.

  • Device passthrough

Device passthrough uses the PCIe protection controller (PCIPC) embedded in the PCIe root complex of Kunpeng 920. A selector is added to the PCIe bus to regulate communication between the processor and peripherals. Operating through the system memory management unit (SMMU), this selector controls both inbound and outbound traffic, safeguarding the entire data link.
Device passthrough boasts excellent security isolation and performance enhancements for PCIe devices:

  • Security isolation

    The TEE manages device access permissions, preventing host software from accessing TEE devices.

  • High performance

    Confidential device passthrough negates performance loss on the data plane compared to traditional encryption and decryption solutions.

  • Ease of use

    Compatibility with existing open source OSs ensures that the kernel driver does not need to be modified.

  • ShangMi (SM) algorithm acceleration

    Hardware-based acceleration for SM algorithms runs on the UADK user-mode accelerator framework to enhance SM algorithm performance and enable algorithm offloading within confidential VMs. It is powered by the Kunpeng processor and utilizes the Kunpeng Accelerator Engine (KAE) features in the TEE.

Hygon CSV3

CSV3 marks the third generation of Hygon's secure virtualization technology. With security features that far exceed its predecessors, CSV3 realizes multi-level data protection that implements VM data isolation within the CPU. This isolation prevents the host OS from accessing VM memory or modifying nested page tables, thereby ensuring strong data integrity and confidentiality.

Secure memory isolation unit

The secure memory isolation unit forms the hardware basis for VM data integrity. A specialized hardware component within the CPU, the unit is positioned on the system bus path between the CPU and the memory controller to retrieve secure memory information for CSV3 VMs, such as physical memory addresses, VM IDs, and associated permissions. It validates all memory access requests from the CPU before granting permission, ensuring only verified requests receive access clearance.

When a CSV3 VM reads from or writes to memory, the page table translation unit converts the guest physical address (GPA) into a host physical address (HPA), and sends a memory access request (including read/write commands), the HPA, and the requester VM ID to the address bus.

If the secure memory isolation unit detects an incorrect VM ID during memory reads, it returns data in a fixed pattern. During memory writes, such an invalid request is discarded.

Secure processor

The secure memory isolation unit is the central hardware component in CSV3, safeguarding the integrity of VM memory. Its configuration must remain entirely secure and immune to modification by the host OS.

Hygon's secure processor is built into the SoC and independent from the CPU. It acts as the RoT of the SoC. Upon power-on, the processor verifies firmware integrity using an embedded signature verification key before the firmware is loaded and executed. Equipped with dedicated hardware resources within an isolated environment, the secure processor represents the highest level of security within the SoC, governing the overall security of the SoC. It maintains exclusive control over the secure memory isolation unit throughout the VM lifecycle, thus preventing the host OS from accessing or modifying unit configurations.

During VM startup, the host OS sends a request to the secure processor, which initializes the secure memory isolation unit. The secure processor firmware updates the secure memory isolation unit, and when the VM terminates, the secure processor clears all configurations of the unit. The secure processor validates all configuration requests from the host and rejects any that are invalid.

Protection of the virtual machine control block (VMCB)

The VMCB contains crucial control data, including the VM ID, VM page table base address, and VM register page base address. Access to this sensitive data by a host OS can cause tampering to VM memory data.

The secure processor creates the VMCB and places it under the protection scope of the secure memory isolation unit, which prevents the host OS from altering the contents of the VMCB.

To improve compatibility with the host OS software, CSV3 uses real and shadow VMCB pages. The host OS constructs the shadow VMCB page, populates it with control data, and sends it to the secure processor. The secure processor then generates the real VMCB page, copying non- critical control data from the shadow VMCB page and adding essential control information independently. The VM launches and operates using the real VMCB page, thus blocking any host OS attempts to hack the VMCB.

openHiTLS

openHiTLS provides a lightweight, customizable cryptographic solution designed for diverse industries including cloud computing, big data, AI, and finance. The platform combines advanced algorithms with exceptional performance while maintaining robust security and reliability. Its flexible architecture ensures seamless compatibility across multiple programming languages. Through community collaboration and ecosystem development, openHiTLS accelerates the adoption of cryptographic security standards across various industries, while fostering a security- focused open source ecosystem centered around openEuler in order to provide users with a safer and more reliable digital environment.

Support for mainstream cryptographic protocols and algorithms

Mainstream international and Chinese cryptographic algorithms and protocols are supported. You can select appropriate cryptographic algorithms and protocols based on scenario requirements.

  • Chinese cryptographic algorithms: SM2, SM3, and SM4
  • International mainstream algorithms: AES, RSA, (EC)DSA, (EC)DH, SHA-3 and HMAC
  • GB/T 38636-2020 TLCP, that is, the dual-certificate protocol
  • TLS 1.2, TLS 1.3, and DTLS 1.2

Open architecture for cryptography applications in all scenarios

By leveraging an open architecture and implementing technological innovations in applications across the entire industry chain, a one-stop, full-scenario solution is provided for diverse industries.

  • Flexible southbound and northbound interfaces: Unified northbound interfaces enable quick access for industry applications. Southbound devices widely deployed in various service systems are abstracted to streamline device utilization.
  • Multi-language compatibility: The foreign function interfaces (FFIs) ensure multi-language compatibility, enabling one cipher suite to be used across various programming languages.
  • Wide applicability: Cryptographic technology applied across various scenarios in the entire industry chain ensures high performance, security, and reliability of openHiTLS in different scenarios.

Hierarchical decoupling and on-demand tailoring to create a lightweight cipher suite

Hierarchical decoupling enables cryptographic algorithm software to achieve ultimate cost efficiency.

  • Hierarchical decoupling: TLS, certificates, and algorithm functions are decoupled for on- demand combination, with layered optimization for algorithm abstraction, scheduling, and algorithm primitives.
  • Advanced abstract interfaces: These interfaces prevent external interface changes caused by algorithm tailoring while reducing software footprint.
  • Ultimate cost efficiency: Automatic management of feature dependencies and on-demand tailoring to trim down to the minimal implementation of PBKDF2 + AES (20 KB binary file size, 1 KB of heap memory, and 256 bytes of stack memory) are supported.

Agile architecture for cryptographic algorithms, addressing post-quantum migration

An innovative agile architecture for cryptographic algorithms enables rapid application migration and fast evolution of advanced algorithms.

  • Unified northbound interfaces: Standardized and extensible interfaces are available for algorithms, meaning that interface changes due to algorithm switching can be avoided, which in turn means that extensive adaptation of new interfaces for upper-layer applications is not required.
  • Plugin-based management of algorithms: The algorithm provider layer supports dynamic algorithm runtime loading.
  • Configurable algorithms: Algorithm information can be obtained from configuration files, avoiding hard coding of algorithm identifiers.

Fail-slow Detection for Rapid Identification of Slow Nodes in AI Cluster Training

During the training process of AI clusters, performance degradation is inevitable, with numerous and complex causes. Existing solutions rely on log analysis after performance degradation occurs. However, it can take 3 to 4 days from log collection to root cause diagnosis and issue resolution on the live network. To address these pain points, an online slow node detection solution is offered. This solution allows for real-time monitoring of key system metrics and uses model- and data-driven algorithms to analyze the observed data and pinpoint the location of slow or degraded nodes. This facilitates system self-healing and fault rectification by O&M personnel.

Grouped metric comparison helps detect slow nodes/cards in AI cluster training scenarios. This technology is implemented through gala-anteater and includes new components such as a configuration file, an algorithm library, and slow node comparison based on both time and space. The output includes the exception occurrence time, abnormal metrics, and IP addresses of slow nodes/cards. The technology improves system stability and reliability. The following features are provided:

  • Configuration file: Contains the types of metrics to be observed, configuration parameters for the metric algorithms, and data interfaces, which are used to initialize the slow node detection algorithms.
  • Algorithm library: Includes common time series exception detection algorithms, such as Streaming Peaks-over-Threshold (SPOT), k-sigma, abnormal node clustering, and similarity measurement.
  • Data: Includes metric data, job topology data, and communicator data. Metric data indicates the time series of metrics, job topology data indicates the node information used in training jobs, and communicator data indicates the node connection relationships (including data parallelism, tensor parallelism, and pipeline parallelism).
  • Grouped metric comparison: Supports spatial filtering of abnormal nodes and temporal exception filtering of a single node. Spatial filtering identifies abnormal nodes based on the exception clustering algorithm, while temporal exception filtering determines whether a node is abnormal based on the historical data of the node.

Rubik with Enhanced Collaboration for Hybrid Deployments

Cloud data centers face a widespread challenge of low resource utilization, typically below 20%. Improving this utilization has become a critical technical priority. Hybrid deployment—deploying workloads with different priorities together—offers an effective solution for increasing resource utilization. While hybrid deployment substantially improves cluster efficiency, it introduces resource contention that can impact the quality of service (QoS) of critical workloads. The key challenge lies in maintaining workload QoS while achieving higher resource utilization.

Rubik, openEuler's container hybrid deployment engine, implements an adaptive system for single-node computing optimization and QoS assurance. The engine maximizes node resource utilization without compromising the performance of critical workloads.

  • Cache and memory bandwidth control: Limits the LLC and memory bandwidth of low- priority VMs. Currently, only static allocation is supported.
  • CPU interference control: Supports CPU time slice preemption in microseconds, simultaneous multithreading (SMT) interference isolation, and anti-priority-inversion.
  • Memory resource preemption: Terminates offline services first during node out-of memory (OOM) events to protect online service quality.
  • Memcg asynchronous memory reclamation: Limits the total memory used by hybrid offline applications, and dynamically compresses the memory used by offline services when the online memory utilization increases.
  • quotaBurst traffic control: When the CPU traffic of key online services is limited, the limit can be exceeded in a short period of time, ensuring the quality of online services.
  • Enhanced observation of PSI: Collects pressure information at the cgroup v1 level, identifies and quantifies service interruption risks caused by resource contention, and improves hardware resource utilization.
  • iocost service I/O weight control: Manages drive I/O rates for offline services to prevent bandwidth contention with online services.
  • Cycles per instruction (CPI) monitoring: Tracks node pressure via CPI metrics to guide offline service eviction decisions.

New features:

  • Node CPU/memory management: Monitors resource levels and evicts offline services.

Enhanced CFGO

The continuous growth in code volume has made front-end bound execution a common issue in processors, which impacts program performance. Feedback-directed optimization techniques in compilers can effectively solve this issue.

Continuous Feature Guided Optimization (CFGO) in GCC for openEuler and BiSheng Compiler refers to continuous feedback-directed optimization for multimodal files (source code and binaries) and the full lifecycle (compilation, linking, post-linking, runtime, OS, and libraries). The following techniques are included:

  • Code layout optimization: Techniques such as basic block reordering, function rearrangement, and hot/cold separation are used to optimize the binary layout of the target program, improving I-cache and I-TLB hit rates.
  • Advanced compiler optimization: Techniques such as inlining, loop unrolling, vectorization, and indirect calls enable the compiler to make more accurate optimization decisions.

CFGO comprises CFGO-PGO, CFGO-CSPGO, and CFGO-BOLT. Enabling these sub-features in sequence helps mitigate front-end bound execution and improve program runtime performance. To further enhance the optimization, you are advised to add the -flto=auto compilation option during CFGO-PGO and CFGO-CSPGO processes.

  • CFGO-PGO

    Unlike conventional profile-guided optimization (PGO), CFGO-PGO uses AI for Compiler (AI4C) to enhance certain optimizations, including inlining, constant propagation, and devirtualization, to further improve performance.

  • CFGO-CSPGO

    The profile in conventional PGO is context-insensitive, which may result in suboptimal optimization. By adding an additional CFGO-CSPGO instrumentation phase after PGO, runtime information from the inlined program is collected. This provides more accurate execution data for compiler optimizations such as code layout and register optimizations, leading to enhanced performance.

  • CFGO-BOLT

    CFGO-BOLT adds optimizations such as software instrumentation for the AArch64 architecture and inlining optimization on top of the baseline version, driving further performance gains.

AI4C

AI4C is an AI-assisted compiler optimization suite. It is a software framework that leverages AI technologies to optimize compiler options and key decisions during optimization passes. It aims to address two major challenges in compilers:

  • Difficult performance improvement: Traditional compiler optimizations have long development cycles, and new optimization techniques are often incompatible with existing optimization processes, making it challenging to achieve the expected performance gains.
  • Low tuning efficiency: When changes in hardware architecture or software application scenarios occur, significant human effort is required to adjust the cost model for compiler optimizations based on the new workloads, resulting in prolonged tuning times.

The AI4C framework provides two main modules: Autotuner for automatic tuning of compiler options and AI-enabled compiler-driven program optimization (ACPO).

The software follows a three-layer architecture, as shown in the following figure. The AI4C framework at the upper layer drives the optimization process of a compiler at the middle layer. The Adaptor module of the compiler invokes the AI model library and model inference engine at the lower layer, using optimization feature data and hardware architecture parameters as inputs to run model inference. This results in the optimal setting for key parameters during compilation, thereby achieving compiler optimization.

  • Autotuner

    The Autotuner of AI4C is developed based on OpenTuner (Ansel et al. 2015). It uses a plugin to drive the compiler to collect tuning parameters, adjusts key decision-making parameters (such as loop unrolling factors) using search algorithms, injects modifications into the compilation process via the plugin, and runs the compiled binary to gather feedback factors for iterative automatic tuning.

    • It integrates a series of search algorithms, with dynamic algorithm selection and shared search progress.
    • It supports user-configured YAML for custom search spaces and the extension of underlying search algorithms.
    • It enables fine-grained code block tuning and coarse-grained automatic tuning of compiler options.
    • It achieved performance gains ranging from 3% to 5% on benchmarks such as Cormark, Dhrystone, and Cbench.
  • ACPO

    ACPO provides a comprehensive set of tools, libraries, and algorithms, which replace or enhance heuristic tuning decision-making algorithms in compilers and provide easy-to-use interfaces for compiler engineers to use AI models. During compiler optimization, plugins are used to extract structured input data from optimization passes as model input features. The getAdvice function runs the pre-trained model to obtain decision coefficients, and the compiler uses the model's decision results to replace specific heuristic decisions, thereby achieving better performance.

    • It decouples compilers from AI models and inference engines, helping algorithm developers focus on AI model development while reducing the costs of model application. It is compatible with multiple compilers, models, AI inference frameworks, and other mainstream products, offering hot update capabilities for AI models.
    • The implementation of optimization phases and processes, such as function inlining based on interprocedural analysis (IPA) and loop unrolling for register transfer language (RTL) generation, has resulted in significant gains.

SM Digital Signatures for RPMs

According to relevant security standards in China, certain application scenarios require the use of Chinese cryptographic algorithms to ensure the authenticity and integrity of critical executable program sources. openEuler currently uses RPM for software package management, with package signatures based on the openPGP standard. openEuler 24.03 LTS SP1 incorporates the SM2 signing algorithm and SM3 digest algorithm into the RPM mechanism.

Based on the RPM component and the GnuPG2 signing tool the component invokes, this feature enables Chinese cryptographic algorithms within the existing openPGP signature system. In signature generation, you can run the gpg command to generate an SM2 signing private key and a certificate, and then run the rpmsign command to add a digital signature based on the SM2 and SM3 algorithms to a specified RPM package. In signature verification, you can run the rpm command to import the verification certificate and verify the digital signature of the RPM package to validate its authenticity and integrity.

oneAPI

The Unified Acceleration (UXL) Foundation is promoting an open, unified standard accelerator software ecosystem. oneAPI, as the initial project, aims to provide a cross-industry, open, and unified standard programming model to deliver consistent development experience for heterogeneous accelerators, including those for CPUs, GPUs, FPGAs, and specialized accelerators. The oneAPI standard extends existing developer programming models to empower cross-architecture programming using a parallel programming language (Data Parallel C++), a group of accelerator software libraries, and a foundational hardware abstraction interface (Intel® oneAPI Level Zero). This approach supports a wide range of accelerator hardware and processor platforms. To ensure compatibility and improve development efficiency, the oneAPI standard provides various open, cross-platform, and easy-to-use developer software suites.

To fully support oneAPI, the Intel® oneAPI Base Toolkit (Base Kit) and runtime container images are integrated into openEuler 24.03 LTS. Since openEuler 24.03 LTS SP1, openEuler natively supports the adaptation and integration of oneAPI foundational libraries, including the dependencies required for oneAPI runtimes, Intel's graphics acceleration compiler, OpenCL™ Runtimes, and oneAPI Level Zero API that is compatible with platforms like x86_64 and AArch64. To comprehensively support Data Parallel C++ and API-based programming on accelerator libraries, various oneAPI software packages have been adapted and validated on openEuler. You can add the official DNF/Yum repositories of oneAPI to openEuler to install and update all required runtime dependencies, developer tools, and debugging utilities.

OpenVINO

OpenVINO is an open source AI toolkit and runtime library that enhances deep learning models from major frameworks. It streamlines the deployment and inference of AI workloads across processors and accelerators on Intel and other platforms including Arm. Beginning with openEuler 24.03 LTS SP1, native OpenVINO integration is provided, granting full access to OpenVINO computing capabilities on openEuler.

OpenVINO converts and optimizes models trained in popular frameworks to run efficiently on diverse hardware in local, edge, or cloud environments. Popular frameworks include TensorFlow, PyTorch, ONNX, and PaddlePaddle.

KAE

KAE is an acceleration solution based on the Kunpeng 920 processor. It contains the KAE encryption and decryption module and the KAEzip compression and decompression module, which accelerate SSL and TLS applications and data compression, reduce processor usage, and boost processor efficiency. In addition, the application layer of KAE masks the internal implementation details, thereby allowing users to quickly migrate services through the standard interfaces of OpenSSL and zlib.

The KAE encryption and decryption module implements the RSA, SM3, SM4, DH, MD5, and AES algorithms. It provides high-performance symmetric and asymmetric encryption and decryption based on the lossless user-mode driver framework. It is compatible with OpenSSL 1.1.1x and supports both synchronous and asynchronous mechanisms.

KAEzip is the compression and decompression module of KAE. It implements the Deflate algorithm and works with the lossless user-mode driver framework to provide high-performance gzip or zlib APIs. KAE can be used to improve application performance in different scenarios. For example, in distributed storage scenarios, the zlib library accelerates data compression and decompression.