Key Feature

Scenario-specific AI Innovations

AI is redefining OSs by powering intelligent development, deployment, and O&M. openEuler supports general-purpose architectures like Arm, x86, and RISC-V, and next-gen AI processors like NVIDIA and Ascend. Further, openEuler is equipped with extensive AI capabilities that have made it a preferred choice for diversified computing power.

OS for AI

sysHAX

The sysHAX large language model (LLM) heterogeneous acceleration runtime enhances model inference performance in single-server, multi-xPU setups by optimizing Kunpeng + xPU (GPU/NPU) resource synergy.

Heterogeneous converged scheduling: When GPUs are fully loaded, the prefill phase of an inference request can be executed on GPUs while the decode phase is handled by CPUs.

sysHAX is used to optimize Transformer models including DeepSeek, Qwen, Baichuan, and Llama. It fits into the following application scenario:

Data centers: sysHAX assigns inference tasks to CPUs to fully utilize CPU resources and increase the concurrency and throughput of LLMs.

GMEM

In the post-Moore era, there have been breakthroughs in GPUs, TPUs, FPGAs, and other dedicated heterogeneous accelerators. Similar to CPUs, these devices increase computing speeds by storing data in local memory (such as LPDDR SDRAM or HBM), but such design catalyzes more complicated memory systems. Modern memory systems have the following defects:

Memory management is split between CPUs and accelerators. Explicit data migration makes it difficult to balance the usability and performance of accelerators' memory.
The high bandwidth memory (HBM) available on accelerators is often insufficient for foundation models. Manual swapping is only feasible in limited scenarios and typically results in significant performance degradation.
A large number of invalid data migrations occur in search & recommendation and big data scenarios, and no efficient memory pooling solution is available.

Heterogeneous Memory Management (HMM) is a Linux feature that is plagued by issues of poor programming, performance, and portability, while also relying heavily on manual tuning. As such, it is unfavored by most OS communities, and has fueled demand for an efficient solution for heterogeneous accelerators. Generalized Memory Management (GMEM) is one new option, which offers a centralized management mechanism for heterogeneous memory connections. GMEM APIs are compatible with native Linux APIs, and feature high usability, performance, and portability. After an accelerator calls GMEM APIs to connect its memory to the unified address space, the accelerator automatically obtains the programming optimization capability for heterogeneous memory, and does not need to execute the memory management framework multiple times. This greatly reduces development and maintenance costs. Developers can apply for and release a unified set of APIs to achieve heterogeneous memory programming without memory migrations. If the HBM of an accelerator is insufficient, GMEM can use the CPU memory as the accelerator cache to transparently over-allocate the HBM without manual swapping. GMEM offers an efficient memory pooling solution thanks to a shared memory pool that eliminates the need for duplicate migrations.

AI for OS

AI makes openEuler more intelligent. openEuler Intelligence is an AI-powered Q&A platform built on openEuler data. It enables workflow orchestration through semantic interfaces and agent building through MCP. Additionally, it integrates some system services to further improve the intelligence of openEuler.

Intelligent Q&A

The openEuler Intelligence system is accessible via web or shell.

Intelligent planning, scheduling, and recommendation
- Intelligent planning: The agent applications of openEuler Intelligence can plan steps in real time based on user input and available tools, continuing until the user's objective is achieved or the maximum number of steps is reached.
- Intelligent scheduling: openEuler Intelligence allows users to define multiple workflows within a workflow application. When a query request is made, openEuler Intelligence automatically extracts the relevant parameters and selects the most suitable workflow to execute the query task.
- Intelligent recommendation: Based on users' query requests and workflow execution results, openEuler Intelligence recommends workflows that may be useful in future tasks, increasing the likelihood of task completion and making applications easier to use.
Workflows
- Semantic interfaces: A semantic interface contains natural language comments. openEuler Intelligence supports two methods for registering semantic interfaces.
- Workflow orchestration and invocation: openEuler Intelligence enables users to visually connect built-in semantic interfaces and user-registered interfaces to create workflows. Users can debug these workflows, and then release and use them as applications. When workflows are debugged and executed, intermediate results are displayed to help lower debugging costs and enhance the overall user experience.
Agent applications
- MCP registration, installation, and activation: MCP is a mainstream AI-related protocol. It uses SDKs to encapsulate complex and diverse services with natural semantic information, allowing AI to easily invoke tools and services reconstructed based on MCP.
- Agent building and use: openEuler Intelligence allows building agents based on MCP and various foundation models. These agents can decompose a user's objective into phased tasks using the configured model information and the user-provided objective. MCP tools are then used to complete each task until the user's objective is achieved.
RAG Retrieval-augmented generation (RAG) extends LLM long-term memory while significantly reducing training overhead. openEuler Intelligence optimizes the end-to-end RAG pipeline, from pre-processing and knowledge indexing to advanced retrieval algorithms and post-processing.
Corpus governance Corpus governance is one of the basic RAG capabilities in the openEuler Intelligence system. It imports corpuses into the knowledge base in a supported format using context location extraction, text summarization, and OCR, increasing the retrieval hit rate.
Automated testing Automated testing is a core RAG capability in openEuler Intelligence. It detects shortcomings in the knowledge base and retrieval augmentation algorithms through automated dataset generation and evaluation.

Intelligent Tuning

The openEuler Intelligence system supports the intelligent shell entry. Through this entry, you can interact with openEuler Intelligence using a natural language and perform heuristic tuning operations such as performance profiling, system performance analysis, and system performance tuning.

Intelligent Diagnosis

Inspection: The Inspection Agent checks for abnormalities of designated IP addresses and provides an abnormality list that contains associated container IDs and abnormal metrics (such as CPU and memory).
Demarcation: The Demarcation Agent analyzes and demarcates a specified abnormality contained in the inspection results and outputs the top three metrics of the root cause.
Location: The Detection Agent performs profiling location analysis on the root cause, and provides useful hotspot information such as the stack, system time, and performance metrics related to the root cause.

AI Cluster Slow-Node Demarcation

Performance degradation during AI cluster training is inevitable and often results from a wide range of complex factors. Existing solutions rely on log analysis after performance degradation occurs. However, it can take 3 to 4 days from log collection to root cause diagnosis and issue resolution on the live network. To address these pain points, an online slow node detection solution is offered. This solution allows for real-time monitoring of key system metrics and uses model- and data-driven algorithms to analyze the observed data and pinpoint slow or degraded nodes. This facilitates system self-healing and fault rectification by O&M personnel.

Grouped metric comparison helps detect slow nodes and cards in AI cluster training. This technology is built on Systrace and includes a configuration file, an algorithm library, and a slow node analysis mechanism based on both time and space dimensions. It outputs the exception timestamp, abnormal metrics, and IP addresses of slow nodes and cards. This technology enhances overall system stability and reliability.

Configuration file: Contains the types of metrics to be observed, configuration parameters for the metric algorithms, and data interfaces, which are used to initialize the slow node detection algorithms.
Algorithm library: Includes common time series exception detection algorithms, such as Streaming Peaks-over-Threshold (SPOT), k-sigma, abnormal node clustering, and similarity measurement.
Data: Metric data collected from each node is represented by a time sequence.
Grouped metric comparison: Supports spatial filtering of abnormal nodes and temporal exception filtering of a single node. Spatial filtering identifies abnormal nodes based on the exception clustering algorithm, while temporal exception filtering determines whether a node is abnormal based on the historical data of the node.

Intelligent Container Images

The openEuler Intelligence system can invoke environment resources through a natural language, assist in pulling container images for local physical resources, and establish a development environment suitable for debugging on existing compute devices. This system supports three types of containers, and container images have been released on Docker Hub. Users can manually pull and run these container images.

SDK layer: encapsulates only the component libraries that enable AI hardware resources, such as CUDA and CANN.
SDKs + training/inference frameworks: accommodates TensorFlow, PyTorch, and other frameworks (for example, tensorflow2.15.0-cuda12.2.0 and pytorch2.1.0.a1-cann7.0.RC1) in addition to the SDK layer.
SDKs + training/inference frameworks + LLMs: encapsulates several models (for example, llama2-7b and chatglm2-13b) based on the second type of containers.

Embedded

openEuler 25.09 is suited for embedded applications, offering significant progress in southbound and northbound ecosystems, technical features, infrastructure, and implementation over previous generations. openEuler Embedded provides a closed loop framework often found in operational technology (OT) applications such as manufacturing and robotics, whereby innovations help optimize its embedded system software stack and ecosystem. openEuler Embedded enhances its software package ecosystem by incorporating the oeBridge feature, which supports online software installation from an openEuler mirror site. When building Yocto images, oeBridge can be used to install openEuler RPM packages for easy image customization. openEuler Embedded also supports the oeDeploy feature for quick deployment of AI and cloud-native software stacks. Kernel support in openEuler is enhanced by optimizing the meta-openEuler kernel configuration and the oeAware real-time tuning feature. These updates help control interference and improve real-time system responsiveness.

Southbound ecosystem: openEuler Embedded Linux supports mainstream processor architectures like AArch64, x86_64, AArch32, and RISC-V, and will extend support to LoongArch in the future. openEuler 24.03 and later versions have a rich southbound ecosystem and support chips from Raspberry Pi, HiSilicon, Rockchip, Renesas, TI, Phytium, StarFive, and Allwinner.
Embedded virtualization base: openEuler Embedded uses an elastic virtualization base that enables multiple OSs to run on a system-on-a-chip (SoC). The base incorporates a series of technologies including bare metal, embedded virtualization, lightweight containers, LibOS, trusted execution environment (TEE), and heterogeneous deployment.
MICA deployment framework: The MICA deployment framework is a unified environment that masks the differences between technologies that comprise the embedded elastic virtualization base. The multi-core capability of hardware combines the universal Linux OS and a dedicated real-time operating system (RTOS) to make full use of all OSs.
Northbound ecosystem
- Over 700 common embedded software packages can be built using openEuler.
- Soft real-time kernel helps respond to soft real-time interrupts within microseconds.
- The distributed soft bus system (DSoftBus) of openEuler Embedded integrates the DSoftBus and point-to-point authentication module of OpenHarmony. It implements interconnection between openEuler-based embedded devices and OpenHarmony-based devices as well as between openEuler-based embedded devices.
- With iSula containers, openEuler and other OS containers can be deployed on embedded devices to simplify application porting and deployment. Embedded container images can be compressed to 5 MB, and can be easily deployed into the OS on another container.
UniProton: An RTOS that features ultra-low latency and flexible MICA deployments. It is suited for industrial control because it supports both microcontroller units and multi-core CPUs. UniProton provides the following capabilities:
- Compatible with processor architectures like Cortex-M, AArch64, x86_64, and riscv64, and supports M4, RK3568, RK3588, x86_64, Hi3093, Raspberry Pi 4B, Kunpeng 920, Ascend 310, and Allwinner D1s.
- Connects with openEuler Embedded Linux on Raspberry Pi 4B, Hi3093, RK3588, and x86_64 devices in bare metal mode.
- Can be debugged using GDB on openEuler Embedded Linux.

Kernel Innovations

openEuler 25.09 runs on Linux kernel 6.6 and inherits the competitive advantages of community versions and innovative features released in the openEuler community.

Filesystem in USErspace (FUSE) pass-through: FUSE is widely used in distributed storage and AI applications. In pass through scenarios, FUSE skips additional processing for read and write I/Os. It only records metadata and forwards the I/ O requests to the back-end file system. As a result, FUSE processing turns into the main bottleneck for I/O performance. The FUSE pass-through feature is designed to eliminate the overhead caused by context switches, wakeups, and data copying on the data plane when FUSE directly interfaces with the back-end file system. It allows applications to directly send read and write I/Os to the back-end file system within the kernel. In lab environments, FUSE pass-through has demonstrated notable performance gains. Specifically, fio tests show that read and write performance more than doubles for sizes between 4 KB and 1 MB. FUSE pass-through has also passed fault injection and stability tests, and is available for use as needed.
Enhanced Memory System Resource Partitioning and Monitoring (MPAM) features: An improved quality of service (QoS) feature is introduced to optimize memory bandwidth and L3 cache control. In hybrid deployment scenarios, shared resources can be allocated based on the upper limit, lower limit, or priority-based policy. The new I/O QoS management feature collaborates with the system memory management unit (SMMU) to isolate I/O bandwidth traffic across hardware peripherals and heterogeneous accelerators. It supports monitoring by iommu_group, providing a new approach to I/O QoS management in heterogeneous environments. In addition, the L2 cache isolation feature enables monitoring of L2C usage and bandwidth traffic, offering core-level optimization and performance analysis in hybrid deployment scenarios. These MPAM features deliver significant performance improvements in test scenarios. In hybrid deployments, the interference rate of SPECjbb as an online service drops from 25.5% to below 5%.

High-Density Many-Core Container Isolation

Server chips have evolved from multi-core to many-core architectures (typically exceeding 256 cores), posing new challenges to OSs. To boost rack-level computing density and reduce data center TCO, many-core servers have become the mainstream in the Internet industry. As cloud technologies and service scales advance, containerized deployment has also become the dominant model. Against this backdrop, serialization and synchronization overheads hinder system scalability, while interference and low resource utilization become increasingly prominent. These scalability issues in container deployments arise from contention for shared hardware and software resources.

Lightweight virtualization is used to partition resources by NUMA domain and enforce container-level resource isolation within each domain. This approach minimizes performance interference caused by hardware and software resource contention and enhances the container deployment scalability.

VM memory QoS control: When multiple tenants' VMs are deployed on the same physical host, memory-intensive VMs may consume a large portion of the available memory bandwidth. This can lead to resource contention, preventing other VMs from obtaining sufficient bandwidth to meet their performance requirements. As a result, the overall system QoS may be degraded. Leveraging the MPAM feature of the Kunpeng processor and the OS-level resource control mechanisms, the system can perform fine-grained monitoring and dynamic control of memory bandwidth usage for up to 30 VMs. This capability enables configuration of upper and lower limits, as well as priority policies, to establish a memory bandwidth isolation and assurance framework in a multi-tenant environment.
Memory bandwidth upper limit: A maximum memory bandwidth threshold can be configured for each VM to prevent any single VM from consuming excessive bandwidth resources, thereby avoiding performance interference with other tenant VMs.
Memory bandwidth lower limit: A minimum bandwidth guarantee can be configured to maintain a baseline allocation even for lightly loaded VMs, enabling dynamic resource optimization and efficient bandwidth utilization.
Priority-based scheduling policy: The memory bandwidth priority of each VM can be configured based on service importance, ensuring stable bandwidth allocation for critical workloads and improving the availability and service quality of high-priority VMs.
NUMA affinity of virtual devices: PCI devices possess NUMA affinity and can be directly accessed on the host. The OS scheduling mechanism optimizes task placement based on device affinity to minimize performance loss resulting from cross-NUMA access to PCI devices. In VM passthrough scenarios, PCI devices do not expose their NUMA node affinity to guest VMs. This feature expands the PCI device topology of VMs based on the PCI Expander Bridge (PXB). Within a VM, it displays the NUMA node where each virtual device is located. This helps the system optimize scheduling and allows users to deploy service applications according to the NUMA nodes of their virtual devices, thereby reducing performance loss caused by cross-NUMA resource access and improving the overall performance of VM service applications.
CPU scheduling by domain: CPUs are divided into domains by cluster for container deployment. Each container operates within an independent scheduling domain. This design isolates interference between containers, reduces cross cluster cache synchronizations, and eases contention for hardware resources like cache and NUMA memory. It improves performance by more than 10% in high-concurrency Redis scenarios.
Interference isolation in file system block allocation: Optimizations to the group lock and s_md_lock in the EXT4 block allocation and release processes enhance the scalability of EXT4 block allocation. When the target block group is occupied, allocation can switch to an idle block group to reduce CPU overhead caused by multiple containers competing for the same group. Leveraging multiple EXT4 block groups helps ease group lock contention. Apart from that, the global target of the streaming allocation is split to multiple targets, so that the contention for the global lock s_md_lock is alleviated and the file data is more aggregated. In a 64-container concurrency scenario, the OPS increases by over 5 times in mixed block allocation and release workloads and by over 10 times in single-block allocation workloads.
Efficient slab reclamation: In this openEuler release, the read and write locks used for slab memory reclamation are replaced with the read-copy-update (RCU) lock-free mechanism. Memory reclamation across different slabs operates independently, significantly improving reclamation efficiency. In multi-container concurrency scenarios, system calls are accelerated.
TCP hash interference isolation: In high-concurrency scenarios, lock contention in tcp_hashinfo bash and ehash and frequent ehash calculations lead to reduced bandwidth and increased latency. The spin lock of tcp_hashinfo bash and ehash can be replaced with read-copy update (RCU), and the ehash calculation method can be changed to lport increment. These changes reduce the query time and calculations and reduce the lock contention in the TCP connection hash.
Enhanced control group (cgroup) isolation: Original atomic operations are replaced with percpu counters to avoid parent node contention across namespaces and eliminate rlimit count interference between containers. This mechanism addresses the linearity issue in the will-it-scale/signal1 test case and triples concurrent throughput performance in a 64-container deployment. Memory cgroups are released in batches to avoid contention for the same parent node's counter caused by frequent small memory releases, enhancing memory count scalability. In the tlb-flush2 test case, this improves throughput by 1.5 times with 64 containers. Leveraging eBPF's programmable kernel capabilities, a host information isolation and filtering approach is provided to present container-specific resource views. Compared with the peer LXCFS solution, this openEuler solution avoids the overhead of switching between the kernel mode and user mode, and eliminates the performance and reliability bottlenecks associated with the LXCFS process. It doubles the resource view throughput in a single container and achieves a 10-fold increase in a 64-container deployment.
Interference monitoring: Interference falls into three categories by result: instruction execution failure, instruction slowdown, and increased instruction execution. Interference is monitored from the kernel perspective, with statistics collected on each typical category during runtime. The system supports online monitoring of schedule latency, throttling, softirq, hardirq, spin lock, mutual exclusion (mutex), and simultaneous multi-threading (SMT) interference while incurring less than 5% performance overhead.
Kunpeng memory and cache QoS control: The memory bandwidth traffic and cache usage at each level can be configured based on the upper limit, lower limit, or priority-based policy. Each thread is assigned a specific isolation policy to suit specific service requirements. The usage of shared resources is monitored in real time at both service and thread levels, and reported to the control policy to enable feedback-driven control. In addition, the MPAM feature and the SMMU combine to enhance peripherals' I/O QoS. They support bandwidth isolation for peripherals and heterogeneous accelerators and allow for resource monitoring at the device level.
Dynamic QoS policy configuration: This openEuler release provides a cluster-level MPAM QoS management plugin. Using the QoS interfaces provided by MPAM, the plugin automatically assigns priorities to nodes based on user-defined policies and sets MPAM QoS priorities for offline tasks according to user configurations. This ensures efficient resource utilization in hybrid deployment scenarios. When online services are busy, the last-level cache (LLC) and memory bandwidth allocated to offline tasks are automatically preempted. When online services are idle, these resources are released to enhance offline service performance.

LLVM for openEuler Compiler

LLVM for openEuler introduces the following compilation features in openEuler 25.09, optimizing the running efficiency of database and big data applications to unlock the ultimate software performance.

ICP optimization: Indirect call promotion (ICP) optimization converts indirect function calls into direct function calls based on feedback information. This increases potential inlining opportunities and reduces function call overhead.
Intelligent hash-based prefetch optimization: This feature identifies multi-layer indirect nested memory access scenarios in applications, automatically calculates actual memory access addresses, and inserts data prefetch instructions, thereby reducing the probability of data cache misses.
Adaptive memory copy optimization: By recognizing the characteristics of the source and destination pointers during memory copy, this feature adds runtime checks for memory copy method specialization (such as generating memset and memmove intrinsics).
Fast access to dynamic libraries: Conventional dynamic library function calls require jumping through the procedure linkage table (PLT), leading to extra memory access and jump instructions. This is optimized by directly calling functions with their addresses in the global offset table (GOT), eliminating PLT jump overhead.

oeDeploy Enhancements

oeDeploy is a lightweight software deployment tool that accelerates environment setup across single-node and distributed systems with unmatched efficiency.

Multi-scenario support and quick software deployment: oeDeploy facilitates quick deployment for both single node applications and clustered software environments. It now includes quick deployment capabilities for Kubernetes environments with multiple master nodes. It also extends support for community toolchains like openEuler Intelligence and DevKit Pipeline, as well as popular Retrieval Augmented Generation (RAG) software such as RAGFlow, AnythingLLM, and Dify.
Flexible plugin management and excellent deployment experience: oeDeploy provides an extensible plugin architecture for flexible management of diverse deployment capabilities, empowering developers to quickly release custom deployment plugins. It now supports plugin source management, enabling one-click plugin updates and one-click plugin initialization. While oeDeploy currently offers a streamlined CLI, a GUI and plugin store will soon launch, promising an even more efficient software deployment experience with less code.
Efficient deployment and intelligent development: oeDeploy introduces MCP servers, offering an out-of-the-box experience within DevStation. It leverages LLM inference to deploy software with natural language, boosting deployment efficiency by 2x. It can also convert user documents into executable oeDeploy plugins, increasing development efficiency by 5x.

Go for openEuler Compiler

Continuous Feature Guided Optimization (CFGO)
While ensuring the functional integrity of the program, this technique collects the program's runtime profile to make more accurate optimization decisions, resulting in a refined program with better performance. Based on the principle of program locality, it arranges hot instructions closely to improve cache/translation lookaside buffer (TLB) hits, effectively reducing front-end bottlenecks.
Arm atomic instruction optimization
In certain service scenarios, the Go runtime incurs significant overhead when invoking compare-and-swap (CAS) operations and load/store (LD/ST) instructions. Adopting an Arm-affinity instruction sequence delivers significant performance gains.
Runtime garbage collection (GC) optimization
This optimization involves the insertion of software prefetch instructions based on identified program characteristics. Parameters governing the GC coroutine overhead are extracted as runtime parameters supporting dynamic adjustment according to varying service characteristics.
Kunpeng Accelerator Engine (KAE) for hardware-based acceleration
The compression/decompression logic of the Gzip compression library of Go is modified to enable underlying hardware based acceleration.

Heap Resizing by BiSheng JDK

In modern containerized deployments, most users' container environments support resource scale-up. However, OpenJDK has a significant limitation: its maximum heap size can only be configured at startup. This means it does not support online resizing, preventing Java applications from immediately using additional memory made available after a container scales up. Applications currently require a restart to reset their maximum heap. To address this limitation, BiSheng JDK introduces online heap memory resizing for the Garbage-First garbage collector (G1GC). This allows users to dynamically update the Java heap memory limit while an application is running, eliminating the need for a JVM restart.

This capability is crucial for Internet and other container-based deployments that require Java application heap memory to resize online in response to container resource scale-up.

UDF Automatic Native Framework

The UDF automatic native framework addresses the inefficient JVM execution often seen in open-source big data systems. It automatically converts Java user-defined functions (UDFs) into C/C++ native UDFs, significantly boosting big data processing performance through efficient memory management and hardware affinity. Essentially, the framework implements a seamless, automatic Java UDF native acceleration mechanism. It comprises the UDF parser, UDF IR optimizer, UDF code generator, and UDF code compiler modules.

The UDF parser automatically converts the bytecode of a service JAR package into Intermediate Representation (IR) code and extracts UDF code by identifying its specific features. The UDF IR optimizer optimizes the UDF IR from dimensions such as automatic memory object management and hardware-affinity acceleration. The UDF code generator translates the optimized UDF IR into native code. The UDF code compiler compiles the UDF native code into native binaries online. Finally, these UDF native binaries are deployed to big data execution nodes. The native execution engine of the big data system dynamically loads and executes the native binaries. This improves the big data processing performance.

De-optimization Observability in BiSheng JDK 17

The JDK Flight Recorder (JFR) Streaming API of JDK 17 is a key feature that enables JFR to move from post-event static analysis to real-time monitoring.

In conventional JFR usage, a user must start recording, stop recording, dump contents into a .jfr file, and finally use Java Mission Control (JMC) tools for offline analysis. This is a post-event analysis mode, which is effective for troubleshooting problems that have occurred.

The Streaming API introduces a new mode, which allows a Java application to continuously subscribe to and consume JFR event streams from the JVM in real time without interrupting JFR recording or generating a complete .jfr file.

Java

// 1. Create a recording stream.
RecordingStream rs = new RecordingStream();
// 2. Enable events of interest and configure related settings.
rs.enable("jdk.GCPhasePause").withPeriod(Duration.ofSeconds(1));
rs.enable("jdk.Deoptimization").withPeriod(Duration.ofSeconds(1));
// 3. Subscribe to an event and set the corresponding event handler (callback function).
rs.onEvent("jdk.GCPhasePause", event -> {
// Read the following fields from the event.
Duration duration = event.getDuration("duration");
String name = event.getString("name"); // For example, "GC Pause".
    *****************
    shell:   jcmd JFR.start delay=-1 filename=xxx.jfr
    *****************
});
// 4. Start the stream (non-blocking call).
rs.startAsync();

CFGO Enhancements in GCC for openEuler

The continuous growth in code volume has made front-end bound execution a common issue in processors, which impacts program performance. Feedback-directed optimization techniques in compilers can effectively solve this issue. Continuous Feature Guided Optimization (CFGO) in GCC for openEuler refers to continuous feedback-directed optimization for multimodal files (source code and binaries) and the full lifecycle (compilation, linking, post-linking, runtime, OS, and libraries). The following techniques are included:

Code layout optimization: Techniques such as basic block reordering, function rearrangement, and hot/cold separation are used to optimize the binary layout of the target program, improving I-cache and I-TLB hit rates.
Advanced compiler optimization: Techniques such as inlining, loop unrolling, vectorization, and indirect calls enable the compiler to make more accurate optimization decisions.

CFGO comprises CFGO-PGO, CFGO-CSPGO, and CFGO-BOLT. Enabling these sub-features in sequence helps mitigate front-end bound execution and improve program runtime performance. To further enhance the optimization, you are advised to add the -flto=auto compilation option during CFGO-PGO and CFGO-CSPGO processes.

CFGO-PGO: Unlike conventional profile-guided optimization (PGO), CFGO-PGO uses AI for Compiler (AI4C) to enhance certain optimizations, including inlining, constant propagation, and devirtualization, to further improve performance.
CFGO-CSPGO: The profile in conventional PGO is context-insensitive, which may result in suboptimal optimization. By adding an additional CFGO-CSPGO instrumentation phase after PGO, runtime information from the inlined program is collected. This provides more accurate execution data for compiler optimizations such as code layout and register optimizations, leading to enhanced performance.
CFGO-BOLT: CFGO-BOLT adds optimizations such as software instrumentation for the AArch64 architecture and inlining optimization on top of the baseline version, driving further performance gains.

DevStation Enhancements

DevStation is an intelligent developer workstation built on openEuler, designed for geeks and innovators. It provides an out-of-the-box, efficient, and secure development environment that streamlines the entire workflow from deployment and coding to compilation, building, and releasing. By integrating a one-click runtime environment with a full-stack development toolchain, it enables a seamless transition from system boot to code execution. The new MCP AI engine allows for quick invocation of community toolchains, offering a significant leap in efficiency from infrastructure setup to application development, all without complex installation.

Developer-friendly integrated environment: Pre-installed with a wide range of development tools and IDEs like VSCodium, this distribution supports multiple programming languages to meet the needs of front-end, back-end, and full-stack developers.
Native community tool ecosystem: New tools like oeDeploy (a one-click deployment tool), epkg (an extended package manager), DevKit (a development toolchain), and openEuler Intelligence provide full-lifecycle support from environment configuration to code deployment. oeDevPlugin and oeGitExt are VSCodium plugins designed for the openEuler community. They provide visual management for issues and pull requests (PRs), facilitating code repository cloning, PR submission, and real-time task status synchronization. openEuler Intelligence supports natural language for generating code snippets, creating API references, and explaining Linux commands.
GUI-based programming environment: DevStation integrates graphical programming tools to streamline coding for beginners while offering powerful visual programming capabilities for veterans. It also comes pre-installed with productivity tools like Thunderbird.
MCP-based intelligent application ecosystem: DevStation deeply integrates the MCP framework to build a complete intelligent toolchain ecosystem. It includes pre-installed MCP servers like oeGitExt and rpm-builder, which provide capabilities for community task management and RPM packaging. It intelligently wraps conventional development tools like Git and RPM builders using the MCP protocol, offering a natural language interaction interface.
Enhanced system deployment and compatibility: DevStation offers extensive hardware support, especially seamless compatibility with mainstream laptop and PC hardware (such as touchpads, Wi-Fi, and Bluetooth), and a restructured kernel build script (kernel-extra-modules) to ensure bare metal deployment experience. It also supports flexible deployment options, including Live CD (instant run without installation), bare metal installation, and VM deployment.
New installation tool: heolleo is a modern client tool designed specifically to simplify the DevStation installation process. Built with a modular design, it easily supports feature expansion across various hardware architectures (like x86 and Arm), file systems, and bootloaders (like GRUB). It offers flexible installation modes, supporting system file acquisition from both local ISO images and network sources (HTTP/FTP).
- Local ISO installation: heolleo provides a local ISO installation mode for users demanding extreme stability, high speed, or deployment in offline or restricted environments. By leveraging existing system image files, it delivers fast, reliable, and completely offline installation experience with automated partition setup.
- Network installation: heolleo's network installation mode aligns with modern system deployment trends. It eliminates the need for manual image downloads by allowing users to obtain the latest system files directly from Internet servers, which is the most convenient access to the latest DevStation version.

DevStore

DevStore is the application store for the openEuler desktop version, acting as a developer-centric software distribution platform. It supports the search and rapid deployment of MCP servers and oeDeploy plugins. DevStore is provided out-of-the-box on the DevStation platform.

Rapid installation of MCP servers: Leveraging openEuler community's extensive software ecosystem, DevStore packages the software dependencies required for MCP server operations as standard RPM files. Using built-in service management tools, DevStore quickly deploys MCP servers in agent applications. It automatically solves software dependency and MCP configuration issues for users, greatly improving user experience. Currently, DevStore supports the deployment of over 80 MCP servers.
Quick plugin deployment: DevStore utilizes the oeDeploy tool to enable the rapid deployment of mainstream software, substantially reducing the setup time. The supported software categories include AI software (like Kubernetes, KubeRay, PyTorch, TensorFlow, and DeepSeek), toolchains (like EulerMaker and openEuler Intelligence), and RAG tools (like RAGFlow, Dify, and AnythingLLM).

CCA

Arm CCA utilizes the following core components working in synergy to construct a realm, an isolated and protected execution space. The realm is completely segregated from the normal world in terms of code execution and data access.

Realm: A realm is the core abstraction of CCA. It is a new execution environment that runs parallel to the normal world and secure world (TrustZone). It is hardware-isolated and designed to host sensitive code and data. It is independent of the host OS and Hypervisor, which can manage a realm but cannot access the content within.
Dynamic management: Hypervisor can dynamically create realms and allocate memory and CPU resources to them as required. However, once a realm is initialized, Hypervisor hands over control to the realm management monitor (RMM), a protected secure virtualization module, and can no longer access the data within the realm.
Memory management: CCA extends the system memory management unit (MMU) to identify and isolate realm memory. Any access attempt from outside the realm (including Hypervisor) is blocked by the hardware, ensuring data confidentiality.
Remote attestation: Each CCA-enabled processor has a unique, hardware-based identity. When a realm starts, it can generate an attestation token that is cryptographically signed by the hardware. Users can obtain this report, verify its signature, and check the component measurements to ensure that their workloads are running in a genuine, unaltered Arm CCA environment.

secGear Enhancements

The new architecture for virtCCA cVMs introduces a platform token. When the attestation agent requests data from a cVM, the resulting virtCCA token now includes a platform token. The attestation service has been updated to accommodate this change:

The root and intermediate certificate algorithms have been upgraded from RSA to ECCP521.
The remote attestation process is enhanced with the addition of platform token signature verification and cVM token public key validation, completing the attestation chain.
The system now allows using policies to verify the version and hash of the virtCCA platform software components.
The attestation report now outputs the verification result of the platform software components based on policies.
The report also explicitly outputs whether the current virtCCA supports the platform token.

secGear retains the support for the legacy virtCCA remote attestation process. This is determined by whether the data sent from the attestation agent to the attestation service includes the platform token. If the token is absent, the system uses the legacy virtCCA remote attestation (certificate) process. In this case, the attestation policy cannot verify the related components, and the report will output vcca.is_platform as False, indicating that the current virtCCA implementation does not support the platform token.

virtCCA Enhancements

The current virtCCA architecture has this constraint: it only supports a boot mode where the kernel and rootfs are mounted separately. However, in mainstream cloud platforms, VMs typically rely on a GRUB bootloader. This requires integrating the Unified Extensible Firmware Interface (UEFI) firmware (like EDK2), kernel, and initial RAM file system (initramfs) into a single image, such as a QCOW2 file. The enhanced virtCCA addresses this by providing the following functions:

Single image encapsulation
- Unified boot stack: virtCCA integrates the EDK2 firmware, GRUB bootloader, kernel, and initramfs into a QCOW2 image, creating a complete boot stack.
- Streamlined boot process: GRUB uses a configuration file (grub.cfg) to locate the kernel path, which requires the kernel and initramfs to reside on the same file system, for example, ext4 or XFS.
Secure boot chain
- Secure boot: EDK2 verifies the digital signatures of GRUB and the kernel, ensuring that the boot components have not been tampered with.
- Hardware resource collaboration: virtCCA leverages UEFI runtime services to enumerate hardware devices, providing a virtualized resource pool for VM monitors like KVM.
Cloud native optimization
virtCCA supports snapshot cloning and dynamic rootfs expansion (depending on cloud-init in initramfs).

oeAware Enhancements

Every oeAware plugin is a dynamic library that utilizes oeAware interfaces. The plugins comprise multiple instances that each contains several topics and deliver collection or sensing results to other plugins or external applications for tuning and analysis purposes.

The SDK enables subscription to plugin topics, with a callback function handling data from oeAware. This allows external applications to create tailored functionalities, such as cross-node information collection or local node analysis.
The performance monitoring unit (PMU) information collection plugin gathers performance records from the system PMU.
The Docker information collection plugin retrieves specific parameter details about Docker containers in the environment.
The system information collection plugin captures kernel parameters, thread details, and resource information (CPUs, memory, I/Os, network) from the current environment, as well as the TCP network affinity between local processes.
The thread sensing plugin monitors key information about threads.
The evaluation plugin examines NUMA and network information during service operations, suggesting optimal tuning methods.
The system tuning plugins comprise stealtask for enhanced CPU tuning, smc_tune (SMC-D) which leverages shared memory communication in the kernel space to boost network throughput and reduce latency, xcall_tune which offers code paths that bypass non-critical processes to minimize system call processing overhead, realtime_tune which enables deep isolation and automatic real-time performance optimization, net_hard_irq_tune which dynamically modifies network interrupt affinity to improve network service performance, and NUMA-aware scheduling which enhances NUMA scheduling performance.
The Docker tuning plugins contain cpu_burst which addresses CPU performance issues during sudden load spikes and traffic-based container scheduling which adjusts the CPU affinity of container services based on service loads to improve QoS.

Constraints

smc_tune: SMC acceleration must be enabled before the server-client connection is established. This plugin is most effective in scenarios with numerous persistent connections.
Docker tuning: This plugin is not compatible with Kubernetes containers.
xcall_tune: The FAST_SYSCALL kernel configuration option must be activated.
realtime_tune: This plugin must be used together with the Preempt-RT kernel.
net_hard_irq_tune: This plugin applies only to network communication over TCP.

vKAE Passthrough Live Migration

The Kunpeng Accelerator Engine (KAE) is a hardware acceleration solution based on the new Kunpeng 920 processor model, featuring HPRE, SEC, and ZIP components for encryption, decryption, compression, and decompression. This enables KAE to significantly reduce processor overhead and boost efficiency. KAE passthrough live migration addresses the scenario where VMs are configured with KAE passthrough devices, offering enhanced operational flexibility and continuous service availability.

SMMU dirty page tracking is a key technology for efficient and reliable live migration of passthrough devices. In the Arm architecture, a purely software-based approach to dirty page tracking incurs significant performance overhead. Hardware Translation Table Update (HTTU) solves this by allowing the hardware to automatically update the SMMU page table status. During a write operation, the write permission bit of the corresponding page table entry is automatically set. During a live migration, the write permission bit of the page table is scanned to collect statistics on dirty pages.

Global Trust Authority for Remote Attestation

The Global Trust Authority (GTA) remote attestation component adopts a client-server architecture, supporting remote attestation of TPM/vTPM, VirtCCA, and IMA.

The server provides the remote attestation service framework, which is compatible with trusted computing and confidential computing. It supports the addition, deletion, modification, and query of certificates and policies, quote verification, random number generation, and JWT token generation.
The client collects local TPM evidence and interacts with the server to verify quotes.

This component provides various capabilities in terms of security and usability. GTA provides differentiated security competitiveness such as database integrity protection, data link encryption, anti-replay, SQL injection prevention, user isolation, and key rotation. The passport and background-check models are available. The client supports multiple verification modes, such as scheduled reporting and challenge response. Both the client and server can be deployed using RPM packages and within Docker containers.

Kuasar Confidential Container

Kuasar has expanded its capabilities to include confidential container support while maintaining its existing secure container functionality. This support can be enabled through iSulad runtime configuration. The current Kuasar confidential container implementation leverages the iSulad+Kuasar solution to significantly accelerate boot times and drastically reduce memory overhead. On the one hand, the Sandbox API eliminates the need to create a separate pause container during container creation, saving time spent on preparing the pause container image snapshot. On the other hand, the 1:N management model allows the Sandboxer process to be persistent. This avoids the cold-start time of the Shim process, greatly accelerating container boot and bringing memory benefits proportional to the number of pods. Finally, Kuasar is implemented in Rust. Compared to Go, Rust provides inherent advantages in memory safety and contributes to overall memory efficiency.

Key functions include:

Native integration with the iSulad container engine preserves Kubernetes ecosystem compatibility.
Hardware-level protection via Kunpeng virtCCA technology ensures confidential workloads are deployed in trusted environments.
The secGear remote attestation framework, which complies with the remote attestation procedures (RATS) (RFC9334), allows containers running in a confidential computing environment to prove their trustworthiness to external trusted services.
Container images can be pulled and decrypted in confidential containers to protect their confidentiality and integrity.

Native .NET Development Capabilities

Mono is a complete, cross-platform development tool set and runtime environment compatible with Microsoft's .NET Framework, allowing developers to use the C# language and the .NET Framework class libraries. .NET is an open-source, cross platform developer platform for building various applications. It provides a unified ecosystem encompassing programming languages, a runtime environment, a vast code library, and a rich set of development tools. openEuler offers native development capabilities for the .NET Framework, Mono, and .NET applications.

openEuler supports MonoDevelop and its dependencies: MonoDevelop is a powerful, open-source Integrated Development Environment (IDE) tailored for .NET developers. By configuring the Mono runtime, it supports the development, debugging, code management, compilation, building, and integrated testing of .NET Framework applications on openEuler.
MonoDevelop supports quick deployment: Developers can easily deploy and uninstall MonoDevelop on the oeDeploy platform with a few clicks.
openEuler supports the .NET SDK and its dependencies: The .NET SDK and its dependencies have been adapted and introduced to openEuler, currently supporting up to .NET 9. This enables the development of modern .NET applications directly on openEuler.
.NET SDK supports quick deployment: Developers can deploy and uninstall the .NET SDK on the oeDeploy platform with a few clicks.

Raspberry Pi

As typical open-source hardware products, Raspberry Pi 4B and Raspberry Pi 5 support multiple Linux distributions such as Raspberry Pi OS, Ubuntu, and openEuler. They have extensive peripherals, powerful video encoding and decoding capabilities, LAN on motherboard (LOM), and can be used as independent computer systems.

Raspberry Pi 4B and Raspberry Pi 5 are widely used in many fields:

Education: programming language learning such as Python, and electronic experiments using the peripheral interfaces
Multimedia and entertainment: media center or game console
IoT and smart home: sensor nodes or smart home hubs for environment monitoring, automation control, and edge computing
Server and network applications: home servers, lightweight web services, and containerized applications
DIY projects: robot control, 3D printing management, and drone flight control
Research and development: AI experiments and prototype validation in embedded development
Industrial automation: device monitoring, man-machine interface, and machine vision

Key Feature ​

Scenario-specific AI Innovations ​

OS for AI ​

sysHAX ​

GMEM ​

AI for OS ​

Intelligent Q&A ​

Intelligent Tuning ​

Intelligent Diagnosis ​

AI Cluster Slow-Node Demarcation ​

Intelligent Container Images ​

Embedded ​

Kernel Innovations ​

High-Density Many-Core Container Isolation ​

LLVM for openEuler Compiler ​

oeDeploy Enhancements ​

Go for openEuler Compiler ​

Heap Resizing by BiSheng JDK ​

UDF Automatic Native Framework ​

De-optimization Observability in BiSheng JDK 17 ​

CFGO Enhancements in GCC for openEuler ​

DevStation Enhancements ​

DevStore ​

CCA ​

secGear Enhancements ​

virtCCA Enhancements ​

oeAware Enhancements ​

vKAE Passthrough Live Migration ​

Global Trust Authority for Remote Attestation ​

Kuasar Confidential Container ​

Native .NET Development Capabilities ​

Raspberry Pi ​