LTS

    Innovation Version

      Imperceptible Virtualization Management Plane Offload

      1 Imperceptible DPU Offload for libvirtd

      1.1 Introduction

      libvirtd refers to the virtualization management plane. DPU offload for libvirtd means to run libvirtd on a machine (DPU) rather than the one where the VM is located (host).

      qtfs is used to mount directories related to VM running on the host to the DPU so that libvirtd can access them and prepare the environment required for running KVM. A dedicated rootfs (/another_rootfs) is created to mount the remote /proc and /sys directories required for running libvirtd.

      In addition, rexec is used to start and delete VMs, allowing libvirtd and VMs to be separated on different machines for remote VM control.

      1.2 Components

      1.2.1 rexec

      rexec is a C-based tool that allows you to remotely execute commands on a peer server. rexec consists of the rexec client and rexec server. The server runs as a daemon. The client binary file connects to the server through unix domain socket (UDS) using the udsproxyd service. The server daemon then starts a specified program on the server. During libvirt virtualization offload, libvirtd is offloaded to the DPU. When libvirtd needs to start a QEMU process on the host, it calls the rexec client to remotely start the process.

      2 Environment Requirements

      Physical machine OS: openEuler 22.03 LTS or later

      libvirt version: 6.9.0

      QEMU version: 6.2.0

      Prepare the files:

      1. On the DPU and host, download 0001-libvirt_6.9.0_1201_offload.patch, 0003-fix-get-affinity.patch, and 0004-qmp-port-manage.patch.

      2. On the DPU and host, clone the dpu-utilities repository and perform build in the qtfs/rexec directory:

        git clone https://gitee.com/openeuler/dpu-utilities.git
        cd dpu-utilities/qtfs/rexec/
        make
        yes | cp ./rexec* /usr/bin/
        
      3. Download the libvirt-6.9.0.tat.xz software package and the qtfs.ko library file to the DPU:

      4. Install QEMU on the host:

        yum install qemu
        

      3 Operation Guide

      Note:

      1. rexec_server must be started on both the host and DPU. You can specify [DPU_IP_address]:[DPU_rexec_port_number] to remotely operate a binary file on the DPU from the host, and vice versa.
      2. When creating a VM, the DPU uses rexec_server on the host to start qemu-kvm.

      3.1 Starting rexec_server

      3.1.1 Copying Binary Files

      Copy rexec_server to the DPU and host.

      cp rexec_server /usr/bin/
      chmod +x rexec_server
      

      3.1.2 Configuring the rexec_server Service

      rexec_server can be configured as a systemd service for convenience.

      1. Create a rexec.service file in the /usr/lib/systemd/system/ directory on the DPU and host. The file content is as follows. Change the port number as required:

        [Unit]
        Description=Rexec_server Service
        After=network.target
        
        [Service]
        Type=simple
        Environment=CMD_NET_ADDR=tcp://0.0.0.0:<port_number>
        ExecStart=/usr/bin/rexec_server
        ExecReload=/bin/kill -s HUP $
        KillMode=process
        
        [Install]
        WantedBy=multi-user.target
        
      2. Run systemctl start rexec to start the rexec_server service.

      systemctl daemon-reload
      systemctl enable --now rexec
      

      3.1.3 rexec Usage Example

      Once the rexec_server service is configured, you can invoke binary files on the host from the DPU. To do this, copy the rexec binary file to /usr/bin and then run the following command:

      CMD_NET_ADDR=tcp://<host_ip>:<host_rexec_server_port> rexec [command_to_be_executed]
      

      For example, to run ls on the host (assuming that the host IP address is 192.168.1.1 and the rexec_server port number is 6666) from the DPU, run the following command:

      CMD_NET_ADDR=tcp://192.168.1.1:6666 rexec /usr/bin/ls
      

      Note:

      If you do not want to start rexec_server as a systemd service, run the following command to manually start rexec_server:

      CMD_NET_ADDR=tcp://0.0.0.0:<port_number> rexec_server

      3.2 Preparing the Rootfs for Running libvirtd

      Note: Perform this step only on the DPU.

      Assume that the rootfs is /another_rootfs (you can change the name as required). Prepare the rootfs by following the instructions in 3.2.1 or 3.2.2 (the latter is recommended). After the rootfs is prepared, you can install software packages in /another_rootfs by referring to 3.5.1.

      3.2.1 Copying the Root Directory

      In most cases, you only need to copy the root directory to /another_rootfs.

      Run the following commands to perform the copy operations:

      mkdir /another_rootfs
      cp -r /usr /another_rootfs
      cp -r /sbin /another_rootfs
      cp -r /bin /another_rootfs
      cp -r /lib64 /another_rootfs
      cp -r /lib /another_rootfs
      mkdir /another_rootfs/boot
      mkdir /another_rootfs/dev
      mkdir /another_rootfs/etc
      mkdir /another_rootfs/home
      mkdir /another_rootfs/mnt
      mkdir /another_rootfs/opt
      mkdir /another_rootfs/proc
      mkdir /another_rootfs/root
      mkdir /another_rootfs/run
      mkdir /another_rootfs/var
      mkdir /another_rootfs/etc
      mkdir /another_rootfs/sys
      mkdir /another_rootfs/local_proc
      

      3.2.2 Using the Official QCOW2 Image of openEuler

      If the root directory is not completely clean, you can use a QCOW2 image provided by the openEuler community to prepare a new rootfs.

      3.2.2.1 Installing Tools

      Use Yum to install xz, kpartx, and qemu-img.

      yum install xz kpartx qemu-img
      
      3.2.2.2 Downloading the QCOW2 Image

      Download the openEuler 22.03 LTS image for x86 VMs or the openEuler 22.03 LTS image for ARM64 VMs from the openEuler website.

      3.2.2.3 Decompressing the QCOW2 Image

      Decompress the downloaded package using the xz -d command to obtain an openEuler-22.03-LTS-<arch>.qcow2 file. The following uses the x86 image as an example.

      xz -d openEuler-22.03-LTS-x86_64.qcow2.xz
      
      3.2.2.4 Mounting the QCOW2 Image and Copying Files
      1. Run the modprobe nbd maxpart=<any_number> command to load the nbd module.
      2. Run qemu-nbd -c /dev/nbd0 <VM_image_path>.
      3. Create an arbitrary folder /random_dir.
      4. Run mount /dev/nbd0p2 /random_dir.
      5. Copy files.
      mkdir /another_rootfs
      cp -r /random_dir/* /another_rootfs/
      

      The VM image has been mounted to /another_rootfs.

      3.2.2.5 Unmounting the QCOW2 Image

      After the rootfs is prepared, run the following commands to unmount the QCOW2 image:

      umount /random_dir
      qemu-nbd -d /dev/nbd0
      

      3.3 Starting qtfs_server on the Host

      Create the folder required by the container management plane, insert qtfs_server.ko, and start the engine process.

      You can run the following script to perform these operations. If an error occurs during the execution, you may need to convert the format of the script using dos2unix (the same applies to all the following scripts). In the last two lines, replace the paths of qtfs_server.ko and engine with the actual paths.

      #!/bin/bash
      mkdir /var/lib/libvirt
      
      insmod <ko_path>/qtfs_server.ko qtfs_server_ip=0.0.0.0 qtfs_log_level=INFO # Replace with the actual path.
      <engine_path>/engine 4096 16 # Replace with the actual path.
      

      3.4 Deploying the udsproxyd Service

      3.4.1 Introduction

      udsproxyd is a cross-host UDS proxy service, which needs to be deployed on both the host and DPU. The udsproxyd components on the host and dpu are peers. They implement seamless UDS communication between the host and DPU, which means that if two processes can communicate with each other through UDS on the same host, they can do the same between the host and DPU. The code of the processes does not need to be modified, only that the client process needs to run with the LD_PRELOAD=libudsproxy.so environment variable.

      3.4.2 Deploying udsproxyd

      Build udsproxyd in the dpu-utilities project:

      cd qtfs/ipc
      make && make install
      

      The engine service on the qtfs server has incorporated the udsproxyd feature. You do not need to manually start udsproxyd if the qtfs server is deployed. However, you need to start udsproxyd on the client by running the following command:

      nohup /usr/bin/udsproxyd <thread num> <addr> <port> <peer addr> <peer port> 2>&1 &
      

      Parameters:

      thread num: number of threads. Currently, only one thread is supported.
      addr: IP address of the host.
      port:Port used on the host.
      peer addr: IP address of the udsproxyd peer
      peer port: port used on the udsproxyd peer
      

      Example:

      nohup /usr/bin/udsproxyd 1 192.168.10.10 12121 192.168.10.11 12121 2>&1 &

      If the qtfs engine service is not started, you can start udsproxyd on the server to test udsproxyd separately. Run the following command:

      nohup /usr/bin/udsproxyd 1 192.168.10.11 12121 192.168.10.10 12121 2>&1 &
      

      Then, copy libudsproxy.so, which will be used by the libvirtd service, to the /usr/lib64 directory in the changed root directory of libvirt.

      3.5 Mounting the Dependent Directories on the Host to the DPU

      3.5.1 Installing the Software Package

      3.5.2.1 Installing to the Root Directory
      1. Install libvirt-client in /another_rootfs.
      yum install libvirt-client
      
      3.5.2.2 Configuring the another_rootfs Environment
      1. Install libvirtd in /another_rootfs.

        cd /another_rootfs
        tar -xf <path_to>/libvirtd-6.9.0.tar.xz # Replace with the actual path to libvirtd-6.9.0.tar.xz.
        cd libvirtd-6.9.0
        patch -p1 < 0001-libvirt_6.9.0_1201_offload.patch # Replace with the actual path to the libvirt patches to apply the patches in sequence.
        patch -p1 < 0003-fix-get-affinity.patch # Replace with the actual path to the libvirt patches to apply the patches in sequence.
        patch -p1 < 0004-qmp-port-manage.patch # Replace with the actual path to the libvirt patches to apply the patches in sequence.
        chroot /another_rootfs
        yum groupinstall "Development tools" -y
        yum install -y vim meson qemu qemu-img strace edk2-aarch64 tar
        
        yum install -y rpcgen python3-docutils glib2-devel gnutls-devel libxml2-devel libpciaccess-devel libtirpc-devel yajl-devel systemd-devel dmidecode glusterfs-api numactl
        
        cd /libvirtd-6.9.0
        
        CFLAGS='-Wno-error=format -Wno-error=int-conversion -Wno-error=implicit-function-declaration -Wno-error=nested-externs -Wno-error=declaration-after-statement -Wno-error=unused-result -Wno-error=missing-prototypes -Wno-error=int-conversion -Wno-error=unused-parameter -Wno-error=unused-variable -Wno-error=pointer-sign -Wno-error=discarded-qualifiers -Wno-error=unused-function' meson build --prefix=/usr -Ddriver_remote=enabled -Ddriver_network=enabled -Ddriver_qemu=enabled -Dtests=disabled -Ddocs=enabled -Ddriver_libxl=disabled -Ddriver_esx=disabled -Dsecdriver_selinux=disabled -Dselinux=disabled
        
        ninja -C build install
        exit
        
      2. Copy rexec to /another_rootfs/usr/bin and grant it the execute permission.

        cp rexec /another_rootfs/usr/bin
        chmod +x /another_rootfs/usr/bin/rexec
        
      3. In /another_rootfs, run the following script to create /usr/bin/qemu-kvm and /usr/libexec/qemu-kvm. Before running the script, replace <host_ip> and <rexec_server_port> with the host IP address and rexec_server port number on the host, respectively.

        $ chroot /another_rootfs
        $ touch /usr/bin/qemu-kvm
        $ touch /usr/libexec/qemu-kvm
        $ cat > /usr/bin/qemu-kvm <<EOF
        #!/bin/bash
        host=<host_ip>
        port=<rexec_server_port>
        CMD_NET_ADDR=tcp://\$host:\$port exec /usr/bin/rexec /usr/bin/qemu-kvm \$*
        EOF
        cat > /usr/libexec/qemu-kvm <<EOF
        #!/bin/bash
        host=<host_ip>
        port=<rexec_server_port>
        CMD_NET_ADDR=tcp://\$host:\$port exec /usr/bin/rexec /usr/bin/qemu-kvm \$*
        EOF
        $ chmod +x /usr/libexec/qemu-kvm
        $ chmod +x /usr/bin/qemu-kvm
        $ exit
        
      3.5.2.3 Mounting Directories

      Run the following script on the DPU to mount the host directories required by libvirtd to the DPU.

      Ensure that the remote directories that will be mounted in the following script (prepare.sh) exist on both the host and DPU.

      #!/bin/bash
      insmod <qtfs.ko_path>/qtfs.ko qtfs_server_ip=<server_ip_address> qtfs_log_level=INFO # Change <qtfs.ko_path> and <server_ip_address>.
      
      systemctl stop libvirtd
       
      mkdir -p /var/run/rexec/pids
      cat >/var/run/rexec/qmpport << EOF
      <qmp_port_number>
      EOF
      cat > /var/run/rexec/hostaddr <<EOF
      <server_ip_address>
      EOF
      cat > /var/run/rexec/rexecport << EOF
      <rexec_port_number>
      EOF
      
      rm -f `find /var/run/libvirt/ -name "*.pid"`
      rm -f /var/run/libvirtd.pid
      
      if [ ! -d "/another_rootfs/local_proc" ]; then
          mkdir -p /another_rootfs/local_proc
      fi
      if [ ! -d "/another_rootfs/local" ]; then
          mkdir -p /another_rootfs/local
      fi
      mount -t proc proc /another_rootfs/local_proc/
      mount -t proc proc /another_rootfs/local/proc
      mount -t sysfs sysfs /another_rootfs/local/sys
      mount --bind /var/run/ /another_rootfs/var/run/
      mount --bind /var/lib/ /another_rootfs/var/lib/
      mount --bind /var/cache/ /another_rootfs/var/cache
      mount --bind /etc /another_rootfs/etc
      
      mkdir -p /another_rootfs/home/VMs/
      mount -t qtfs /home/VMs/ /another_rootfs/home/VMs/
      
      mount -t qtfs /var/lib/libvirt /another_rootfs/var/lib/libvirt
      
      mount -t devtmpfs devtmpfs /another_rootfs/dev/
      mount -t hugetlbfs hugetlbfs /another_rootfs/dev/hugepages/
      mount -t mqueue mqueue /another_rootfs/dev/mqueue/
      mount -t tmpfs tmpfs /another_rootfs/dev/shm
      
      mount -t sysfs sysfs /another_rootfs/sys
      mkdir -p /another_rootfs/sys/fs/cgroup
      mount -t tmpfs tmpfs /another_rootfs/sys/fs/cgroup
      list="perf_event freezer files net_cls,net_prio hugetlb pids rdma cpu,cpuacct memory devices blkio cpuset"
      for i in $list
      do
              echo $i
              mkdir -p /another_rootfs/sys/fs/cgroup/$i
              mount -t cgroup cgroup -o rw,nosuid,nodev,noexec,relatime,$i /another_rootfs/sys/fs/cgroup/$i
      done
      
      # common system dir
      mount -t qtfs -o proc /proc /another_rootfs/proc
      echo "proc"
      
      mount -t qtfs /sys /another_rootfs/sys
      echo "cgroup"
      mount -t qtfs /dev/pts /another_rootfs/dev/pts
      mount -t qtfs /dev/vfio /another_rootfs/dev/vfio
      

      3.6 Starting libvirtd

      On the DPU, open a session and change the root directory to /another_rootfs.

      chroot /another_rootfs
      

      Run the following commands to start virtlogd and libvirtd:

      #!/bin/bash
      virtlogd -d
      libvirtd -d
      

      Because /var/run/ has been bound to /another_rootfs/var/run/, you can use virsh to access libvirtd in a common rootfs to manage containers.

      4 Environment Restoration

      To unmount related directories, run the following commands:

      #!/bin/bash
      
      umount /root/p2/dev/hugepages
      umount /root/p2/etc
      umount /root/p2/home/VMs
      umount /root/p2/local_proc
      umount /root/p2/var/lib/libvirt
      umount /root/p2/var/lib
      umount /root/p2/*
      umount /root/p2/dev/pts
      umount /root/p2/dev/mqueue
      umount /root/p2/dev/shm
      umount /root/p2/dev/vfio
      umount /root/p2/dev
      rmmod qtfs
      
      umount /root/p2/sys/fs/cgroup/*
      umount /root/p2/sys/fs/cgroup
      umount /root/p2/sys
      

      Bug Catching

      Buggy Content

      Bug Description

      Submit As Issue

      It's a little complicated....

      I'd like to ask someone.

      PR

      Just a small problem.

      I can fix it online!

      Bug Type
      Specifications and Common Mistakes

      ● Misspellings or punctuation mistakes;

      ● Incorrect links, empty cells, or wrong formats;

      ● Chinese characters in English context;

      ● Minor inconsistencies between the UI and descriptions;

      ● Low writing fluency that does not affect understanding;

      ● Incorrect version numbers, including software package names and version numbers on the UI.

      Usability

      ● Incorrect or missing key steps;

      ● Missing prerequisites or precautions;

      ● Ambiguous figures, tables, or texts;

      ● Unclear logic, such as missing classifications, items, and steps.

      Correctness

      ● Technical principles, function descriptions, or specifications inconsistent with those of the software;

      ● Incorrect schematic or architecture diagrams;

      ● Incorrect commands or command parameters;

      ● Incorrect code;

      ● Commands inconsistent with the functions;

      ● Wrong screenshots.

      Risk Warnings

      ● Lack of risk warnings for operations that may damage the system or important data.

      Content Compliance

      ● Contents that may violate applicable laws and regulations or geo-cultural context-sensitive words and expressions;

      ● Copyright infringement.

      How satisfied are you with this document

      Not satisfied at all
      Very satisfied
      Submit
      Click to create an issue. An issue template will be automatically generated based on your feedback.
      Bug Catching
      编组 3备份