Imperceptible Virtualization Management Plane Offload

1 Imperceptible DPU Offload for libvirtd

1.1 Introduction

libvirtd refers to the virtualization management plane. DPU offload for libvirtd means to run libvirtd on a machine (DPU) rather than the one where the VM is located (host).

qtfs is used to mount directories related to VM running on the host to the DPU so that libvirtd can access them and prepare the environment required for running KVM. A dedicated rootfs (/another_rootfs) is created to mount the remote /proc and /sys directories required for running libvirtd.

In addition, rexec is used to start and delete VMs, allowing libvirtd and VMs to be separated on different machines for remote VM control.

1.2 Components

1.2.1 rexec

rexec is a Go-based tool that allows you to remotely execute commands on a peer server.

1.2.2 Changes to libvirt

2 Environment Requirements

Physical machine OS: openEuler 22.03 LTS or later

libvirt version: 6.9.0

QEMU version: 6.2.0

Prepare the files:

  1. On the DPU and host, download 0001-libvirt_6.9.0_1201_offload.patch, 0003-fix-get-affinity.patch, and 0004-qmp-port-manage.patch.

  2. On the DPU and host, clone the dpu-utilities repository and perform build in the qtfs/rexec directory:

    git clone https://gitee.com/openeuler/dpu-utilities.git
    cd dpu-utilities/qtfs/rexec/
    make
    yes | cp ./rexec* /usr/bin/
    
  3. Download the libvirt-6.9.0.tat.xz software package and the qtfs.ko library file to the DPU:

  4. Install QEMU on the host:

    yum install qemu
    

3 Operation Guide

Note:

  1. rexec_server must be started on both the host and DPU. You can specify [DPU_IP_address]:[DPU_rexec_port_number] to remotely operate a binary file on the DPU from the host, and vice versa.
  2. When creating a VM, the DPU uses rexec_server on the host to start qemu-kvm.

3.1 Starting rexec_server

3.1.1 Copying Binary Files

Copy rexec_server to the DPU and host.

cp rexec_server /usr/bin/
chmod +x rexec_server

3.1.2 Configuring the rexec_server Service

rexec_server can be configured as a systemd service for convenience.

  1. Create a rexec.service file in the /usr/lib/systemd/system/ directory on the DPU and host. The file content is as follows. Change the port number as required:

    [Unit]
    Description=Rexec_server Service
    After=network.target
    
    [Service]
    Type=simple
    Environment=CMD_NET_ADDR=tcp://0.0.0.0:<port_number>
    ExecStart=/usr/bin/rexec_server
    ExecReload=/bin/kill -s HUP $
    KillMode=process
    
    [Install]
    WantedBy=multi-user.target
    
  2. Run systemctl start rexec to start the rexec_server service.

systemctl daemon-reload
systemctl enable --now rexec

3.1.3 rexec Operation Example

Once the rexec_server service is configured, you can invoke binary files on the host from the DPU. To do this, copy the rexec binary file to /usr/bin and then run the following command:

CMD_NET_ADDR=tcp://<host_ip>:<host_rexec_server_port> rexec [command_to_be_executed]

For example, to run ls on the host (assuming that the host IP address is 192.168.1.1 and the rexec_server port number is 6666) from the DPU, run the following command:

CMD_NET_ADDR=tcp://192.168.1.1:6666 rexec /usr/bin/ls

Note:

If you do not want to start rexec_server as a systemd service, run the following command to manually start rexec_server:

CMD_NET_ADDR=tcp://0.0.0.0:<port_number> rexec_server

3.2 Preparing the Rootfs for Running libvirtd

Note: Perform this step only on the DPU.

Assume that the rootfs is /another_rootfs (you can change the name as required). Prepare the rootfs by following the instructions in 3.2.1 or 3.2.2 (the latter is recommended). After the rootfs is prepared, you can install software packages in /another_rootfs by referring to 3.4.1.

3.2.1 Copying the Root Directory

In most cases, you only need to copy the root directory to /another_rootfs.

Run the following commands to perform the copy operations:

mkdir /another_rootfs
cp -r /usr /another_rootfs
cp -r /sbin /another_rootfs
cp -r /bin /another_rootfs
cp -r /lib64 /another_rootfs
cp -r /lib /another_rootfs
mkdir /another_rootfs/boot
mkdir /another_rootfs/dev
mkdir /another_rootfs/etc
mkdir /another_rootfs/home
mkdir /another_rootfs/mnt
mkdir /another_rootfs/opt
mkdir /another_rootfs/proc
mkdir /another_rootfs/root
mkdir /another_rootfs/run
mkdir /another_rootfs/var
mkdir /another_rootfs/etc
mkdir /another_rootfs/sys
mkdir /another_rootfs/local_proc

3.2.2 Using the Official QCOW2 Image of openEuler

If the root directory is not completely clean, you can use a QCOW2 image provided by the openEuler community to prepare a new rootfs.

3.2.2.1 Installing Tools

Use Yum to install xz, kpartx, and qemu-img.

yum install xz kpartx qemu-img
3.2.2.2 Downloading the QCOW2 Image

Download the openEuler 22.03 LTS image for x86 VMs or the openEuler 22.03 LTS image for ARM64 VMs from the openEuler website.

3.2.2.3 Decompressing the QCOW2 Image

Decompress the downloaded package using the xz -d command to obtain an openEuler-22.03-LTS-<arch>.qcow2 file. The following uses the x86 image as an example.

xz -d openEuler-22.03-LTS-x86_64.qcow2.xz
3.2.2.4 Mounting the QCOW2 Image and Copying Files
  1. Run the modprobe nbd maxpart=<any_number> command to load the nbd module.
  2. Run qemu-nbd -c /dev/nbd0 <VM_image_path>.
  3. Create an arbitrary folder /random_dir.
  4. Run mount /dev/nbd0p2 /random_dir.
  5. Copy files.
mkdir /another_rootfs
cp -r /random_dir/* /another_rootfs/

The VM image has been mounted to /another_rootfs.

3.2.2.5 Unmounting the QCOW2 File

After the rootfs is prepared, run the following commands to unmount the QCOW2 file:

umount /random_dir
qemu-nbd -d /dev/nbd0

3.3 Starting qtfs_server on the Host

Create the folder required by the container management plane, insert qtfs_server.ko, and start the engine process.

You can run the following script to perform these operations. If an error occurs during the execution, you may need to convert the format of the script using dos2unix (the same applies to all the following scripts). In the last two lines, replace the paths of qtfs_server.ko and engine with the actual paths.

#!/bin/bash
mkdir /var/lib/libvirt

insmod <ko_path>/qtfs_server.ko qtfs_server_ip=0.0.0.0 qtfs_log_level=INFO # Replace with the actual path.
<engine_path>/engine 4096 16 # Replace with the actual path.

3.4 Mounting the Dependent Directories on the Host to the DPU

3.4.1 Installing the Software Package

3.4.2.1 Installing to the Root Directory
  1. Install libvirt-client in /another_rootfs.
yum install libvirt-client
3.4.2.2 Configuring the another_rootfs Environment
  1. Install libvirtd in /another_rootfs.

    cd /another_rootfs
    tar -xf <path_to>/libvirtd-6.9.0.tar.xz # Replace with the actual path to libvirtd-6.9.0.tar.xz.
    cd libvirtd-6.9.0
    patch -p1 < 0001-libvirt_6.9.0_1201_offload.patch # Replace with the actual path to the libvirt patches to apply the patches in sequence.
    patch -p1 < 0003-fix-get-affinity.patch # Replace with the actual path to the libvirt patches to apply the patches in sequence.
    patch -p1 < 0004-qmp-port-manage.patch # Replace with the actual path to the libvirt patches to apply the patches in sequence.
    chroot /another_rootfs
    yum groupinstall "Development tools" -y
    yum install -y vim meson qemu qemu-img strace edk2-aarch64 tar
    
    yum install -y rpcgen python3-docutils glib2-devel gnutls-devel libxml2-devel libpciaccess-devel libtirpc-devel yajl-devel systemd-devel dmidecode glusterfs-api numactl
    
    cd /libvirtd-6.9.0
    
    CFLAGS='-Wno-error=format -Wno-error=int-conversion -Wno-error=implicit-function-declaration -Wno-error=nested-externs -Wno-error=declaration-after-statement -Wno-error=unused-result -Wno-error=missing-prototypes -Wno-error=int-conversion -Wno-error=unused-parameter -Wno-error=unused-variable -Wno-error=pointer-sign -Wno-error=discarded-qualifiers -Wno-error=unused-function' meson build --prefix=/usr -Ddriver_remote=enabled -Ddriver_network=enabled -Ddriver_qemu=enabled -Dtests=disabled -Ddocs=enabled -Ddriver_libxl=disabled -Ddriver_esx=disabled -Dsecdriver_selinux=disabled -Dselinux=disabled
    
    ninja -C build install
    exit
    
  2. Copy rexec to /another_rootfs/usr/bin and grant it the execute permission.

    cp rexec /another_rootfs/usr/bin
    chmod +x /another_rootfs/usr/bin/rexec
    
  3. In /another_rootfs, run the following script to create /usr/bin/qemu-kvm and /usr/libexec/qemu-kvm. Before running the script, replace <host_ip> and <rexec_server_port> with the host IP address and rexec_server port number on the host, respectively.

    $ chroot /another_rootfs
    $ touch /usr/bin/qemu-kvm
    $ touch /usr/libexec/qemu-kvm
    $ cat > /usr/bin/qemu-kvm <<EOF
    #!/bin/bash
    host=<host_ip>
    port=<rexec_server_port>
    CMD_NET_ADDR=tcp://\$host:\$port exec /usr/bin/rexec /usr/bin/qemu-kvm \$*
    EOF
    cat > /usr/libexec/qemu-kvm <<EOF
    #!/bin/bash
    host=<host_ip>
    port=<rexec_server_port>
    CMD_NET_ADDR=tcp://\$host:\$port exec /usr/bin/rexec /usr/bin/qemu-kvm \$*
    EOF
    $ chmod +x /usr/libexec/qemu-kvm
    $ chmod +x /usr/bin/qemu-kvm
    $ exit
    
3.4.2.3 Mounting Directories

Run the following script on the DPU to mount the host directories required by libvirtd to the DPU.

Ensure that the remote directories that will be mounted in the following script (prepare.sh) exist on both the host and DPU.

#!/bin/bash
insmod <qtfs.ko_path>/qtfs.ko qtfs_server_ip=<server_ip_address> qtfs_log_level=INFO # Change <qtfs.ko_path> and <server_ip_address>.

systemctl stop libvirtd
 
mkdir -p /var/run/rexec/pids
cat >/var/run/rexec/qmpport << EOF
<qmp_port_number>
EOF
cat > /var/run/rexec/hostaddr <<EOF
<server_ip_address>
EOF
cat > /var/run/rexec/rexecport << EOF
<rexec_port_number>
EOF

rm -f `find /var/run/libvirt/ -name "*.pid"`
rm -f /var/run/libvirtd.pid

if [ ! -d "/another_rootfs/local_proc" ]; then
    mkdir -p /another_rootfs/local_proc
fi
if [ ! -d "/another_rootfs/local" ]; then
    mkdir -p /another_rootfs/local
fi
mount -t proc proc /another_rootfs/local_proc/
mount -t proc proc /another_rootfs/local/proc
mount -t sysfs sysfs /another_rootfs/local/sys
mount --bind /var/run/ /another_rootfs/var/run/
mount --bind /var/lib/ /another_rootfs/var/lib/
mount --bind /var/cache/ /another_rootfs/var/cache
mount --bind /etc /another_rootfs/etc

mkdir -p /another_rootfs/home/VMs/
mount -t qtfs /home/VMs/ /another_rootfs/home/VMs/

mount -t qtfs /var/lib/libvirt /another_rootfs/var/lib/libvirt

mount -t devtmpfs devtmpfs /another_rootfs/dev/
mount -t hugetlbfs hugetlbfs /another_rootfs/dev/hugepages/
mount -t mqueue mqueue /another_rootfs/dev/mqueue/
mount -t tmpfs tmpfs /another_rootfs/dev/shm

mount -t sysfs sysfs /another_rootfs/sys
mkdir -p /another_rootfs/sys/fs/cgroup
mount -t tmpfs tmpfs /another_rootfs/sys/fs/cgroup
list="perf_event freezer files net_cls,net_prio hugetlb pids rdma cpu,cpuacct memory devices blkio cpuset"
for i in $list
do
        echo $i
        mkdir -p /another_rootfs/sys/fs/cgroup/$i
        mount -t cgroup cgroup -o rw,nosuid,nodev,noexec,relatime,$i /another_rootfs/sys/fs/cgroup/$i
done

# common system dir
mount -t qtfs -o proc /proc /another_rootfs/proc
echo "proc"

mount -t qtfs /sys /another_rootfs/sys
echo "cgroup"
mount -t qtfs /dev/pts /another_rootfs/dev/pts
mount -t qtfs /dev/vfio /another_rootfs/dev/vfio

3.5 Starting libvirtd

On the DPU, open a session and change the root directory to /another_rootfs.

chroot /another_rootfs

Run the following commands to start virtlogd and libvirtd:

#!/bin/bash
virtlogd -d
libvirtd -d

Because /var/run/ has been bound to /another_rootfs/var/run/, you can use virsh to access libvirtd in a common rootfs to manage containers.

4 Environment Restoration

To unmount related directories, run the following commands:

#!/bin/bash

umount /root/p2/dev/hugepages
umount /root/p2/etc
umount /root/p2/home/VMs
umount /root/p2/local_proc
umount /root/p2/var/lib/libvirt
umount /root/p2/var/lib
umount /root/p2/*
umount /root/p2/dev/pts
umount /root/p2/dev/mqueue
umount /root/p2/dev/shm
umount /root/p2/dev/vfio
umount /root/p2/dev
rmmod qtfs

umount /root/p2/sys/fs/cgroup/*
umount /root/p2/sys/fs/cgroup
umount /root/p2/sys

Bug Catching

Buggy Content

Bug Description

Submit As Issue

It's a little complicated....

I'd like to ask someone.

PR

Just a small problem.

I can fix it online!

Bug Type
Specifications and Common Mistakes

● Misspellings or punctuation mistakes;

● Incorrect links, empty cells, or wrong formats;

● Chinese characters in English context;

● Minor inconsistencies between the UI and descriptions;

● Low writing fluency that does not affect understanding;

● Incorrect version numbers, including software package names and version numbers on the UI.

Usability

● Incorrect or missing key steps;

● Missing prerequisites or precautions;

● Ambiguous figures, tables, or texts;

● Unclear logic, such as missing classifications, items, and steps.

Correctness

● Technical principles, function descriptions, or specifications inconsistent with those of the software;

● Incorrect schematic or architecture diagrams;

● Incorrect commands or command parameters;

● Incorrect code;

● Commands inconsistent with the functions;

● Wrong screenshots.

Risk Warnings

● Lack of risk warnings for operations that may damage the system or important data.

Content Compliance

● Contents that may violate applicable laws and regulations or geo-cultural context-sensitive words and expressions;

● Copyright infringement.

How satisfied are you with this document

Not satisfied at all
Very satisfied
Submit
Click to create an issue. An issue template will be automatically generated based on your feedback.