Imperceptible Virtualization Management Plane Offload
1 Imperceptible DPU Offload for libvirtd
1.1 Introduction
libvirtd refers to the virtualization management plane. DPU offload for libvirtd means to run libvirtd on a machine (DPU) rather than the one where the VM is located (host).
qtfs is used to mount directories related to VM running on the host to the DPU so that libvirtd can access them and prepare the environment required for running KVM. A dedicated rootfs (/another_rootfs) is created to mount the remote /proc and /sys directories required for running libvirtd.
In addition, rexec is used to start and delete VMs, allowing libvirtd and VMs to be separated on different machines for remote VM control.
1.2 Components
1.2.1 rexec
rexec is a C-based tool that allows you to remotely execute commands on a peer server. rexec consists of the rexec client and rexec server. The server runs as a daemon. The client binary file connects to the server through unix domain socket (UDS) using the udsproxyd service. The server daemon then starts a specified program on the server. During libvirt virtualization offload, libvirtd is offloaded to the DPU. When libvirtd needs to start a QEMU process on the host, it calls the rexec client to remotely start the process.
2 Environment Requirements
Physical machine OS: openEuler 22.03 LTS or later
libvirt version: 6.9.0
QEMU version: 6.2.0
Prepare the files:
On the DPU and host, download 0001-libvirt_6.9.0_1201_offload.patch, 0003-fix-get-affinity.patch, and 0004-qmp-port-manage.patch.
On the DPU and host, clone the dpu-utilities repository and perform build in the qtfs/rexec directory:
git clone https://gitee.com/openeuler/dpu-utilities.git cd dpu-utilities/qtfs/rexec/ make yes | cp ./rexec* /usr/bin/
Download the libvirt-6.9.0.tat.xz software package and the qtfs.ko library file to the DPU:
Install QEMU on the host:
yum install qemu
3 Operation Guide
Note:
- rexec_server must be started on both the host and DPU. You can specify [DPU_IP_address]:[DPU_rexec_port_number] to remotely operate a binary file on the DPU from the host, and vice versa.
- When creating a VM, the DPU uses rexec_server on the host to start qemu-kvm.
3.1 Starting rexec_server
3.1.1 Copying Binary Files
Copy rexec_server to the DPU and host.
cp rexec_server /usr/bin/
chmod +x rexec_server
3.1.2 Configuring the rexec_server Service
rexec_server can be configured as a systemd service for convenience.
Create a rexec.service file in the /usr/lib/systemd/system/ directory on the DPU and host. The file content is as follows. Change the port number as required:
[Unit] Description=Rexec_server Service After=network.target [Service] Type=simple Environment=CMD_NET_ADDR=tcp://0.0.0.0:<port_number> ExecStart=/usr/bin/rexec_server ExecReload=/bin/kill -s HUP $ KillMode=process [Install] WantedBy=multi-user.target
Run
systemctl start rexec
to start the rexec_server service.
systemctl daemon-reload
systemctl enable --now rexec
3.1.3 rexec Usage Example
Once the rexec_server service is configured, you can invoke binary files on the host from the DPU. To do this, copy the rexec binary file to /usr/bin and then run the following command:
CMD_NET_ADDR=tcp://<host_ip>:<host_rexec_server_port> rexec [command_to_be_executed]
For example, to run ls
on the host (assuming that the host IP address is 192.168.1.1 and the rexec_server port number is 6666) from the DPU, run the following command:
CMD_NET_ADDR=tcp://192.168.1.1:6666 rexec /usr/bin/ls
Note:
If you do not want to start rexec_server as a systemd service, run the following command to manually start rexec_server:
CMD_NET_ADDR=tcp://0.0.0.0:<port_number> rexec_server
3.2 Preparing the Rootfs for Running libvirtd
Note: Perform this step only on the DPU.
Assume that the rootfs is /another_rootfs (you can change the name as required). Prepare the rootfs by following the instructions in 3.2.1 or 3.2.2 (the latter is recommended). After the rootfs is prepared, you can install software packages in /another_rootfs by referring to 3.5.1.
3.2.1 Copying the Root Directory
In most cases, you only need to copy the root directory to /another_rootfs.
Run the following commands to perform the copy operations:
mkdir /another_rootfs
cp -r /usr /another_rootfs
cp -r /sbin /another_rootfs
cp -r /bin /another_rootfs
cp -r /lib64 /another_rootfs
cp -r /lib /another_rootfs
mkdir /another_rootfs/boot
mkdir /another_rootfs/dev
mkdir /another_rootfs/etc
mkdir /another_rootfs/home
mkdir /another_rootfs/mnt
mkdir /another_rootfs/opt
mkdir /another_rootfs/proc
mkdir /another_rootfs/root
mkdir /another_rootfs/run
mkdir /another_rootfs/var
mkdir /another_rootfs/etc
mkdir /another_rootfs/sys
mkdir /another_rootfs/local_proc
3.2.2 Using the Official QCOW2 Image of openEuler
If the root directory is not completely clean, you can use a QCOW2 image provided by the openEuler community to prepare a new rootfs.
3.2.2.1 Installing Tools
Use Yum to install xz, kpartx, and qemu-img.
yum install xz kpartx qemu-img
3.2.2.2 Downloading the QCOW2 Image
Download the openEuler 22.03 LTS image for x86 VMs or the openEuler 22.03 LTS image for ARM64 VMs from the openEuler website.
3.2.2.3 Decompressing the QCOW2 Image
Decompress the downloaded package using the xz -d
command to obtain an openEuler-22.03-LTS-<arch>.qcow2 file. The following uses the x86 image as an example.
xz -d openEuler-22.03-LTS-x86_64.qcow2.xz
3.2.2.4 Mounting the QCOW2 Image and Copying Files
- Run the
modprobe nbd maxpart=<any_number>
command to load the nbd module. - Run
qemu-nbd -c /dev/nbd0 <VM_image_path>
. - Create an arbitrary folder /random_dir.
- Run
mount /dev/nbd0p2 /random_dir
. - Copy files.
mkdir /another_rootfs
cp -r /random_dir/* /another_rootfs/
The VM image has been mounted to /another_rootfs.
3.2.2.5 Unmounting the QCOW2 Image
After the rootfs is prepared, run the following commands to unmount the QCOW2 image:
umount /random_dir
qemu-nbd -d /dev/nbd0
3.3 Starting qtfs_server on the Host
Create the folder required by the container management plane, insert qtfs_server.ko, and start the engine process.
You can run the following script to perform these operations. If an error occurs during the execution, you may need to convert the format of the script using dos2unix (the same applies to all the following scripts). In the last two lines, replace the paths of qtfs_server.ko and engine with the actual paths.
#!/bin/bash
mkdir /var/lib/libvirt
insmod <ko_path>/qtfs_server.ko qtfs_server_ip=0.0.0.0 qtfs_log_level=INFO # Replace with the actual path.
<engine_path>/engine 4096 16 # Replace with the actual path.
3.4 Deploying the udsproxyd Service
3.4.1 Introduction
udsproxyd is a cross-host UDS proxy service, which needs to be deployed on both the host and DPU. The udsproxyd components on the host and dpu are peers. They implement seamless UDS communication between the host and DPU, which means that if two processes can communicate with each other through UDS on the same host, they can do the same between the host and DPU. The code of the processes does not need to be modified, only that the client process needs to run with the LD_PRELOAD=libudsproxy.so environment variable.
3.4.2 Deploying udsproxyd
Build udsproxyd in the dpu-utilities project:
cd qtfs/ipc
make && make install
The engine service on the qtfs server has incorporated the udsproxyd feature. You do not need to manually start udsproxyd if the qtfs server is deployed. However, you need to start udsproxyd on the client by running the following command:
nohup /usr/bin/udsproxyd <thread num> <addr> <port> <peer addr> <peer port> 2>&1 &
Parameters:
thread num: number of threads. Currently, only one thread is supported.
addr: IP address of the host.
port:Port used on the host.
peer addr: IP address of the udsproxyd peer
peer port: port used on the udsproxyd peer
Example:
nohup /usr/bin/udsproxyd 1 192.168.10.10 12121 192.168.10.11 12121 2>&1 &
If the qtfs engine service is not started, you can start udsproxyd on the server to test udsproxyd separately. Run the following command:
nohup /usr/bin/udsproxyd 1 192.168.10.11 12121 192.168.10.10 12121 2>&1 &
Then, copy libudsproxy.so, which will be used by the libvirtd service, to the /usr/lib64 directory in the changed root directory of libvirt.
3.5 Mounting the Dependent Directories on the Host to the DPU
3.5.1 Installing the Software Package
3.5.2.1 Installing to the Root Directory
- Install libvirt-client in /another_rootfs.
yum install libvirt-client
3.5.2.2 Configuring the another_rootfs Environment
Install libvirtd in /another_rootfs.
cd /another_rootfs tar -xf <path_to>/libvirtd-6.9.0.tar.xz # Replace with the actual path to libvirtd-6.9.0.tar.xz. cd libvirtd-6.9.0 patch -p1 < 0001-libvirt_6.9.0_1201_offload.patch # Replace with the actual path to the libvirt patches to apply the patches in sequence. patch -p1 < 0003-fix-get-affinity.patch # Replace with the actual path to the libvirt patches to apply the patches in sequence. patch -p1 < 0004-qmp-port-manage.patch # Replace with the actual path to the libvirt patches to apply the patches in sequence. chroot /another_rootfs yum groupinstall "Development tools" -y yum install -y vim meson qemu qemu-img strace edk2-aarch64 tar yum install -y rpcgen python3-docutils glib2-devel gnutls-devel libxml2-devel libpciaccess-devel libtirpc-devel yajl-devel systemd-devel dmidecode glusterfs-api numactl cd /libvirtd-6.9.0 CFLAGS='-Wno-error=format -Wno-error=int-conversion -Wno-error=implicit-function-declaration -Wno-error=nested-externs -Wno-error=declaration-after-statement -Wno-error=unused-result -Wno-error=missing-prototypes -Wno-error=int-conversion -Wno-error=unused-parameter -Wno-error=unused-variable -Wno-error=pointer-sign -Wno-error=discarded-qualifiers -Wno-error=unused-function' meson build --prefix=/usr -Ddriver_remote=enabled -Ddriver_network=enabled -Ddriver_qemu=enabled -Dtests=disabled -Ddocs=enabled -Ddriver_libxl=disabled -Ddriver_esx=disabled -Dsecdriver_selinux=disabled -Dselinux=disabled ninja -C build install exit
Copy rexec to /another_rootfs/usr/bin and grant it the execute permission.
cp rexec /another_rootfs/usr/bin chmod +x /another_rootfs/usr/bin/rexec
In /another_rootfs, run the following script to create /usr/bin/qemu-kvm and /usr/libexec/qemu-kvm. Before running the script, replace <host_ip> and <rexec_server_port> with the host IP address and rexec_server port number on the host, respectively.
$ chroot /another_rootfs $ touch /usr/bin/qemu-kvm $ touch /usr/libexec/qemu-kvm $ cat > /usr/bin/qemu-kvm <<EOF #!/bin/bash host=<host_ip> port=<rexec_server_port> CMD_NET_ADDR=tcp://\$host:\$port exec /usr/bin/rexec /usr/bin/qemu-kvm \$* EOF cat > /usr/libexec/qemu-kvm <<EOF #!/bin/bash host=<host_ip> port=<rexec_server_port> CMD_NET_ADDR=tcp://\$host:\$port exec /usr/bin/rexec /usr/bin/qemu-kvm \$* EOF $ chmod +x /usr/libexec/qemu-kvm $ chmod +x /usr/bin/qemu-kvm $ exit
3.5.2.3 Mounting Directories
Run the following script on the DPU to mount the host directories required by libvirtd to the DPU.
Ensure that the remote directories that will be mounted in the following script (prepare.sh) exist on both the host and DPU.
#!/bin/bash
insmod <qtfs.ko_path>/qtfs.ko qtfs_server_ip=<server_ip_address> qtfs_log_level=INFO # Change <qtfs.ko_path> and <server_ip_address>.
systemctl stop libvirtd
mkdir -p /var/run/rexec/pids
cat >/var/run/rexec/qmpport << EOF
<qmp_port_number>
EOF
cat > /var/run/rexec/hostaddr <<EOF
<server_ip_address>
EOF
cat > /var/run/rexec/rexecport << EOF
<rexec_port_number>
EOF
rm -f `find /var/run/libvirt/ -name "*.pid"`
rm -f /var/run/libvirtd.pid
if [ ! -d "/another_rootfs/local_proc" ]; then
mkdir -p /another_rootfs/local_proc
fi
if [ ! -d "/another_rootfs/local" ]; then
mkdir -p /another_rootfs/local
fi
mount -t proc proc /another_rootfs/local_proc/
mount -t proc proc /another_rootfs/local/proc
mount -t sysfs sysfs /another_rootfs/local/sys
mount --bind /var/run/ /another_rootfs/var/run/
mount --bind /var/lib/ /another_rootfs/var/lib/
mount --bind /var/cache/ /another_rootfs/var/cache
mount --bind /etc /another_rootfs/etc
mkdir -p /another_rootfs/home/VMs/
mount -t qtfs /home/VMs/ /another_rootfs/home/VMs/
mount -t qtfs /var/lib/libvirt /another_rootfs/var/lib/libvirt
mount -t devtmpfs devtmpfs /another_rootfs/dev/
mount -t hugetlbfs hugetlbfs /another_rootfs/dev/hugepages/
mount -t mqueue mqueue /another_rootfs/dev/mqueue/
mount -t tmpfs tmpfs /another_rootfs/dev/shm
mount -t sysfs sysfs /another_rootfs/sys
mkdir -p /another_rootfs/sys/fs/cgroup
mount -t tmpfs tmpfs /another_rootfs/sys/fs/cgroup
list="perf_event freezer files net_cls,net_prio hugetlb pids rdma cpu,cpuacct memory devices blkio cpuset"
for i in $list
do
echo $i
mkdir -p /another_rootfs/sys/fs/cgroup/$i
mount -t cgroup cgroup -o rw,nosuid,nodev,noexec,relatime,$i /another_rootfs/sys/fs/cgroup/$i
done
# common system dir
mount -t qtfs -o proc /proc /another_rootfs/proc
echo "proc"
mount -t qtfs /sys /another_rootfs/sys
echo "cgroup"
mount -t qtfs /dev/pts /another_rootfs/dev/pts
mount -t qtfs /dev/vfio /another_rootfs/dev/vfio
3.6 Starting libvirtd
On the DPU, open a session and change the root directory to /another_rootfs.
chroot /another_rootfs
Run the following commands to start virtlogd and libvirtd:
#!/bin/bash
virtlogd -d
libvirtd -d
Because /var/run/ has been bound to /another_rootfs/var/run/, you can use virsh
to access libvirtd in a common rootfs to manage containers.
4 Environment Restoration
To unmount related directories, run the following commands:
#!/bin/bash
umount /root/p2/dev/hugepages
umount /root/p2/etc
umount /root/p2/home/VMs
umount /root/p2/local_proc
umount /root/p2/var/lib/libvirt
umount /root/p2/var/lib
umount /root/p2/*
umount /root/p2/dev/pts
umount /root/p2/dev/mqueue
umount /root/p2/dev/shm
umount /root/p2/dev/vfio
umount /root/p2/dev
rmmod qtfs
umount /root/p2/sys/fs/cgroup/*
umount /root/p2/sys/fs/cgroup
umount /root/p2/sys