1 Imperceptible Container Management Plane Offload
1.1 Overview
The container management plane refers to container management tools such as dockerd, containerd, and isulad. Container management plane offload means to offload the container management plane from the HOST where the container is located to the DPU.
We use qtfs to mount some directories related to container running on the HOST to the DPU so that the container management plane can access these directories to prepare the environment required for container running. In addition, the remote proc and sys file systems need to be mounted. To avoid impact on the current system, you can create a dedicated rootfs (referred to as /another_rootfs) as the running environment of dockerd and containerd.
The rexec
command is used to start or delete containers, allowing the management plane and containers to be separated on different machines for remote management. You can use either of the following modes to verify the offload.
1.1.1 Test Mode
Prepare two physical machines or VMs that can communicate with each other.
One physical machine functions as the DPU, and the other functions as the host. In this document, DPU and HOST refer to the two physical machines.
NOTE: In the test mode, network ports are exposed without connection authentication, which is risky and should be used only for internal tests and verification. Do not use this mode in the production environment. In the production environment, use closed communication to prevent external connections, such as the vsock mode.
1.1.2 vsock Mode
The DPU and HOST are required and must be able to provide vsock communication through virtio.
This document describes only the test mode usage. If vsock communication is supported in the test environment (virtual environment or DPU-HOST environment that supports vsock), the following test procedure is applicable, except that you need to change the IP addresses to the vsock CIDs (TEST_MODE is not required for binary file compilation).
2 Environment Setup
2.1 qtfs File System Deployment
For details, see qtfs.
NOTE: If the test mode is used, set qtfs_TEST_MODE to 1 when compiling the .ko files on the qtfs client and server. If the vsock mode is used, you do not need to set qtfs_TEST_MODE.
To establish a qtfs connection, you need to disable the firewall between the DPU and HOST, or open related network ports on the firewall.
2.2 udsproxyd Service Deployment
2.2.1 Overview
udsproxyd is a cross-host Unix domain socket (UDS) proxy service and needs to be deployed on the HOST and DPU. The udsproxyd components are in a peer relationship. Their respective processes on the host and DPU can communicate with each other transparently using the standard UDSs. That is, if the two processes communicate with each other through UDSs on the same host, they can also communicate with each other between the HOST and DPU without the need for modifying code. As a cross-host Unix socket service, udsproxyd can be used by running with LD_PRELOAD=libudsproxy.so
or configuring the udsconnect allowlist in advance. The methods for configuring the allowlist are described later.
2.2.2 Deploying udsproxyd
Compile udsproxyd in the dpu-utilities project.
cd qtfs/ipc
make -j UDS_TEST_MODE=1 && make install
NOTE: If the vsock mode is used, you do not need to set UDS_TEST_MODE during compilation.
The latest engine service on the qtfs server has integrated the udsproxyd capability. Therefore, you do not need to start the udsproxyd service on the server. Start the udsproxyd service on the client.
nohup /usr/bin/udsproxyd <thread num> <addr> <port> <peer addr> <peer port> 2>&1 &
Parameters:
thread num: number of threads. Currently, only one thread is supported.
addr: IP address of the host. If the vsock communication mode is used, the value is the CID.
port: Port used on the host.
peer addr: IP address of the udsproxyd peer. If the vsock communication mode is used, the value is the CID.
peer port: port used on the udsproxyd peer.
Example:
nohup /usr/bin/udsproxyd 1 192.168.10.10 12121 192.168.10.11 12121 2>&1 &
If the qtfs engine service is not started and you want to test udsproxyd separately, start udsproxyd on the server.
nohup /usr/bin/udsproxyd 1 192.168.10.11 12121 192.168.10.10 12121 2>&1 &
2.2.3 Using udsproxyd
2.2.3.1 Using the udsproxyd Service Independently
When starting the client process of the Unix socket application that uses the UDS service, add the LD_PRELOAD=libudsproxy.so environment variable to intercept the connect API of glibc for UDS interconnection. Alternatively, run the qtcfg
command to configure the udsconnect allowlist to instruct the system to take over UDS connections in specified directories.
2.2.3.2 Using the udsproxyd Service Transparently
Configure the allowlist of the UDS service for qtfs. The socket file bound to the Unix socket server needs to be added to the allowlist. You can use either of the following methods:
- Load the allowlist by using the
qtcfg
utility. First compile the utility in qtfs/qtinfo.
Run the following command on the qtfs client:
make role=client
make install
Run the following command on the qtfs server:
make role=server
make install
After qtcfg
is installed automatically, run qtcfg
to configure the allowlist. Assume that /var/lib/docker needs to be added to the allowlist:
qtcfg -w udsconnect -x /var/lib/docker
Query the allowlist:
qtcfg -w udsconnect -z
Delete an allowlist entry:
qtcfg -w udsconnect -y 0
The parameter is the index number listed when you query the allowlist.
- Add an allowlist entry through the configuration file. The configuration file needs to be set before the qtfs or qtfs_server kernel module is loaded. The allowlist is loaded when the kernel modules are initialized.
Add the following content to the /etc/qtfs/whitelist file.
[Udsconnect]
/var/lib/docker
NOTE: The allowlist prevents irrelevant Unix sockets from establishing remote connections, causing errors or wasting resources. You are advised to set the allowlist as precisely as possible. For example, in this document, /var/lib/docker is set in the container scenario. It would be risky to directly add /var/lib, /var, or the root directory.
2.3 rexec Service Deployment
2.3.1 Overview
rexec is a remote execution component developed using the C language. It consists of the rexec client and rexec server. The server is a daemon process, and the client is a binary file. After being started, the client establishes a UDS connection with the server using the udsproxyd service, and the server daemon process starts a specified program on the server machine. During container management plane offload, dockerd is offloaded to the DPU. When dockerd needs to start a service container process on the HOST, the rexec client is invoked to remotely start the process.
2.3.2 Deploying rexec
2.3.2.1 Configuring the Environment Variables and Allowlist
Configure the rexec server allowlist on the host. Put the whitelist file in the /etc/rexec directory, and change the file permission to read-only.
chmod 400 /etc/rexec/whitelist
After downloading the dpu-utilities code, go to the qtfs/rexec directory and run make && make install
to install all binary files required by rexec (rexec and rexec_server) to the /usr/bin directory.
Before starting the rexec_server service on the server, check whether the /var/run/rexec directory exists. If not, create it.
mkdir /var/run/rexec
The underlying communication of the rexec service uses Unix sockets. Therefore, cross-host communication between rexec and rexec_server depends on the udsproxyd service, and the related files need to be added to the udsproxy allowlist.
qtcfg -w udsconnect -x /var/run/rexec
2.3.2.2 Starting the Service
You can start the rexec_server service on the server in either of the following ways.
- Method 1: Configure rexec as a systemd service.
Add the rexec.service file to /usr/lib/systemd/system.
Then, use systemctl
to manage the rexec service.
Start the service for the first time:
systemctl daemon-reload
systemctl enable --now rexec
Restart the service:
systemctl stop rexec
systemctl start rexec
- Method 2: Manually start the service in the background.
nohup /usr/bin/rexec_server 2>&1 &
3 Changes to Management Plane Components
3.1 dockerd
The changes to dockerd are based on version 18.09.
For details about the changes to Docker, see the patch file in this directory.
3.2 containerd
The changes to containerd are based on containerd-1.2-rc.1.
For details about the changes to containerd, see the patch file in this directory.
4 Container Management Plane Offload Guide
NOTE:
- Start rexec_server on both the HOST and DPU.
- rexec_server on the HOST is used to start containerd-shim by using rexec when the DPU creates a container.
- rexec_server on the DPU is used to execute the call to dockerd and containerd by containerd-shim.
4.1 Preparing the Rootfs for Running dockerd and containerd
Note: Perform this step only on the DPU.
In the following document, the rootfs is called /another_rootfs (the directory name can be changed as required).
4.1.1 Using the Official openEuler QCOW2 Image
You are advised to use the QCOW2 image provided by openEuler to prepare the new rootfs.
4.1.1.1 Installing the Tools
Use yum
to install xz, kpartx, and qemu-img.
yum install xz kpartx qemu-img
4.1.1.2 Downloading the QCOW2 Image
Download the openEuler 22.03 LTS VM image for x86 or openEuler 22.03 LTS VM image for Arm64 from the openEuler official website.
4.1.1.3 Decompressing the QCOW2 Image
Run xz -d
to decompress the package and obtain the openEuler-22.03-LTS-<arch>.qcow2 file. The following uses the x86 image as an example.
xz -d openEuler-22.03-LTS-x86_64.qcow2.xz
4.1.1.4 Mounting the QCOW2 Image and Copying Files
- Run the
modprobe nbd maxpart=<any number>
command to load the nbd module. qemu-nbd -c /dev/nbd0 <VM image path>
- Create a folder, for example,
/random_dir
. mount /dev/nbd0p2 /random_dir
- Copy the files.
mkdir /another_rootfs
cp -r /random_dir/* /another_rootfs/
The VM image has been mounted to /another_rootfs.
4.1.1.5 Unmounting QCOW2
After the rootfs is prepared, run the following command to umount the QCOW2 file:
umount /random_dir
qemu-nbd -d /dev/nbd0
4.1.2 Installing Software in /another_rootfs
- Copy /etc/resolv.conf from the root directory to /another_rootfs/etc/resolv.conf.
- Remove the files in /another_rootfs/etc/yum.repos.d and copy the files in /etc/yum.repos.d/ to /another_rootfs/etc/yum.repos.d.
- Run
yum install <software package> --installroot=/another_rootfs
to install a software package.
yum install --installroot=/another_rootfs iptables
4.2 Starting qtfs_server on the HOST
Copy rexec, containerd-shim, runc, and engine to the /usr/bin directory. Pay attention to the permissions. rexec and engine have been provided. Compile Docker binary files based on the patch described in 3 Changes to Management Plane Components.
4.2.1 Inserting the qtfs_server Module
Create a folder required by the container management plane, insert qtfs_server.ko, and start the engine process.
You can run this script to perform this operation. If an error occurs during the execution, try using dos2unix to convert the format of the script (the same applies to all the following scripts).
Replace the paths of the module and binary file in the script with the actual qtfs path.
In addition, create the /usr/bin/dockerd and /usr/bin/containerd scripts for executing the rexec
command on the HOST.
/usr/bin/dockerd:
#!/bin/bash
rexec /usr/bin/dockerd $*
/usr/bin/containerd:
#!/bin/bash
exec /usr/bin/containerd $*
After the two scripts are created, run the chmod
command to grant execute permission on them.
chmod +x /usr/bin/containerd
chmod +x /usr/bin/dockerd
4.3 Mounting the Dependency Directories on the HOST to the DPU
4.3.1 Installing the Software Packages
4.3.2.1 Installing in the Root Directory
In the DPU root directory (not /another_rootfs), install iptables, libtool, libcgroup, and tar using yum
.
yum install iptables libtool libcgroup tar
You can also download all dependency packages and run the rpm
command to install them. The iptables and libtool packages and their dependency packages are: iptables, libtool, emacs, autoconf, automake, libtool-ltdl, m4 and tar, libcgroup.
After downloading the preceding software packages, run the following command:
rpm -ivh iptables-1.8.7-5.oe2203.x86_64.rpm libtool-2.4.6-34.oe2203.x86_64.rpm emacs-27.2-3.oe2203.x86_64.rpm autoconf-2.71-2.oe2203.noarch.rpm automake-1.16.5-3.oe2203.noarch.rpm libtool-ltdl-2.4.6-34.oe2203.x86_64.rpm m4-1.4.19-2.oe2203.x86_64.rpm tar-1.34-1.oe2203.x86_64.rpm libcgroup-0.42.2-1.oe2203.x86_64.rpm
4.3.2.2 Configuring the /another_rootfs Environment
Install iptables in /another_rootfs, which is mandatory for dockerd startup.
Run
yum install <software package > --installroot=/another_rootfs
to install.Copy rexec to /another_rootfs/usr/bin and grant execute permission.
cp rexec /another_rootfs/usr/bin chmod +x /another_rootfs/usr/bin/rexec
In addition, copy containerd and dockerd compiled based on the community Docker source code and the preceding patch to /another_rootfs/usr/bin, and copy docker to /usr/bin.
cp {YOUR_PATH}/dockerd /another_rootfs/usr/bin cp {YOUR_PATH}/containerd /another_rootfs/usr/bin cp {YOUR_PATH}/docker /usr/bin
Delete /another_rootfs/usr/sbin/modprobe from /another_rootfs.
rm -f /another_rootfs/usr/sbin/modprobe
Create the following scripts in /another_rootfs:
/another_rootfs/usr/local/bin/containerd-shim:
#!/bin/bash /usr/bin/rexec /usr/bin/containerd-shim $*
/another_rootfs/usr/bin/remote_kill:
#!/bin/bash /usr/bin/rexec /usr/bin/kill $*
/another_rootfs/usr/sbin/modprobe:
#!/bin/bash /usr/bin/rexec /usr/sbin/modprobe $*
After the creation is complete, grant execute permission to them.
chmod +x /another_rootfs/usr/local/bin/containerd-shim chmod +x /another_rootfs/usr/bin/remote_kill chmod +x /another_rootfs/usr/sbin/modprobe
4.3.2.3 Mounting Directories
Run the prepare.sh script on the DPU to mount the HOST directories required by dockerd and containerd to the DPU.
In addition, ensure that the remote directories mounted by the script exist on both the HOST and DPU.
4.4 dockerd and containerd Startup
On the DPU, open two sessions and chroot them to the /another_rootfs required for running dockerd and containerd.
chroot /another_rootfs
Run the following commands in the two sessions to start containerd and then dockerd:
containerd
#!/bin/bash
SHIM_HOST=${YOUR_SERVER_IP} containerd --config /var/run/docker/containerd/containerd.toml --address /var/run/containerd/containerd.sock
dockerd
#!/bin/bash
# this need to be done once
/usr/bin/rexec mount -t qtfs /var/lib/docker/overlay2 /another_rootfs/var/lib/docker/overlay2/
SHIM_HOST=${YOUR_SERVER_IP} /usr/bin/dockerd --containerd /var/run/containerd/containerd.sock -s overlay2 --iptables=false --debug 2>&1 | tee docker.log
Because /var/run/ and /another_rootfs/var/run/ have been bind-mounted, you can access the docker.sock interface through Docker in a normal rootfs to manage containers.
5 Environment Restoration
To unmount related directories, delete the existing containers, stop containerd and dockerd, and run the following commands:
for i in `lsof | grep v1.linux | awk '{print $2}'`
do
kill -9 $i
done
mount | grep qtfs | awk '{print $3}' | xargs umount
mount | grep another_rootfs | awk '{print $3}' | xargs umount
sleep 1
umount /another_rootfs/etc
umount /another_rootfs/sys
pkill udsproxyd
rmmod qtfs