Imperceptible Container Management Plane Offload Deployment Guide
NOTE:
In this user guide, modifications are performed to the container management plane components and the rexec tool of a specific version. You can modify other versions based on the actual execution environment. The patch provided in this document is for verification only and is not for commercial use. NOTE:
The communication between shared file systems is implemented through the network. You can perform a simulated offload using two physical machines or VMs connected through the network.
Before the verification, you are advised to set up a Kubernetes cluster and container running environment that can be used properly and offload the management plane process of a single node. You can use a physical machine or VM that is connected to the network as an emulated DPU.
Introduction
Container management plane, that is, management tools of containers such as Kubernetes, dockerd, containerd, and isulad. Container management plane offload is to offload the container management plane from the host where the container is located to another host, that is, the DPU, a set of hardware that has an independent running environment.
By mounting directories related to container running on the host to the DPU through qtfs, the container management plane tool running on the DPU can access these directories and prepare the running environment for the containers running on the host. To remotely mount the special file systems such as proc and sys, a dedicated rootfs is created as the running environment of Kubernetes and dockerd (referred to as /another_rootfs).
In addition, rexec is used to start and delete containers so that the container management plane and containers can run on two different hosts for remote container management.
Related Component Patches
rexec
rexec is a remote execution tool written in the Go language based on the rexec example tool of Docker/libchan. rexec is used to remotely invoke binary files. For ease of use, capabilities such as transferring environment variables and monitoring the exit of original processes are added to rexec.
To use the rexec tool, run the CMD_NET_ADDR=tcp://0.0.0.0:<port_number> rexec_server
command on the server to start the rexec service process, and then run the CMD_NET_ADDR=tcp://<server_IP_address>:<port_number> rexec [command]
on the client`. This instructs rexec_server to execute the command.
dockerd
The changes to dockerd are based on version 18.09.
In containerd, the part that invokes libnetwork-setkey through hook is commented out. This does not affect container startup. In addition, to ensure the normal use of docker load
, an error in the mount
function in mounter_linux.go is commented out.
In the running environment of the container management plane, /proc is mounted to the proc file system on the server, and the local proc file system is mounted to /local_proc. In dockerd and containerd, /proc is changed to /local_proc for accessing /proc/self/xxx, /proc/getpid()/xxx, or related file systems.
containerd
The changes to containerd are based on containerd-1.2-rc.1.
When obtaining mounting information, /proc/self/mountinfo can obtain only the local mounting information of dockerd but cannot obtain that on the server. Therefore, /proc/self/mountinfo is changed to /proc/1/mountinfo to obtain the mounting information on the server by obtaining the mounting information of process 1 on the server.
In containerd-shim, the Unix socket that communicates with containerd is changed to TCP. containerd obtains the IP address of the running environment of containerd-shim through the SHIM_HOST environment variable, that is, the IP address of the server. The has value of shim is used to generate a port number, which is used as the communication port to start containerd-shim.
In addition, the original method of sending signals to containerd-shim is changed to the method of remotely invoking the kill
command to send signals to shim, ensuring that Docker can correctly kill containers.
Kubernetes
kubelet is not modified. The container QoS manager may fail to be configured for the first time. This error does not affect the subsequent pod startup process.
Container Management Plane Offload Operation Guide
Start rexec_server on both the server and client. rexec_server on the server is used to invoke rexec to stat containerd-shim. rexec_server on the client is used to execute invoking of dockerd and containerd by containerd-shim.
Server
Create a folder required by the container management plane, insert qtfs_server.ko, and start the engine process.
In addition, you need to create the rexec script /usr/bin/dockerd on the server.
#!/bin/bash
CMD_NET_ADDR=tcp://<client_IP_address>:<rexec_port_number> rexec /usr/bin/dockerd $*
Client
Prepare a rootfs as the running environment of dockerd and containerd. Use the following script to mount the server directories required by dockerd and containerd to the client. Ensure that the remote directories mounted in the script exist on both the server and client.
#!/bin/bash
mkdir -p /another_rootfs/var/run/docker/containerd
iptables -t nat -N DOCKER
echo "---------insmod qtfs ko----------"
insmod /YOUR/QTFS/PATH/qtfs.ko qtfs_server_ip=<server_IP_address> qtfs_log_level=INFO
# The proc file system in the chroot environment is replaced by the proc shared file system of the DPU. The actual proc file system of the local host needs to be mounted to **/local_proc**.
mount -t proc proc /another_rootfs/local_proc/
# Bind the chroot internal environment to the external environment to facilitate configuration and running.
mount --bind /var/run/ /another_rootfs/var/run/
mount --bind /var/lib/ /another_rootfs/var/lib/
mount --bind /etc /another_rootfs/etc
mkdir -p /another_rootfs/var/lib/isulad
# Create and mount the dev, sys, and cgroup file systems in the chroot environment.
mount -t devtmpfs devtmpfs /another_rootfs/dev/
mount -t sysfs sysfs /another_rootfs/sys
mkdir -p /another_rootfs/sys/fs/cgroup
mount -t tmpfs tmpfs /another_rootfs/sys/fs/cgroup
list="perf_event freezer files net_cls,net_prio hugetlb pids rdma cpu,cpuacct memory devices blkio cpuset"
for i in $list
do
echo $i
mkdir -p /another_rootfs/sys/fs/cgroup/$i
mount -t cgroup cgroup -o rw,nosuid,nodev,noexec,relatime,$i /another_rootfs/sys/fs/cgroup/$i
done
## common system dir
mount -t qtfs -o proc /proc /another_rootfs/proc
echo "proc"
mount -t qtfs /sys /another_rootfs/sys
echo "cgroup"
# Mount the shared directory required by the container management plane.
mount -t qtfs /var/lib/docker/containers /another_rootfs/var/lib/docker/containers
mount -t qtfs /var/lib/docker/containerd /another_rootfs/var/lib/docker/containerd
mount -t qtfs /var/lib/docker/overlay2 /another_rootfs/var/lib/docker/overlay2
mount -t qtfs /var/lib/docker/image /another_rootfs/var/lib/docker/image
mount -t qtfs /var/lib/docker/tmp /another_rootfs/var/lib/docker/tmp
mkdir -p /another_rootfs/run/containerd/io.containerd.runtime.v1.linux/
mount -t qtfs /run/containerd/io.containerd.runtime.v1.linux/ /another_rootfs/run/containerd/io.containerd.runtime.v1.linux/
mkdir -p /another_rootfs/var/run/docker/containerd
mount -t qtfs /var/run/docker/containerd /another_rootfs/var/run/docker/containerd
mount -t qtfs /var/lib/kubelet/pods /another_rootfs/var/lib/kubelet/pods
In**/another_rootfs**, create the following script to support cross-host operations:
- /another_rootfs/usr/local/bin/containerd-shim
#!/bin/bash
CMD_NET_ADDR=tcp://<server_IP_address>:<rexec_port_number> /usr/bin/rexec /usr/bin/containerd-shim $*
- /another_rootfs/usr/local/bin/remote_kill
#!/bin/bash
CMD_NET_ADDR=tcp://<server_IP_address>:<rexec_port_number> /usr/bin/rexec /usr/bin/kill $*
- /another_rootfs/usr/sbin/modprobe
#!/bin/bash
CMD_NET_ADDR=tcp://<server_IP_address>:<rexec_port_number> /usr/bin/rexec /usr/sbin/modprobe $*
After changing the root directories of dockerd and containerd to the required rootfs, run the following command to start dockerd and containerd:
- containerd
#!/bin/bash
SHIM_HOST=<server_IP_address> containerd --config /var/run/docker/containerd/containerd.toml --address /var/run/containerd/containerd.sock
- dockerd
#!/bin/bash
SHIM_HOST=<server_IP_address>CMD_NET_ADDR=tcp://<server_IP_address>:<rexec_port_number> /usr/bin/dockerd --containerd /var/run/containerd/containerd.sock
- kubelet
Use the original parameters to start kubelet in the chroot environment.
Because /var/run/ is bound to /another_rootfs/var/run/, you can use Docker to access the docker.sock interface for container management in the regular rootfs.
The container management plane is offloaded to the DPU. You can run docker
commands to create and delete containers, or use kubectl
on the current node to schedule and destroy pods. The actual container service process runs on the host.
NOTE:
This guide describes only the container management plane offload. The offload of container network and data volumes requires additional offload capabilities, which are not included. You can perform cross-node startup of containers that are not configured with network and storage by referring to this guide.