Gazelle User Guide
Overview
Gazelle is a high-performance user-mode protocol stack. It directly reads and writes NIC packets in user mode based on the Data Plane Development Kit (DPDK) and transmit the packets through shared hugepage memory, and uses the LwIP protocol stack. Gazelle greatly improves the network I/O throughput of applications and accelerates the network for the databases, such as MySQL and Redis.
- High Performance Zero-copy and lock-free packets that can be flexibly scaled out and scheduled adaptively.
- Universality Compatible with POSIX without modification, and applicable to different types of applications.
In the single-process scenario where the NIC supports multiple queues, use liblstack.so only to shorten the packet path. In other scenarios, use the ltran process to distribute packets to each thread.
Installation
Configure the Yum source of openEuler and run theyum
command to install Gazelle.
yum install dpdk
yum install libconfig
yum install numactl
yum install libboundscheck
yum install libpcap
yum install gazelle
NOTE: The version of dpdk must be 21.11-2 or later.
How to Use
To configure the operating environment and use Gazelle to accelerate applications, perform the following steps:
1. Installing the .ko File as the root User
Install the .ko files based on the site requirements to enable the virtual network ports and bind NICs to the user-mode driver. To enable the virtual network port function, use rte_kni.ko.
modprobe rte_kni carrier="on"
Configure NetworkManager not to manage the KNI NIC.
[root@localhost ~]# cat /etc/NetworkManager/conf.d/99-unmanaged-devices.conf
[keyfile]
unmanaged-devices=interface-name:kni
[root@localhost ~]# systemctl reload NetworkManager
Bind the NIC from the kernel driver to the user-mode driver. Choose one of the following .ko files based on the site requirements.
#If the IOMMU is available
modprobe vfio-pci
#If the IOMMU is not available and the VFIO supports the no-IOMMU mode
modprobe vfio enable_unsafe_noiommu_mode=1
modprobe vfio-pci
#Other cases
modprobe igb_uio
NOTE: You can check whether the IOMMU is enabled based on the BIOS configuration.
2. Binding the NIC Using DPDK
Bind the NIC to the driver selected in Step 1 to provide an interface for the user-mode NIC driver to access the NIC resources.
#Using vfio-pci
dpdk-devbind -b vfio-pci enp3s0
#Using igb_uio
dpdk-devbind -b igb_uio enp3s0
3. Configuring Memory Huge Pages
Gazelle uses hugepage memory to improve efficiency. You can configure any size for the memory huge pages reserved by the system using the root permissions. Each memory huge page requires a file descriptor. If the memory is large, you are advised to use 1 GB huge pages to avoid occupying too many file descriptors. Select a page size based on the site requirements and configure sufficient memory huge pages. Run the following commands to configure huge pages:
#Configuring 1024 2 MB huge pages on node0. The total memory is 2 GB.
echo 1024 > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages
#Configuring 5 1 GB huge pages on node0. The total memory is 5 GB.
echo 5 > /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages
NOTE: Run the cat command to query the actual number of reserved pages. If the continuous memory is insufficient, the number may be less than expected.
4. Mounting Memory Huge Pages
Create two directories for the lstack and ltran processes to access the memory huge pages. Run the following commands:
mkdir -p /mnt/hugepages-ltran
mkdir -p /mnt/hugepages-lstack
chmod -R 700 /mnt/hugepages-ltran
chmod -R 700 /mnt/hugepages-lstack
mount -t hugetlbfs nodev /mnt/hugepages-ltran -o pagesize=2M
mount -t hugetlbfs nodev /mnt/hugepages-lstack -o pagesize=2M
NOTE: The huge pages mounted to /mnt/hugepages-ltran and /mnt/hugepages-lstack must be in the same page size.
5. Enabling Gazelle for an Application
Enable Gazelle for an application using either of the following methods as required.
- Recompile the application and replace the sockets interface.
#Add the Makefile of Gazelle to the application makefile.
-include /etc/gazelle/lstack.Makefile
#Add the LSTACK_LIBS variable when compiling the source code.
gcc test.c -o test ${LSTACK_LIBS}
- Use the LD_PRELOAD environment variable to load the Gazelle library. Use the GAZELLE_BIND_PROCNAME environment variable to specify the process name, and LD_PRELOAD to specify the Gazelle library path.
GAZELLE_BIND_PROCNAME=test LD_PRELOAD=/usr/lib64/liblstack.so ./test
6. Configuring Gazelle
- The lstack.conf file is used to specify the startup parameters of lstack. The default path is /etc/gazelle/lstack.conf. The parameters in the configuration file are as follows:
Options | Value | Remarks |
---|---|---|
dpdk_args | --socket-mem (mandatory) --huge-dir (mandatory) --proc-type (mandatory) --legacy-mem --map-perfect -d | DPDK initialization parameter. For details, see the DPDK description. --map-perfect is an extended feature. It is used to prevent the DPDK from occupying excessive address space and ensure that extra address space is available for lstack. The -d option is used to load the specified .so library file. |
listen_shadow | 0/1 | Whether to use the shadow file descriptor for listening. This function is enabled when there is a single listen thread and multiple protocol stack threads. |
use_ltran | 0/1 | Whether to use ltran. |
num_cpus | "0,2,4 ..." | IDs of the CPUs bound to the lstack threads. The number of IDs is the number of lstack threads (less than or equal to the number of NIC queues). You can select CPUs by NUMA nodes. |
low_power_mode | 0/1 | Whether to enable the low-power mode. This parameter is not supported currently. |
kni_switch | 0/1 | Whether to enable the rte_kni module. The default value is 0. This module can be enabled only when ltran is not used. |
unix_prefix | "string" | Prefix string of the Unix socket file used for communication between Gazelle processes. By default, this parameter is left blank. The value must be the same as the value of unix_prefix in ltran.conf of the ltran process that participates in communication, or the value of the -u option for gazellectl . The value cannot contain special characters and can contain a maximum of 128 characters. |
host_addr | "192.168.xx.xx" | IP address of the protocol stack, which is also the IP address of the application. |
mask_addr | "255.255.xx.xx" | Subnet mask. |
gateway_addr | "192.168.xx.1" | Gateway address. |
devices | "aa:bb:cc:dd:ee:ff" | MAC address for NIC communication. The value must be the same as that of bond_macs in the ltran.conf file. |
app_bind_numa | 0/1 | Whether to bind the epoll and poll threads of an application to the NUMA node where the protocol stack is located. The default value is 1, indicating that the threads are bound. |
send_connect_number | 4 | Number of connections for sending packets in each protocol stack loop. The value is a positive integer. |
read_connect_number | 4 | Number of connections for receiving packets in each protocol stack loop. The value is a positive integer. |
rpc_number | 4 | Number of RPC messages processed in each protocol stack loop. The value is a positive integer. |
nic_read_num | 128 | Number of data packets read from the NIC in each protocol stack cycle. The value is a positive integer. |
mbuf_pool_size | 1024000 | Size of the mbuf address pool applied for during initialization. Set this parameter based on the NIC configuration. The value must be a positive integer less than 5120000 and not too small, otherwise the startup fails. |
lstack.conf example:
dpdk_args=["--socket-mem", "2048,0,0,0", "--huge-dir", "/mnt/hugepages-lstack", "--proc-type", "primary", "--legacy-mem", "--map-perfect"]
use_ltran=1
kni_switch=0
low_power_mode=0
num_cpus="2,22"
host_addr="192.168.1.10"
mask_addr="255.255.255.0"
gateway_addr="192.168.1.1"
devices="aa:bb:cc:dd:ee:ff"
send_connect_number=4
read_connect_number=4
rpc_number=4
nic_read_num=128
mbuf_pool_size=1024000
- The ltran.conf file is used to specify ltran startup parameters. The default path is /etc/gazelle/ltran.conf. To enable ltran, set use_ltran=1 in the lstack.conf file. The configuration parameters are as follows:
Options | Value | Remarks |
---|---|---|
forward_kit | "dpdk" | Specified transceiver module of an NIC. This field is reserved and is not used currently. |
forward_kit_args | -l --socket-mem (mandatory) --huge-dir (mandatory) --proc-TYPE (mandatory) --legacy-mem (mandatory) --map-perfect (mandatory) -d | DPDK initialization parameter. For details, see the DPDK description. --map-perfect is an extended feature. It is used to prevent the DPDK from occupying excessive address space and ensure that extra address space is available for lstack. The -d option is used to load the specified .so library file. |
kni_switch | 0/1 | Whether to enable the rte_kni module. The default value is 0. |
unix_prefix | "string" | Prefix string of the Unix socket file used for communication between Gazelle processes. By default, this parameter is left blank. The value must be the same as the value of unix_prefix in lstack.conf of the lstack process that participates in communication, or the value of the -u option for gazellectl . |
dispatch_max_clients | n | Maximum number of clients supported by ltran. The total number of lstack protocol stack threads cannot exceed 32. |
dispatch_subnet | 192.168.xx.xx | Subnet mask, which is the subnet segment of the IP addresses that can be identified by ltran. The value is an example. Set the subnet based on the site requirements. |
dispatch_subnet_length | n | Length of the Subnet that can be identified by ltran. For example, if the value of length is 4, the value ranges from 192.168.1.1 to 192.168.1.16. |
bond_mode | n | Bond mode. Currently, only Active Backup(Mode1) is supported. The value is 1. |
bond_miimon | n | Bond link monitoring time. The unit is millisecond. The value ranges from 1 to 2^64 - 1 - (1000 x 1000). |
bond_ports | "0x01" | DPDK NIC to be used. The value 0x01 indicates the first NIC. |
bond_macs | "aa:bb:cc:dd:ee:ff" | MAC address of the bound NIC, which must be the same as the MAC address of the KNI. |
bond_mtu | n | Maximum transmission unit. The default and maximum value is 1500. The minimum value is 68. |
ltran.conf example:
forward_kit_args="-l 0,1 --socket-mem 1024,0,0,0 --huge-dir /mnt/hugepages-ltran --proc-type primary --legacy-mem --map-perfect --syslog daemon"
forward_kit="dpdk"
kni_switch=0
dispatch_max_clients=30
dispatch_subnet="192.168.1.0"
dispatch_subnet_length=8
bond_mode=1
bond_mtu=1500
bond_miimon=100
bond_macs="aa:bb:cc:dd:ee:ff"
bond_ports="0x1"
tcp_conn_scan_interval=10
7. Starting an Application
- Start the ltran process.
If there is only one process and the NIC supports multiple queues, the NIC multi-queue is used to distribute packets to each thread. You do not need to start the ltran process. Set the value of use_ltran in the lstack.conf file to 0.
If you do not use
--config-file
to specify a configuration file when starting ltran, the default configuration file path /etc/gazelle/ltran.conf is used.
ltran --config-file ./ltran.conf
- Start the application. If the environment variable LSTACK_CONF_PATH is not used to specify the configuration file before the application is started, the default configuration file path /etc/gazelle/lstack.conf is used.
export LSTACK_CONF_PATH=./lstack.conf
LD_PRELOAD=/usr/lib64/liblstack.so GAZELLE_BIND_PROCNAME=redis-server redis-server redis.conf
8. APIs
Gazelle wraps the POSIX interfaces of the application. The code of the application does not need to be modified.
9. Commissioning Commands
- If the ltran mode is not used, the gazellectl ltran xxx and gazellectl lstack show {ip | pid} -r commands are not supported.
Usage: gazellectl [-h | help]
or: gazellectl ltran {quit | show | set} [LTRAN_OPTIONS] [time] [-u UNIX_PREFIX]
or: gazellectl lstack {show | set} {ip | pid} [LSTACK_OPTIONS] [time] [-u UNIX_PREFIX]
quit ltran process exit
where LTRAN_OPTIONS :=
show ltran all statistics
-r, rate show ltran statistics per second
-i, instance show ltran instance register info
-b, burst show ltran NIC packet len per second
-l, latency show ltran latency
set:
loglevel {error | info | debug} set ltran loglevel
where LSTACK_OPTIONS :=
show lstack all statistics
-r, rate show lstack statistics per second
-s, snmp show lstack snmp
-c, connetct show lstack connect
-l, latency show lstack latency
set:
loglevel {error | info | debug} set lstack loglevel
lowpower {0 | 1} set lowpower enable
[time] measure latency time default 1S
The -u
option specifies the prefix of the Unix socket for communication between Gazelle processes. The value of this parameter must be the same as that of unix_prefix in the ltran.conf or lstack.conf file.
Packet Capturing Tool The NIC used by Gazelle is managed by DPDK. Therefore, tcpdump cannot capture Gazelle packets. As a substitute, Gazelle uses gazelle-pdump provided in the dpdk-tools software package as the packet capturing tool. gazelle-pdump uses the multi-process mode of DPDK to share memory with the lstack or ltran process. In ltran mode, gazelle-pdump can capture only ltran packets that directly communicate with the NIC. By filtering tcpdump data packets, gazelle-pdump can filter packets of a specific lstack process. (Usage)
10. Precautions
Location of the DPDK Configuration File
For the root user, the configuration file is stored in the /var/run/dpdk directory after the DPDK is started. For a non-root user, the path of the DPDK configuration file is determined by the environment variable XDG_RUNTIME_DIR.
- If XDG_RUNTIME_DIR is not set, the DPDK configuration file is stored in /tmp/dpdk.
- If XDG_RUNTIME_DIR is set, the DPDK configuration file is stored in the path specified by XDG_RUNTIME_DIR.
- Note that XDG_RUNTIME_DIR is set by default on some servers.
Impact on Gazelle Performance by the Retbleed Vulnerability Patch
- The patch to fix the Retbleed vulnerability is merged in kernel 5.10.0-60.57.0.85. This patch impacts the performance of Gazelle in x86 environments. You can add retbleed=off mitigations=off to the boot parameters to disable the patch and prevent the performance impact based on your security requirements. By default, the patch is enabled for security.
- In the test scenario, 1024 KB of data is sent from kernel space to user space through ltran. The performance decreases from 17,000 Mb/s to 5,000 Mb/s.
- openEuler 22.03 LTS and its SP versions (kernel version 5.10.0-60.57.0.85 or later) are affected.
- For details, see https:/gitee.com/openeuler/kernel/pulls/110.
Restrictions
Restrictions of Gazelle are as follows:
Function Restrictions
- Blocking accept() or connect() is not supported.
- A maximum of 1500 TCP connections are supported.
- Currently, only TCP, UDP, IGMPv2, ICMP, ARP, and IPv4 are supported.
- When a peer end pings Gazelle, the specified packet length must be less than or equal to 14,000 bytes.
- Transparent huge pages are not supported.
- ltran does not support the hybrid bonding of multiple types of NICs.
- The active/standby mode (bond1 mode) of ltran supports active/standby switchover only when a fault occurs at the link layer (for example, the network cable is disconnected), but does not support active/standby switchover when a fault occurs at the physical layer (for example, the NIC is powered off or removed).
- VM NICs do not support multiple queues.
- KNI must be enabled with UDP unless the NIC driver (such as mlx5) supports user mode and kernel mode at the same time.
Operation Restrictions
- By default, the command lines and configuration files provided by Gazelle requires root permissions. Privilege escalation and changing of file owner are required for non-root users.
- To bind the NIC from user-mode driver back to the kernel driver, you must exit Gazelle first.
- Memory huge pages cannot be remounted to subdirectories created in the mount point.
- The minimum huge page memory required by ltran is 1 GB.
- The minimum hugepage memory of each application instance protocol stack thread is 800 MB.
- Gazelle supports only 64-bit OSs.
- The
-march=native
option is used when building the x86 version of Gazelle to optimize Gazelle based on the CPU instruction set of the build environment (Intel® Xeon® Gold 5118 CPU @ 2.30GHz). Therefore, the CPU of the operating environment must support the SSE4.2, AVX, AVX2, and AVX-512 instruction set extensions. - The maximum number of IP fragments is 10 (the maximum ping packet length is 14,790 bytes). TCP does not use IP fragments.
- You are advised to set the rp_filter parameter of the NIC to 1 using the
sysctl
command. Otherwise, the Gazelle protocol stack may not be used as expected. Instead, the kernel protocol stack is used. - If ltran is not used, the KNI cannot be configured to be used only for local communication. In addition, you need to configure the NetworkManager not to manage the KNI network adapter before starting Gazelle.
- The IP address and MAC address of the virtual KNI must be the same as those in the lstack.conf file.
Precautions
You need to evaluate the use of Gazelle based on application scenarios.
Shared Memory
- Current situation: The memory huge pages are mounted to the /mnt/hugepages-lstack directory. During process initialization, files are created in the /mnt/hugepages-lstack directory. Each file corresponds to a huge page, and the mmap function is performed on the files. After receiving the registration information of lstask, ltran configures the files in the mmap directory of the information page based on the huge page memory configurations, implementing shared huge page memory. The procedure also applies to the files in the /mnt/hugepages-ltran directory.
- Current mitigation measures The huge page file permission is 600. Only the owner can access the files. The default owner is the root user. Other users can be configured. Huge page files are locked by DPDK and cannot be directly written or mapped.
- Caution Malicious processes belonging to the same user imitate the DPDK implementation logic to share huge page memory using huge page files and perform write operations to damage the huge page memory. As a result, the Gazelle program crashes. It is recommended that the processes of a user belong to the same trust domain.
Traffic Limit
Gazelle does not limit the traffic. Users can send packets at the maximum NIC line rate to the network, which may congest the network.
Process Spoofing
If two lstack processes A and B are legitimately registered with ltran, A can impersonate B to send spoofing messages to ltran and modify the ltran forwarding control information. As a result, the communication of B becomes abnormal, and information leakage occurs when packets for B are sent to A. Ensure that all lstack processes are trusted.