Rubik Configuration Description
The Rubik program is written in Go and compiled into a static executable file to minimize the coupling with the system.
Commands
Besides the -v
option for querying version information, Rubik does not support other options. The following is an example of version query output:
$ ./rubik -v
Version: 2.0.0
Release: 3.oe2203sp2
Go Version: go1.18.8
Git Commit: bcaace8
Built: 2023-03-30
OS/Arch: linux/amd64
Configuration
When the Rubik binary file is executed, Rubik parses configuration file /var/lib/rubik/config.json.
Custom configuration file path is currently not supported to avoid confusion. When Rubik runs as a Daemonset in a Kubernetes cluster, modify the ConfigMap in the hack/rubik-daemonset.yaml file to configure Rubik.
The configuration file is in JSON format and keys are in lower camel case.
An example configuration file is as follows:
{
"agent": {
"logDriver": "stdio",
"logDir": "/var/log/rubik",
"logSize": 2048,
"logLevel": "info",
"cgroupRoot": "/sys/fs/cgroup",
"enabledFeatures": [
"preemption",
"dynCache",
"ioLimit",
"ioCost",
"quotaBurst",
"quotaTurbo",
"psi"
]
},
"preemption": {
"resource": [
"cpu",
"memory"
]
},
"quotaTurbo": {
"highWaterMark": 50,
"syncInterval": 100
},
"dynCache": {
"defaultLimitMode": "static",
"adjustInterval": 1000,
"perfDuration": 1000,
"l3Percent": {
"low": 20,
"mid": 30,
"high": 50
},
"memBandPercent": {
"low": 10,
"mid": 30,
"high": 50
}
},
"ioCost": [
{
"nodeName": "k8s-single",
"config": [
{
"dev": "sdb",
"enable": true,
"model": "linear",
"param": {
"rbps": 10000000,
"rseqiops": 10000000,
"rrandiops": 10000000,
"wbps": 10000000,
"wseqiops": 10000000,
"wrandiops": 10000000
}
}
]
}
],
"psi": {
"interval": 10,
"resource": [
"cpu",
"memory",
"io"
],
"avg10Threshold": 5.0
}
}
Rubik configuration items include common items and feature items. Common items are under the agent section and are applied globally. Feature items are applied to sub-features that are enabled in the enabledFeatures field under agent.
agent
The agent section stores common configuration items related to Rubik running, such as log configurations and cgroup mount points.
Key[=Default Value] | Type | Description | Example Value |
---|---|---|---|
logDriver=stdio | string | Log driver, which can be the standard I/O or file | stdio, file |
logDir=/var/log/rubik | string | Log directory | Anu readable and writable directory |
logSize=1024 | int | Total size of logs in MB when logDriver=file | [10, $2^{20}$] |
logLevel=info | string | Log level | debug,info,warn,error |
cgroupRoot=/sys/fs/cgroup | string | Mount point of the system cgroup | Mount point of the system cgroup |
enabledFeatures=[] | string array | List of Rubik features to be enabled | Rubik features. see Feature Introduction for details. |
preemption
The preemption field stores configuration items of the absolute preemption feature, including CPU and memory preemption. You can configure this field to use either or both of CPU and memory preemption.
Key[=Default Value] | Type | Description | Example Value |
---|---|---|---|
resource=[] | string array | Resource type to be accessed | cpu, memory |
dynCache
The dynCache field stores configuration items related to pod memory bandwidth and last-level cache (LLC) limits. l3Percent indicates the watermarks of each LLC level. memBandPercent indicates watermarks of memory bandwidth in MB.
Key[=Default Value] | Type | Description | Example Value |
---|---|---|---|
defaultLimitMode=static | string | dynCache control mode | static, dynamic |
adjustInterval=1000 | int | Interval for dynCache control, in milliseconds | [10, 10000] |
perfDuration=1000 | int | perf execution duration for dynCache, in milliseconds | [10, 10000] |
l3Percent | map | Watermarks of each L3 cache level of dynCache in percents | |
.low=20 | int | Watermark of the low L3 cache level | [10, 100] |
.mid=30 | int | Watermark of the middle L3 cache level | [low, 100] |
.high=50 | int | Watermark of the high L3 cache level | [mid, 100] |
memBandPercent | map | Watermarks of each memory bandwidth level of dynCache in percents | |
.low=10 | int | Watermark of the low bandwidth level in MB | [10, 100] |
.mid=30 | int | Watermark of the middle bandwidth level in MB | [low, 100] |
.high=50 | int | Watermark of the high bandwidth level in MB | [mid, 100] |
quotaTurbo
The quotaTurbo field stores configuration items of the user-mode elastic traffic limiting feature.
Key[=Default Value] | Type | Description | Example Value |
---|---|---|---|
highWaterMark=60 | int | High watermark of CPU load | [0, alarmWaterMark) |
alarmWaterMark=80 | int | Alarm watermark of CPU load | (highWaterMark,100] |
syncInterval=100 | int | Interval for triggering container quota updates, in milliseconds | [100,10000] |
ioCost
The ioCost field stores configuration items of the iocost-based I/O weight control feature. The field is an array whose elements are names of nodes (nodeName) and their device configuration arrays (config).
Key | Type | Description | Example Value |
---|---|---|---|
nodeName | string | Node name | Kubernetes cluster node name |
config | array | Configurations of a block device | / |
config parameters of a block device:
Key[=Default Value] | Type | Description | Example Value |
---|---|---|---|
dev | string | Physical block device name | / |
model | string | iocost model | linear |
param | / | Device parameters specific to the model | / |
For the linear model, the param field supports the following parameters:
Key[=Default Value] | Type | Description | Example Value |
---|---|---|---|
rbps | int64 | Maximum read bandwidth | (0, $2^{63}$) |
rseqiops | int64 | Maximum sequential read IOPS | (0, $2^{63}$) |
rrandiops | int64 | Maximum random read IOPS | (0, $2^{63}$) |
wbps | int64 | Maximum write bandwidth | (0, $2^{63}$) |
wseqiops | int64 | Maximum sequential write IOPS | (0, $2^{63}$) |
wrandiops | int64 | Maximum random write IOPS | (0, $2^{63}$) |
psi
The psi field stores configuration items of the PSI-based interference detection feature. This feature can monitor CPUs, memory, and I/O resources.You can configure this field to monitor the PSI of any or all of the resources.
Key[=Default Value] | Type | Description | Example Value |
---|---|---|---|
interval=10 | int | Interval for PSI monitoring, in seconds | [10,30] |
resource=[] | string array | Resource type to be accessed | cpu, memory, io |
avg10Threshold=5.0 | float | Average percentage of blocking time of a job in 10 seconds. If this threshold is reached, offline services are evicted. | [5.0,100] |