Deploying aops-agent
1. Environment Requirements
One host running on openEuler 20.03 or later
2. Configuration Environment Deployment
2.1 Disabling the Firewall
systemctl stop firewalld
systemctl disable firewalld
systemctl status firewalld
2.2 Deploying aops-agent
Run
yum install aops-agent
to install aops-agent based on the Yum source.Modify the configuration file. Change the value of the ip in the agent section to the IP address of the local host.
shellvim /etc/aops/agent.conf
The following uses 192.168.1.47 as an example.
ini[agent] ;IP address and port number bound when the aops-agent is started. ip=192.168.1.47 port=12000 [gopher] ;Default path of the gala-gopher configuration file. If you need to change the path, ensure that the file path is correct. config_path=/opt/gala-gopher/gala-gopher.conf ;aops-agent log collection configuration [log] ;Level of the logs to be collected, which can be set to DEBUG, INFO, WARNING, ERROR, or CRITICAL log_level=INFO ;Location for storing collected logs log_dir=/var/log/aops ;Maximum size of a log file max_bytes=31457280 ;Number of backup logs backup_count=40
Run
systemctl start aops-agent
to start the service.
2.3 Registering with aops-manager
To identify users and prevent APIs from being invoked randomly, aops-agent uses tokens to authenticate users, reducing the pressure on the deployed hosts.
For security purposes, the active registration mode is used to obtain the token. Before the registration, prepare the information to be registered on aops-agent and run the register
command to register the information with aops-manager. No database is configured for aops-agent. After the registration is successful, the token is automatically saved to the specified file and the registration result is displayed on the GUI. In addition, save the local host information to the aops-manager database for subsequent management.
Prepare the register.json file.
Prepare the information required for registration on aops-agent and save the information in JSON format. The data structure is as follows:
JSON{ // Name of the login user "web_username":"admin", // User password "web_password": "changeme", // Host name "host_name": "host1", // Name of the group to which the host belongs "host_group_name": "group1", // IP address of the host where aops-manager is running "manager_ip":"192.168.1.23", // Whether to register as a management host "management":false, // External port for running aops-manager "manager_port":"11111", // Port for running aops-agent "agent_port":"12000" }
Note: Ensure that aops-manager is running on the target host, for example, 192.168.1.23, and the registered host group exists.
Run
aops_agent register -f register.json
.The registration result is displayed on the GUI. If the registration is successful, the token character string is saved to a specified file. If the registration fails, locate the fault based on the message and log content (/var/log/aops/aops.log).
The following is an example of the registration result:
- Registration succeeded.
shell[root@localhost ~]# aops_agent register -f register.json Agent Register Success
- Registration failed. The following uses the aops-manager start failure as an example.
shell[root@localhost ~]# aops_agent register -f register.json Agent Register Fail [root@localhost ~]#
- Log content
shell2022-09-05 16:11:52,576 ERROR command_manage/register/331: HTTPConnectionPool(host='192.168.1.23', port=11111): Max retries exceeded with url: /manage/host/add (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff0504ce4f0>: Failed to establish a new connection: [Errno 111] Connection refused')) [root@localhost ~]#
3. Plug-in Support
3.1 gala-gopher
3.1.1 Introduction
gala-gopher is a low-load probe framework based on eBPF. It can be used to monitor the CPU, memory, and network status of hosts and collect data. You can configure the collection status of existing probes based on service requirements.
3.1.2 Deployment
- Run
yum install gala-gopher
to install gala-gopher based on the Yum source. - Enable probes based on service requirements. You can view information about probes in /opt/gala-gopher/gala-gopher.conf.
- Run
systemctl start gala-gopher
to start the gala-gopher service.
3.1.3 Others
For more information about gala-gopher, see https://gitee.com/openeuler/gala-gopher/blob/master/README.md.
4. API Support
4.1 List of External APIs
No. | API | Type | Description |
---|---|---|---|
1 | /v1/agent/plugin/start | POST | Starts a plug-in. |
2 | /v1/agent/plugin/stop | POST | Stops a plug-in. |
3 | /v1/agent/application/info | GET | Collects running applications in the target application collection. |
4 | /v1/agent/host/info | GET | Obtains host information. |
5 | /v1/agent/plugin/info | GET | Obtains the plug-in running information in aops-agent. |
6 | /v1/agent/file/collect | POST | Collects content of the configuration file. |
7 | /v1/agent/collect/items/change | POST | Changes the running status of plug-in collection items. |
4.1.1 /v1/agent/plugin/start
Description: Starts the plug-in that is installed but not running. Currently, only the gala-gopher plug-in is supported.
HTTP request mode: POST
Data submission mode: query
Request parameter
Parameter Mandatory Type Description plugin_name True str Plug-in name Request parameter example
Parameter Value plugin_name gala-gopher Response body parameters
Parameter Type Description code int Return code msg str Information corresponding to the status code Response example
json{ "code": 200, "msg": "xxxx" }
4.1.2 /v1/agent/plugin/stop
Description: Stops a running plug-in. Currently, only the gala-gopher plug-in is supported.
HTTP request mode: POST
Data submission mode: query
Request parameter
Parameter Mandatory Type Description plugin_name True str Plug-in name Request parameter example
Parameter Value plugin_name gala-gopher Response body parameters
Parameter Type Description code int Return code msg str Information corresponding to the status code Response example
json{ "code": 200, "msg": "xxxx" }
4.1.3 /v1/agent/application/info
Description: Collects running applications in the target application collection. Currently, the target application collection contains MySQL, Kubernetes, Hadoop, Nginx, Docker, and gala-gopher.
HTTP request mode: GET
Data submission mode: query
Request parameter
Parameter Mandatory Type Description Request parameter example
Parameter Value Response body parameters
Parameter Type Description code int Return code msg str Information corresponding to the status code resp dict Response body - resp
Parameter Type Description running List[str] List of the running applications Response example
json{ "code": 200, "msg": "xxxx", "resp": { "running": [ "mysql", "docker" ] } }
4.1.4 /v1/agent/host/info
Description: Obtains information about the host where aops-agent is installed, including the system version, BIOS version, kernel version, CPU information, and memory information.
HTTP request mode: POST
Data submission mode: application/json
Request parameter
Parameter Mandatory Type Description info_type True List[str] List of the information to be collected. Currently, only the CPU, disk, memory, and OS are supported. Request parameter example
json["os", "cpu","memory", "disk"]
Response body parameters
Parameter Type Description code int Return code msg str Information corresponding to the status code resp dict Response body - resp
Parameter Type Description cpu dict CPU information memory dict Memory information os dict OS information disk List[dict] Disk information - cpu
Parameter Type Description architecture str CPU architecture core_count int Number of cores l1d_cache str L1 data cache size l1i_cache str L1 instruction cache size l2_cache str L2 cache size l3_cache str L3 cache size model_name str Model name vendor_id str Vendor ID - memory
Parameter Type Description size str Total memory total int Number of DIMMs info List[dict] Information about all DIMMs - info
Parameter Type Description size str Memory size type str Type speed str Speed manufacturer str Vendor - os
Parameter Type Description bios_version str BIOS version os_version str OS version kernel str Kernel version Response example
json{ "code": 200, "msg": "operate success", "resp": { "cpu": { "architecture": "aarch64", "core_count": "128", "l1d_cache": "8 MiB (128 instances)", "l1i_cache": "8 MiB (128 instances)", "l2_cache": "64 MiB (128 instances)", "l3_cache": "128 MiB (4 instances)", "model_name": "Kunpeng-920", "vendor_id": "HiSilicon" }, "memory": { "info": [ { "manufacturer": "Hynix", "size": "16 GB", "speed": "2933 MT/s", "type": "DDR4" }, { "manufacturer": "Hynix", "size": "16 GB", "speed": "2933 MT/s", "type": "DDR4" } ], "size": "32G", "total": 2 }, "os": { "bios_version": "1.82", "kernel": "5.10.0-60.18.0.50", "os_version": "openEuler 22.03 LTS" }, "disk": [ { "capacity": "xxGB", "model": "xxxxxx" } ] } }
4.1.5 /v1/agent/plugin/info
Description: Obtains the plug-in running status of the host. Currently, only the gala-gopher plug-in is supported.
HTTP request mode: GET
Data submission mode: query
Request parameter
Parameter Mandatory Type Description Request parameter example
Parameter Value Response body parameters
Parameter Type Description code int Return code msg str Information corresponding to the status code resp List[dict] Response body - resp
Parameter Type Description plugin_name str Plug-in name collect_items list Running status of plug-in collection items is_installed str Information corresponding to the status code resource List[dict] Plug-in resource usage status str Plug-in running status - resource
Parameter Type Description name str Resource name current_value str Resource usage limit_value str Resource limit Response example
json{ "code": 200, "msg": "operate success", "resp": [ { "collect_items": [ { "probe_name": "system_tcp", "probe_status": "off", "support_auto": false }, { "probe_name": "haproxy", "probe_status": "auto", "support_auto": true }, { "probe_name": "nginx", "probe_status": "auto", "support_auto": true }, ], "is_installed": true, "plugin_name": "gala-gopher", "resource": [ { "current_value": "0.0%", "limit_value": null, "name": "cpu" }, { "current_value": "13 MB", "limit_value": null, "name": "memory" } ], "status": "active" } ] }
4.1.6 /v1/agent/file/collect
Description: Collects information such as the content, permission, and owner of the target configuration file. Currently, only text files smaller than 1 MB, without execute permission, and supporting UTF8 encoding can be read.
HTTP request mode: POST
Data submission mode: application/json
Request parameter
Parameter Mandatory Type Description configfile_path True List[str] List of the full paths of the files to be collected Request parameter example
json[ "/home/test.conf", "/home/test.ini", "/home/test.json"]
Response body parameters
Parameter Type Description infos List[dict] File collection information success_files List[str] List of files successfully collected fail_files List[str] List of files that fail to be collected - infos
Parameter Type Description path str File path content str File content file_attr dict File attributes - file_attr
Parameter Type Description mode str Permission of the file type owner str File owner group str Group to which the file belongs Response example
json{ "infos": [ { "content": "this is a test file", "file_attr": { "group": "root", "mode": "0644", "owner": "root" }, "path": "/home/test.txt" } ], "success_files": [ "/home/test.txt" ], "fail_files": [ "/home/test.txt" ] }
4.1.7 /v1/agent/collect/items/change
Description: Changes the collection status of the plug-in collection items. Currently, only the status of the gala-gopher collection items can be changed. For the gala-gopher collection items, see /opt/gala-gopher/gala-gopher.conf.
HTTP request mode: POST
Data submission mode: application/json
Request parameter
Parameter Mandatory Type Description plugin_name True dict Expected modification result of the plug-in collection items - plugin_name
Parameter Mandatory Type Description collect_item True string Expected modification result of the collection item Request parameter example
json{ "gala-gopher":{ "redis":"auto", "system_inode":"on", "tcp":"on", "haproxy":"auto" } }
Response body parameters
Parameter Type Description code int Return code msg str Information corresponding to the status code resp List[dict] Response body - resp
Parameter Type Description plugin_name dict Modification result of the corresponding collection item - plugin_name
Parameter Type Description success List[str] Collection items that are successfully modified failure List[str] Collection items that fail to be modified Response example
json{ "code": 200, "msg": "operate success", "resp": { "gala-gopher": { "failure": [ "redis" ], "success": [ "system_inode", "tcp", "haproxy" ] } } }
FAQs
If an error is reported, view the /var/log/aops/aops.log file, rectify the fault based on the error message in the log file, and restart the service.
You are advised to run aops-agent in Python 3.7 or later. Pay attention to the version of the Python dependency library when installing it.
The value of access_token can be obtained from the /etc/aops/agent.conf file after the registration is complete.
To limit the CPU and memory resources of a plug-in, add MemoryHigh and CPUQuota to the Service section in the service file corresponding to the plug-in.
For example, set the memory limit of gala-gopher to 40 MB and the CPU limit to 20%.
ini[Unit] Description=a-ops gala gopher service After=network.target [Service] Type=exec ExecStart=/usr/bin/gala-gopher Restart=on-failure RestartSec=1 RemainAfterExit=yes ;Limit the maximum memory that can be used by processes in the unit. The limit can be exceeded. However, after the limit is exceeded, the process running speed is limited, and the system reclaims the excess memory as much as possible. ;The option value can be an absolute memory size in bytes (K, M, G, or T suffix based on 1024) or a relative memory size in percentage. MemoryHigh=40M ;Set the CPU time limit for the processes of this unit. The value must be a percentage ending with %, indicating the maximum percentage of the total time that the unit can use a single CPU. CPUQuota=20% [Install] WantedBy=multi-user.target