Deploying aops-agent

1. Environment Requirements

One host running on openEuler 20.03 or later

2. Configuration Environment Deployment

2.1 Disabling the Firewall

shell
systemctl stop firewalld
systemctl disable firewalld
systemctl status firewalld

2.2 Deploying aops-agent

  1. Run yum install aops-agent to install aops-agent based on the Yum source.

  2. Modify the configuration file. Change the value of the ip in the agent section to the IP address of the local host.

    shell
    vim  /etc/aops/agent.conf

    The following uses 192.168.1.47 as an example.

    ini
    [agent]
    ;IP address and port number bound when the aops-agent is started.
    ip=192.168.1.47
    port=12000
    
    [gopher]
    ;Default path of the gala-gopher configuration file. If you need to change the path, ensure that the file path is correct.
    config_path=/opt/gala-gopher/gala-gopher.conf
    
    ;aops-agent log collection configuration
    [log]
    ;Level of the logs to be collected, which can be set to DEBUG, INFO, WARNING, ERROR, or CRITICAL
    log_level=INFO
    ;Location for storing collected logs
    log_dir=/var/log/aops
    ;Maximum size of a log file
    max_bytes=31457280
    ;Number of backup logs
    backup_count=40
  3. Run systemctl start aops-agent to start the service.

2.3 Registering with aops-manager

To identify users and prevent APIs from being invoked randomly, aops-agent uses tokens to authenticate users, reducing the pressure on the deployed hosts.

For security purposes, the active registration mode is used to obtain the token. Before the registration, prepare the information to be registered on aops-agent and run the register command to register the information with aops-manager. No database is configured for aops-agent. After the registration is successful, the token is automatically saved to the specified file and the registration result is displayed on the GUI. In addition, save the local host information to the aops-manager database for subsequent management.

  1. Prepare the register.json file.

    Prepare the information required for registration on aops-agent and save the information in JSON format. The data structure is as follows:

    JSON
    {
        // Name of the login user
        "web_username":"admin",
        // User password
        "web_password": "changeme",
        // Host name
        "host_name": "host1",
        // Name of the group to which the host belongs
        "host_group_name": "group1",
        // IP address of the host where aops-manager is running
        "manager_ip":"192.168.1.23",
        // Whether to register as a management host
        "management":false,
        // External port for running aops-manager
        "manager_port":"11111",
        // Port for running aops-agent
        "agent_port":"12000"
    }

    Note: Ensure that aops-manager is running on the target host, for example, 192.168.1.23, and the registered host group exists.

  2. Run aops_agent register -f register.json.

  3. The registration result is displayed on the GUI. If the registration is successful, the token character string is saved to a specified file. If the registration fails, locate the fault based on the message and log content (/var/log/aops/aops.log).

    The following is an example of the registration result:

    • Registration succeeded.
    shell
    [root@localhost ~]# aops_agent register -f register.json
    Agent Register Success
    • Registration failed. The following uses the aops-manager start failure as an example.
    shell
    [root@localhost ~]# aops_agent register -f register.json
    Agent Register Fail
    [root@localhost ~]#
    • Log content
    shell
    2022-09-05 16:11:52,576 ERROR command_manage/register/331: HTTPConnectionPool(host='192.168.1.23', port=11111): Max retries exceeded with url: /manage/host/add (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff0504ce4f0>: Failed to establish a new connection: [Errno 111] Connection refused'))
    [root@localhost ~]#

3. Plug-in Support

3.1 gala-gopher

3.1.1 Introduction

gala-gopher is a low-load probe framework based on eBPF. It can be used to monitor the CPU, memory, and network status of hosts and collect data. You can configure the collection status of existing probes based on service requirements.

3.1.2 Deployment

  1. Run yum install gala-gopher to install gala-gopher based on the Yum source.
  2. Enable probes based on service requirements. You can view information about probes in /opt/gala-gopher/gala-gopher.conf.
  3. Run systemctl start gala-gopher to start the gala-gopher service.

3.1.3 Others

For more information about gala-gopher, see https://gitee.com/openeuler/gala-gopher/blob/master/README.md.

4. API Support

4.1 List of External APIs

No.APITypeDescription
1/v1/agent/plugin/startPOSTStarts a plug-in.
2/v1/agent/plugin/stopPOSTStops a plug-in.
3/v1/agent/application/infoGETCollects running applications in the target application collection.
4/v1/agent/host/infoGETObtains host information.
5/v1/agent/plugin/infoGETObtains the plug-in running information in aops-agent.
6/v1/agent/file/collectPOSTCollects content of the configuration file.
7/v1/agent/collect/items/changePOSTChanges the running status of plug-in collection items.

4.1.1 /v1/agent/plugin/start

  • Description: Starts the plug-in that is installed but not running. Currently, only the gala-gopher plug-in is supported.

  • HTTP request mode: POST

  • Data submission mode: query

  • Request parameter

    ParameterMandatoryTypeDescription
    plugin_nameTruestrPlug-in name
  • Request parameter example

    ParameterValue
    plugin_namegala-gopher
  • Response body parameters

    ParameterTypeDescription
    codeintReturn code
    msgstrInformation corresponding to the status code
  • Response example

    json
    {
        "code": 200,
        "msg": "xxxx"
    }

4.1.2 /v1/agent/plugin/stop

  • Description: Stops a running plug-in. Currently, only the gala-gopher plug-in is supported.

  • HTTP request mode: POST

  • Data submission mode: query

  • Request parameter

    ParameterMandatoryTypeDescription
    plugin_nameTruestrPlug-in name
  • Request parameter example

    ParameterValue
    plugin_namegala-gopher
  • Response body parameters

    ParameterTypeDescription
    codeintReturn code
    msgstrInformation corresponding to the status code
  • Response example

    json
    {
        "code": 200,
        "msg": "xxxx"
    }

4.1.3 /v1/agent/application/info

  • Description: Collects running applications in the target application collection. Currently, the target application collection contains MySQL, Kubernetes, Hadoop, Nginx, Docker, and gala-gopher.

  • HTTP request mode: GET

  • Data submission mode: query

  • Request parameter

    ParameterMandatoryTypeDescription
  • Request parameter example

    ParameterValue
  • Response body parameters

    ParameterTypeDescription
    codeintReturn code
    msgstrInformation corresponding to the status code
    respdictResponse body
    • resp
    ParameterTypeDescription
    runningList[str]List of the running applications
  • Response example

    json
    {
        "code": 200,
        "msg": "xxxx",
        "resp": {
            "running": [
                "mysql",
                "docker"
            ]
        }
    }

4.1.4 /v1/agent/host/info

  • Description: Obtains information about the host where aops-agent is installed, including the system version, BIOS version, kernel version, CPU information, and memory information.

  • HTTP request mode: POST

  • Data submission mode: application/json

  • Request parameter

    ParameterMandatoryTypeDescription
    info_typeTrueList[str]List of the information to be collected. Currently, only the CPU, disk, memory, and OS are supported.
  • Request parameter example

    json
    ["os", "cpu","memory", "disk"]
  • Response body parameters

    ParameterTypeDescription
    codeintReturn code
    msgstrInformation corresponding to the status code
    respdictResponse body
    • resp
    ParameterTypeDescription
    cpudictCPU information
    memorydictMemory information
    osdictOS information
    diskList[dict]Disk information
    • cpu
    ParameterTypeDescription
    architecturestrCPU architecture
    core_countintNumber of cores
    l1d_cachestrL1 data cache size
    l1i_cachestrL1 instruction cache size
    l2_cachestrL2 cache size
    l3_cachestrL3 cache size
    model_namestrModel name
    vendor_idstrVendor ID
    • memory
    ParameterTypeDescription
    sizestrTotal memory
    totalintNumber of DIMMs
    infoList[dict]Information about all DIMMs
    • info
    ParameterTypeDescription
    sizestrMemory size
    typestrType
    speedstrSpeed
    manufacturerstrVendor
    • os
    ParameterTypeDescription
    bios_versionstrBIOS version
    os_versionstrOS version
    kernelstrKernel version
  • Response example

    json
    {
        "code": 200,
        "msg": "operate success",
        "resp": {
            "cpu": {
                "architecture": "aarch64",
                "core_count": "128",
                "l1d_cache": "8 MiB (128 instances)",
                "l1i_cache": "8 MiB (128 instances)",
                "l2_cache": "64 MiB (128 instances)",
                "l3_cache": "128 MiB (4 instances)",
                "model_name": "Kunpeng-920",
                "vendor_id": "HiSilicon"
            },
            "memory": {
                "info": [
                    {
                        "manufacturer": "Hynix",
                        "size": "16 GB",
                        "speed": "2933 MT/s",
                        "type": "DDR4"
                    },
                    {
                        "manufacturer": "Hynix",
                        "size": "16 GB",
                        "speed": "2933 MT/s",
                        "type": "DDR4"
                    }
                ],
                "size": "32G",
                "total": 2
            },
            "os": {
                "bios_version": "1.82",
                "kernel": "5.10.0-60.18.0.50",
                "os_version": "openEuler 22.03 LTS"
            },
            "disk": [
                {
                    "capacity": "xxGB",
                    "model": "xxxxxx"
                }
                ]
        }
    }

4.1.5 /v1/agent/plugin/info

  • Description: Obtains the plug-in running status of the host. Currently, only the gala-gopher plug-in is supported.

  • HTTP request mode: GET

  • Data submission mode: query

  • Request parameter

    ParameterMandatoryTypeDescription
  • Request parameter example

    ParameterValue
  • Response body parameters

    ParameterTypeDescription
    codeintReturn code
    msgstrInformation corresponding to the status code
    respList[dict]Response body
    • resp
    ParameterTypeDescription
    plugin_namestrPlug-in name
    collect_itemslistRunning status of plug-in collection items
    is_installedstrInformation corresponding to the status code
    resourceList[dict]Plug-in resource usage
    statusstrPlug-in running status
    • resource
    ParameterTypeDescription
    namestrResource name
    current_valuestrResource usage
    limit_valuestrResource limit
  • Response example

    json
    {
        "code": 200,
        "msg": "operate success",
        "resp": [
            {
                "collect_items": [
                    {
                        "probe_name": "system_tcp",
                        "probe_status": "off",
                        "support_auto": false
                    },
                    {
                        "probe_name": "haproxy",
                        "probe_status": "auto",
                        "support_auto": true
                    },
                    {
                        "probe_name": "nginx",
                        "probe_status": "auto",
                        "support_auto": true
                    },
                ],
                "is_installed": true,
                "plugin_name": "gala-gopher",
                "resource": [
                    {
                        "current_value": "0.0%",
                        "limit_value": null,
                        "name": "cpu"
                    },
                    {
                        "current_value": "13 MB",
                        "limit_value": null,
                        "name": "memory"
                    }
                ],
                "status": "active"
            }
        ]
    }

4.1.6 /v1/agent/file/collect

  • Description: Collects information such as the content, permission, and owner of the target configuration file. Currently, only text files smaller than 1 MB, without execute permission, and supporting UTF8 encoding can be read.

  • HTTP request mode: POST

  • Data submission mode: application/json

  • Request parameter

    ParameterMandatoryTypeDescription
    configfile_pathTrueList[str]List of the full paths of the files to be collected
  • Request parameter example

    json
    [ "/home/test.conf", "/home/test.ini", "/home/test.json"]
  • Response body parameters

    ParameterTypeDescription
    infosList[dict]File collection information
    success_filesList[str]List of files successfully collected
    fail_filesList[str]List of files that fail to be collected
    • infos
    ParameterTypeDescription
    pathstrFile path
    contentstrFile content
    file_attrdictFile attributes
    • file_attr
    ParameterTypeDescription
    modestrPermission of the file type
    ownerstrFile owner
    groupstrGroup to which the file belongs
  • Response example

    json
    {
        "infos": [
            {
                "content": "this is a test file",
                "file_attr": {
                    "group": "root",
                    "mode": "0644",
                    "owner": "root"
                },
                "path": "/home/test.txt"
            }
        ],
        "success_files": [
            "/home/test.txt"
        ],
        "fail_files": [
            "/home/test.txt"
        ]
    }

4.1.7 /v1/agent/collect/items/change

  • Description: Changes the collection status of the plug-in collection items. Currently, only the status of the gala-gopher collection items can be changed. For the gala-gopher collection items, see /opt/gala-gopher/gala-gopher.conf.

  • HTTP request mode: POST

  • Data submission mode: application/json

  • Request parameter

    ParameterMandatoryTypeDescription
    plugin_nameTruedictExpected modification result of the plug-in collection items
    • plugin_name
    ParameterMandatoryTypeDescription
    collect_itemTruestringExpected modification result of the collection item
  • Request parameter example

    json
    {
        "gala-gopher":{
            "redis":"auto",
            "system_inode":"on",
            "tcp":"on",
            "haproxy":"auto"
        }
    }
  • Response body parameters

    ParameterTypeDescription
    codeintReturn code
    msgstrInformation corresponding to the status code
    respList[dict]Response body
    • resp
    ParameterTypeDescription
    plugin_namedictModification result of the corresponding collection item
    • plugin_name
    ParameterTypeDescription
    successList[str]Collection items that are successfully modified
    failureList[str]Collection items that fail to be modified
  • Response example

    json
    {
        "code": 200,
        "msg": "operate success",
        "resp": {
            "gala-gopher": {
                "failure": [
                    "redis"
                ],
                "success": [
                    "system_inode",
                    "tcp",
                    "haproxy"
                ]
            }
        }
    }

    FAQs

  1. If an error is reported, view the /var/log/aops/aops.log file, rectify the fault based on the error message in the log file, and restart the service.

  2. You are advised to run aops-agent in Python 3.7 or later. Pay attention to the version of the Python dependency library when installing it.

  3. The value of access_token can be obtained from the /etc/aops/agent.conf file after the registration is complete.

  4. To limit the CPU and memory resources of a plug-in, add MemoryHigh and CPUQuota to the Service section in the service file corresponding to the plug-in.

    For example, set the memory limit of gala-gopher to 40 MB and the CPU limit to 20%.

    ini
    [Unit]
    Description=a-ops gala gopher service
    After=network.target
    
    [Service]
    Type=exec
    ExecStart=/usr/bin/gala-gopher
    Restart=on-failure
    RestartSec=1
    RemainAfterExit=yes
    ;Limit the maximum memory that can be used by processes in the unit. The limit can be exceeded. However, after the limit is exceeded, the process running speed is limited, and the system reclaims the excess memory as much as possible.
    ;The option value can be an absolute memory size in bytes (K, M, G, or T suffix based on 1024) or a relative memory size in percentage.
    MemoryHigh=40M
    ;Set the CPU time limit for the processes of this unit. The value must be a percentage ending with %, indicating the maximum percentage of the total time that the unit can use a single CPU.
    CPUQuota=20%
    
    [Install]
    WantedBy=multi-user.target