LTS

    Innovation Version

      Deploying aops-agent

      1. Environment Requirements

      One host running on openEuler 20.03 or later

      2. Configuration Environment Deployment

      2.1 Disabling the Firewall

      systemctl stop firewalld
      systemctl disable firewalld
      systemctl status firewalld
      

      2.2 Deploying aops-agent

      1. Run yum install aops-agent to install aops-agent based on the Yum source.

      2. Modify the configuration file. Change the value of the ip in the agent section to the IP address of the local host.

      vim  /etc/aops/agent.conf
      

      The following uses 192.168.1.47 as an example.

      [agent]
      ;IP address and port number bound when the aops-agent is started.
      ip=192.168.1.47
      port=12000
       
      [gopher]
      ;Default path of the gala-gopher configuration file. If you need to change the path, ensure that the file path is correct.
      config_path=/opt/gala-gopher/gala-gopher.conf
       
      ;aops-agent log collection configuration
      [log]
      ;Level of the logs to be collected, which can be set to DEBUG, INFO, WARNING, ERROR, or CRITICAL
      log_level=INFO
      ;Location for storing collected logs
      log_dir=/var/log/aops
      ;Maximum size of a log file
      max_bytes=31457280
      ;Number of backup logs
      backup_count=40
      
      1. Run systemctl start aops-agent to start the service.

      2.3 Registering with aops-manager

      To identify users and prevent APIs from being invoked randomly, aops-agent uses tokens to authenticate users, reducing the pressure on the deployed hosts.

      For security purposes, the active registration mode is used to obtain the token. Before the registration, prepare the information to be registered on aops-agent and run the register command to register the information with aops-manager. No database is configured for aops-agent. After the registration is successful, the token is automatically saved to the specified file and the registration result is displayed on the GUI. In addition, save the local host information to the aops-manager database for subsequent management.

      1. Prepare the register.json file.

        Prepare the information required for registration on aops-agent and save the information in JSON format. The data structure is as follows:

      {
          // Name of the login user
          "web_username":"admin",
          // User password
          "web_password": "changeme",
          // Host name
          "host_name": "host1",
          // Name of the group to which the host belongs
          "host_group_name": "group1",
          // IP address of the host where aops-manager is running
          "manager_ip":"192.168.1.23",
          // Whether to register as a management host
          "management":false,
          // External port for running aops-manager
          "manager_port":"11111",
          // Port for running aops-agent
          "agent_port":"12000"
      }
      

      Note: Ensure that aops-manager is running on the target host, for example, 192.168.1.23, and the registered host group exists.

      1. Run aops_agent register -f register.json.
      2. The registration result is displayed on the GUI. If the registration is successful, the token character string is saved to a specified file. If the registration fails, locate the fault based on the message and log content (/var/log/aops/aops.log).

      The following is an example of the registration result:

      • Registration succeeded.
      [root@localhost ~]# aops_agent register -f register.json
      Agent Register Success
      
      • Registration failed. The following uses the aops-manager start failure as an example.
      [root@localhost ~]# aops_agent register -f register.json
      Agent Register Fail
      [root@localhost ~]#
      
      • Log content
      2022-09-05 16:11:52,576 ERROR command_manage/register/331: HTTPConnectionPool(host='192.168.1.23', port=11111): Max retries exceeded with url: /manage/host/add (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff0504ce4f0>: Failed to establish a new connection: [Errno 111] Connection refused'))
      [root@localhost ~]#
      

      3. Plug-in Support

      3.1 gala-gopher

      3.1.1 Introduction

      gala-gopher is a low-load probe framework based on eBPF. It can be used to monitor the CPU, memory, and network status of hosts and collect data. You can configure the collection status of existing probes based on service requirements.

      3.1.2 Deployment
      1. Run yum install gala-gopher to install gala-gopher based on the Yum source.
      2. Enable probes based on service requirements. You can view information about probes in /opt/gala-gopher/gala-gopher.conf.
      3. Run systemctl start gala-gopher to start the gala-gopher service.
      3.1.3 Others

      For more information about gala-gopher, see https://gitee.com/openeuler/gala-gopher/blob/master/README.md.

      4. API Support

      4.1 List of External APIs

      No.APITypeDescription
      1/v1/agent/plugin/startPOSTStarts a plug-in.
      2/v1/agent/plugin/stopPOSTStops a plug-in.
      3/v1/agent/application/infoGETCollects running applications in the target application collection.
      4/v1/agent/host/infoGETObtains host information.
      5/v1/agent/plugin/infoGETObtains the plug-in running information in aops-agent.
      6/v1/agent/file/collectPOSTCollects content of the configuration file.
      7/v1/agent/collect/items/changePOSTChanges the running status of plug-in collection items.
      4.1.1 /v1/agent/plugin/start
      • Description: Starts the plug-in that is installed but not running. Currently, only the gala-gopher plug-in is supported.

      • HTTP request mode: POST

      • Data submission mode: query

      • Request parameter

        ParameterMandatoryTypeDescription
        plugin_nameTruestrPlug-in name
      • Request parameter example

        ParameterValue
        plugin_namegala-gopher
      • Response body parameters

        ParameterTypeDescription
        codeintReturn code
        msgstrInformation corresponding to the status code
      • Response example

        {
            "code": 200,
            "msg": "xxxx"
        }
        
      4.1.2 /v1/agent/plugin/stop
      • Description: Stops a running plug-in. Currently, only the gala-gopher plug-in is supported.

      • HTTP request mode: POST

      • Data submission mode: query

      • Request parameter

        ParameterMandatoryTypeDescription
        plugin_nameTruestrPlug-in name
      • Request parameter example

        ParameterValue
        plugin_namegala-gopher
      • Response body parameters

        ParameterTypeDescription
        codeintReturn code
        msgstrInformation corresponding to the status code
      • Response example

        {
            "code": 200,
            "msg": "xxxx"
        }
        
      4.1.3 /v1/agent/application/info
      • Description: Collects running applications in the target application collection. Currently, the target application collection contains MySQL, Kubernetes, Hadoop, Nginx, Docker, and gala-gopher.

      • HTTP request mode: GET

      • Data submission mode: query

      • Request parameter

        ParameterMandatoryTypeDescription
      • Request parameter example

        ParameterValue
      • Response body parameters

        ParameterTypeDescription
        codeintReturn code
        msgstrInformation corresponding to the status code
        respdictResponse body
        • resp
          ParameterTypeDescription
          runningList[str]List of the running applications
      • Response example

        {
            "code": 200,
            "msg": "xxxx",
            "resp": {
                "running": [
                    "mysql",
                    "docker"
                ]
            }
        }
        
      4.1.4 /v1/agent/host/info
      • Description: Obtains information about the host where aops-agent is installed, including the system version, BIOS version, kernel version, CPU information, and memory information.

      • HTTP request mode: POST

      • Data submission mode: application/json

      • Request parameter

        ParameterMandatoryTypeDescription
        info_typeTrueList[str]List of the information to be collected. Currently, only the CPU, disk, memory, and OS are supported.
      • Request parameter example

        ["os", "cpu","memory", "disk"]
        
      • Response body parameters

        ParameterTypeDescription
        codeintReturn code
        msgstrInformation corresponding to the status code
        respdictResponse body
        • resp

          ParameterTypeDescription
          cpudictCPU information
          memorydictMemory information
          osdictOS information
          diskList[dict]Disk information
        • cpu

          ParameterTypeDescription
          architecturestrCPU architecture
          core_countintNumber of cores
          l1d_cachestrL1 data cache size
          l1i_cachestrL1 instruction cache size
          l2_cachestrL2 cache size
          l3_cachestrL3 cache size
          model_namestrModel name
          vendor_idstrVendor ID
        • memory

          ParameterTypeDescription
          sizestrTotal memory
          totalintNumber of DIMMs
          infoList[dict]Information about all DIMMs
          • info
            ParameterTypeDescription
            sizestrMemory size
            typestrType
            speedstrSpeed
            manufacturerstrVendor
        • os

          ParameterTypeDescription
          bios_versionstrBIOS version
          os_versionstrOS version
          kernelstrKernel version
      • Response example

        {
            "code": 200,
            "msg": "operate success",
            "resp": {
                "cpu": {
                    "architecture": "aarch64",
                    "core_count": "128",
                    "l1d_cache": "8 MiB (128 instances)",
                    "l1i_cache": "8 MiB (128 instances)",
                    "l2_cache": "64 MiB (128 instances)",
                    "l3_cache": "128 MiB (4 instances)",
                    "model_name": "Kunpeng-920",
                    "vendor_id": "HiSilicon"
                },
                "memory": {
                    "info": [
                        {
                            "manufacturer": "Hynix",
                            "size": "16 GB",
                            "speed": "2933 MT/s",
                            "type": "DDR4"
                        },
                        {
                            "manufacturer": "Hynix",
                            "size": "16 GB",
                            "speed": "2933 MT/s",
                            "type": "DDR4"
                        }
                    ],
                    "size": "32G",
                    "total": 2
                },
                "os": {
                    "bios_version": "1.82",
                    "kernel": "5.10.0-60.18.0.50",
                    "os_version": "openEuler 22.03 LTS"   
                },
                "disk": [
                    {
                        "capacity": "xxGB",
                        "model": "xxxxxx"
                    }
                    ]
            }
        }
        
      4.1.5 /v1/agent/plugin/info
      • Description: Obtains the plug-in running status of the host. Currently, only the gala-gopher plug-in is supported.

      • HTTP request mode: GET

      • Data submission mode: query

      • Request parameter

        ParameterMandatoryTypeDescription
      • Request parameter example

        ParameterValue
      • Response body parameters

        ParameterTypeDescription
        codeintReturn code
        msgstrInformation corresponding to the status code
        respList[dict]Response body
        • resp

          ParameterTypeDescription
          plugin_namestrPlug-in name
          collect_itemslistRunning status of plug-in collection items
          is_installedstrInformation corresponding to the status code
          resourceList[dict]Plug-in resource usage
          statusstrPlug-in running status
          • resource
            ParameterTypeDescription
            namestrResource name
            current_valuestrResource usage
            limit_valuestrResource limit
      • Response example

        {
            "code": 200,
            "msg": "operate success",
            "resp": [
                {
                    "collect_items": [
                        {
                            "probe_name": "system_tcp",
                            "probe_status": "off",
                            "support_auto": false
                        },
                        {
                            "probe_name": "haproxy",
                            "probe_status": "auto",
                            "support_auto": true
                        },
                        {
                            "probe_name": "nginx",
                            "probe_status": "auto",
                            "support_auto": true
                        },
                    ],
                    "is_installed": true,
                    "plugin_name": "gala-gopher",
                    "resource": [
                        {
                            "current_value": "0.0%",
                            "limit_value": null,
                            "name": "cpu"
                        },
                        {
                            "current_value": "13 MB",
                            "limit_value": null,
                            "name": "memory"
                        }
                    ],
                    "status": "active"
                }
            ]
        }
        
      4.1.6 /v1/agent/file/collect
      • Description: Collects information such as the content, permission, and owner of the target configuration file. Currently, only text files smaller than 1 MB, without execute permission, and supporting UTF8 encoding can be read.

      • HTTP request mode: POST

      • Data submission mode: application/json

      • Request parameter

        ParameterMandatoryTypeDescription
        configfile_pathTrueList[str]List of the full paths of the files to be collected
      • Request parameter example

        [ "/home/test.conf", "/home/test.ini", "/home/test.json"]
        
      • Response body parameters

        ParameterTypeDescription
        infosList[dict]File collection information
        success_filesList[str]List of files successfully collected
        fail_filesList[str]List of files that fail to be collected
        • infos

          ParameterTypeDescription
          pathstrFile path
          contentstrFile content
          file_attrdictFile attributes
          • file_attr
            ParameterTypeDescription
            modestrFile permission
            ownerstrFile owner
            groupstrGroup to which the file belongs
      • Response example

        {
            "infos": [
                {
                    "content": "this is a test file",
                    "file_attr": {
                        "group": "root",
                        "mode": "0644",
                        "owner": "root"
                    },
                    "path": "/home/test.txt"
                }
            ],
            "success_files": [
                "/home/test.txt"
            ],
            "fail_files": [
                "/home/test.txt"
            ]
        }
        
      4.1.7 /v1/agent/collect/items/change
      • Description: Changes the collection status of the plug-in collection items. Currently, only the status of the gala-gopher collection items can be changed. For the gala-gopher collection items, see /opt/gala-gopher/gala-gopher.conf.

      • HTTP request mode: POST

      • Data submission mode: application/json

      • Request parameter

        ParameterMandatoryTypeDescription
        plugin_nameTruedictExpected modification result of the plug-in collection items
        • plugin_name
          ParameterMandatoryTypeDescription
          collect_itemTruestringExpected modification result of the collection item
      • Request parameter example

        {
            "gala-gopher":{
                "redis":"auto",
                "system_inode":"on",
                "tcp":"on",
                "haproxy":"auto"
            }
        } 
        
      • Response body parameters

        ParameterTypeDescription
        codeintReturn code
        msgstrInformation corresponding to the status code
        respList[dict]Response body
        • resp

          ParameterTypeDescription
          plugin_namedictModification result of the corresponding collection item
          • plugin_name
            ParameterTypeDescription
            successList[str]Collection items that are successfully modified
            failureList[str]Collection items that fail to be modified
      • Response example

        {
            "code": 200,
            "msg": "operate success",
            "resp": {
                "gala-gopher": {
                    "failure": [
                        "redis"
                    ],
                    "success": [
                        "system_inode",
                        "tcp",
                        "haproxy"
                    ]
                }
            }
        }
        

        FAQs

      1. If an error is reported, view the /var/log/aops/aops.log file, rectify the fault based on the error message in the log file, and restart the service.

      2. You are advised to run aops-agent in Python 3.7 or later. Pay attention to the version of the Python dependency library when installing it.

      3. The value of access_token can be obtained from the /etc/aops/agent.conf file after the registration is complete.

      4. To limit the CPU and memory resources of a plug-in, add MemoryHigh and CPUQuota to the Service section in the service file corresponding to the plug-in.

        For example, set the memory limit of gala-gopher to 40 MB and the CPU limit to 20%.

        [Unit]
        Description=a-ops gala gopher service
        After=network.target
        
        [Service]
        Type=exec
        ExecStart=/usr/bin/gala-gopher
        Restart=on-failure
        RestartSec=1
        RemainAfterExit=yes
        ;Limit the maximum memory that can be used by processes in the unit. The limit can be exceeded. However, after the limit is exceeded, the process running speed is limited, and the system reclaims the excess memory as much as possible.
        ;The option value can be an absolute memory size in bytes (K, M, G, or T suffix based on 1024) or a relative memory size in percentage.
        MemoryHigh=40M
        ;Set the CPU time limit for the processes of this unit. The value must be a percentage ending with %, indicating the maximum percentage of the total time that the unit can use a single CPU.
        CPUQuota=20%
        
        [Install]
        WantedBy=multi-user.target
        

      Bug Catching

      Buggy Content

      Bug Description

      Submit As Issue

      It's a little complicated....

      I'd like to ask someone.

      PR

      Just a small problem.

      I can fix it online!

      Bug Type
      Specifications and Common Mistakes

      ● Misspellings or punctuation mistakes;

      ● Incorrect links, empty cells, or wrong formats;

      ● Chinese characters in English context;

      ● Minor inconsistencies between the UI and descriptions;

      ● Low writing fluency that does not affect understanding;

      ● Incorrect version numbers, including software package names and version numbers on the UI.

      Usability

      ● Incorrect or missing key steps;

      ● Missing prerequisites or precautions;

      ● Ambiguous figures, tables, or texts;

      ● Unclear logic, such as missing classifications, items, and steps.

      Correctness

      ● Technical principles, function descriptions, or specifications inconsistent with those of the software;

      ● Incorrect schematic or architecture diagrams;

      ● Incorrect commands or command parameters;

      ● Incorrect code;

      ● Commands inconsistent with the functions;

      ● Wrong screenshots.

      Risk Warnings

      ● Lack of risk warnings for operations that may damage the system or important data.

      Content Compliance

      ● Contents that may violate applicable laws and regulations or geo-cultural context-sensitive words and expressions;

      ● Copyright infringement.

      How satisfied are you with this document

      Not satisfied at all
      Very satisfied
      Submit
      Click to create an issue. An issue template will be automatically generated based on your feedback.
      Bug Catching
      编组 3备份