LTS

    Innovation Version

      adoctor-check Exception Detection Module

      1 Description

      The adoctor-check exception detection module is a tool that calculates and analyzes various running data collected from the nodes in the cluster to check whether there are abnormal data items.

      2 Architecture

      The exception detection tool depends on the A-Ops framework and consists of the scheduler and executor modules.

      Architecture of the Exception Detection Module

      3 Software Download

      4 Operating Environment

      • Hardware configuration:
        Configuration ItemRecommended Specification
        CPU8 cores
        Memory32 GB (minimum: 4 GB)
        Network bandwidth300 Mbps
        I/O375 MB/s
      • Software configuration:
        SoftwareVersion and Specifications
        PythonVersion 3.8 or later
        MySQL8.0.26
        Kafka2.6.0
        ZooKeeper3.6.2
        Elasticsearch7.14.0
        Prometheus2.20.0
        node_exporter1.0.1
        aops-database1.0.1
        aops-manager1.0.1
        Fluentd1.13.3
        fluentd-plugin-elasticsearch5.0.5

      5 Installation

      5.1 Installing the A-Ops Framework

      The exception detection module depend on the A-Ops framework. For details about the framework installation, see the A-Ops Framework Manual.

      5.2 Installing the Dependent Components

      The exception detection module depends on the messaging middleware and data collection module. Ensure that the ZooKeeper, Kafka, Prometheus, node_exporter, Elasticsearch and Fluentd components have been deployed in the environment. Elasticsearch is deployed during the installation of the A-Ops framework. You need to modify related configurations to receive the push of data collected by Fluentd on multiple hosts. You can run yum install prometheus2 to download Prometheus and yum install rubygem-fluentd to download Fluentd.

      5.2.1 Installing Using the A-Ops Deployment Service

      You can also install adoctor-check-scheduler and adoctor-check-executor in this step.

      5.2.1.1 Editing the Task List

      Modify the deployment task list and enable the steps for the ZooKeeper, Kafka, Prometheus, node_exporter, and Fluentd components:

      ---
      step_list:
       zookeeper:
        enable: true
        continue: false
       kafka:
        enable: true
        continue: false
       prometheus:
        enable: true
        continue: false
       node_exporter:
         enable: true
         continue: false
       fluentd:
         enable: true
         continue: false
       ...
      
      5.2.1.2 Editing the Host List

      For details about the host configurations, see section 2.2.2 in the Deployment Management Manual.

      5.2.1.3 Editing the Variable List

      For details about the variable configurations, see section 2.2.2 in the Deployment Management Manual.

      5.2.1.4 Executing the Deployment Task

      For details about deployment task execution, see section 3 in the Deployment Management Manual.

      5.2.2 Manual Installation

      You can configure the openEuler Repo source to install the preceding dependent components. For details about the parameter configuration and cluster deployment , see the official documents of the softwares.

      5.3 Installing adoctor-check

      5.3.1 Manual Installation

      • Installing using the repo source mounted by Yum.

        Configure the Yum sources openEuler22.03 and openEuler22.03:Epol in the /etc/yum.repos.d/openEuler.repo file.

        [everything] # openEuler 22.03 officially released repository
        name=openEuler22.03
        baseurl=https://repo.openeuler.org/openEuler-22.03-LTS/everything/$basearch/ 
        enabled=1
        gpgcheck=1
        gpgkey=https://repo.openeuler.org/openEuler-22.03-LTS/everything/$basearch/RPM-GPG-KEY-openEuler
        
        [Epol] # openEuler 22.03:Epol officially released repository
        name=Epol
        baseurl=https://repo.openeuler.org/openEuler-22.03-LTS/EPOL/main/$basearch/ 
        enabled=1
        gpgcheck=1
        gpgkey=https://repo.openeuler.org/openEuler-22.03-LTS/OS/$basearch/RPM-GPG-KEY-openEuler
        

        Run the following commands to download adoctor-check and its dependencies.

        yum install adoctor-check-scheduler
        yum install adoctor-check-executor
        
      • Installing using the RPM packages. Download adoctor-check-scheduler, adoctor-check-executor, and aops-utils, and then run the following command to install the modules. (x.x-x indicates the version. Replace it with the actual version number.)

        rpm -ivh adoctor-check-scheduler-x.x-x.oe1.noarch.rpm
        rpm -ivh adoctor-check-executor-x.x-x.oe1.noarch.rpm
        

      5.3.2 Installing Using the A-Ops Deployment Service

      You can install this module with the dependent components in section 5.2.1.

      5.3.2.1 Editing the Task List

      Modify the deployment task list and enable the steps for adoctor-check-scheduler and adoctor-check-executor:

      ---
      step_list:
       ...
       adoctor_check_executor:
         enable: true
         continue: false
       adoctor_check_scheduler:
         enable: true
         continue: false
       ...
      
      5.3.2.2 Editing the Host List

      For details about the host configuration, see section 2.2.3.8 in the Deployment Management Manual.

      5.3.2.3 Editing the Variable List

      For details about the variable configuration, see section 2.2.3.8 in the Deployment Management Manual.

      5.3.2.4 Executing the Deployment Task

      See section 3 in the Deployment Management Manual to execute the deployment task.

      6 Defining the Exception Detection Rules

      Exception detection rules are defined based on the data that can be collected by the system.

      6.1 Exception Detection Rule File Format

      Exception detection rules must be imported in JSON format. Different users can import different detection rules. The detection rule list consists of check items. The following is an example of a check item:

      "check_items": [
          {
              "check_item": "check_item2",
              "data_list": [
                  {
                      "name": "data1",
                      "type": "kpi",
                      "label": {
                          "cpu": "1",
                          "mode": "irq"
                       }
                  }
              ],
              "condition": "$0>1",
              "plugin": "",
              "description": "data 1"
          },
          {
              "check_item": "check_item2",
              ...
          }
      ]
      

      An exception detection rule contains the following fields:

      FieldDescriptionTypeRemarks
      check_itemName of the exception detection rule.stringThe name of the exception detection rule imported by a user must be unique.
      data_listList of data required for the exception detection.list
      conditionDetection rule description.stringA string to describe the exception. It is a condition for determining that the data in data_list is abnormal.
      pluginName of the exception detection rule plugin.stringIf you use a custom exception detection rule plugin, set this parameter to the plugin name. If this parameter is left empty, the built-in expression plugin is used to parse the detection rule by default.
      descriptionDescription of the exception detection check item.string

      6.2 Description of the Exception Detection Data Items

      You need to describe the data required for calculation for each exception detection check item. The data is listed in data_list. The description of key fields in data_list is as follows:

      FieldDescriptionTypeRemarks
      nameName of the data item.stringThe data item name is unique within the data_list.
      typeType of the data.stringThe value can be kpi (indicator data) or log (log data).
      labelData label.dictKey-value pairs of data items recorded in the dictionary. You need to describe all labels of the data item. For details about the values of KPI data, see the metrics list of node_exporter. The label field is optional for log data.
      • KPI data

        Currently, KPI data is collected by node_exporter and stored in Prometheus. The data indicators that can be collected vary according to the node_exporter version. The supported data items are subject to the deployed node_exporter version.

        • Data categories

        The following categories are included:

        Data ItemData Category
        node_cpu*System CPU metrics
        node_disk*Disk I/O metrics
        node_filesystem*File system metrics
        node_memory*, node_zoneinfo*System memory usage metrics
        node_netstat*, node_network*, node_sockstat*Network metrics
        node_load*System load metrics
        node_systemd*Service metrics
        node_time*Time-related metrics
        process*Prometheus process metrics
        go_*Go environment metrics
        • Data labels

        The KPI data usually carries labels that represent its own characteristics. The labels are described in the form of a dictionary in the label field. A group of labels uniquely identifies a metric data record. The dimensions of the label may include the status, the feature, the environment definition, or the like of the monitoring data. An example is as follows:

        node_cpu_seconds_total{cpu="0",mode="idle"} 
        node_cpu_seconds_total{cpu="0",mode="iowait"}
        node_cpu_seconds_total{cpu="0",mode="irq"}
        node_cpu_seconds_total{cpu="0",mode="nice"} 
        node_cpu_seconds_total{cpu="0",mode="softirq"} 
        node_cpu_seconds_total{cpu="0",mode="steal"}
        node_cpu_seconds_total{cpu="0",mode="system"} 
        node_cpu_seconds_total{cpu="0",mode="user"} 
        ...
        node_cpu_seconds_total{cpu="95",mode="softirq"} 
        node_cpu_seconds_total{cpu="95",mode="steal"
        node_cpu_seconds_total{cpu="95",mode="system"} 
        node_cpu_seconds_total{cpu="95",mode="user"}
        

        Each of the data items above has the cpu and mode labels. The cpu label indicates the CPU core to which the data belongs. There are 96 CPU cores in total. The mode indicates the status of the CPU core. There are 8 status, such as idle (idle), user space process usage (user), and system space process usage (system).

        For details about the data items supported by node_exporter, see the node_exporter official document.

      • Log data

        Log data is collected by Fluentd and stored in Elasticsearch. Currently, the history logs and demesg logs of the system can be collected.

      6.3 Conditions of the Exception Detection Rules

      6.3.1 Exception Detection Expressions

      By default, the exception detection rules use expressions to create the description of the collected data. The standard expressions look like follows:

      <expression><operator><constant>
      <expression><operator><expression>
      

      expression is an expression that contains data variables. The data variables are described in data_list and are expressed following the order in data_list using $i macros.

      For example, if data_list is as follows:

      "data_list": 
      [{
       "name": "node_cpu_frequency_min_hertz",
       "type": "kpi",
      
      },
      {
       "name": "node_cpu_guest_seconds_total",
       "type": "kpi",
       "label": {
        "cpu": "1",
      
       }
      },
      {
       "name": "node_cpu_seconds_total",
       "type": "kpi"
      }]
      

      In this case, the condition should be similar to the following:

      $0 + $1 * $2 >100
      

      The meaning of the expression is as follows:

      node_cpu_frequency_min_hertz + node_cpu_guest_seconds_total * node_cpu_seconds_total > 100
      

      6.3.2 Operators

      Currently, the following operators are supported:

      PriorityOperatorDefinitionUsage
      1()Parenthesis(expression), function(parameter)
      2-Minus sign operator-expression
      +Plus sign operator+expression
      +!Logical NOT!expression
      ~Bitwise NOT~expression
      3/Divisionexpression1 / expression2
      *Multiplicationexpression1 * expression2
      %Remainder (modulo)expression1 % expression2
      4+Additionexpression1 + expression2
      -Subtractionexpression1 - expression2
      5<<Left shiftexpression1 << expression2
      >>Right shiftexpression1 >> expression2
      6>Greater thanexpression1 > expression2
      >=Greater than or equal toexpression1 >= expression2
      <Less thanexpression1 < expression2
      <=Less than or equal toexpression1 <= expression2
      7==Equal toexpression1 == expression2
      !=Not equal toexpression1 != expression2
      inIncluded by{key: value} in {ke1:value1, key2:value2, ...}. key and value can be expressions.
      notinNot included by{key: value} notin {ke1:value1, key2:value2, ...}. key and value can be expressions.
      8&Bitwise ANDexpression1 & expression2
      9^Bitwise XORexpression1 ^ expression2
      10|Bitwise ORexpression1 | expression2
      11&&Logical ANDexpression1 && expression2
      12||Logical ORexpression1 || expression2
      13?:Condition operatorjudge_expression ? expression_true : expression_false

      6.3.3 Functions

      You can use functions to calculate the data in the expressions. The expression format is as follows:

      # Compare the function calculation result of the specified data_name with constant:
      <data_name>.function(<parameter>)<operator><constant>
      # Compare the function calculation result of $0 with constant:
      function(<parameter>)<operator><constant>
      
      • Function parameters

        You can use time and quantity as filters in the functions.

        Function InvocationDescription
        max(30s)Maximum value of the data in the last 30 seconds.
        max(#5)Maximum value of the latest 5 data records.

        The unit of the time offset can be s (second), m (minute), or h (hour).

      • Function list

        Function NameDescriptionParameterData TypeRemarks
        count(sec|#num, pattern, operator)Counts the data records that meet the conditions within a specified range of data.sec|#num: Offset in time or quantity; <pattern>: Value to be compared; <operator>: Logical relationshipfloat, int,strcount(10m,1,">"): Quantity of data records greater than 1 in the last 10 minutes. count(#15,"error","=="): Quantity of data records that are equal to "error" in the latest 15 data records.
        max(sec|#num)Calculates the maximum value of a specified range of data.sec|#num: Offset in time or quantityfloat, intmax(#5): The maximum value of the latest 5 data records. max(5s): The maximum value of the data records in the last 5 seconds.
        min(sec|#num)Calculates the minimum value of a specified range of data.sec|#num: Offset in time or quantityfloat, intmin(#5): The minimum value of the latest 5 data records. min(5s): The minimum value of data records in the last 5 seconds.
        sum(sec|#num)Calculates the sum of a specified range of data.sec|#num: Offset in time or quantityfloat, intsum(#5): The sum of the latest 5 data records. sum(5s): The sum of data records in the last 5 seconds.
        avg(sec|#num)Calculates the average value of a specified range of data.sec|#num: Offset in time or quantityfloat, intavg(#5): The average value of the latest 5 data records. avg(5s): The average value of data records in the last 5 seconds.
        keyword(pattern)Determines whether the current data contains the specified keyword string.pattern: Keyword stringstrkeyword("error"): If the data contains the keyword "error", true is returned. Otherwise, false is returned.
        diff(sec|#num)Determines whether the current value is the same as a historical value.sec|#num: Offset in time or quantityfloat,int, str,true: The two values are the same. false: The two values are different.
        abschange(sec|#num)Calculates the absolute value of the difference between the current value and a history value.sec|#num: Offset in time or quantityfloat,intpre:1 cur:5 return: 4 pre:3 cur:1 return:2 pre:0 cur:-2.5 return 2.5
        change(sec|#num)Calculates the difference between the current value and a historical value.sec|#num: Offset in time or quantityfloat,intpre:1 cur:5 return: 4 pre:3 cur:1 return:-2 pre:0 cur:-2.5 return -2.5

      6.4 Custom Exception Detection Plugin

      • Customizing the exception detection plugin interface.

        The custom exception detection plugin needs to inherit the CheckRulePlugin class in adoctor_check_executor.check_rule_plugins.check_rule_plugin.

        InterfaceFunctionInput ValueReturn Value
        judge_conditionChecks for anomalies.index: Coordinate in the time series data segment; data_vector: Time series data segment; main_data_name: Main datatrue: Anomalies found. false: No exception found.
        Sets the Plugin Manager.plugin_manager: Plugin manager
      • Configuring the plugin file.

        Add your custom exception detection rule plugin in /etc/aops/check_rule_plugin.yml.

        ---
        plugin_list:
          - plugin_name: expression_rule_plugin
            file_name: expression_rule
            module_name: ExpressionCheckRule
          - plugin_name: my_plugin #Plugin name
           file_name: my_plugin #Plugin file
           module_name: Myplugin #Plugin class name
        
        1. Move my_plugin.py and the files to be referenced to the installation directory of adoctor_check_executor adoctor_check_executor/check_rule_plugins.
        2. Restart the adoctor-check-executor to load the custom plugins.

      7 Parameter Configuration

      7.1 Configuring the adoctor-check-scheduler

      By default, the adoctor-check-scheduler configuration file is stored as /etc/aops/check_scheduler.ini. Modify the file as required.

      vim /etc/aops/check_scheduler.ini
      
      ; Kafka producer parameters.
      [producer] 
      ; List of Kafka hosts, which is in HOST:PORT format. The default port number is 9092, and the default IP address is localhost.
      kafka_server_list = 192.168.1.1:9092
      ; Kafka API version. The default value is 0.11.5.
      api_version = 0.11.5
      ; The number of ACK messages received by the Kafka leader before the leader confirms a request is complete. The default value is 1.
      ; 0: The producer does not wait for the ACK message from the server. The message is immediately added to the socket buffer and set as already sent. There is not guarantee that the server has received the message. The offset returned for each message is always -1.
      ; 1: The leader writes the record only to the local host. The broker does not wait for responses from all followers.
      ; all: The producer waits for all synchronized replications to write the record.
      acks = 1
      ; If the value is greater than 0, the client resends the message when the message fails to be sent.
      retries = 3
      ; The time to wait before the producer retries a failed request, in milliseconds.
      retry_backoff_ms = 100
      
      ; Kafka consumer parameters.
      [consumer]
      ; List of Kafka hosts, which is in HOST:PORT format. The default port number is 9092, and the default IP address is localhost.
      kafka_server_list = 192.168.1.1:9092
      ; If the value is True, the consumer periodically commits the offset in the background.
      enable_auto_commit = False
      ; Policy for resetting the offset when the OffsetOutOfRange error occurs.
      ; earliest: Move the offset to the earliest available message.
      ; latest: Move the offset to the latest message.
      auto_offset_reset = earliest
      ; The time that the consumer spends waiting if there is no data in the buffer when the consumer performs the poll operation. If the value is 0, the current available records in the buffer are returned immediately. Otherwise, null is returned.
      timeout_ms = 5
      ; Maximum number of records returned by a single poll of the consumer.
      max_records = 3
      
      ; check scheduler configuration parameters.
      [check_scheduler]
      ; check scheduler listening address.
      ip = 127.0.0.1
      ; check scheduler listening port.
      port = 11112
      ; Maximum number of task retry times when no data is available or an internal error occurs during exception detection.
      max_retry_num = 3
      ; The cooldown period before the next retry if a retry fails.
      cool_down_time = 120
      ; Maximum number of task IDs recorded in the cache of the dead tasks, that is, the tasks whose number of retry time exceeds the maximum. The cache is cleared when the maximum number is reached.
      max_dead_retry_task = 10000
      ; Percentage of dead tasks to be cleared. By default, 50% of the cache is deleted when the number of recorded dead task IDs exceeds 10000.
      dead_retry_task_discount = 0.5
      ; Time step of the backward detection task (following the reversed timeline) after the check scheduler is started.
      backward_task_step = 60
      ; Interval for triggering the backward tasks.
      backward_task_interval = 30
      ; Interval for triggering the forward tasks (following the timeline).
      forward_task_interval = 30
      ; Maximum interval for triggering a new forward task.
      forward_max_task_step = 86400
      
      ; uWSGI configuration parameters.
      [uwsgi]
      ; Script for starting the uWSGI.
      wsgi-file=manage.py
      ; Log of the check uwsgi service .
      daemonize=/var/log/aops/uwsgi/check_scheduler.log
      ; Timeout interval of HTTP connections.
      http-timeout=600
      ; Server response timeout interval.
      harakiri=600
      

      Note that if the listening IP address and port of the check_scheduler service are changed, the IP address and port of check_scheduler in /etc/aops/system.ini must be changed accordingly.

      7.2 Configuring the adoctor-check-executor

      7.2.1 Basic Parameter Configuration

      The default configuration file of the adoctor-check-executor is stored as /etc/aops/check_executor.ini. Modify the file as needed:

      vim /etc/aops/check_executor.ini
      
      ; check scheduler configuration parameters.
      [consumer]
      ; List of Kafka hosts, which is in HOST:PORT format. The default port number is 9092, and the default IP address is localhost.
      kafka_server_list=192.168.1.1:9092
      ; If the value is True, the consumer periodically commits the offset in the background.
      enable_auto_commit=False
      ; Policy for resetting the offset when the OffsetOutOfRange error occurs.
      ; earliest: Move to the earliest available message.
      ; latest: Move to the latest message.
      auto_offset_reset=earliest
      ; The time that the consumer spends waiting if there is no data in the buffer when the consumer performs the poll operation. If the value is 0, the current available records in the buffer are returned immediately. Otherwise, null is returned.
      timeout_ms=5
      ; Maximum number of records returned by a single poll invocation of the consumer.
      max_records=3
      
      ; Kafka producer parameters.
      [producer]
      ; List of Kafka hosts, which is in HOST:PORT format. The default port number is 9092, and the default IP address is localhost.
      kafka_server_list=192.168.1.1:9092
      ; Kafka API version. The default value is 0.11.5.
      api_version=0.11.5
      ; The number of ACK messages received by the Kafka leader before the leader confirms a request is complete. The default value is 1.
      ; 0: The producer does not wait for the ACK message from the server. The message is immediately added to the socket buffer and set as already sent. There is not guarantee that the server has received the message. The offset returned for each message is always -1.
      ; 1: The leader writes the record only to the local host. The broker does not wait for responses from all followers.
      ; all: The producer waits for all synchronized replications to write the record.
      acks=1
      ; If the value is greater than 0, the client resends the message when the message fails to be sent.
      retries=3
      ; The time to wait before the producer retries a failed request, in milliseconds.
      retry_backoff_ms=100
      
      ; check executor configuration parameters.
      [executor]
      ; Relative path for loading the plugin. The relative path is referenced from the Python package of adoctor_check_executor.
      plugin_path = adoctor_check_executor.check_rule_plugins
      ; Number of consumers for which exception detection is performed.
      do_check_consumer_num = 2
      ; The sampling period for collecting data. It must be the same as the scraping interval in Prometheus configuration. Otherwise, the detection result will be inaccurate.
      sample_period = 15
      

      7.2.2 Custom Exception Detection Rule Plugin Configuration

      The rule plugins for exception detection are configured in /etc/aops/check_rule_plugin.yml. By default, the built-in expression_rule_plugin is configured. You can add custom plugins by referring to the configuration in section 6.4.

      8 Exception Detection Rule Management

      8.1 Importing Exception Detection Rules

      8.1.1 Using the Web Page

      1. Prepare the JSON file of exception detection rules.
      2. Open the Rule Management page and create exception detection rules.

      Importing exception detection rules

      8.1.2 Using the CLI

      1. Log in to the server where adoctor-check-scheduler is deployed.

      2. Prepare the JSON file of exception detection rules and copy the file to the server.

      3. Run the following command:

        adoctor checkrule [--action] add [--conf] [check.json] [--access_token] [token]
        

        Parameter description:

        • action: The value add indicates the add operation.
        • conf: Indicates the rule file to be imported.
        • access_token: Access token.
      4. For example, to import the check_rule.json file, run the following command:

        adoctor checkrule --action add --conf check_rule.json --access_token "1111"
        

      8.2 Deleting Exception Detection Rules

      8.2.1 Using the Web Page

      1. Open the Rule Management page.
      2. Delete the rule from the rule list.

      Delete an exception detection rule

      8.2.2 Using the CLI

      1. Log in to the server where adoctor-check-scheduler is deployed.

      2. Run the following command:

        adoctor checkrule [--action] delete [--check_items] [items] [--access_token] [token]
        

        Parameter description:

        • action: The value delete indicates the delete operation.
        • check_items: Indicates the check items to be deleted. Multiple check items are separated by commas (,). If this parameter is left empty, no check item is deleted.
        • access_token: Access token.
      3. For example, to delete the check item cpu_usage_overflow, run the following command:

        adoctor checkrule --action delete --check_items cpu_usage_overflow --access_token "1111"
        

      8.3 Obtaining Exception Detection Rules

      8.3.1 Using the Web Page

      1. Open the Exception Detection Rule List to view the exception detection rules.

      Obtaining exception detection rules

      8.3.2 Using the CLI

      1. Log in to the server where adoctor-check-scheduler is deployed.

      2. Run the following command:

        adoctor checkrule [--action] get [--check_items] [items] [--export] [path] [--sort] [{check_item}] [--direction] [{asc, desc}] [--page] [page] [--per_page] [per_page] [--access_token] [token]
        

        Parameter description:

        • action: The value get indicates the obtain operation.
        • check_items: Indicates the check items to be obtained. Multiple check items are separated by commas (,). If this parameter is left empty, all check items are obtained.
        • sort: Sorts exception detection results. The value can be start, end, or check_item. If the value is empty, the results are not sorted.
        • direction: Specifies the order of abnormal results. The value can be asc (ascending order) or desc (descending order). The default value is asc.
        • page: Current page number.
        • per_page: Number of records on each page. The maximum value is 50.
        • access_token: Access token.
      3. For example, to export the check item cpu_usage_overflow, obtain only the first page, display 30 items on each page, and save the results to the /tmp directory, run the following command:

        adoctor checkrule --action get --check_items cpu_usage_overflow --export /tmp --page 1 --per_page 30 --access_token "1111"
        

      9 Using the Exception Detection Functions

      9.1 Executing Exception Detection

      When started, the adoctor-check-scheduler and adoctor-check-executor services begin to perform both real-time and historical exception detection. The historical exception detection is performed backwards in timeline based on the configured time step.

      9.2 Obtaining Exception Detection Results

      9.2.1 Using the Web Page

      1. View the Exception Detection Records list.

        Obtaining exception detection results

      2. View the the Exception Detection Result Statistics.

        Exception detection result statistics

      9.2.2 Using the CLI

      1. Log in to the server where adoctor-check-scheduler is deployed.

      2. Ensure that adoctor-cli has been installed.

      3. Run the following command:

        adoctor check [--check_items] [items] [--host_list] [list] [--start] [time1] [--end] [time2] [--sort] [{check_item, start, end}] [--direction] [{asc, desc}] [--page] [page] [--per_page] [per_page] [--access_token] [token]
        

        Parameter description:

        • check_items: Specifies the check items. Multiple check items are separated by commas (,). If this parameter is left empty, all check items are specified.
        • host_list: Specifies the list of hosts. Multiple hosts are separated by commas (,). If this parameter is left empty, all hosts are specifies.
        • start: Specifies the start time of exception detection. The format is similar to 20200101-11:11:11. If this parameter is not specified, the start time is one hour before the end time by default.
        • end: Specifies the end time of exception detection. If this parameter is not specified, the current time is used by default.
        • sort: Sorts exception detection results. The value can be start, end, or check_item. If the value is empty, the results are not sorted.
        • direction: Specifies the order of abnormal results. The value can be asc (ascending order) or desc (descending order). The default value is asc.
        • page: Current page number.
        • per_page: Number of records on each page. The maximum value is 50.
        • access_token: Access token.
      4. For example, to obtain the exception detection results of check_item1 on host1 from 2021-9-8 11:24:10 to 2021-9-9 09:00:00 on the first page only, display 30 results on each page, and sort the results in ascending order by check_item, run the following command:

        adoctor check --check_items check_item1 --host_list host1 --start 20210908-11:24:10 --end 20210909-09:00:00 --sort check_item --direction asc --page 1 --per_page 30 --access_token "1111"
        

      10 Viewing and Dumping Logs

      1. adoctor-check-scheduler logs
        • Path: /var/log/aops/uwsgi/check_scheduler.log
        • Function: Records internal running logs of the code to facilitate fault locating.
        • Permission: The path permission is 755. The log file permission is 644. Common users can view the logs.
        • Debug log switch: log_level in the /etc/aops/system.ini file.
      2. adoctor-check-executor logs
        • Path: /var/log/aops/aops.log
        • Function: Records internal running logs of the code to facilitate fault locating.
        • Permission: The path permission is 755. The log file permission is 600. Common users can view the logs.
        • Toggling Debug log: log_level in the /etc/aops/system.ini file.

      Bug Catching

      Buggy Content

      Bug Description

      Submit As Issue

      It's a little complicated....

      I'd like to ask someone.

      PR

      Just a small problem.

      I can fix it online!

      Bug Type
      Specifications and Common Mistakes

      ● Misspellings or punctuation mistakes;

      ● Incorrect links, empty cells, or wrong formats;

      ● Chinese characters in English context;

      ● Minor inconsistencies between the UI and descriptions;

      ● Low writing fluency that does not affect understanding;

      ● Incorrect version numbers, including software package names and version numbers on the UI.

      Usability

      ● Incorrect or missing key steps;

      ● Missing prerequisites or precautions;

      ● Ambiguous figures, tables, or texts;

      ● Unclear logic, such as missing classifications, items, and steps.

      Correctness

      ● Technical principles, function descriptions, or specifications inconsistent with those of the software;

      ● Incorrect schematic or architecture diagrams;

      ● Incorrect commands or command parameters;

      ● Incorrect code;

      ● Commands inconsistent with the functions;

      ● Wrong screenshots.

      Risk Warnings

      ● Lack of risk warnings for operations that may damage the system or important data.

      Content Compliance

      ● Contents that may violate applicable laws and regulations or geo-cultural context-sensitive words and expressions;

      ● Copyright infringement.

      How satisfied are you with this document

      Not satisfied at all
      Very satisfied
      Submit
      Click to create an issue. An issue template will be automatically generated based on your feedback.
      Bug Catching
      编组 3备份