Using gala-gopher

As a data collection module, gala-gopher provides OS-level monitoring capabilities, supports dynamic probe installation and uninstallation, and integrates third-party probes in a non-intrusive manner to quickly expand the monitoring scope.

This chapter describes how to deploy and use the gala-gopher service.

Installation

Mount the repositories.

basic
[oe-22.03-lts-sp3-everything] # openEuler 22.03-LTS-SP3 officially released repository
name=oe-2203-lts-sp3-everything
baseurl=http://repo.openeuler.org/openEuler-22.03-LTS-SP3/everything/x86_64/
enabled=1
gpgcheck=0
priority=1

[oe-22.03-lts-sp3-epol-update] # openEuler 22.03-LTS-SP3 Update officially released repository
name=oe-22.03-lts-sp3-epol-update
baseurl=http://repo.openeuler.org/openEuler-22.03-LTS-SP3/EPOL/update/main/x86_64/
enabled=1
gpgcheck=0
priority=1

[oe-22.03-lts-sp3-epol-main] # openEuler 22.03-LTS-SP3 EPOL officially released repository
name=oe-22.03-lts-sp3-epol-main
baseurl=http://repo.openeuler.org/openEuler-22.03-LTS-SP3/EPOL/main/x86_64/
enabled=1
gpgcheck=0
priority=1

Install gala-gopher.

bash
# yum install gala-gopher

Configuration

Configuration Description

The configuration file of gala-gopher is /opt/gala-gopher/gala-gopher.conf. The configuration items in the file are described as follows (the parts that do not need to be manually configured are not described):

The following configurations can be modified as required:

  • global: Global configuration for gala-gopher
    • log_file_name: Log file name for gala-gopher
    • log_level: Log level for gala-gopher (currently unavailable)
    • pin_path: Storage path for eBPF probe shared maps (default configuration recommended)
  • metric: Metrics output configuration
    • out_channel: Output channels for metrics (web_server|kafka), leave blank to disable
    • kafka_topic: Kafka topic configuration (when using kafka channel)
  • event: Event output configuration
    • out_channel: Output channels for events (logs|kafka), leave blank to disable
    • kafka_topic: Kafka topic configuration (when using kafka channel)
  • meta: Metadata output configuration
    • out_channel: Output channels for metadata (logs|kafka), leave blank to disable
    • kafka_topic: Kafka topic configuration (when using kafka channel)
  • imdb: Cache configuration specifications
    • max_tables_num: Maximum number of cache tables (each meta file in /opt/gala-gopher/meta corresponds to one table)
    • max_records_num: Maximum records per table (each probe typically generates at least one observation record per cycle)
    • max_metrics_num: Maximum metrics per observation record
    • record_timeout: Record expiration time (records older than this will be deleted, in seconds)
  • web_server: Web server configuration
    • port: Listening port
  • kafka: Kafka output configuration
    • kafka_broker: Kafka server IP address and port
  • logs: Log output configuration
    • metric_dir: Directory for metrics logs
    • event_dir: Directory for event logs
    • meta_dir: Directory for metadata logs
    • debug_dir: Directory for runtime logs
  • probes: Native probe configuration
    • name: Probe name (must match native probe name, such as "example" for example.probe)
    • param: Startup parameters (see Startup Parameters)
    • switch: Probe status (on/off)
  • extend_probes: Third-party probe configuration
    • name: Probe name
    • command: Startup command
    • param: Startup parameters (see Startup Parameters)
    • start_check: Condition check for auto mode (determines whether to start probe)
    • switch: Probe status (on/off/auto; auto mode uses "start_check" result)

Startup Parameters

ParameterDescription
-lEnable/disable abnormal event reporting.
-tSampling interval (seconds), defaulting to 5s.
-TDelay threshold (ms), defaulting to 0 ms.
-JJitter threshold (ms), defaulting to 0 ms.
-OOffline threshold (ms), defaulting to 0 ms.
-DPacket loss threshold, defaulting to 0 packets.
-Ftask means to filter by task_whitelist.conf; [pid] means to monitor specific process only.
-PDetection program scope for each probe (affects tcpprobe/taskprobe).
-UMax resource utilization threshold, defaulting to 0%.
-LMin resource utilization threshold, defaulting to 0%.
-cProbe (tcp) client port identification, defaulting to disabled (0).
-NProcess observed by the probe (ksliprobe), defaulting to NULL.
-pTarget process binary path (such as -p /user/local/sbin/nginx), defaulting to NULL.
-wApplication filter path (such as -w /opt/gala-gopher/task_whitelist.conf), defaulting to NULL (no filter).
-nNIC for tc ebpf (defaulting to all interfaces), such as -n eth0.

Configuration File Example

  • Select the data output channels.

    yaml
    metric =
    {
        out_channel = "web_server";
        kafka_topic = "gala_gopher";
    };
    
    event =
    {
        out_channel = "kafka";
        kafka_topic = "gala_gopher_event";
    };
    
    meta =
    {
        out_channel = "kafka";
        kafka_topic = "gala_gopher_metadata";
    };
  • Configure Kafka and Web Server.

    yaml
    web_server =
    {
        port = 8888;
    };
    
    kafka =
    {
        kafka_broker = "<Kafka server IP address>:9092";
    };
  • Select the probe to be enabled. The following is an example.

    yaml
    probes =
    (
        {
            name = "system_infos";
            param = "-t 5 -w /opt/gala-gopher/task_whitelist.conf -l warn -U 80";
            switch = "on";
        },
    );
    extend_probes =
    (
        {
            name = "tcp";
            command = "/opt/gala-gopher/extend_probes/tcpprobe";
            param = "-l warn -c 1 -P 7";
            switch = "on";
        }
    );

Start

After the configuration is complete, start gala-gopher.

bash
# systemctl start gala-gopher.service

Query the status of the gala-gopher service.

bash
# systemctl status gala-gopher.service

If the following information is displayed, the service is started successfully: Check whether the enabled probe is started. If the probe thread does not exist, check the configuration file and gala-gopher run log file.

Note: The root permission is required for deploying and running gala-gopher.

How to Use

Deployment of External Dependent Software

As shown in the preceding figure, the green parts are external dependent components of gala-gopher. gala-gopher outputs metric data to Prometheus, metadata and abnormal events to Kafka. gala-anteater and gala-spider in gray rectangles obtain data from Prometheus and Kafka.

Note: Obtain the installation packages of Kafka and Prometheus from the official websites.

Output Data

  • Metric

    Prometheus Server has a built-in Express Browser UI. You can use PromQL statements to query metric data. For details, see Using the expression browser in the official document. The following is an example.

    If the specified metric is gala_gopher_tcp_link_rcv_rtt, the metric data displayed on the UI is as follows:

    basic
    gala_gopher_tcp_link_rcv_rtt{client_ip="x.x.x.165",client_port="1234",hostname="openEuler",instance="x.x.x.172:8888",job="prometheus",machine_id="1fd3774xx",protocol="2",role="0",server_ip="x.x.x.172",server_port="3742",tgid="1516"} 1
  • Metadata

    You can directly consume data from the Kafka topic gala_gopher_metadata. The following is an example.

    bash
    # Input request
    ./bin/kafka-console-consumer.sh --bootstrap-server x.x.x.165:9092 --topic gala_gopher_metadata
    # Output data
    {"timestamp": 1655888408000, "meta_name": "thread", "entity_name": "thread", "version": "1.0.0", "keys": ["machine_id", "pid"], "labels": ["hostname", "tgid", "comm", "major", "minor"], "metrics": ["fork_count", "task_io_wait_time_us", "task_io_count", "task_io_time_us", "task_hang_count"]}
  • Abnormal events

    You can directly consume data from the Kafka topic gala_gopher_event. The following is an example.

    bash
    # Input request
    ./bin/kafka-console-consumer.sh --bootstrap-server x.x.x.165:9092 --topic gala_gopher_event
    # Output data
    {"timestamp": 1655888408000, "meta_name": "thread", "entity_name": "thread", "version": "1.0.0", "keys": ["machine_id", "pid"], "labels": ["hostname", "tgid", "comm", "major", "minor"], "metrics": ["fork_count", "task_io_wait_time_us", "task_io_count", "task_io_time_us", "task_hang_count"]}