Checking the Container Health Status

Scenarios

In the production environment, bugs are inevitable in applications provided by developers or services provided by platforms. Therefore, a management system is indispensable for periodically checking and repairing applications. The container health check mechanism adds a user-defined health check function for containers. When a container is created, the –health-cmd option is configured so that commands are periodically executed in the container to monitor the health status of the container based on return values.

Configuration Methods

Configurations during container startup:

isula run -itd --health-cmd "echo iSulad >> /tmp/health_check_file || exit 1" --health-interval 5m --health-timeout 3s --health-exit-on-unhealthy  busybox bash

The configurable options are as follows:

  • –health-cmd: This option is mandatory. If 0 is returned after a command is run in a container, the command execution succeeds. If a value other than 0 is returned, the command execution fails.
  • –health-interval: interval between two consecutive command executions. The default value is 30s. The value ranges from 1s to the maximum value of Int64 (unit: nanosecond). If the input parameter is set to 0s, the default value is used.
  • –health-timeout: maximum duration for executing a single check command. If the execution times out, the command execution fails. The default value is 30s. The value ranges from 1s to the maximum value of Int64 (unit: nanosecond). If the input parameter is set to 0s, the default value is used. Only containers whose runtime is of the LCR type are supported.
  • –health-start-period: container initialization time. The default value is 0s. The value ranges from 1s to the maximum value of Int64 (unit: nanosecond).
  • –health-retries: maximum number of retries for the health check. The default value is 3. The maximum value is the maximum value of Int32.
  • –health-exit-on-unhealthy: specifies whether to kill a container when it is unhealthy. The default value is false.

Check Rules

  1. After a container is started, the container status is health:starting.
  2. After the period specified by start-period, the cmd command is periodically executed in the container at the interval specified by interval. That is, after the command is executed, the command will be executed again after the specified period.
  3. If the cmd command is successfully executed within the time specified by timeout and the return value is 0, the check is successful. Otherwise, the check fails. If the check is successful, the container status changes to health:healthy.
  4. If the cmd command fails to be executed for the number of times specified by retries, the container status changes to health:unhealthy, and the container continues the health check.
  5. When the container status is health:unhealthy, the container status changes to health:healthy if a check succeeds.
  6. If –exit-on-unhealthy is set, and the container exits due to reasons other than being killed (the returned exit code is 137), the health check takes effect only after the container is restarted.
  7. When the cmd command execution is complete or times out, Docker daemon will record the start time, return value, and standard output of the check to the configuration file of the container. A maximum of five records can be recorded. In addition, the configuration file of the container stores health check parameters.
  8. When the container is running, the health check status is written into the container configurations. You can run the isula inspect command to view the status.
"Health": {
    "Status": "healthy",
    "FailingStreak": 0,
    "Log": [
        {
            "Start": "2018-03-07T07:44:15.481414707-05:00",
            "End": "2018-03-07T07:44:15.556908311-05:00",
            "ExitCode": 0,
            "Output": ""
        },
        {
            "Start": "2018-03-07T07:44:18.557297462-05:00",
            "End": "2018-03-07T07:44:18.63035891-05:00",
            "ExitCode": 0,
            "Output": ""
        },
        ......
}

Usage Restrictions

  • A maximum of five health check status records can be stored in a container. The last five records are saved.
  • If health check parameters are set to 0 during container startup, the default values are used.
  • After a container with configured health check parameters is started, if iSulad daemon exits, the health check is not executed. After iSulad daemon is restarted, the health status of the running container changes to starting. Afterwards, the check rules are the same as above.
  • If the health check fails for the first time, the health check status will not change from starting to unhealthy until the specified number of retries (–health-retries) is reached, or to healthy until the health check succeeds.
  • The health check function of containers whose runtime is of the Open Container Initiative (OCI) type needs to be improved. Only containers whose runtime is of the LCR type are supported.

有奖捉虫

“有虫”文档片段

存在的问题

提交类型 issue
有点复杂...
找人问问吧。
PR
小问题,全程线上修改...
一键搞定!
问题类型
规范和低错类

● 错别字或拼写错误;标点符号使用错误;

● 链接错误、空单元格、格式错误;

● 英文中包含中文字符;

● 界面和描述不一致,但不影响操作;

● 表述不通顺,但不影响理解;

● 版本号不匹配:如软件包名称、界面版本号;

易用性

● 关键步骤错误或缺失,无法指导用户完成任务;

● 缺少必要的前提条件、注意事项等;

● 图形、表格、文字等晦涩难懂;

● 逻辑不清晰,该分类、分项、分步骤的没有给出;

正确性

● 技术原理、功能、规格等描述和软件不一致,存在错误;

● 原理图、架构图等存在错误;

● 命令、命令参数等错误;

● 代码片段错误;

● 命令无法完成对应功能;

● 界面错误,无法指导操作;

风险提示

● 对重要数据或系统存在风险的操作,缺少安全提示;

内容合规

● 违反法律法规,涉及政治、领土主权等敏感词;

● 内容侵权;

您对文档的总体满意度

非常不满意
非常满意
创Issue赢奖品
根据您的反馈,会自动生成issue模板。您只需点击按钮,创建issue即可。