System Halt Caused by Audit Logs Consuming Drive Space

Context

An unplanned system halt occurred, necessitating diagnosis.

Symptom

Audit service logs confirmed it initiated the halt, contrary to configured log rotation settings. The audit log directory contained excessive files occupying full drive capacity:

txt
-r--------. 1 root root 6291639  May 14 04:10 audit.log.968
-r--------. 1 root root 6291629  May 14 03:28 audit.log.969
-r--------. 1 root root 6291630  May 14 02:45 audit.log.970
-r--------. 1 root root 6291627  May 14 02:03 audit.log.971
-r--------. 1 root root 6291546  May 14 01:20 audit.log.972
-r--------. 1 root root 6291689  May 14 00:38 audit.log.973
-r--------. 1 root root 6291705  May 13 23:57 audit.log.974
-r--------. 1 root root 6291528  May 13 23:14 audit.log.975
...

Possible Causes

The issue appears to stem from failed audit log rotation. The auditd.conf settings is as follows:

txt
...
max_log_file = 6 // 6 MB log file size limit
num_logs = 5     // Maximum of 5 log files
...
admin_space_left = 50
admin_space_left_action = halt // System halts if space drops below the threshold
...

The configuration allows only 5 log files, but the actual count exceeded this limit. The system halt was expected behavior.

Message logs revealed:

txt
...
2024-06-09T04:59:46.424433+08:00 localhost auditd[21699]: Audit daemon rotating long files with keep option
...

The logs show rotation used the "keep option." The auditd.conf setting is as follows:

txt
...
max_log_file_action = keep_logs // Similar to rotate but overrides num_logs setting.
...

The root cause is that max_log_file_action = keep_logs disabled the num_logs = 5 limit, allowing logs to accumulate.

Solution

Change max_log_file_action to rotate.