Chapter 28.1 — TLPI

Chapter 28.1 — TLPI
Process Accounting
How the Linux kernel silently tracks every process that ever ran on your system
acct()
System Call
acct_v3
New Format
comp_t
Time Encoding

What is Process Accounting?

Every time a process terminates on Linux, if process accounting is enabled, the kernel writes a small record to a special file. This record contains useful statistics: how much CPU the process used, how long it ran, who ran it, and why it ended. Think of it as a flight recorder for processes.

Originally used to bill users on shared UNIX machines for CPU usage, it is today useful for auditing, debugging, and monitoring processes that no parent is watching.

How Process Accounting Works

Admin enables
accounting
acct("/var/log/pacct")
Kernel watches
all processes
on the system
Process
terminates
(any reason)
Kernel appends
acct record
to accounting file

⚙️ Kernel Configuration Required: Process accounting is an optional kernel feature. It must be compiled in with CONFIG_BSD_PROCESS_ACCT=y. On most distros this is already enabled.

The acct() System Call

acct() is the system call that enables or disables process accounting. Only a privileged process (with CAP_SYS_PACCT capability) can call it.

/* Prototype */
#include <unistd.h>

int acct(const char *acctfile);
/* Returns 0 on success, -1 on error */

/* To ENABLE: pass a pathname to an existing file */
acct("/var/log/pacct");

/* To DISABLE: pass NULL */
acct(NULL);

Example 1 — Enable and Disable Process Accounting

This program takes an optional filename argument. If given, it enables accounting to that file. If no argument, it disables accounting.

/* acct_toggle.c — Enable or disable process accounting */
#define _BSD_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

int main(int argc, char *argv[])
{
    if (argc > 2) {
        fprintf(stderr, "Usage: %s [file]\n", argv[0]);
        exit(EXIT_FAILURE);
    }

    /* argv[1] is the accounting file path, or NULL to disable */
    if (acct(argc > 1 ? argv[1] : NULL) == -1) {
        perror("acct");
        exit(EXIT_FAILURE);
    }

    printf("Process accounting %s\n",
           (argc == 1) ? "disabled" : "enabled");

    exit(EXIT_SUCCESS);
}
Run it (as root):
touch /var/log/pacct
sudo ./acct_toggle /var/log/pacct — enables
sudo ./acct_toggle — disables

The acct Structure — What Gets Recorded

For every process that terminates, the kernel fills in this structure and appends it to the accounting file.

/* From <sys/acct.h> */
typedef u_int16_t comp_t;   /* Compressed clock ticks — see below */

struct acct {
    char      ac_flag;      /* Accounting flags (AFORK, ASU, AXSIG, ACORE) */
    u_int16_t ac_uid;       /* User ID of the process */
    u_int16_t ac_gid;       /* Group ID of the process */
    u_int16_t ac_tty;       /* Controlling terminal (0 if daemon) */
    u_int32_t ac_btime;     /* Start time (seconds since the Epoch) */
    comp_t    ac_utime;     /* User CPU time (clock ticks) */
    comp_t    ac_stime;     /* System (kernel) CPU time (clock ticks) */
    comp_t    ac_etime;     /* Elapsed real time (clock ticks) */
    comp_t    ac_mem;       /* Average memory usage (kilobytes) */
    comp_t    ac_io;        /* Bytes read/written (unused in Linux) */
    comp_t    ac_rw;        /* Blocks read/written (unused) */
    comp_t    ac_minflt;    /* Minor page faults (Linux-specific) */
    comp_t    ac_majflt;    /* Major page faults (Linux-specific) */
    comp_t    ac_swaps;     /* Number of swaps (unused; Linux-specific) */
    u_int32_t ac_exitcode;  /* Process termination status */
    char      ac_comm[17];  /* Command name (last execve basename) */
    char      ac_pad[10];   /* Padding for future use */
};

ac_flag Bit Values

Flag Letter Meaning
AFORK F Process was created by fork() but never called exec() before terminating
ASU S Process used superuser (root) privileges at some point
AXSIG X Process was terminated by a signal (not on all implementations)
ACORE C Process dumped core (not on all implementations)

Understanding comp_t (Compressed Clock Ticks)

Time fields (ac_utime, ac_stime, ac_etime) use the comp_t type — a compressed floating-point format stored in 16 bits. It has a 13-bit mantissa and a 3-bit base-8 exponent.

3-bit Exponent (base-8) 13-bit Mantissa
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Value = mantissa × 8exponent

Example 2 — Decode comp_t to Seconds

/* compt_decode.c — Convert comp_t to actual time in seconds */
#include <stdio.h>
#include <stdint.h>
#include <sys/acct.h>
#include <unistd.h>

/* Convert comp_t value to long long clock ticks */
static long long comptToLL(comp_t ct)
{
    const int EXP_BITS  = 3;            /* 3-bit base-8 exponent */
    const int MANT_BITS = 13;           /* 13-bit mantissa */
    const int MANT_MASK = (1 << MANT_BITS) - 1;  /* 0x1FFF */

    long long mantissa = ct & MANT_MASK;
    long long exponent = (ct >> MANT_BITS) & ((1 << EXP_BITS) - 1);

    /* Multiply mantissa by 8^exponent = left-shift by (exponent * 3) bits */
    return mantissa << (exponent * 3);
}

int main(void)
{
    /* Example: read a few records and print CPU time in seconds */
    FILE *fp = fopen("/var/log/pacct", "rb");
    if (!fp) { perror("fopen"); return 1; }

    struct acct ac;
    long clk_tck = sysconf(_SC_CLK_TCK);  /* Usually 100 ticks/sec */

    printf("%-16s %8s %8s %8s\n",
           "Command", "User(s)", "Sys(s)", "Elapsed(s)");
    printf("%-16s %8s %8s %8s\n",
           "-------", "------", "-----", "--------");

    while (fread(&ac, sizeof(ac), 1, fp) == 1) {
        double utime = (double) comptToLL(ac.ac_utime) / clk_tck;
        double stime = (double) comptToLL(ac.ac_stime) / clk_tck;
        double etime = (double) comptToLL(ac.ac_etime) / clk_tck;

        printf("%-16.16s %8.2f %8.2f %8.2f\n",
               ac.ac_comm, utime, stime, etime);
    }

    fclose(fp);
    return 0;
}

/proc/sys/kernel/acct — Disk Space Control

Accounting files can grow rapidly. Linux provides a virtual file to suspend accounting when disk space is low.

Parameter Default Meaning
high-water 4% Resume accounting when free disk ≥ this percent
low-water 2% Suspend accounting when free disk < this percent
frequency 30s How often to check free disk space (seconds)
/* Read and display current /proc/sys/kernel/acct settings */
#include <stdio.h>

int main(void)
{
    FILE *fp = fopen("/proc/sys/kernel/acct", "r");
    if (!fp) { perror("fopen"); return 1; }

    int high, low, freq;
    fscanf(fp, "%d %d %d", &high, &low, &freq);
    fclose(fp);

    printf("Process accounting disk control:\n");
    printf("  high-water: %d%% (resume accounting above this)\n", high);
    printf("  low-water:  %d%% (suspend accounting below this)\n", low);
    printf("  frequency:  %d seconds between checks\n", freq);
    return 0;
}

Process Accounting Version 3 (acct_v3)

From kernel 2.6.8, Linux introduced an improved accounting format enabled with CONFIG_BSD_PROCESS_ACCT_V3. It fixes two major limitations of the original format.

Feature Original acct acct_v3
Process ID ❌ Not recorded ✅ ac_pid added
Parent Process ID ❌ Not recorded ✅ ac_ppid added
User/Group ID size 16-bit only ✅ 32-bit (Linux 2.4+)
Elapsed time type comp_t (limited range) ✅ float (longer times)
Version field ❌ None ✅ ac_version = 3
/* acct_v3 structure — available from kernel 2.6.8 onwards */
struct acct_v3 {
    char      ac_flag;       /* Accounting flags */
    char      ac_version;    /* Always 3 for this format */
    u_int16_t ac_tty;        /* Controlling terminal */
    u_int32_t ac_exitcode;   /* Termination status */
    u_int32_t ac_uid;        /* 32-bit user ID (fixed from 16-bit) */
    u_int32_t ac_gid;        /* 32-bit group ID */
    u_int32_t ac_pid;        /* Process ID (new in v3) */
    u_int32_t ac_ppid;       /* Parent process ID (new in v3) */
    u_int32_t ac_btime;      /* Start time (time_t) */
    float     ac_etime;      /* Elapsed time as float (wider range) */
    comp_t    ac_utime;      /* User CPU time */
    comp_t    ac_stime;      /* System CPU time */
    comp_t    ac_mem;        /* Average memory usage (KB) */
    comp_t    ac_io;         /* Bytes read/written (unused) */
    comp_t    ac_rw;         /* Blocks read/written (unused) */
    comp_t    ac_minflt;     /* Minor page faults */
    comp_t    ac_majflt;     /* Major page faults */
    comp_t    ac_swaps;      /* Number of swaps (unused) */
    char      ac_comm[16];   /* Command name */
};

/* Check which version we are reading */
void check_version(const char *file)
{
    /* Read first byte — if it's 3, it's a v3 file */
    FILE *fp = fopen(file, "rb");
    if (!fp) return;

    struct acct_v3 rec;
    if (fread(&rec, sizeof(rec), 1, fp) == 1) {
        printf("Version: %d\n", (unsigned char)rec.ac_version);
        printf("PID: %u, PPID: %u\n", rec.ac_pid, rec.ac_ppid);
    }
    fclose(fp);
}

⚠️ Important Notes About Accounting Records

  • Records are ordered by termination time, not start time
  • If the system crashes, no record is written for running processes
  • For multi-threaded processes (kernel ≥ 2.6.10): one record written when the last thread exits
  • Under older LinuxThreads: one record written per thread
  • Accounting is not in SUSv3 — format varies across UNIX systems

Interview Questions

Q1. What is process accounting and when is it useful?
Process accounting is a Linux kernel feature that writes a record to a file whenever a process terminates. The record contains CPU time, elapsed time, user ID, and command name. It is useful for auditing which processes ran, charging users for resource usage on shared systems, and debugging processes that have no parent watching them.
Q2. What privilege is required to call acct()? Why?
The calling process must have the CAP_SYS_PACCT capability (effectively root). This is required because enabling accounting writes to a system-wide file and can consume disk space affecting the whole system. Unprivileged users should not be able to redirect this file or fill the disk.
Q3. What is comp_t and why is it used instead of a regular integer?
comp_t is a 16-bit compressed floating-point type with a 13-bit mantissa and 3-bit base-8 exponent. It is used for time fields to save space while representing a wide range of values. A regular 32-bit integer would be needed for the same range. The maximum comp_t value is (2¹³ − 1) × 8⁷ which exceeds what a 32-bit unsigned can hold.
Q4. What does the AFORK flag in ac_flag mean?
AFORK is set when a process was created by fork() but terminated without ever calling exec(). This means the process ran only the code it inherited from its parent, never replacing its program image. This is common for child processes that do a small task and exit quickly without loading a new program.
Q5. What improvements does acct_v3 bring over the original acct structure?
acct_v3 adds: (1) ac_pid and ac_ppid fields so you know exactly which process generated the record, (2) 32-bit uid/gid instead of 16-bit fixing overflow for large user IDs introduced in Linux 2.4, (3) elapsed time as float instead of comp_t allowing longer times to be recorded, and (4) an ac_version field for format identification.
Q6. Why are accounting records ordered by termination time, not start time?
Records are written to the file only when a process terminates, so they are naturally appended in termination order. The start time is stored inside the record (as ac_btime), but the record itself cannot be written until the process ends. If you need start-time ordering, you must read all records and sort them by ac_btime.
Q7. What happens to accounting records if the system crashes?
No accounting record is written for any process still running at crash time. The record is only written at termination, so a hard crash loses all in-flight process data. This is one reason accounting is supplemented by other monitoring tools in production systems.

Leave a Reply

Your email address will not be published. Required fields are marked *