The struct rusage
When you call getrusage(), the kernel fills a struct rusage with resource usage statistics. This structure has 17 fields. On Linux, only 9 of them are actually populated — the rest are legacy fields inherited from BSD UNIX and exist purely for source-code compatibility. Understanding which fields are populated is essential: reading a zero from an unused field and treating it as meaningful data leads to incorrect conclusions.
The Complete Structure Definition
/* From <sys/resource.h> */
struct rusage {
struct timeval ru_utime; /* User CPU time used [USED] */
struct timeval ru_stime; /* System (kernel) CPU time used [USED] */
long ru_maxrss; /* Max resident set size (KB) [USED since 2.6.32] */
long ru_ixrss; /* Shared text size (KB-sec) [UNUSED on Linux] */
long ru_idrss; /* Unshared data size (KB-sec) [UNUSED on Linux] */
long ru_isrss; /* Unshared stack size (KB-sec) [UNUSED on Linux] */
long ru_minflt; /* Minor (soft) page faults [USED] */
long ru_majflt; /* Major (hard) page faults [USED] */
long ru_nswap; /* Swaps out of physical memory [UNUSED on Linux] */
long ru_inblock; /* Block input ops via filesystem [USED since 2.6.22] */
long ru_oublock; /* Block output ops via filesystem [USED since 2.6.22] */
long ru_msgsnd; /* IPC messages sent [UNUSED on Linux] */
long ru_msgrcv; /* IPC messages received [UNUSED on Linux] */
long ru_nsignals;/* Signals received [UNUSED on Linux] */
long ru_nvcsw; /* Voluntary context switches [USED since 2.6] */
long ru_nivcsw; /* Involuntary context switches [USED since 2.6] */
};
Fields Used on Linux — Detailed Explanation
ru_utime — User-Mode CPU Time
A struct timeval containing the total CPU time the process has consumed while executing in user mode. User mode is where your application code runs — the loops, calculations, string operations, and library functions that do not involve a system call transition into the kernel.
struct timeval has two members: tv_sec (whole seconds) and tv_usec (microseconds, 0–999999). To convert to a single floating-point number: tv_sec + tv_usec / 1e6.
Practical meaning: High ru_utime means your code is doing a lot of computation. Compare with ru_stime to understand the ratio of application work vs kernel work.
ru_stime — System (Kernel) CPU Time
A struct timeval containing the total CPU time consumed in kernel mode on behalf of this process. Every time your process makes a system call — open(), read(), write(), malloc()’s use of sbrk(), fork(), etc. — the kernel runs code on your behalf, and that time is charged to ru_stime.
Practical meaning: High ru_stime relative to ru_utime means your process is I/O-heavy or makes many system calls. If ru_stime dominates, consider reducing system call frequency by buffering I/O or batching operations.
ru_maxrss — Peak Resident Set Size
The maximum resident set size in kilobytes. This is the largest amount of physical RAM pages the process had mapped into physical memory at any single point during its lifetime. It is a peak value, not a current value — it never decreases.
Linux availability: Only populated since kernel 2.6.32. On older kernels it is always 0. On 64-bit Linux the value is in kilobytes; on some BSD systems it is in bytes — be careful when porting.
Special case for RUSAGE_CHILDREN: This field returns the maximum RSS across all descendants, not the sum. So if child A peaked at 50 MB and child B peaked at 80 MB, you will see 80 MB here.
ru_minflt — Minor (Soft) Page Faults
Counts the number of minor page faults. A minor fault occurs when the MMU triggers a page fault but the requested page is already present somewhere in the kernel’s page cache — no actual disk read is needed. The kernel just needs to set up a page table entry pointing at the cached page.
When do minor faults happen?
- Accessing a newly allocated heap page for the first time (Linux allocates pages lazily)
- Accessing a shared library that another process has already loaded into the page cache
- Copy-on-write faults after fork()
Practical meaning: Minor faults are cheap (microseconds). High counts are normal and not usually a concern.
ru_majflt — Major (Hard) Page Faults
Counts the number of major page faults. A major fault occurs when the requested page is not in the page cache at all — the kernel must read it from disk (swap space or a file). This involves actual I/O and can take milliseconds, which is many orders of magnitude slower than a minor fault.
When do major faults happen?
- Reading a part of an executable or shared library that has not been loaded yet
- Accessing a page that was swapped out to disk due to memory pressure
Practical meaning: High ru_majflt is a serious performance warning. It means the system is under memory pressure and is swapping. For real-time or embedded systems, major faults can violate timing guarantees.
ru_inblock / ru_oublock — Block I/O Operations
ru_inblock counts the number of times the filesystem had to perform a block read from the underlying block device (disk). Similarly, ru_oublock counts block writes.
Available since Linux 2.6.22. The key point is that these count actual hardware I/O operations — reads that were served from the page cache (without hitting the disk) do not increment these counters.
Practical meaning: A process that reads the same file twice and both reads come from the page cache will show 0 in ru_inblock both times. Only the first read that actually went to disk increments the counter.
ru_nvcsw — Voluntary Context Switches
Counts how many times the process voluntarily gave up the CPU before its time slice expired. This happens when the process calls a blocking system call: sleep(), read() on a socket with no data, wait(), pthread_mutex_lock() when the mutex is already held, etc.
Practical meaning: High ru_nvcsw is generally a good sign — it means the process is being cooperative and yielding the CPU when it has nothing to do, which is efficient behaviour in a multi-process system.
ru_nivcsw — Involuntary Context Switches
Counts how many times the kernel forcibly preempted the process. This happens either because a higher-priority process became runnable, or because the process’s CPU time slice expired before it voluntarily yielded.
Practical meaning: Very high ru_nivcsw can indicate CPU contention — many processes are fighting for CPU time. For real-time processes, it may indicate priority inversion or scheduling problems.
Fields Unused on Linux (BSD Legacy)
The following fields are always zero on Linux. They exist because the struct rusage definition was copied from BSD UNIX, where they carry meaningful data. Linux provides them so that code compiled on BSD can be recompiled on Linux without modification.
| Field | What it was supposed to mean |
|---|---|
| ru_ixrss | Integral shared memory size (kilobyte-seconds) |
| ru_idrss | Integral unshared data size (kilobyte-seconds) |
| ru_isrss | Integral unshared stack size (kilobyte-seconds) |
| ru_nswap | Number of swaps to/from physical memory |
| ru_msgsnd | Number of IPC messages sent |
| ru_msgrcv | Number of IPC messages received |
| ru_nsignals | Number of signals received |
Code Example — Interpreting All Populated Fields
#include <stdio.h>
#include <stdlib.h>
#include <sys/resource.h>
/* Helper: convert timeval to seconds as a double */
static double tv_to_sec(struct timeval tv)
{
return (double)tv.tv_sec + (double)tv.tv_usec / 1000000.0;
}
void print_rusage(const struct rusage *ru)
{
printf("--- Resource Usage Report ---\n");
/* CPU time */
printf("User-mode CPU : %.6f seconds\n", tv_to_sec(ru->ru_utime));
printf("Kernel-mode CPU : %.6f seconds\n", tv_to_sec(ru->ru_stime));
/* Memory */
printf("Peak RSS : %ld KB\n", ru->ru_maxrss);
/* Page faults */
printf("Minor faults : %ld (no disk I/O needed)\n", ru->ru_minflt);
printf("Major faults : %ld (required disk I/O)\n", ru->ru_majflt);
/* Block I/O (available since Linux 2.6.22) */
printf("Block reads : %ld\n", ru->ru_inblock);
printf("Block writes : %ld\n", ru->ru_oublock);
/* Context switches (available since Linux 2.6) */
printf("Voluntary ctx : %ld (process yielded CPU)\n", ru->ru_nvcsw);
printf("Involuntary ctx : %ld (kernel preempted)\n", ru->ru_nivcsw);
}
int main(void)
{
struct rusage ru;
/* Allocate and touch some memory to generate minor faults */
char *buf = malloc(4 * 1024 * 1024); /* 4 MB */
long i;
for (i = 0; i < 4 * 1024 * 1024; i += 4096)
buf[i] = 1; /* touch each page */
free(buf);
if (getrusage(RUSAGE_SELF, &ru) == -1) {
perror("getrusage"); exit(1);
}
print_rusage(&ru);
return 0;
}
/*
* Compile: gcc -o rusage_fields rusage_fields.c
*
* Note: ru_majflt should be 0 in normal circumstances.
* ru_minflt will be non-zero because pages are allocated lazily.
*/
Interview Questions
The struct has 17 fields. On Linux, 9 are populated (ru_utime, ru_stime, ru_maxrss, ru_minflt, ru_majflt, ru_inblock, ru_oublock, ru_nvcsw, ru_nivcsw). The remaining 8 are BSD legacy fields always filled with 0. The availability of some fields depends on kernel version.
Both are struct timeval with two members: tv_sec (seconds) and tv_usec (microseconds, 0–999999). Total seconds = tv_sec + tv_usec / 1000000.0.
A minor fault (ru_minflt) means the page is already in the kernel’s page cache — no disk I/O is needed, just a page table update. Very cheap. A major fault (ru_majflt) means the page must be read from disk (swap or file) — actual I/O, orders of magnitude slower. High major fault counts indicate memory pressure or thrashing.
It measures the peak resident set size in kilobytes — the maximum amount of physical RAM the process had at any point. It is populated since Linux 2.6.32. For RUSAGE_CHILDREN it gives the maximum across all descendants, not the sum.
They come from BSD UNIX. Linux provides them so that code written for BSD can be compiled on Linux without modification. If Linux ever implements them, the struct layout will not change, so existing compiled binaries remain valid. It is a source and binary compatibility measure.
ru_nvcsw counts voluntary context switches — the process willingly gave up the CPU (e.g., by calling sleep, blocking on I/O, or waiting on a lock). ru_nivcsw counts involuntary context switches — the kernel forcibly preempted the process because a higher-priority process became runnable or the time slice expired. High involuntary switches suggest CPU contention.
