Capabilities-Only Environments, Securebits, Discovering Required Capabilities & Older Kernels

Linux Capabilities

Chapter 39 — Part 7: Capabilities-Only Environments, Securebits, Discovering Required Capabilities & Older Kernels

🔐 Securebits

🔍 Discovering Caps

📜 Older Kernels

🎯 Interview Q&A

Creating Pure Capability-Based Environments

In the preceding sections, we’ve seen that UID 0 (root) receives special treatment in several ways: automatic capability grants, capability adjustments during UID transitions, and file effective bit handling. In a pure capability-based system, none of this special treatment for root would exist — privileges would come entirely from file capabilities.

Linux’s securebits mechanism (introduced in kernel 2.6.26) allows an application to opt out of the special root treatments, creating a capability-based environment where root gets no special privileges beyond what file capabilities provide.

39.8 — The Securebits Mechanism

The securebits mechanism is a set of per-process (technically per-thread) flags that control each of the three special root treatments described in the chapter. These flags exist in pairs: a base flag that controls behavior, and a corresponding locked flag that prevents the base flag from being changed again.

Flag	Meaning when SET
`SECBIT_KEEP_CAPS`	Don’t drop permitted capabilities when a process with one or more UID=0 sets all UIDs to nonzero. Prevents Rule 1 of Section 39.6 from clearing the permitted set. Note: Only effective if SECBIT_NO_SETUID_FIXUP is NOT also set. Cleared on exec().
`SECBIT_NO_SETUID_FIXUP`	Don’t change capabilities when effective or file-system UIDs switch between 0 and nonzero. Disables Rules 2, 3, and 4 from Section 39.6 — UID changes no longer affect capabilities at all.
`SECBIT_NOROOT`	If a process with real or effective UID=0 does an exec(), or if a set-user-ID-root program is execed, don’t grant any capabilities unless the executable has file capabilities explicitly set. Disables the root-preserving semantics of Section 39.5.2.
`SECBIT_KEEP_CAPS_LOCKED`	Lock SECBIT_KEEP_CAPS. Once set, SECBIT_KEEP_CAPS cannot be changed. This is a one-way latch.
`SECBIT_NO_SETUID_FIXUP_LOCKED`	Lock SECBIT_NO_SETUID_FIXUP. Once set, cannot be changed.
`SECBIT_NOROOT_LOCKED`	Lock SECBIT_NOROOT. Once set, cannot be changed.

Inheritance behavior:

Securebits flags are inherited by child processes created via fork().
All flags are preserved across exec(), except SECBIT_KEEP_CAPS, which is cleared on exec() (for historical compatibility with the older prctl(PR_SET_KEEPCAPS) operation).
Modifying securebits flags requires CAP_SETPCAP in the process’s effective set.
A process reads its securebits with prctl(PR_GET_SECUREBITS).
A process sets its securebits with prctl(PR_SET_SECUREBITS, flags).

Creating a Purely Capability-Based Environment

To make a process and all its descendants operate in a pure capability model (where root’s UID has no special meaning), you can call:

prctl(PR_SET_SECUREBITS,

SECBIT_NO_SETUID_FIXUP |

SECBIT_NO_SETUID_FIXUP_LOCKED |

SECBIT_NOROOT |

SECBIT_NOROOT_LOCKED

);

After this call:

UID transitions no longer affect capabilities (SECBIT_NO_SETUID_FIXUP)
Root gets no automatic capability grants from exec (SECBIT_NOROOT)
Both flags are permanently locked — even a process with CAP_SETPCAP cannot undo them (LOCKED flags)
The only way for this process or its descendants to gain capabilities is through file capabilities on executed programs

📌 SECBIT_KEEP_CAPS vs SECBIT_NO_SETUID_FIXUP:
These two flags are related but different. SECBIT_KEEP_CAPS prevents the permitted set from being cleared when all UIDs go to nonzero (half of the Rule 1 behavior). SECBIT_NO_SETUID_FIXUP prevents ALL capability changes due to UID transitions (Rules 2, 3, and 4 as well). When both are set, SECBIT_NO_SETUID_FIXUP takes precedence and SECBIT_KEEP_CAPS has no additional effect. SECBIT_KEEP_CAPS exists mainly to mirror the older prctl(PR_SET_KEEPCAPS) operation for backward compatibility.

39.9 — Discovering the Capabilities Required by a Program

Before you can use setcap to assign file capabilities, you need to know which capabilities a program actually requires. This is a non-trivial problem, especially for binary-only programs or large codebases. There are two main approaches:

Approach 1 — Use strace to find EPERM errors:

Run the program under strace(1) and look for system calls that fail with EPERM. The EPERM (“Operation not permitted”) error is what the kernel returns when a required capability is missing.

$ strace -e trace=all ./program 2>&1 | grep “EPERM”

Once you find the failing system call, look up its man page to determine which capability it requires. Limitation: EPERM can sometimes be returned for reasons unrelated to missing capabilities. Also, a program may silently handle some EPERM errors gracefully, meaning not every missing capability causes a visible failure.

Approach 2 — Use a kernel probe for capability checks:

Kernel probes (using tools like SystemTap, eBPF/bpftrace, or kernel tracepoints) allow you to monitor every single capability check that the kernel performs. For each capability check, you can log: the kernel function called, the capability requested, and the program name.

# Using bpftrace to trace capability checks

$ sudo bpftrace -e ‘

kprobe:cap_capable {

printf(“PID=%d COMM=%s CAP=%d\n”, pid, comm, arg2);

}’

This approach is more thorough and accurate than strace. It logs every capability request regardless of whether it succeeds or fails, and identifies the exact kernel function involved. It requires more setup but gives a definitive list of all capabilities the program ever attempts to use.

39.10 — Older Kernels and Systems Without File Capabilities

File capabilities were not available before Linux 2.6.24. Systems running older kernels (or newer kernels built without CONFIG_SECURITY_FILE_CAPABILITIES) behave differently in several important ways.

Aspect	With File Capabilities (≥ 2.6.24)	Without File Capabilities (< 2.6.24)
CAP_SETPCAP semantics	Add caps to own inheritable set; drop from bounding set; change securebits	Theoretically allows granting/removing caps from OTHER processes (but always masked out by bounding set)
Bounding set scope	Per-process attribute (shown in /proc/PID/status)	System-wide attribute affecting all processes, accessible via /proc/sys/kernel/cap-bound
Initial bounding set content	init starts with ALL capabilities in bounding set	System-wide bounding set always masks out CAP_SETPCAP (value -257 = all bits except bit 8)
Inheritable set as bounding limit	Bounding set limits what can be added to inheritable set	System-wide bounding set does NOT restrict inheritable set (not needed — file caps not supported)

CAP_SETPCAP on older kernels: The capability theoretically allows a process to change other processes’ capabilities. However, the system-wide bounding set always masks out CAP_SETPCAP, making this impossible in practice. The bounding set is initialized as –257 in two’s complement representation, which means all bits are set except bit 8 (CAP_SETPCAP has the value 8).

🔑 Security on Older Kernels Without File Capabilities:

Even without file capabilities, you can still improve security by following this pattern:

Run the program as a set-user-ID-root process (it gets all capabilities except CAP_SETPCAP).
At startup, use libcap to drop all capabilities from the effective set, and drop unnecessary capabilities from the permitted set.
Set the SECBIT_KEEP_CAPS flag (or use prctl(PR_SET_KEEPCAPS)) to prevent the permitted set from being cleared in the next step.
Set all user IDs to nonzero values (preventing access to root-owned files and exec-based privilege escalation).
During operation, selectively raise and lower the remaining permitted capabilities in the effective set as needed.

💻 Coding Example — Creating a Capabilities-Only Environment with Securebits

This program demonstrates how to use the securebits mechanism to create a pure capability-based environment. It sets the necessary flags to disable root’s special treatment, then forks a child to show that the flags are inherited.

/*
 * securebits_demo.c
 *
 * Demonstrates the securebits mechanism for creating a pure
 * capability-based environment where UID 0 gets no special treatment.
 *
 * Must run as root (needs CAP_SETPCAP to modify securebits).
 *
 * Compile:
 *   gcc -o securebits_demo securebits_demo.c -lcap
 *
 * Run:
 *   sudo ./securebits_demo
 */

#include <stdio.h>
#include <stdlib.h>
#include <sys/prctl.h>
#include <sys/capability.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <linux/securebits.h>

/*
 * print_securebits() - Display the current securebits flags.
 * Uses prctl(PR_GET_SECUREBITS) to read the flags.
 */
static void print_securebits(const char *label)
{
    int bits;

    bits = prctl(PR_GET_SECUREBITS);
    if (bits == -1) {
        perror("prctl(PR_GET_SECUREBITS)");
        return;
    }

    printf("  [%s] Securebits=0x%x\n", label, bits);
    printf("    SECBIT_KEEP_CAPS:              %s\n",
           (bits & SECBIT_KEEP_CAPS) ? "SET" : "clear");
    printf("    SECBIT_NO_SETUID_FIXUP:        %s\n",
           (bits & SECBIT_NO_SETUID_FIXUP) ? "SET" : "clear");
    printf("    SECBIT_NOROOT:                 %s\n",
           (bits & SECBIT_NOROOT) ? "SET" : "clear");
    printf("    SECBIT_NO_SETUID_FIXUP_LOCKED: %s\n",
           (bits & SECBIT_NO_SETUID_FIXUP_LOCKED) ? "SET (locked)" : "clear");
    printf("    SECBIT_NOROOT_LOCKED:          %s\n",
           (bits & SECBIT_NOROOT_LOCKED) ? "SET (locked)" : "clear");
    printf("\n");
}

/*
 * print_caps() - Print brief capability state
 */
static void print_caps(const char *label)
{
    cap_t caps = cap_get_proc();
    char *text = caps ? cap_to_text(caps, NULL) : NULL;
    printf("  [%s] caps=[%s]  RUID=%d EUID=%d\n",
           label, text ? text : "?",
           (int)getuid(), (int)geteuid());
    if (text) cap_free(text);
    if (caps) cap_free(caps);
}

int main(void)
{
    pid_t child_pid;

    printf("=== Securebits Mechanism Demo ===\n\n");

    if (geteuid() != 0) {
        fprintf(stderr, "Must run as root. Try: sudo ./securebits_demo\n");
        return 1;
    }

    /* --- Step 1: Show initial state --- */
    printf("--- Initial state ---\n");
    print_securebits("Before");
    print_caps("Before");

    /* --- Step 2: Set securebits to create pure capability environment ---
     *
     * We set four flags:
     *
     * SECBIT_NOROOT:
     *   Root no longer gets automatic capabilities during exec().
     *   A root process execing a program gets no capabilities unless
     *   the file has explicit file capabilities set.
     *
     * SECBIT_NOROOT_LOCKED:
     *   Lock SECBIT_NOROOT permanently. Even CAP_SETPCAP cannot undo it.
     *
     * SECBIT_NO_SETUID_FIXUP:
     *   UID transitions (0<->nonzero) no longer affect capabilities.
     *   Capabilities are fully orthogonal to UIDs.
     *
     * SECBIT_NO_SETUID_FIXUP_LOCKED:
     *   Lock SECBIT_NO_SETUID_FIXUP permanently.
     *
     * After this call, this process and ALL its descendants operate in
     * a pure capability-based model. Capabilities come only from file
     * capabilities — root UID provides no special treatment.
     *
     * prctl(PR_SET_SECUREBITS) requires CAP_SETPCAP in effective set.
     */
    printf("--- Setting NOROOT + NO_SETUID_FIXUP (and locking both) ---\n");

    if (prctl(PR_SET_SECUREBITS,
              SECBIT_NOROOT | SECBIT_NOROOT_LOCKED |
              SECBIT_NO_SETUID_FIXUP | SECBIT_NO_SETUID_FIXUP_LOCKED) == -1) {
        perror("prctl(PR_SET_SECUREBITS)");
        return 1;
    }

    print_securebits("After setting securebits");
    print_caps("After setting securebits");

    /* --- Step 3: Verify that UID transitions no longer affect capabilities ---
     *
     * With SECBIT_NO_SETUID_FIXUP set, changing EUID from 0 to nonzero
     * should NOT clear the effective capability set (contrast with Section 39.6
     * where it normally would).
     */
    printf("--- Testing: seteuid(1000) should NOT affect capabilities now ---\n");

    if (seteuid(1000) == -1) {
        perror("seteuid(1000)");
    } else {
        print_caps("After seteuid(1000) with NO_SETUID_FIXUP");
        printf("  (Capabilities unchanged — SECBIT_NO_SETUID_FIXUP is working)\n\n");
        seteuid(0);  /* Restore root EUID */
    }

    /* --- Step 4: Fork to show child inherits securebits ---
     *
     * A child process created via fork() inherits the securebits flags.
     * The pure capability environment propagates to all descendants.
     */
    printf("--- Forking child to show securebits inheritance ---\n");

    child_pid = fork();
    if (child_pid == -1) {
        perror("fork");
        return 1;
    }

    if (child_pid == 0) {
        /* Child process */
        printf("  [Child PID=%d] Inherited securebits from parent:\n",
               (int)getpid());
        print_securebits("Child");

        /* Child cannot change securebits because locked flags prevent it */
        printf("  [Child] Attempting to clear SECBIT_NOROOT (should fail — locked):\n");
        if (prctl(PR_SET_SECUREBITS, 0) == -1) {
            printf("  [Child] prctl failed as expected: %m\n");
            printf("  [Child] Locked securebits cannot be changed.\n");
        } else {
            printf("  [Child] Unexpectedly succeeded in clearing securebits.\n");
        }
        exit(0);
    } else {
        /* Parent waits for child */
        int status;
        waitpid(child_pid, &status, 0);
    }

    printf("\n=== Demo complete. Pure capability environment established. ===\n");
    printf("Any program run from this process can only gain capabilities\n");
    printf("via explicit file capabilities (setcap), never via root UID.\n");

    return 0;
}

Expected output (as root):

--- Initial state ---
  [Before] Securebits=0x0
    SECBIT_KEEP_CAPS:              clear
    SECBIT_NO_SETUID_FIXUP:        clear
    SECBIT_NOROOT:                 clear

--- Setting NOROOT + NO_SETUID_FIXUP ---
  [After setting securebits] Securebits=0x2a
    SECBIT_NO_SETUID_FIXUP:        SET
    SECBIT_NOROOT:                 SET
    SECBIT_NO_SETUID_FIXUP_LOCKED: SET (locked)
    SECBIT_NOROOT_LOCKED:          SET (locked)

--- Testing: seteuid(1000) should NOT affect capabilities now ---
  [After seteuid(1000) with NO_SETUID_FIXUP] caps=[=ep]  RUID=0 EUID=1000
  (Capabilities unchanged — SECBIT_NO_SETUID_FIXUP is working)

--- Forking child ---
  [Child] prctl failed as expected: Operation not permitted
  [Child] Locked securebits cannot be changed.

🎯 Interview Questions

Q1. What is the securebits mechanism and why was it introduced?

Securebits is a set of per-process flags (introduced in Linux 2.6.26 with file capabilities) that disable the kernel’s special treatment of UID 0. It was introduced to allow applications to run in a pure capability-based environment where root’s UID provides no automatic privilege — all capabilities must come from file capabilities. This is the ideal design for capability-based systems and enables true “capabilities-only” applications.

Q2. What is SECBIT_NOROOT and what specific behavior does it disable?

SECBIT_NOROOT disables the root-semantics preservation in exec(). Normally, when a root process execs a program (or execs a set-user-ID-root program), the kernel notionally sets the file capability sets to all ones, giving the process all capabilities. With SECBIT_NOROOT set, this no longer happens — root processes get no capabilities from exec unless the file has explicit file capabilities assigned to it.

Q3. Why do the securebits flags exist in pairs (base + locked)?

The locked flag provides irreversibility. Once SECBIT_NOROOT_LOCKED is set, even a process with CAP_SETPCAP cannot clear SECBIT_NOROOT. This is critical for security: it allows an application to establish a pure capability environment and guarantee that no descendant process can accidentally or maliciously re-enable root’s special treatment. Without the locked flags, a compromised child process could undo the security constraints set by the parent.

Q4. What are the two methods for discovering which capabilities a program requires?

Method 1: Run the program with strace and look for system calls failing with EPERM. Match the failing calls to their required capabilities via man pages. Limitation: EPERM can have other causes; some failures may be silently handled. Method 2: Use a kernel probe (SystemTap, bpftrace, eBPF) to monitor every capability check the kernel performs. This logs the capability checked, the kernel function, and the program name — providing a complete and accurate list of all capabilities the program attempts to use.

Q5. Before file capabilities were added (pre-2.6.24), what was different about the CAP_SETPCAP capability?

Without file capabilities, CAP_SETPCAP theoretically allowed a process to grant or remove capabilities in other processes (not just itself). However, this was entirely theoretical — the system-wide capability bounding set always had CAP_SETPCAP masked out (the initial value -257 leaves bit 8 clear). Since a process cannot have a capability that is not in the bounding set, no process could ever actually have CAP_SETPCAP on old kernels without file capabilities.

Q6. What is the difference between SECBIT_KEEP_CAPS and SECBIT_NO_SETUID_FIXUP?

SECBIT_KEEP_CAPS prevents only one specific behavior: when all UIDs change from having-any-zero to all-nonzero, it prevents the permitted set from being cleared. The effective set is still cleared. SECBIT_NO_SETUID_FIXUP is a superset — it prevents ALL capability changes caused by UID transitions between 0 and nonzero, including the clearing of the effective set when EUID changes 0→nonzero, the copying of permitted to effective when EUID changes nonzero→0, and the file-related capability adjustments when fsuid changes. When both are set, NO_SETUID_FIXUP takes precedence.

Q7. Summarize the two main differences in capability behavior between Linux ≥ 2.6.24 and older kernels.

Difference 1 — File capabilities: On ≥ 2.6.24, capabilities can be attached to executable files via the security.capability xattr. On older kernels, this is not supported — capabilities can only be attached to processes, and all privilege elevation must go through UID 0.
Difference 2 — Bounding set scope: On ≥ 2.6.25, the bounding set is a per-process attribute that can be independently managed for each process. On older kernels, the bounding set is a system-wide attribute (accessible via /proc/sys/kernel/cap-bound) that applies to all processes, and only init can add bits to it.

39.11 — Chapter Summary

The Linux capabilities scheme replaces the coarse-grained all-or-nothing UNIX privilege model with around 40 distinct capability units. Key points to remember:

Each process has three capability sets: permitted (ceiling), effective (active), and inheritable (for exec propagation).
Each file can have three capability sets: permitted (forced grants), effective (single bit for dumb programs), and inheritable (filter for process inheritable).
The exec() formula precisely defines how process and file sets combine to produce new capabilities.
The capability bounding set is a per-process security limit that can only shrink, never grow.
The kernel automatically adjusts capabilities during UID transitions to maintain backward compatibility.
The libcap API (cap_get_proc, cap_set_flag, cap_set_proc, cap_free) is the correct way to manipulate capabilities in C programs.
The securebits mechanism allows creating pure capability environments where root UID has no special meaning.
File capabilities were introduced in Linux 2.6.24. Before that, capabilities could only be used with set-user-ID-root programs using the libcap API to selectively raise and drop them.

Chapter 39 — All Parts

Part 1: Intro Part 2: Table Part 3: Sets Part 4: exec() Part 5: UIDs Part 6: libcap

embeddedpathashala.com

Capabilities-Only Environments, Securebits, Discovering Required Capabilities & Older Kernels

Leave a Reply Cancel reply