ransformation During exec() & The Bounding Set

 

Linux Capabilities
Chapter 39 โ€” Part 4: Transformation During exec() & The Bounding Set
๐Ÿ“ The exec() Formula
๐Ÿ›ก๏ธ Bounding Set
๐Ÿ’ป Coding Example
๐ŸŽฏ Interview Q&A

How Capabilities Change Across exec()

The most critical moment in the capability lifecycle is when a process calls exec(). At this point, the kernel must compute a completely new set of capabilities for the process that will run the new program. This computation uses a precise mathematical formula that combines the process’s current capabilities with the file’s capability sets.

Understanding this formula is essential for correctly setting up file capabilities and for predicting what privileges a program will have after being executed.

39.5 โ€” The exec() Capability Transformation Rules

When exec() is called, the kernel calculates the process’s new capability sets using these three rules (where P = value before exec, P' = value after exec, F = file capability set, cap_bset = capability bounding set):

/* Rule 1: New Permitted set */
P'(permitted) = (P(inheritable) & F(inheritable)) | (F(permitted) & cap_bset)

/* Rule 2: New Effective set */
P'(effective) = F(effective) ? P'(permitted) : 0

/* Rule 3: New Inheritable set (unchanged) */
P'(inheritable) = P(inheritable)

Let’s break down each rule in plain language:

Rule 1 โ€” New Permitted = (Process Inheritable AND File Inheritable) OR (File Permitted AND Bounding Set)

The new permitted set has two sources of capabilities:
Source A: Capabilities that are in both the process’s inheritable set AND the file’s inheritable set. This is how inheritable capabilities from the process flow through. The file acts as a filter โ€” only capabilities that the file explicitly “allows” can flow from the process’s inheritable set.
Source B: Capabilities in the file’s permitted set, masked by the capability bounding set. The bounding set acts as a system-wide limit โ€” even if a file says “grant this capability,” the bounding set can override it.

Rule 2 โ€” New Effective Set depends on the File Effective Bit

If the file’s effective bit is 1: the new effective set equals the new permitted set (all permitted capabilities are immediately active).
If the file’s effective bit is 0: the new effective set is empty (the program starts with no active capabilities and must raise them explicitly using libcap).

Rule 3 โ€” Inheritable Set is Preserved Unchanged

The process’s inheritable set passes through exec() unchanged. This makes sense: the inheritable set is the process’s mechanism for specifying which capabilities it wants to preserve across future exec() calls.

๐Ÿ“ Worked Example

Scenario: An unprivileged process (no capabilities in any set) executes a file that has CAP_NET_BIND_SERVICE in its file permitted set and the file effective bit is set.

Given:
P(inheritable) = 0 (no inheritable caps)
F(inheritable) = 0 (no inheritable caps on file)
F(permitted) = CAP_NET_BIND_SERVICE
F(effective bit) = 1 (set)
cap_bset = all capabilities (default)
Result:
P'(permitted) = (0 & 0) | (CAP_NET_BIND_SERVICE & ALL) = CAP_NET_BIND_SERVICE
P'(effective) = 1 ? P'(permitted) : 0 = CAP_NET_BIND_SERVICE
P'(inheritable) = P(inheritable) = 0

The process can now bind to port 80, even though it was unprivileged before the exec(). This is exactly what sudo setcap "cap_net_bind_service=pe" myserver achieves.

39.5.1 โ€” The Capability Bounding Set

The capability bounding set (cap_bset) is a per-process security mechanism that acts as a hard upper limit on what capabilities a process can gain during exec(). It serves two purposes:

Purpose 1 โ€” Limits file permitted capabilities: During exec(), the file’s permitted capability set is ANDed with the bounding set before being applied. If a capability is not in the bounding set, the file cannot grant it to the process โ€” no matter what the file’s capability sets say.
Purpose 2 โ€” Limits inheritable capabilities: A process can only add capabilities to its inheritable set if those capabilities are in the bounding set. This prevents a process from setting up its inheritable set to gain capabilities it could never otherwise acquire.

Key properties of the bounding set:

  • It is a per-thread (per-process) attribute, visible as CapBnd in /proc/PID/status.
  • It is inherited by child processes created via fork().
  • It is preserved across exec().
  • On a kernel that supports file capabilities, init starts with a bounding set containing all capabilities. All other processes inherit this.
  • A process can drop capabilities from its bounding set using prctl(PR_CAPBSET_DROP, cap), but it can never add them back. Dropping is irreversible.
  • A process can check if a capability is in its bounding set using prctl(PR_CAPBSET_READ, cap).
๐Ÿ”‘ Why the Bounding Set Matters:
Suppose you run a daemon that forks child processes. You want to ensure that even if a child process execs a file that has been tampered with to give it CAP_SYS_MODULE (kernel module loading), the child cannot gain that capability. Before forking, the parent drops CAP_SYS_MODULE from the bounding set. Now, no child or grandchild process can ever gain that capability from any file, no matter how the file’s capabilities are set.
# Drop CAP_SYS_MODULE from bounding set (using prctl in C)
prctl(PR_CAPBSET_DROP, CAP_SYS_MODULE);

# Check if CAP_NET_RAW is in bounding set
int result = prctl(PR_CAPBSET_READ, CAP_NET_RAW);
# result == 1 means yes, 0 means no, -1 means error

39.5.2 โ€” Preserving root Semantics During exec()

To maintain complete backward compatibility with traditional UNIX programs that expect root to have all privileges, the kernel applies special rules during exec() when the process or program has a root identity. In these cases, file capability sets are ignored, and the kernel instead notionally defines the file sets as follows:

Condition Notional File Sets
Execing a set-user-ID-root program, OR process has real/effective UID = 0 F(inheritable) = ALL 1s, F(permitted) = ALL 1s
Execing a set-user-ID-root program, OR process has effective UID = 0 F(effective bit) = 1

With these notional definitions, the exec() formula for a typical root process (both real and effective UID are 0) simplifies to:

/* Simplified formula for root processes */
P'(permitted) = P(inheritable) | cap_bset
P'(effective) = P'(permitted)

In practice, since the bounding set contains all capabilities by default, this means a root process that execs any program gets all capabilities โ€” which is exactly the traditional behavior.

๐Ÿ’ป Coding Example โ€” Capability Bounding Set Operations with prctl()

This program demonstrates how to read the bounding set using prctl(PR_CAPBSET_READ) and how to drop capabilities from the bounding set using prctl(PR_CAPBSET_DROP). It shows the state before and after dropping, and explains why this is an irreversible security hardening operation.

/*
 * bounding_set_demo.c
 *
 * Demonstrates capability bounding set operations:
 *   1. Read which capabilities are in the bounding set
 *   2. Drop specific capabilities from the bounding set
 *   3. Verify the drop is permanent (cannot be restored)
 *
 * Must be run as root (or with CAP_SETPCAP) to drop from bounding set.
 *
 * Compile:
 *   gcc -o bounding_set_demo bounding_set_demo.c -lcap
 *
 * Run:
 *   sudo ./bounding_set_demo
 *
 * Inspect bounding set directly:
 *   grep CapBnd /proc/self/status | awk '{print $2}' | \
 *     xargs printf "%d\n" | xargs -I{} capsh --decode={}
 */

#include <stdio.h>
#include <stdlib.h>
#include <sys/capability.h>
#include <sys/prctl.h>
#include <unistd.h>
#include <errno.h>
#include <string.h>

/*
 * is_in_bounding_set() - Check if a capability is currently in the bounding set.
 *
 * Uses prctl(PR_CAPBSET_READ, capability).
 * Returns:
 *   1  if capability IS in the bounding set
 *   0  if capability is NOT in the bounding set
 *  -1  on error (e.g., capability number is invalid)
 */
int is_in_bounding_set(cap_value_t cap)
{
    int ret = prctl(PR_CAPBSET_READ, (unsigned long)cap, 0, 0, 0);
    if (ret == -1 && errno != EINVAL)
        perror("prctl(PR_CAPBSET_READ)");
    return ret;
}

/*
 * drop_from_bounding_set() - Permanently remove a capability from bounding set.
 *
 * Uses prctl(PR_CAPBSET_DROP, capability).
 *
 * CRITICAL: This operation is IRREVERSIBLE for the calling process and all
 * descendants. Once dropped, the capability cannot be added back to the
 * bounding set. If exec() is called with a file that has this capability
 * in its file permitted set, the capability will be filtered out.
 *
 * Requires: CAP_SETPCAP in the process's effective set (usually means root).
 *
 * Returns 0 on success, -1 on failure.
 */
int drop_from_bounding_set(cap_value_t cap)
{
    if (prctl(PR_CAPBSET_DROP, (unsigned long)cap, 0, 0, 0) == -1) {
        perror("prctl(PR_CAPBSET_DROP)");
        return -1;
    }
    return 0;
}

/*
 * print_bounding_set() - Print status of selected capabilities in bounding set.
 */
void print_bounding_set(void)
{
    /*
     * We check a representative sample of capabilities.
     * In a real program you would check all capabilities from 0 to CAP_LAST_CAP.
     */
    struct { cap_value_t val; const char *name; } caps_to_check[] = {
        { CAP_SYS_MODULE,        "CAP_SYS_MODULE"        },
        { CAP_SYS_RAWIO,         "CAP_SYS_RAWIO"         },
        { CAP_SYS_TIME,          "CAP_SYS_TIME"          },
        { CAP_NET_RAW,           "CAP_NET_RAW"           },
        { CAP_NET_BIND_SERVICE,  "CAP_NET_BIND_SERVICE"  },
        { CAP_DAC_OVERRIDE,      "CAP_DAC_OVERRIDE"      },
        { CAP_KILL,              "CAP_KILL"              },
        { CAP_SYS_BOOT,          "CAP_SYS_BOOT"          },
    };
    int n = sizeof(caps_to_check) / sizeof(caps_to_check[0]);

    printf("  %-30s  %s\n", "Capability", "In Bounding Set?");
    printf("  %-30s  %s\n", "----------", "----------------");

    for (int i = 0; i < n; i++) {
        int result = is_in_bounding_set(caps_to_check[i].val);
        printf("  %-30s  %s\n",
               caps_to_check[i].name,
               (result == 1) ? "YES" : (result == 0) ? "NO" : "ERROR");
    }
}

int main(void)
{
    printf("=== Capability Bounding Set Demo ===\n");
    printf("PID=%d  UID=%d  EUID=%d\n\n",
           (int)getpid(), (int)getuid(), (int)geteuid());

    if (geteuid() != 0) {
        printf("WARNING: Not running as root. Cannot drop from bounding set.\n");
        printf("(PR_CAPBSET_READ still works for reading)\n\n");
    }

    /*
     * Step 1: Show the bounding set BEFORE any drops.
     * On a default system, all capabilities should be present.
     */
    printf("--- BEFORE dropping capabilities ---\n");
    print_bounding_set();

    if (geteuid() != 0) {
        printf("\nRun as root to see the drop demonstration.\n");
        return 0;
    }

    /*
     * Step 2: Drop dangerous capabilities from the bounding set.
     *
     * A real security-hardened daemon would drop all capabilities
     * that it will never need, before forking worker processes.
     * This ensures no descendant can ever gain these capabilities,
     * even by execing a file that has them set.
     *
     * We drop:
     *   CAP_SYS_MODULE โ€” prevents loading kernel modules
     *   CAP_SYS_RAWIO  โ€” prevents direct hardware I/O access
     *   CAP_SYS_BOOT   โ€” prevents system reboot
     */
    printf("\n--- Dropping CAP_SYS_MODULE, CAP_SYS_RAWIO, CAP_SYS_BOOT ---\n");
    printf("(This is IRREVERSIBLE for this process and all descendants)\n\n");

    if (drop_from_bounding_set(CAP_SYS_MODULE) == 0)
        printf("  Dropped CAP_SYS_MODULE from bounding set.\n");

    if (drop_from_bounding_set(CAP_SYS_RAWIO) == 0)
        printf("  Dropped CAP_SYS_RAWIO from bounding set.\n");

    if (drop_from_bounding_set(CAP_SYS_BOOT) == 0)
        printf("  Dropped CAP_SYS_BOOT from bounding set.\n");

    /*
     * Step 3: Show the bounding set AFTER drops.
     * The three dropped capabilities should now show "NO".
     */
    printf("\n--- AFTER dropping capabilities ---\n");
    print_bounding_set();

    /*
     * Step 4: Try to add a capability back to the bounding set.
     * There is NO prctl operation for this โ€” it is impossible.
     * Once dropped, it's gone from this process and all descendants.
     *
     * The only way to get it back would be to exec() a new process
     * hierarchy (not from a descendant of this process).
     */
    printf("\n--- Attempting to restore CAP_SYS_MODULE (should fail) ---\n");
    printf("  There is no prctl() operation to add to the bounding set.\n");
    printf("  PR_CAPBSET_DROP exists, but there is no PR_CAPBSET_ADD.\n");
    printf("  The drop is permanent for this process and all its children.\n");

    printf("\n=== Done ===\n");
    return 0;
}
Expected output (as root):

--- BEFORE dropping capabilities ---
  CAP_SYS_MODULE                  YES
  CAP_SYS_RAWIO                   YES
  CAP_SYS_TIME                    YES
  CAP_NET_RAW                     YES
  ...

--- Dropping CAP_SYS_MODULE, CAP_SYS_RAWIO, CAP_SYS_BOOT ---
  Dropped CAP_SYS_MODULE from bounding set.
  Dropped CAP_SYS_RAWIO from bounding set.
  Dropped CAP_SYS_BOOT from bounding set.

--- AFTER dropping capabilities ---
  CAP_SYS_MODULE                  NO
  CAP_SYS_RAWIO                   NO
  CAP_SYS_TIME                    YES
  ...

๐ŸŽฏ Interview Questions
Q1. Write the complete exec() capability transformation formula and explain each term.

P'(permitted) = (P(inheritable) & F(inheritable)) | (F(permitted) & cap_bset)
P'(effective) = F(effective) ? P'(permitted) : 0
P'(inheritable) = P(inheritable)
P = process caps before exec, P’ = after, F = file caps, cap_bset = bounding set. The new permitted comes from two sources: (1) capabilities that are in both process and file inheritable sets, and (2) file permitted capabilities filtered by the bounding set. The new effective is either all of permitted (if file effective bit is 1) or empty (if 0). The inheritable set passes through unchanged.

Q2. What is the capability bounding set and what two roles does it play?

The bounding set is a per-process security limit on capabilities that can be gained during exec(). Role 1: it is ANDed with the file permitted capabilities, so a file cannot grant capabilities not in the bounding set. Role 2: it limits what can be added to the process’s inheritable set, preventing a process from bootstrapping capabilities it couldn’t otherwise obtain. It is inherited by children and preserved across exec(), but capabilities can only be dropped from it (irreversibly) โ€” never added.

Q3. What prctl() operations manage the capability bounding set?

prctl(PR_CAPBSET_READ, cap) returns 1 if the capability is in the bounding set, 0 if not. prctl(PR_CAPBSET_DROP, cap) permanently removes a capability from the bounding set (requires CAP_SETPCAP). There is no PR_CAPBSET_ADD โ€” addition is impossible. This asymmetry is intentional: the bounding set can only shrink, never grow.

Q4. What special treatment does the kernel apply to root processes during exec(), and why?

To preserve backward compatibility, when a process with UID 0 execs a program (or execs a set-user-ID-root program), the kernel ignores the file’s actual capability sets and notionally treats F(inheritable) and F(permitted) as all ones, and F(effective) as 1. This means root processes get all capabilities in their new effective and permitted sets โ€” the same as traditional UNIX behavior where root bypasses all checks. Without this, existing root-based programs would break.

Q5. A non-root process has CAP_NET_BIND_SERVICE in its inheritable set. A file has CAP_NET_BIND_SERVICE in its file inheritable set. After exec(), will the process have this capability in its new permitted set?

Yes, assuming CAP_NET_BIND_SERVICE is in the bounding set. Per Rule 1: P'(permitted) = (P(inheritable) & F(inheritable)) | ... โ€” since the capability is in both the process’s and file’s inheritable sets, the AND produces a non-zero result, so the capability ends up in the new permitted set. This is exactly how the inheritable mechanism is supposed to work โ€” the process signals “I want this capability preserved,” and the file confirms “I allow this.”

Q6. Why does exec() not simply preserve the process’s permitted capabilities unchanged?

Two reasons: (1) The exec might itself require certain privileges (like CAP_DAC_OVERRIDE) that we don’t want to carry forward. If permitted caps were preserved automatically, there would be no way to prevent this. (2) If you pre-emptively dropped some permitted caps to avoid preserving them, and then exec() failed, the program would have permanently lost capabilities it needed โ€” because dropping from permitted is irreversible. The inheritable set provides a controlled, deliberate mechanism for selectively preserving capabilities.

Leave a Reply

Your email address will not be published. Required fields are marked *