The most critical moment in the capability lifecycle is when a process calls exec(). At this point, the kernel must compute a completely new set of capabilities for the process that will run the new program. This computation uses a precise mathematical formula that combines the process’s current capabilities with the file’s capability sets.
Understanding this formula is essential for correctly setting up file capabilities and for predicting what privileges a program will have after being executed.
When exec() is called, the kernel calculates the process’s new capability sets using these three rules (where P = value before exec, P' = value after exec, F = file capability set, cap_bset = capability bounding set):
P'(permitted) = (P(inheritable) & F(inheritable)) | (F(permitted) & cap_bset)
/* Rule 2: New Effective set */
P'(effective) = F(effective) ? P'(permitted) : 0
/* Rule 3: New Inheritable set (unchanged) */
P'(inheritable) = P(inheritable)
Let’s break down each rule in plain language:
The new permitted set has two sources of capabilities:
Source A: Capabilities that are in both the process’s inheritable set AND the file’s inheritable set. This is how inheritable capabilities from the process flow through. The file acts as a filter โ only capabilities that the file explicitly “allows” can flow from the process’s inheritable set.
Source B: Capabilities in the file’s permitted set, masked by the capability bounding set. The bounding set acts as a system-wide limit โ even if a file says “grant this capability,” the bounding set can override it.
If the file’s effective bit is 1: the new effective set equals the new permitted set (all permitted capabilities are immediately active).
If the file’s effective bit is 0: the new effective set is empty (the program starts with no active capabilities and must raise them explicitly using libcap).
The process’s inheritable set passes through exec() unchanged. This makes sense: the inheritable set is the process’s mechanism for specifying which capabilities it wants to preserve across future exec() calls.
Scenario: An unprivileged process (no capabilities in any set) executes a file that has CAP_NET_BIND_SERVICE in its file permitted set and the file effective bit is set.
| Given: | |
| P(inheritable) | = 0 (no inheritable caps) |
| F(inheritable) | = 0 (no inheritable caps on file) |
| F(permitted) | = CAP_NET_BIND_SERVICE |
| F(effective bit) | = 1 (set) |
| cap_bset | = all capabilities (default) |
| Result: | |
| P'(permitted) | = (0 & 0) | (CAP_NET_BIND_SERVICE & ALL) = CAP_NET_BIND_SERVICE |
| P'(effective) | = 1 ? P'(permitted) : 0 = CAP_NET_BIND_SERVICE |
| P'(inheritable) | = P(inheritable) = 0 |
The process can now bind to port 80, even though it was unprivileged before the exec(). This is exactly what sudo setcap "cap_net_bind_service=pe" myserver achieves.
The capability bounding set (cap_bset) is a per-process security mechanism that acts as a hard upper limit on what capabilities a process can gain during exec(). It serves two purposes:
Key properties of the bounding set:
- It is a per-thread (per-process) attribute, visible as
CapBndin/proc/PID/status. - It is inherited by child processes created via
fork(). - It is preserved across
exec(). - On a kernel that supports file capabilities,
initstarts with a bounding set containing all capabilities. All other processes inherit this. - A process can drop capabilities from its bounding set using
prctl(PR_CAPBSET_DROP, cap), but it can never add them back. Dropping is irreversible. - A process can check if a capability is in its bounding set using
prctl(PR_CAPBSET_READ, cap).
Suppose you run a daemon that forks child processes. You want to ensure that even if a child process execs a file that has been tampered with to give it
CAP_SYS_MODULE (kernel module loading), the child cannot gain that capability. Before forking, the parent drops CAP_SYS_MODULE from the bounding set. Now, no child or grandchild process can ever gain that capability from any file, no matter how the file’s capabilities are set.prctl(PR_CAPBSET_DROP, CAP_SYS_MODULE);
# Check if CAP_NET_RAW is in bounding set
int result = prctl(PR_CAPBSET_READ, CAP_NET_RAW);
# result == 1 means yes, 0 means no, -1 means error
To maintain complete backward compatibility with traditional UNIX programs that expect root to have all privileges, the kernel applies special rules during exec() when the process or program has a root identity. In these cases, file capability sets are ignored, and the kernel instead notionally defines the file sets as follows:
| Condition | Notional File Sets |
|---|---|
| Execing a set-user-ID-root program, OR process has real/effective UID = 0 | F(inheritable) = ALL 1s, F(permitted) = ALL 1s |
| Execing a set-user-ID-root program, OR process has effective UID = 0 | F(effective bit) = 1 |
With these notional definitions, the exec() formula for a typical root process (both real and effective UID are 0) simplifies to:
P'(permitted) = P(inheritable) | cap_bset
P'(effective) = P'(permitted)
In practice, since the bounding set contains all capabilities by default, this means a root process that execs any program gets all capabilities โ which is exactly the traditional behavior.
This program demonstrates how to read the bounding set using prctl(PR_CAPBSET_READ) and how to drop capabilities from the bounding set using prctl(PR_CAPBSET_DROP). It shows the state before and after dropping, and explains why this is an irreversible security hardening operation.
/*
* bounding_set_demo.c
*
* Demonstrates capability bounding set operations:
* 1. Read which capabilities are in the bounding set
* 2. Drop specific capabilities from the bounding set
* 3. Verify the drop is permanent (cannot be restored)
*
* Must be run as root (or with CAP_SETPCAP) to drop from bounding set.
*
* Compile:
* gcc -o bounding_set_demo bounding_set_demo.c -lcap
*
* Run:
* sudo ./bounding_set_demo
*
* Inspect bounding set directly:
* grep CapBnd /proc/self/status | awk '{print $2}' | \
* xargs printf "%d\n" | xargs -I{} capsh --decode={}
*/
#include <stdio.h>
#include <stdlib.h>
#include <sys/capability.h>
#include <sys/prctl.h>
#include <unistd.h>
#include <errno.h>
#include <string.h>
/*
* is_in_bounding_set() - Check if a capability is currently in the bounding set.
*
* Uses prctl(PR_CAPBSET_READ, capability).
* Returns:
* 1 if capability IS in the bounding set
* 0 if capability is NOT in the bounding set
* -1 on error (e.g., capability number is invalid)
*/
int is_in_bounding_set(cap_value_t cap)
{
int ret = prctl(PR_CAPBSET_READ, (unsigned long)cap, 0, 0, 0);
if (ret == -1 && errno != EINVAL)
perror("prctl(PR_CAPBSET_READ)");
return ret;
}
/*
* drop_from_bounding_set() - Permanently remove a capability from bounding set.
*
* Uses prctl(PR_CAPBSET_DROP, capability).
*
* CRITICAL: This operation is IRREVERSIBLE for the calling process and all
* descendants. Once dropped, the capability cannot be added back to the
* bounding set. If exec() is called with a file that has this capability
* in its file permitted set, the capability will be filtered out.
*
* Requires: CAP_SETPCAP in the process's effective set (usually means root).
*
* Returns 0 on success, -1 on failure.
*/
int drop_from_bounding_set(cap_value_t cap)
{
if (prctl(PR_CAPBSET_DROP, (unsigned long)cap, 0, 0, 0) == -1) {
perror("prctl(PR_CAPBSET_DROP)");
return -1;
}
return 0;
}
/*
* print_bounding_set() - Print status of selected capabilities in bounding set.
*/
void print_bounding_set(void)
{
/*
* We check a representative sample of capabilities.
* In a real program you would check all capabilities from 0 to CAP_LAST_CAP.
*/
struct { cap_value_t val; const char *name; } caps_to_check[] = {
{ CAP_SYS_MODULE, "CAP_SYS_MODULE" },
{ CAP_SYS_RAWIO, "CAP_SYS_RAWIO" },
{ CAP_SYS_TIME, "CAP_SYS_TIME" },
{ CAP_NET_RAW, "CAP_NET_RAW" },
{ CAP_NET_BIND_SERVICE, "CAP_NET_BIND_SERVICE" },
{ CAP_DAC_OVERRIDE, "CAP_DAC_OVERRIDE" },
{ CAP_KILL, "CAP_KILL" },
{ CAP_SYS_BOOT, "CAP_SYS_BOOT" },
};
int n = sizeof(caps_to_check) / sizeof(caps_to_check[0]);
printf(" %-30s %s\n", "Capability", "In Bounding Set?");
printf(" %-30s %s\n", "----------", "----------------");
for (int i = 0; i < n; i++) {
int result = is_in_bounding_set(caps_to_check[i].val);
printf(" %-30s %s\n",
caps_to_check[i].name,
(result == 1) ? "YES" : (result == 0) ? "NO" : "ERROR");
}
}
int main(void)
{
printf("=== Capability Bounding Set Demo ===\n");
printf("PID=%d UID=%d EUID=%d\n\n",
(int)getpid(), (int)getuid(), (int)geteuid());
if (geteuid() != 0) {
printf("WARNING: Not running as root. Cannot drop from bounding set.\n");
printf("(PR_CAPBSET_READ still works for reading)\n\n");
}
/*
* Step 1: Show the bounding set BEFORE any drops.
* On a default system, all capabilities should be present.
*/
printf("--- BEFORE dropping capabilities ---\n");
print_bounding_set();
if (geteuid() != 0) {
printf("\nRun as root to see the drop demonstration.\n");
return 0;
}
/*
* Step 2: Drop dangerous capabilities from the bounding set.
*
* A real security-hardened daemon would drop all capabilities
* that it will never need, before forking worker processes.
* This ensures no descendant can ever gain these capabilities,
* even by execing a file that has them set.
*
* We drop:
* CAP_SYS_MODULE โ prevents loading kernel modules
* CAP_SYS_RAWIO โ prevents direct hardware I/O access
* CAP_SYS_BOOT โ prevents system reboot
*/
printf("\n--- Dropping CAP_SYS_MODULE, CAP_SYS_RAWIO, CAP_SYS_BOOT ---\n");
printf("(This is IRREVERSIBLE for this process and all descendants)\n\n");
if (drop_from_bounding_set(CAP_SYS_MODULE) == 0)
printf(" Dropped CAP_SYS_MODULE from bounding set.\n");
if (drop_from_bounding_set(CAP_SYS_RAWIO) == 0)
printf(" Dropped CAP_SYS_RAWIO from bounding set.\n");
if (drop_from_bounding_set(CAP_SYS_BOOT) == 0)
printf(" Dropped CAP_SYS_BOOT from bounding set.\n");
/*
* Step 3: Show the bounding set AFTER drops.
* The three dropped capabilities should now show "NO".
*/
printf("\n--- AFTER dropping capabilities ---\n");
print_bounding_set();
/*
* Step 4: Try to add a capability back to the bounding set.
* There is NO prctl operation for this โ it is impossible.
* Once dropped, it's gone from this process and all descendants.
*
* The only way to get it back would be to exec() a new process
* hierarchy (not from a descendant of this process).
*/
printf("\n--- Attempting to restore CAP_SYS_MODULE (should fail) ---\n");
printf(" There is no prctl() operation to add to the bounding set.\n");
printf(" PR_CAPBSET_DROP exists, but there is no PR_CAPBSET_ADD.\n");
printf(" The drop is permanent for this process and all its children.\n");
printf("\n=== Done ===\n");
return 0;
}
--- BEFORE dropping capabilities --- CAP_SYS_MODULE YES CAP_SYS_RAWIO YES CAP_SYS_TIME YES CAP_NET_RAW YES ... --- Dropping CAP_SYS_MODULE, CAP_SYS_RAWIO, CAP_SYS_BOOT --- Dropped CAP_SYS_MODULE from bounding set. Dropped CAP_SYS_RAWIO from bounding set. Dropped CAP_SYS_BOOT from bounding set. --- AFTER dropping capabilities --- CAP_SYS_MODULE NO CAP_SYS_RAWIO NO CAP_SYS_TIME YES ...
P'(permitted) = (P(inheritable) & F(inheritable)) | (F(permitted) & cap_bset)
P'(effective) = F(effective) ? P'(permitted) : 0
P'(inheritable) = P(inheritable)
P = process caps before exec, P’ = after, F = file caps, cap_bset = bounding set. The new permitted comes from two sources: (1) capabilities that are in both process and file inheritable sets, and (2) file permitted capabilities filtered by the bounding set. The new effective is either all of permitted (if file effective bit is 1) or empty (if 0). The inheritable set passes through unchanged.
The bounding set is a per-process security limit on capabilities that can be gained during exec(). Role 1: it is ANDed with the file permitted capabilities, so a file cannot grant capabilities not in the bounding set. Role 2: it limits what can be added to the process’s inheritable set, preventing a process from bootstrapping capabilities it couldn’t otherwise obtain. It is inherited by children and preserved across exec(), but capabilities can only be dropped from it (irreversibly) โ never added.
prctl(PR_CAPBSET_READ, cap) returns 1 if the capability is in the bounding set, 0 if not. prctl(PR_CAPBSET_DROP, cap) permanently removes a capability from the bounding set (requires CAP_SETPCAP). There is no PR_CAPBSET_ADD โ addition is impossible. This asymmetry is intentional: the bounding set can only shrink, never grow.
To preserve backward compatibility, when a process with UID 0 execs a program (or execs a set-user-ID-root program), the kernel ignores the file’s actual capability sets and notionally treats F(inheritable) and F(permitted) as all ones, and F(effective) as 1. This means root processes get all capabilities in their new effective and permitted sets โ the same as traditional UNIX behavior where root bypasses all checks. Without this, existing root-based programs would break.
Yes, assuming CAP_NET_BIND_SERVICE is in the bounding set. Per Rule 1: P'(permitted) = (P(inheritable) & F(inheritable)) | ... โ since the capability is in both the process’s and file’s inheritable sets, the AND produces a non-zero result, so the capability ends up in the new permitted set. This is exactly how the inheritable mechanism is supposed to work โ the process signals “I want this capability preserved,” and the file confirms “I allow this.”
Two reasons: (1) The exec might itself require certain privileges (like CAP_DAC_OVERRIDE) that we don’t want to carry forward. If permitted caps were preserved automatically, there would be no way to prevent this. (2) If you pre-emptively dropped some permitted caps to avoid preserving them, and then exec() failed, the program would have permanently lost capabilities it needed โ because dropping from permitted is irreversible. The inheritable set provides a controlled, deliberate mechanism for selectively preserving capabilities.
โ Part 3: Process & File Sets Next: Effect of Changing User IDs โ
