System calls to modify and retrieve scheduling policies and priorities

 

Realtime Scheduling API: Set & Get
Chapter 35.3 — System calls to modify and retrieve scheduling policies and priorities, privilege rules, RLIMIT_RTPRIO, and SCHED_RESET_ON_FORK
← Chapter 35 Index  /  Realtime Scheduling API

sched_setscheduler(): Set Policy and Priority

sched_setscheduler() is the primary system call for changing both the scheduling policy and the priority of a process in one call.


#include <sched.h>

/* Changes scheduling policy AND priority of process 'pid'.
 * If pid = 0, operates on the calling process.
 * Returns 0 on success, -1 on error.
 * Note: Linux returns 0 on success (not the previous policy as SUSv3 requires).
 */
int sched_setscheduler(pid_t pid, int policy, const struct sched_param *param);

/* The sched_param structure */
struct sched_param {
    int sched_priority;  /* Scheduling priority */
                         /* For RT policies: 1 to 99 */
                         /* For SCHED_OTHER, BATCH, IDLE: must be 0 */
};
    

The policy Argument

The policy argument specifies which scheduling policy to apply:

Constant Description POSIX sched_priority
SCHED_FIFO Realtime FIFO 1–99
SCHED_RR Realtime Round-Robin 1–99
SCHED_OTHER Standard round-robin 0
SCHED_BATCH Batch (CPU-bound) Linux 2.6.16+ 0
SCHED_IDLE Idle (lowest priority) Linux 2.6.23+ 0
⚠️ Queue Position After Set: A successful call to sched_setscheduler() moves the process to the back of the queue for its new priority level. This is important: even if you don’t change the priority, just calling this function will put you at the end of the line.
📌 Inheritance: The scheduling policy and priority are inherited by children created via fork() and preserved across exec(). So a SCHED_FIFO process that calls exec() will still run as SCHED_FIFO after the new program starts, unless you use SCHED_RESET_ON_FORK (see below).

sched_setparam(): Change Priority Only

sched_setparam() changes only the priority of a process, leaving the scheduling policy unchanged. It is a subset of sched_setscheduler().


#include <sched.h>

/* Changes only the scheduling priority, not the policy.
 * Returns 0 on success, -1 on error.
 * Like sched_setscheduler(), this moves the process to the BACK
 * of its priority queue.
 */
int sched_setparam(pid_t pid, const struct sched_param *param);
    

Use this when you want to dynamically adjust priority during execution without changing the policy.

sched_getscheduler() and sched_getparam(): Read Policy and Priority


#include <sched.h>

/* Returns the scheduling policy of process 'pid'.
 * If pid = 0, returns the policy of the calling process.
 * Returns the policy constant on success, -1 on error.
 */
int sched_getscheduler(pid_t pid);

/* Retrieves the scheduling priority of process 'pid' into
 * the sched_param structure pointed to by 'param'.
 * Returns 0 on success, -1 on error.
 */
int sched_getparam(pid_t pid, struct sched_param *param);
    
✅ No privileges required to read: Both sched_getscheduler() and sched_getparam() can be called by an unprivileged process to read the scheduling information of any process on the system, regardless of credentials. This is unlike sched_setscheduler() which requires privileges for realtime policies.

Privilege Rules for Changing Scheduling

Changing realtime scheduling is a privileged operation. Here are the rules, organized by kernel version:

Before Kernel 2.6.12

  • Generally, a process must be privileged (CAP_SYS_NICE) to change to realtime policies.
  • Exception: an unprivileged process can switch another process to SCHED_OTHER if its effective UID matches the real or effective UID of the target.
  • An unprivileged process can only lower its own realtime priority (not raise it).

Kernel 2.6.12 and Later: RLIMIT_RTPRIO

Since kernel 2.6.12, RLIMIT_RTPRIO resource limit controls what unprivileged processes can do:

  • If RLIMIT_RTPRIO soft limit is nonzero: the process can freely change its realtime policy and priority, up to the maximum of (current RT priority, RLIMIT_RTPRIO soft limit).
  • If RLIMIT_RTPRIO soft limit is 0: the process can only lower its realtime priority or switch from realtime to non-realtime.
  • SCHED_IDLE is special: a SCHED_IDLE process cannot change its own policy regardless of RLIMIT_RTPRIO.
  • Changes to another process: allowed if the caller’s effective UID matches the real or effective UID of the target.

Permission Decision Flow

if (caller has CAP_SYS_NICE)
→ Can make ANY scheduling change
else if (caller is SCHED_IDLE)
→ DENIED (no self-changes allowed)
else if (RLIMIT_RTPRIO soft limit > 0)
→ Can change to RT, priority ≤ max(current_rt, rlimit)
else if (RLIMIT_RTPRIO soft limit == 0)
→ Can only LOWER RT priority or switch to non-RT
else
→ For target process: UID match required

SCHED_RESET_ON_FORK (Linux 2.6.32+)

SCHED_RESET_ON_FORK is a special flag value (not a separate policy). It is ORed with a policy constant when calling sched_setscheduler(). When this flag is set on a process, its children will not inherit the privileged scheduling policy.


/* Setting SCHED_RESET_ON_FORK flag */
struct sched_param sp;
sp.sched_priority = 25;

/* ORed with the policy */
sched_setscheduler(0, SCHED_FIFO | SCHED_RESET_ON_FORK, &sp);

/* Effect: This process runs as SCHED_FIFO priority 25.
 * Any child it creates via fork() will automatically be
 * reset to SCHED_OTHER (priority 0) — it does NOT inherit FIFO.
 */
    

What Gets Reset in Children

If parent has realtime policy (SCHED_RR or SCHED_FIFO):

Child’s policy is reset to SCHED_OTHER

If parent has negative nice value (high priority):

Child’s nice value is reset to 0

Why is this useful? It prevents fork bombs from exploiting realtime scheduling. Without this flag, a malicious SCHED_FIFO process could fork() thousands of children, each inheriting the realtime policy, overwhelming the system with realtime-priority processes that bypass the RLIMIT_RTTIME limits.

🔒 Security note: Once SCHED_RESET_ON_FORK is set, only a privileged process (CAP_SYS_NICE) can clear it. When a child is created, its reset-on-fork flag is always disabled (children don’t inherit the flag itself, only the reset behavior applies once at fork time).

💻 Code Example 1: Full Set and Get Scheduler


/* sched_set_get.c
 * Demonstrates setting and getting scheduling policies/priorities
 * for processes specified by PID on the command line.
 *
 * Compile: gcc sched_set_get.c -o sched_set_get
 * Run:     sudo ./sched_set_get f 25 $(pgrep sleep)
 *          (Sets a sleep process to SCHED_FIFO priority 25)
 */
#include <stdio.h>
#include <stdlib.h>
#include <sched.h>
#include <string.h>

const char *policy_str(int p)
{
    switch (p) {
    case SCHED_OTHER: return "SCHED_OTHER";
    case SCHED_FIFO:  return "SCHED_FIFO";
    case SCHED_RR:    return "SCHED_RR";
#ifdef SCHED_BATCH
    case SCHED_BATCH: return "SCHED_BATCH";
#endif
#ifdef SCHED_IDLE
    case SCHED_IDLE:  return "SCHED_IDLE";
#endif
    default:          return "UNKNOWN";
    }
}

int main(int argc, char *argv[])
{
    int j, pol;
    struct sched_param sp;

    if (argc < 4 || strchr("rfobi", argv[1][0]) == NULL) {
        fprintf(stderr,
            "Usage: %s policy priority pid...\n"
            "  policy: r=RR, f=FIFO, o=OTHER, b=BATCH, i=IDLE\n",
            argv[0]);
        exit(EXIT_FAILURE);
    }

    /* Parse policy letter */
    switch (argv[1][0]) {
    case 'r': pol = SCHED_RR;    break;
    case 'f': pol = SCHED_FIFO;  break;
    case 'o': pol = SCHED_OTHER; break;
#ifdef SCHED_BATCH
    case 'b': pol = SCHED_BATCH; break;
#endif
#ifdef SCHED_IDLE
    case 'i': pol = SCHED_IDLE;  break;
#endif
    default:  pol = SCHED_OTHER; break;
    }

    sp.sched_priority = atoi(argv[2]);

    /* Apply to each PID on command line */
    for (j = 3; j < argc; j++) {
        pid_t pid = (pid_t) atoi(argv[j]);

        /* Show current settings before change */
        int cur_pol = sched_getscheduler(pid);
        struct sched_param cur_sp;
        sched_getparam(pid, &cur_sp);
        printf("PID %-6d before: %-12s priority=%d\n",
               (int)pid, policy_str(cur_pol), cur_sp.sched_priority);

        /* Apply new policy and priority */
        if (sched_setscheduler(pid, pol, &sp) == -1) {
            perror("sched_setscheduler");
            continue;
        }

        /* Verify */
        cur_pol = sched_getscheduler(pid);
        sched_getparam(pid, &cur_sp);
        printf("PID %-6d after:  %-12s priority=%d\n\n",
               (int)pid, policy_str(cur_pol), cur_sp.sched_priority);
    }

    return EXIT_SUCCESS;
}
    

💻 Code Example 2: SCHED_RESET_ON_FORK Demo


/* reset_on_fork.c
 * Demonstrates SCHED_RESET_ON_FORK:
 * Parent runs as SCHED_FIFO. Child is automatically reset to SCHED_OTHER.
 *
 * Compile: gcc reset_on_fork.c -o reset_on_fork
 * Run:     sudo ./reset_on_fork
 */
#include <stdio.h>
#include <stdlib.h>
#include <sched.h>
#include <unistd.h>
#include <sys/wait.h>

#ifndef SCHED_RESET_ON_FORK
#define SCHED_RESET_ON_FORK 0x40000000
#endif

const char *policy_name(int p) {
    if (p == SCHED_OTHER) return "SCHED_OTHER";
    if (p == SCHED_FIFO)  return "SCHED_FIFO";
    if (p == SCHED_RR)    return "SCHED_RR";
    return "UNKNOWN";
}

int main(void)
{
    struct sched_param sp;
    int policy;
    pid_t child;

    /* Set this process to SCHED_FIFO | SCHED_RESET_ON_FORK */
    sp.sched_priority = sched_get_priority_min(SCHED_FIFO) + 5;

    if (sched_setscheduler(0, SCHED_FIFO | SCHED_RESET_ON_FORK, &sp) == -1) {
        perror("sched_setscheduler (need root)");
        exit(EXIT_FAILURE);
    }

    policy = sched_getscheduler(0);
    sched_getparam(0, &sp);
    printf("[Parent PID=%d] Policy=%s, Priority=%d\n",
           (int)getpid(), policy_name(policy & ~SCHED_RESET_ON_FORK),
           sp.sched_priority);
    printf("[Parent] SCHED_RESET_ON_FORK is SET\n\n");

    /* Fork: child will be automatically reset to SCHED_OTHER */
    child = fork();
    if (child == -1) {
        perror("fork");
        exit(EXIT_FAILURE);
    }

    if (child == 0) {
        /* CHILD PROCESS */
        policy = sched_getscheduler(0);
        sched_getparam(0, &sp);
        printf("[Child  PID=%d] Policy=%s, Priority=%d\n",
               (int)getpid(), policy_name(policy), sp.sched_priority);
        printf("[Child] Automatically reset — child is NOT SCHED_FIFO!\n");
        exit(EXIT_SUCCESS);
    } else {
        wait(NULL);
        printf("\n[Parent] Child has exited. Resetting parent to SCHED_OTHER.\n");
        sp.sched_priority = 0;
        sched_setscheduler(0, SCHED_OTHER, &sp);
    }

    return EXIT_SUCCESS;
}
    
Expected output:
[Parent PID=…] Policy=SCHED_FIFO, Priority=6
[Parent] SCHED_RESET_ON_FORK is SET
[Child PID=…] Policy=SCHED_OTHER, Priority=0
[Child] Automatically reset — child is NOT SCHED_FIFO!

🎯 Interview Questions

Q1. What is the difference between sched_setscheduler() and sched_setparam()?
sched_setscheduler() changes both the policy and the priority together. sched_setparam() changes only the priority, leaving the current policy unchanged. Use sched_setparam() when you just want to adjust priority dynamically without changing the type of scheduling.
Q2. What effect does a successful sched_setscheduler() call have on the process’s queue position?
A successful call moves the process to the back of the queue for its priority level. This happens even if you don’t actually change the policy or priority — the act of calling sched_setscheduler() resets the queue position.
Q3. What is RLIMIT_RTPRIO and what does it enable?
RLIMIT_RTPRIO (added in kernel 2.6.12) is a per-process resource limit that allows unprivileged processes to use realtime scheduling. If the soft limit is nonzero, the process can freely change its realtime policy and priority, with the constraint that the maximum priority cannot exceed the greater of its current realtime priority and the RLIMIT_RTPRIO soft limit value.
Q4. What is SCHED_RESET_ON_FORK and why is it a security feature?
SCHED_RESET_ON_FORK is a flag ORed with a scheduling policy. When set, children created by fork() do not inherit the realtime policy (they get SCHED_OTHER) and don’t inherit negative nice values (they get 0).

Security value: It prevents fork bomb attacks where a realtime process creates many children each inheriting the realtime policy, overwhelming the system with high-priority processes that bypass RLIMIT_RTTIME limits.

Q5. Linux’s sched_setscheduler() return value deviates from SUSv3 on success. What is the difference?
SUSv3 specifies that a successful call should return the previous scheduling policy. However, Linux returns 0 on success (like most other system calls). A portable application should check for success by verifying the return value is not -1, rather than using it as the previous policy value.
Q6. Can an unprivileged process read the scheduling policy of another process?
Yes. sched_getscheduler() and sched_getparam() can be called by any unprivileged process to read the scheduling information of any process on the system. No credential checking is performed for read operations. Only write operations (setting policy/priority) require privileges.

Leave a Reply

Your email address will not be published. Required fields are marked *