UNIX fork+exec vs spawn()

UNIX fork+exec vs spawn()
Topic 1 → Subtopic 3  |  Why Linux separates creation from execution
Topic 1
Subtopic 3
Design
Philosophy
3
Code Examples

Two Different Design Philosophies

When you need to run a new program, there are two broad approaches. Some operating systems (like Windows) provide a single call — spawn() — that creates a new process and loads a new program in one shot. UNIX/Linux deliberately separates these into two steps: fork() then execve(). Understanding why reveals a lot about the UNIX design philosophy.

Keywords:

spawn() posix_spawn() fork + exec MMU embedded systems pipe() dup2() between fork and exec

⚖ Side-by-Side Comparison
UNIX Way: fork() + exec()
✓ Two separate steps
✓ fork() takes no arguments (simple API)
✓ Can do work between fork and exec
✓ fork() without exec is useful alone
✓ Flexible I/O redirection before exec
✓ Change UID/GID before exec
✓ Works elegantly on all Unix systems
spawn() Approach (Windows / embedded)
✓ One combined step
✓ Needed without MMU (embedded)
✗ Complex API (many parameters)
✗ No flexibility between create and exec
✗ Hard to set up pipes/redirection
✗ Can’t use fork() for parallelism alone
✓ posix_spawn() portable alternative

💡 What You Can Do Between fork() and exec()

The real power of the UNIX approach is everything you can do after fork() but before exec(). The child can set up the environment exactly how the new program needs it, then exec():

🔄 I/O Redirection
Redirect stdin/stdout to files or pipes using dup2() before exec
🔓 Change User/Group
Call setuid()/setgid() to drop/change privileges before exec
🏴 Working Directory
Call chdir() to set the child’s working directory before exec
🔕 Close file descriptors
Close sensitive fds so the new program can’t access them
🔌 Set up pipes
Create pipe(), wire it to stdin/stdout before exec
🎕 Signal masks
Reset signal dispositions and masks before exec

💻 Code Example 1: I/O Redirection Between fork() and exec()

This is how a shell implements ls > output.txt: fork a child, redirect its stdout to a file, then exec ls.

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/wait.h>

int main(void)
{
    pid_t pid;
    int fd;

    pid = fork();
    if (pid == -1) { perror("fork"); exit(1); }

    if (pid == 0) {
        /* CHILD: set up I/O redirection BEFORE exec */

        /* Open file for writing (creates if not exists) */
        fd = open("output.txt",
                  O_WRONLY | O_CREAT | O_TRUNC, 0644);
        if (fd == -1) { perror("open"); exit(1); }

        /* Redirect stdout (fd 1) to the file */
        if (dup2(fd, STDOUT_FILENO) == -1) {
            perror("dup2"); exit(1);
        }
        close(fd);  /* no longer needed after dup2 */

        /* Now exec ls: its output goes to output.txt */
        char *argv[] = { "ls", "-l", "/tmp", NULL };
        execvp("ls", argv);
        perror("execvp"); exit(1);

    } else {
        wait(NULL);
        printf("Done. Check output.txt for ls results.\n");
    }

    return 0;
}
This is impossible with spawn()! With spawn, you’d need complex parameters to specify redirections. With fork+exec, it’s just file descriptor manipulation.

💻 Code Example 2: chdir() + setenv() Between fork() and exec()
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>

int main(void)
{
    pid_t pid = fork();
    if (pid == -1) { perror("fork"); exit(1); }

    if (pid == 0) {
        /* CHILD: customize environment before exec */

        /* 1. Change working directory */
        if (chdir("/tmp") == -1) {
            perror("chdir"); exit(1);
        }
        printf("[Child] Working dir changed to /tmp\n");

        /* 2. Set an environment variable */
        if (setenv("MY_VAR", "hello_from_child", 1) == -1) {
            perror("setenv"); exit(1);
        }

        /* 3. Now exec: new program inherits /tmp as cwd
                       and MY_VAR in its environment */
        char *argv[] = { "env", NULL };
        execvp("env", argv);   /* 'env' prints all env vars */
        perror("execvp"); exit(1);

    } else {
        wait(NULL);
        printf("[Parent] Child has finished.\n");
    }

    return 0;
}

📄 posix_spawn() — The Portable Alternative

POSIX defines posix_spawn() which combines fork+exec into one call. It exists mainly for embedded systems without an MMU where fork() is hard to implement. On regular Linux, fork+exec is preferred.

#include <stdio.h>
#include <stdlib.h>
#include <spawn.h>
#include <sys/wait.h>

extern char **environ;

int main(void)
{
    pid_t pid;
    int status;

    /* argv for the new program */
    char *argv[] = { "ls", "-l", "/tmp", NULL };

    /* posix_spawn: one call to create + exec */
    int ret = posix_spawn(&pid, "/bin/ls",
                          NULL,    /* file actions (I/O setup) */
                          NULL,    /* spawn attributes         */
                          argv,    /* argument list            */
                          environ  /* environment              */);
    if (ret != 0) {
        fprintf(stderr, "posix_spawn failed: %d\n", ret);
        exit(1);
    }

    printf("Spawned ls with PID = %d\n", (int)pid);
    waitpid(pid, &status, 0);
    printf("ls exited with: %d\n", WEXITSTATUS(status));

    return 0;
}
/* Compile: gcc -o spawn_demo spawn_demo.c */
When to use posix_spawn(): Embedded RTOS environments, cross-platform code that must also run on systems without fork(), or when policy mandates POSIX portability. On standard Linux desktop/server, prefer fork()+exec().

🅾 Interview Questions
Q1: Why does UNIX separate fork() and exec() instead of combining them?

Because the child can set up its environment (I/O redirection, user IDs, signal masks, working directory) between fork() and exec(). This gives enormous flexibility. Also, fork() alone (without exec) is useful for parallel processing. The combined spawn() approach requires complex APIs to handle all these cases.

Q2: How does a shell implement the “>” redirection operator?

The shell calls fork(). In the child, it opens the output file and uses dup2() to redirect STDOUT_FILENO to the file descriptor. Then it calls exec() to run the command. Since exec() inherits open file descriptors, the command’s output goes to the file.

Q3: What is posix_spawn() and on which systems is it important?

posix_spawn() is a POSIX function combining fork+exec. It is important on embedded systems without an MMU (Memory Management Unit), where traditional fork() is impossible because there is no virtual memory support. Examples include some microcontrollers running POSIX-like RTOS environments.

Q4: Can you change the UID of a process before exec()? How?

Yes. After fork(), in the child, call setuid(new_uid) or setgid(new_gid) before calling exec(). The new program will run with the changed credentials. This is how setuid programs and privilege-dropping daemons work. The caller must have appropriate permissions to change UID.

Q5: How does a shell implement pipes (cmd1 | cmd2)?

The shell calls pipe() to get a read/write fd pair. It forks two children. In child 1 (cmd1), it uses dup2() to redirect stdout to the write-end of the pipe, then closes the read-end, then execs cmd1. In child 2 (cmd2), it uses dup2() to redirect stdin from the read-end, closes the write-end, then execs cmd2. This is only possible because fork+exec allows setup between the two steps.

Series Navigation
Topic 1 → Subtopic 3 of 3  |  Next: Topic 2 → fork() in Depth

← Previous Next: fork() How It Works → Index

Leave a Reply

Your email address will not be published. Required fields are marked *