Why epoll Semantics Matter
epoll has some behaviors that feel surprising at first. For example, closing a file descriptor does not always remove it from epoll. You can watch an fd that you have already closed. Two processes can share the same epoll instance. Understanding why these things happen requires understanding the difference between file descriptors and open file descriptions.
This tutorial explains those concepts from the ground up, then shows you the exact scenarios where epoll behaves in ways you might not expect.
Before we can understand epoll semantics, we must understand how the Linux kernel represents open files. There are three layers:
|
Process A โ fd table
fd 0 (stdin)
fd 3 โ OFD #1
fd 4 โ OFD #1 (dup of fd3)
fd 5 โ OFD #2
|
โ
|
Open File Descriptions (OFD)
OFD #1
offset, flags, ref_count=2 โ i-node #10 OFD #2
offset, flags, ref_count=1 โ i-node #20 |
โ
|
i-node Table
i-node #10
type, permissions, size i-node #20
type, permissions, size |
The key point: fd 3 and fd 4 both point to OFD #1 (after a dup()). They are two different numbers in the fd table but they reference the same underlying open file.
The critical rule
epoll monitors open file descriptions (OFDs), not file descriptor numbers. When you call epoll_ctl(EPOLL_CTL_ADD, fd, ...), the kernel records both the fd number AND a reference to the OFD it points to. The kernel watches the OFD.
An OFD is removed from the epoll interest list only when all file descriptors that reference that OFD have been closed.
This is the example from the TLPI book. It shows behavior that catches many developers off guard.
/*
* What happens when you close fd1 after calling dup() on it,
* then call epoll_wait()?
*
* Spoiler: epoll still reports fd1 as ready even though fd1 is closed!
*/
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/epoll.h>
#define MAX_EVENTS 5
int main(void)
{
/* Open a FIFO (assume writer exists) */
int fd1 = open("myfifo", O_RDONLY | O_NONBLOCK);
if (fd1 == -1) { perror("open"); exit(1); }
/* Create epoll instance */
int epfd = epoll_create1(0);
if (epfd == -1) { perror("epoll_create1"); exit(1); }
/* Add fd1 to the interest list */
struct epoll_event ev;
ev.events = EPOLLIN;
ev.data.fd = fd1; /* we store fd1 here as a tag */
if (epoll_ctl(epfd, EPOLL_CTL_ADD, fd1, &ev) == -1) {
perror("epoll_ctl"); exit(1);
}
/* Suppose the FIFO now has data ready (a writer wrote something) */
/* Create a duplicate of fd1 */
int fd2 = dup(fd1); /* fd2 points to the SAME open file description */
printf("fd1=%d fd2=%d\n", fd1, fd2);
/* Close fd1 */
close(fd1);
/* At this point: fd1 is closed, but fd2 still refers to the same OFD.
The OFD is NOT removed from epoll because fd2 still holds a reference. */
/* Now call epoll_wait */
struct epoll_event evlist[MAX_EVENTS];
int ready = epoll_wait(epfd, evlist, MAX_EVENTS, 1000); /* 1 sec timeout */
if (ready > 0) {
/* This WILL fire, reporting fd1 as ready โ even though fd1 is closed! */
printf("Ready: %d\n", ready);
printf("Reported fd: %d\n", evlist[0].data.fd); /* prints original fd1 value */
printf("BUT fd1 is already closed. Reading via fd2 instead.\n");
/* To actually read, you must use fd2 */
char buf[64];
int n = read(fd2, buf, sizeof(buf) - 1);
if (n > 0) {
buf[n] = '\0';
printf("Read: %s\n", buf);
}
}
close(fd2); /* NOW the OFD has zero references โ removed from epoll */
close(epfd);
return 0;
}
epoll_ctl(EPOLL_CTL_ADD, fd1, ...). Kernel records: fd1 number + reference to OFD-X.dup(fd1). Now fd2 also points to OFD-X. OFD-X reference count = 2.close(fd1). OFD-X reference count drops to 1. OFD-X is NOT destroyed.ev.data.fd.close(fd2) does OFD-X reference count reach 0, and epoll removes it.When a process calls fork(), the child gets a copy of the parent’s file descriptor table. Every fd in the parent is duplicated in the child. This includes the epoll fd itself.
Since the child’s epoll fd points to the same epoll open file description as the parent’s, they share the exact same interest list and ready list.
/*
* fork_epoll.c
* Shows that parent and child share the epoll interest list after fork().
*
* Compile: gcc -o fork_epoll fork_epoll.c
*/
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/epoll.h>
#include <sys/wait.h>
#include <fcntl.h>
int main(void)
{
/* Create epoll instance BEFORE fork */
int epfd = epoll_create1(0);
if (epfd == -1) { perror("epoll_create1"); exit(1); }
/* Open a pipe */
int pipefd[2];
pipe(pipefd); /* pipefd[0]=read end, pipefd[1]=write end */
/* Add read end to epoll */
struct epoll_event ev;
ev.events = EPOLLIN;
ev.data.fd = pipefd[0];
epoll_ctl(epfd, EPOLL_CTL_ADD, pipefd[0], &ev);
pid_t pid = fork();
if (pid == 0) {
/* CHILD */
/* Child inherits epfd and pipefd[0].
Both parent and child now watch the SAME pipe read-end
through the SAME epoll instance. */
close(pipefd[1]); /* child does not write */
struct epoll_event evlist[5];
printf("Child: waiting on epoll_wait...\n");
int ready = epoll_wait(epfd, evlist, 5, 5000);
if (ready > 0)
printf("Child: got event on fd %d\n", evlist[0].data.fd);
else
printf("Child: timeout\n");
close(epfd);
close(pipefd[0]);
exit(0);
} else {
/* PARENT */
close(pipefd[0]); /* parent does not read via epoll */
sleep(1); /* let child reach epoll_wait */
/* Parent writes to the pipe โ child's epoll_wait will wake up */
write(pipefd[1], "hello", 5);
printf("Parent: wrote to pipe\n");
close(pipefd[1]);
wait(NULL);
}
close(epfd);
return 0;
}
|
Parent Process
epfd โ epoll OFD (interest list)
pipefd[0] โ pipe read OFD
|
fork()
โ
Shared OFDs
epoll interest list pipe read-end |
Child Process
epfd (copy) โ same epoll OFD
pipefd[0] (copy) โ same pipe OFD
|
Practical consequence of fork() + epoll
Both parent and child can call epoll_wait() on the same epoll instance. When an event occurs, only one of them will receive it (the kernel delivers to one waiter). This is actually used in some server designs where multiple worker processes share a listening socket’s epoll instance.
However, this sharing also means: if the child closes its epfd, the parent’s interest list is still intact (because the OFD still exists). The interest list disappears only when both parent and child close their copies of epfd.
When you call epoll_create(), the kernel creates:
(this OFD holds the interest list and ready list)
(points to the OFD above)
Because the interest list is tied to the OFD (not the fd number), if you dup() the epfd, both the original and the duplicate epfd give you access to the same interest list. You can call epoll_wait() using either.
/* Demonstrate that dup'd epfd shares the same interest list */
int epfd = epoll_create1(0);
int epfd2 = dup(epfd); /* duplicate the epoll fd */
/* Add a socket to the interest list using epfd */
struct epoll_event ev;
ev.events = EPOLLIN;
ev.data.fd = sockfd;
epoll_ctl(epfd, EPOLL_CTL_ADD, sockfd, &ev);
/* Now wait using epfd2 โ it sees the same interest list! */
struct epoll_event evlist[5];
int ready = epoll_wait(epfd2, evlist, 5, -1);
/* This will return when sockfd has data, even though we added it via epfd */
Similarly, you can modify or remove items from the interest list using either the original epfd or the duplicated epfd2. They are just two handles to the same underlying data structure.
The common misconception: “closing an fd removes it from epoll”. The truth is more nuanced.
Epoll keeps watching it
Events still fire
Epoll interest list entry removed
No more events
/*
* Illustrating when an OFD is truly removed from epoll
*/
int fd1 = open("myfile", O_RDONLY);
/* Add to epoll */
struct epoll_event ev = { .events = EPOLLIN, .data.fd = fd1 };
epoll_ctl(epfd, EPOLL_CTL_ADD, fd1, &ev);
/* Case 1: No dup โ closing fd1 removes OFD from epoll */
close(fd1); /* OFD refcount goes to 0 โ removed from epoll */
/* ================================================================ */
int fd1 = open("myfile", O_RDONLY);
int fd2 = dup(fd1);
epoll_ctl(epfd, EPOLL_CTL_ADD, fd1, &ev);
/* Case 2: fd2 still open โ closing fd1 does NOT remove OFD */
close(fd1); /* OFD refcount drops from 2 to 1 โ still in epoll */
/* epoll_wait() can still fire for this OFD! */
/* To actually remove from epoll: first use EPOLL_CTL_DEL */
epoll_ctl(epfd, EPOLL_CTL_DEL, fd2, NULL); /* or fd1 while still open */
close(fd2); /* now OFD refcount = 0 */
Best practice: Always call epoll_ctl(EPOLL_CTL_DEL, ...) explicitly before closing an fd, especially in programs that use dup() or fork(). Do not rely on close() to clean up epoll entries.
| Situation | What happens to epoll entry |
|---|---|
| close(fd) โ no duplicates exist | OFD destroyed โ entry removed from interest list |
| close(fd) โ dup(fd) was called earlier | OFD survives โ entry stays in interest list, events still fire |
| close(fd) โ after fork() | Child still holds fd โ OFD survives, entry remains |
| dup(epfd) โ duplicate the epoll fd itself | Both fds see the same interest list and ready list |
| fork() โ epoll fd is inherited | Parent and child share the same epoll instance |
| EPOLL_CTL_DEL then close(fd) | Entry explicitly removed first โ safe and predictable |
This shows the recommended way to remove an fd from epoll before closing it, even when you are not using dup().
/*
* safe_epoll_remove.c
* Best practice: always EPOLL_CTL_DEL before closing an fd.
*
* Compile: gcc -o safe_epoll_remove safe_epoll_remove.c
*/
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/epoll.h>
#define MAX_FDS 64
#define MAX_EVENTS 16
/* Wrapper to safely remove an fd from epoll and close it */
void remove_and_close(int epfd, int fd)
{
/* Remove from epoll first */
if (epoll_ctl(epfd, EPOLL_CTL_DEL, fd, NULL) == -1)
perror("epoll_ctl DEL"); /* log but continue */
/* Now close the fd */
if (close(fd) == -1)
perror("close");
printf(" Removed and closed fd %d\n", fd);
}
int main(int argc, char *argv[])
{
if (argc < 2) {
fprintf(stderr, "Usage: %s file1 [file2 ...]\n", argv[0]);
return 1;
}
int epfd = epoll_create1(EPOLL_CLOEXEC);
if (epfd == -1) { perror("epoll_create1"); return 1; }
int numFds = 0;
for (int i = 1; i < argc && numFds < MAX_FDS; i++) {
int fd = open(argv[i], O_RDONLY | O_NONBLOCK);
if (fd == -1) { perror("open"); continue; }
struct epoll_event ev;
ev.events = EPOLLIN | EPOLLHUP | EPOLLERR;
ev.data.fd = fd;
if (epoll_ctl(epfd, EPOLL_CTL_ADD, fd, &ev) == -1) {
perror("epoll_ctl ADD");
close(fd);
continue;
}
printf("Watching fd %d (%s)\n", fd, argv[i]);
numFds++;
}
struct epoll_event evlist[MAX_EVENTS];
char buf[512];
while (numFds > 0) {
int ready = epoll_wait(epfd, evlist, MAX_EVENTS, 5000);
if (ready == -1) {
perror("epoll_wait");
break;
}
if (ready == 0) {
printf("Timeout โ no events in 5 seconds\n");
break;
}
for (int j = 0; j < ready; j++) {
int cur_fd = evlist[j].data.fd;
if (evlist[j].events & EPOLLIN) {
ssize_t n = read(cur_fd, buf, sizeof(buf) - 1);
if (n > 0) {
buf[n] = '\0';
printf("fd=%d: read %zd bytes: %s\n", cur_fd, n, buf);
}
}
if (evlist[j].events & (EPOLLHUP | EPOLLERR)) {
printf("fd=%d: hangup or error\n", cur_fd);
remove_and_close(epfd, cur_fd);
numFds--;
}
}
}
close(epfd);
printf("Done.\n");
return 0;
}
A file descriptor is a small integer in a per-process table that acts like a handle. An open file description is a kernel data structure (in the system-wide open file table) that holds the actual state: file offset, flags, and a pointer to the i-node. Multiple file descriptors (even in different processes) can point to the same open file description, which is what happens after dup() or fork().
Yes. epoll monitors the open file description, not the fd number. After dup(), fd2 also references the same open file description. Closing fd1 reduces the OFD’s reference count from 2 to 1. The OFD is still alive because fd2 holds a reference, so epoll continues to monitor it and can still report events. The entry is removed only when fd2 is also closed.
After fork(), both parent and child hold file descriptors that point to the same epoll open file description. This means they share the same interest list and ready list. Either process can call epoll_wait() and will receive events. If both call epoll_wait() simultaneously, an event is delivered to only one of them (the kernel wakes one waiter). This sharing lasts until all fds referencing the epoll OFD are closed across both processes.
It creates a new in-memory i-node (not backed by a filesystem), a new open file description linked to that i-node, and a file descriptor in the calling process pointing to that OFD. The interest list and ready list are stored in the OFD, not in the fd. This is why the interest list is shared across duplicate fds.
Always call epoll_ctl(epfd, EPOLL_CTL_DEL, fd, NULL) explicitly before calling close(fd). Do not rely on close() to automatically remove the fd, because if there are duplicate fds (from dup() or fork()), the OFD remains alive and epoll continues to monitor it even after you close one fd.
Yes. Because the interest list and ready list are tied to the open file description (not the fd number), any fd that references the same epoll OFD โ whether original or a duplicate โ can be used to call epoll_wait() and will see the same events.
EPOLL_CLOEXEC is a flag for epoll_create1(). It marks the epoll fd with the close-on-exec flag, so the fd is automatically closed when the process calls exec(). This prevents accidental fd leaks into child programs started with exec(). It is equivalent to calling fcntl(epfd, F_SETFD, FD_CLOEXEC) after creation, but is atomic and avoids race conditions in multi-threaded programs.
No, the parent’s interest list is completely unaffected. The child closing its copy of epfd reduces the epoll OFD’s reference count by 1. But as long as the parent still holds its own copy of epfd (pointing to the same OFD), the OFD stays alive and the interest list remains intact. The interest list disappears only when all file descriptors referencing the epoll OFD have been closed across all processes.
Explore more tutorials on Linux Programming, BLE, and Embedded Systems
