Shared Libraries β€” Overview The Linux Programming Interface

 

πŸ“š Shared Libraries β€” Overview
Chapter 41.3 | The Linux Programming Interface
Topic
Shared vs Static
Level
Intermediate
Part
1 of 3

πŸ”‘ Key Concepts

Shared Library Static Library Dynamic Linking Virtual Memory Symbol Relocation ELF Format .so file Run-time Loading

What Is a Shared Library?

When you write a C program and use functions like printf() or malloc(), those functions live in a library. Linux supports two kinds of libraries: static libraries (.a files) and shared libraries (.so files). Understanding the difference between them is fundamental to Linux system programming.

A static library embeds a copy of the library’s object code directly into your program’s executable at link time. A shared library keeps the code outside the executable β€” it is loaded from disk into memory at run time, and many programs can share the same in-memory copy simultaneously.

⚠️ Problems with Static Libraries

Before shared libraries existed, all programs were statically linked. This means the linker copies the required object modules directly into the final executable binary. This approach works, but creates three major problems when many programs on the same system use the same library code:

πŸ” Static Linking β€” Each Program Gets Its Own Copy
prog_A
libc copy
libm copy
prog_B
libc copy
libm copy
prog_C
libc copy
libm copy
β†’
WASTE!
3Γ— disk space
3Γ— RAM used
relink for bug fix

Problem 1 β€” Disk Space Waste: Every program that uses libm gets a full embedded copy of libm‘s object code in its binary. On a system running 100 programs that all use libm, you store 100 identical copies on disk.

Problem 2 β€” Memory (RAM) Waste: When those 100 programs run simultaneously, each one loads its own private copy of the library code into virtual memory. Modern Linux systems may run dozens of processes using libc simultaneously. With static linking, each occupies its own chunk of RAM.

Problem 3 β€” Bug Fix Nightmare: If a security vulnerability is found in libssl, every statically linked program must be recompiled and redistributed. The system administrator must track which programs used that library. This is impractical at scale.

βœ… How Shared Libraries Solve These Problems

A shared library stores the object modules in a single .so file on disk. When a program starts, the Linux dynamic linker/loader (ld.so or ld-linux.so) maps the shared library into the process’s virtual address space. If another process also needs the same library, the kernel maps the same physical memory pages into that process’s address space too β€” no duplication.

βœ… Shared Library β€” One Copy, Many Users
prog_A
references only
prog_B
references only
prog_C
references only
β†’
libfoo.so
ONE copy in RAM
shared by all
Update once β†’ all benefit
Key insight: The code (text segment) is shared. But each process gets its own private copies of the library’s global and static variables. The library’s data is never shared β€” only the executable instructions are.

Advantage 1 β€” Smaller executables on disk: Programs only store a reference to libfoo.so, not its content. This saves considerable disk space.

Advantage 2 β€” Less RAM consumed: The kernel shares physical memory pages of the library’s code among all processes. 100 programs using libc means only one copy of libc‘s code in physical RAM.

Advantage 3 β€” Easy updates: Replace libssl.so with a patched version and all programs automatically use the new version the next time they start β€” no recompilation or relinking needed.

Advantage 4 β€” Faster startup (in some cases): If the shared library is already loaded in memory by another running program, the new program can start faster because the library pages are already cached. However, the very first program to load a library pays the I/O cost of loading it from disk.

βš–οΈ The Costs of Shared Libraries

Shared libraries are not free. They bring real complexity and small performance overheads that every system programmer should understand:

Cost Explanation
Complexity Building shared libraries requires understanding compiler flags, versioning, and the dynamic linker. Static libraries are much simpler to create and use.
Position-Independent Code (PIC) Shared library code must be compiled with -fPIC so it can load at any virtual address. PIC uses an extra register (the GOT pointer) which has a small performance overhead on 32-bit x86 architectures.
Run-time relocation When a shared library is loaded, the dynamic linker must resolve all symbol references (function calls, global variables) to their actual run-time addresses. This adds a small startup delay compared to static linking.
“Dependency hell” Programs depend on specific library versions. If the wrong version is present, the program may fail to start.

β˜• Bonus: Shared Libraries and Java Native Interface (JNI)

Shared libraries are also the foundation for the Java Native Interface (JNI). JNI lets Java code call C functions that are packaged inside a shared library. This is how Java programs can access operating-system-specific features that are not available in pure Java. For example, accessing device drivers, calling ioctl(), or using platform-specific APIs all go through JNI, which loads a .so file at run time.

Note: Shared libraries on Linux end in .so. On macOS they end in .dylib. On Windows they are called DLLs (Dynamic-Link Libraries) with the .dll extension. The concept is the same, but the mechanics differ.

πŸ’» Coding Examples

Example 1 β€” Seeing the Difference: Static vs Shared Linking

This example shows how to observe the size difference between a statically and dynamically linked executable, and confirms that a shared library is not embedded in the binary.

Step 1: Create a simple library source file (mylib.c)

/* mylib.c β€” a tiny library with two functions */
#include <stdio.h>

void greet(void) {
    printf("Hello from the shared library!\n");
}

int add(int a, int b) {
    return a + b;
}

Step 2: Create a program that uses the library (main.c)

/* main.c β€” uses functions from mylib */
#include <stdio.h>

/* Declarations of the library functions */
void greet(void);
int add(int a, int b);

int main(void) {
    greet();
    printf("3 + 4 = %d\n", add(3, 4));
    return 0;
}

Step 3: Build as a STATIC library and link

# Compile object file (no -fPIC needed for static)
gcc -c -Wall mylib.c -o mylib.o

# Create static archive
ar rcs libmylib.a mylib.o

# Link statically: library code IS embedded in prog_static
gcc -Wall main.c -L. -lmylib -static -o prog_static

# Check size: this will be large (includes all of libc too!)
ls -lh prog_static

Step 4: Build as a SHARED library and link

# Compile with Position-Independent Code (-fPIC)
gcc -c -Wall -fPIC mylib.c -o mylib_pic.o

# Create shared library
gcc -shared -o libmylib.so mylib_pic.o

# Link dynamically: prog_shared only holds a reference
gcc -Wall main.c -L. -lmylib -o prog_shared

# Set library path and run
export LD_LIBRARY_PATH=.
./prog_shared

# Compare binary sizes β€” prog_shared will be MUCH smaller
ls -lh prog_static prog_shared

Step 5: Inspect what the dynamic binary depends on

# ldd shows which shared libraries a program needs at run time
ldd prog_shared
# Output example:
#   libmylib.so => ./libmylib.so (0x00007f...)
#   libc.so.6   => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f...)

# ldd on static binary shows no shared deps
ldd prog_static
# Output: not a dynamic executable (or empty list)
Example 2 β€” Verifying Code is Shared But Data is Private

This example proves the fundamental rule: shared library code is shared, but each process gets its own copy of the library’s global/static variables.

/* counter_lib.c β€” library with a global counter */
#include <stdio.h>

/* This global variable is NOT shared between processes.
   Each process gets its own private copy. */
static int call_count = 0;

void increment(void) {
    call_count++;
    printf("PID %d: call_count is now %d\n", getpid(), call_count);
}

int get_count(void) {
    return call_count;
}
/* test_counter.c β€” two processes use the same .so */
#include <stdio.h>
#include <unistd.h>
#include <sys/wait.h>

void increment(void);
int get_count(void);

int main(void) {
    pid_t pid = fork();

    if (pid == 0) {
        /* Child process β€” increments its OWN counter */
        increment();  /* child: count = 1 */
        increment();  /* child: count = 2 */
        printf("Child final count: %d\n", get_count());
        return 0;
    } else {
        /* Parent process β€” increments its OWN counter */
        increment();  /* parent: count = 1 */
        wait(NULL);
        printf("Parent final count: %d\n", get_count());
        /* Parent sees 1, not 3 β€” each has its own copy of call_count */
    }
    return 0;
}
# Build and run
gcc -c -fPIC -Wall counter_lib.c -o counter_lib.o
gcc -shared -o libcounter.so counter_lib.o
gcc -Wall test_counter.c -L. -lcounter -o test_counter
export LD_LIBRARY_PATH=.
./test_counter

# Expected output (PIDs will differ):
# PID 1234: call_count is now 1    (child)
# PID 1234: call_count is now 2    (child)
# Child final count: 2
# PID 1233: call_count is now 1    (parent)
# Parent final count: 1
# Parent count is 1, NOT 3 β€” data is private per process!

🎯 Interview Questions

Q1. What are the three main disadvantages of static libraries compared to shared libraries?

Answer: (1) Disk waste β€” each executable carries its own embedded copy of the library code, so N programs using the same library means N copies on disk. (2) Memory waste β€” when running simultaneously, each process holds its own copy of the code in virtual memory, multiplying RAM usage. (3) Maintenance burden β€” any bug fix in the library requires recompiling and relinking every program that uses it, and the admin must track all affected programs.

Q2. When a shared library is loaded, is the library’s global variable data shared between processes?

Answer: No. Only the code (text segment) of a shared library is shared between processes at the physical memory level. Each process that loads the library gets its own private copies of any global and static variables defined in the library. This is achieved through copy-on-write memory mapping.

Q3. If a shared library is used by 50 processes simultaneously, how many times is its code loaded into physical RAM?

Answer: Once. The kernel maps the same physical memory pages (containing the library’s code) into all 50 process virtual address spaces. This is the core efficiency of shared libraries. The processes each have their own virtual mapping to the same physical pages.

Q4. What is “run-time symbol relocation” and why does it add overhead?

Answer: When a shared library is loaded at run time, the dynamic linker must resolve all symbolic references β€” function calls and global variable accesses β€” to their actual memory addresses in the current process’s virtual address space. The library can load at different addresses each time, so addresses cannot be hard-coded at compile time. This resolution work is called relocation and happens just before the program begins executing, causing a small startup delay compared to static linking where all addresses are fixed at link time.

Q5. What is the file naming convention for shared libraries on Linux?

Answer: Shared library filenames use the prefix lib and the suffix .so (shared object). For example: libfoo.so, libssl.so, libc.so. Full versioned names look like libfoo.so.1.2, with a soname of libfoo.so.1 and a symbolic link chain for version management.

Q6. What command shows you which shared libraries a program depends on at run time?

Answer: The ldd command lists all shared library dependencies of an executable along with the paths where they will be found and their load addresses. Example: ldd /bin/ls. Note: never use ldd on untrusted executables as it may execute the binary in a special mode.

Leave a Reply

Your email address will not be published. Required fields are marked *