calloc(), realloc() and Aligned Memory: The Complete Guide

From zeroed sensor arrays to DMA-aligned buffers — master the full C heap allocation toolkit with practical embedded and Linux examples

⏱️ 13 min read

🎯 Intermediate

💻 Linux / Embedded C

Beyond malloc(): Why You Need calloc() and realloc()

Once you go beyond simple single-variable allocations, three functions become essential: calloc() for allocating and zeroing arrays, realloc() for resizing existing allocations, and memalign()/posix_memalign() for obtaining memory at specific alignment boundaries. Each solves a real problem that malloc() alone cannot.

Embedded engineers care especially about alignment: DMA engines, SIMD instruction sets, and certain peripherals require buffers to start at addresses that are multiples of a specific power-of-two. Using a misaligned buffer with DMA causes undefined hardware behavior or a hard fault.

🔑 Topics Covered

calloc() zeroed allocation realloc() dynamic resize memalign() power-of-two posix_memalign() DMA-aligned buffer Integer overflow guard Growing dynamic arrays Embedded memory patterns

1. calloc() — Allocate and Zero in One Step

#include <stdlib.h>

void *calloc(size_t numitems, size_t size);
/* Allocates numitems * size bytes, all zeroed */
/* Returns pointer on success, NULL on failure */

calloc() takes the number of items and the size of each item separately, which gives it an important advantage: it can detect integer overflow when computing the total size. If numitems * size would overflow a size_t, calloc() returns NULL rather than silently allocating a tiny buffer for a large number of items — a class of vulnerability that has caused real-world security bugs when using malloc(numitems * size) directly.

The other key property: the returned memory is guaranteed to be zero-initialized. Unlike malloc(), you do not need a separate memset() call.

Property	malloc(n)	calloc(items, size)
Initial content	Uninitialized (garbage)	All bytes set to 0
Overflow detection	None (you compute size)	Internal overflow check
Typical use	Single objects, known size	Arrays, structs that must start zeroed
Deallocation	free()	free() — same mechanism

💡 Example 1: Zeroed ADC Sample Buffer for Signal Processing

In embedded signal processing, you often need to initialize an ADC (Analog-to-Digital Converter) sample buffer to zero before filling it — partly for safety and partly because averaging algorithms can misbehave if the initial state contains garbage. calloc() is perfect here.

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

#define ADC_CHANNELS  8
#define SAMPLES_PER_CHANNEL  256

typedef struct {
    uint16_t  raw[SAMPLES_PER_CHANNEL];   /* ADC raw counts, 0..4095 */
    float     voltage[SAMPLES_PER_CHANNEL]; /* converted values */
    uint8_t   channel_id;
    uint8_t   valid;                       /* 0 = empty, 1 = filled */
} AdcChannelBuffer;

int main(void)
{
    /*
     * calloc() allocates 8 * sizeof(AdcChannelBuffer) bytes,
     * all zeroed. Every raw[], voltage[], and flag starts at 0.
     * No need for memset() — safe to use immediately as "empty".
     */
    AdcChannelBuffer *adc_bufs = calloc(ADC_CHANNELS, sizeof(AdcChannelBuffer));
    if (adc_bufs == NULL) {
        fprintf(stderr, "calloc failed: cannot allocate ADC buffers\n");
        return 1;
    }

    /* Initialize channel IDs — everything else is already 0 */
    for (int ch = 0; ch < ADC_CHANNELS; ch++) {
        adc_bufs[ch].channel_id = (uint8_t)ch;
    }

    /* Simulate filling channel 3 with samples */
    adc_bufs[3].valid = 1;
    for (int s = 0; s < SAMPLES_PER_CHANNEL; s++) {
        adc_bufs[3].raw[s]     = (uint16_t)(2048 + (s % 64));  /* fake data */
        adc_bufs[3].voltage[s] = adc_bufs[3].raw[s] * (3.3f / 4095.0f);
    }

    /* Compute average for channel 3 */
    float sum = 0.0f;
    for (int s = 0; s < SAMPLES_PER_CHANNEL; s++) sum += adc_bufs[3].voltage[s];
    printf("Channel 3 average voltage: %.4f V\n", sum / SAMPLES_PER_CHANNEL);

    /* Check that un-filled channel 0 is truly zeroed */
    printf("Channel 0 valid flag: %d (should be 0)\n", adc_bufs[0].valid);
    printf("Channel 0 raw[0]:     %u (should be 0)\n", adc_bufs[0].raw[0]);

    free(adc_bufs);  /* calloc'd memory freed with free() as usual */
    return 0;
}

Key benefit shown: The valid flag and raw[] arrays are all zero without any memset(). If we had used malloc(), accessing adc_bufs[0].raw[0] before filling it would return garbage, potentially causing a spurious signal detection.

calloc zero init ADC buffer pattern Array of structs

2. realloc() — Resize an Existing Allocation

#include <stdlib.h>

void *realloc(void *ptr, size_t size);
/* Resizes the block at ptr to size bytes */
/* May MOVE the block to a new address    */
/* Returns new pointer on success, NULL on failure */
/* On failure: ptr remains valid and unchanged     */

realloc() is powerful but dangerous if used carelessly. The critical rule: never assign the return value directly back to the original pointer. If realloc() fails, it returns NULL and leaves your original block intact. If you assigned NULL back to your only pointer, you have created an irreversible memory leak.

/* WRONG — loses original pointer on failure */
ptr = realloc(ptr, new_size);

/* CORRECT — save original, check, then reassign */
void *tmp = realloc(ptr, new_size);
if (tmp == NULL) {
    /* ptr still valid — handle error */
} else {
    ptr = tmp;  /* only update ptr after success */
}

When realloc() grows a block, the newly added bytes are uninitialized — unlike calloc(). You must explicitly zero or initialize the new portion if needed. When realloc() shrinks a block, it just adjusts the metadata; no data is cleared.

💡 Example 2: Dynamic String Builder for AT Command Responses

Embedded serial AT command interfaces often return responses of unpredictable length. A dynamically growing string buffer that doubles in capacity when full is the classic use case for realloc().

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

/* Dynamic string builder — grows automatically */
typedef struct {
    char  *data;
    size_t length;    /* current used bytes (excluding null terminator) */
    size_t capacity;  /* total allocated bytes */
} StrBuilder;

/* Initialize with a small initial capacity */
int sb_init(StrBuilder *sb, size_t initial_capacity)
{
    sb->data = malloc(initial_capacity);
    if (!sb->data) return -1;
    sb->data[0] = '\0';
    sb->length   = 0;
    sb->capacity = initial_capacity;
    return 0;
}

/* Append a string — realloc() if necessary */
int sb_append(StrBuilder *sb, const char *str)
{
    size_t add_len = strlen(str);
    size_t needed  = sb->length + add_len + 1;  /* +1 for null terminator */

    if (needed > sb->capacity) {
        /* Double capacity until it fits */
        size_t new_cap = sb->capacity;
        while (new_cap < needed) new_cap *= 2;

        char *tmp = realloc(sb->data, new_cap);
        if (tmp == NULL) {
            /* sb->data is STILL VALID — just report failure */
            return -1;
        }
        sb->data     = tmp;
        sb->capacity = new_cap;
    }

    memcpy(sb->data + sb->length, str, add_len + 1);
    sb->length += add_len;
    return 0;
}

void sb_free(StrBuilder *sb)
{
    free(sb->data);
    sb->data     = NULL;
    sb->length   = 0;
    sb->capacity = 0;
}

int main(void)
{
    StrBuilder resp;
    if (sb_init(&resp, 16) != 0) {  /* start with tiny 16-byte buffer */
        fprintf(stderr, "Init failed\n");
        return 1;
    }

    /* Simulate receiving an AT command response line-by-line */
    const char *lines[] = {
        "+CREG: 1,\"2A3B\",\"0C4D\",7\r\n",
        "+CSQ: 20,99\r\n",
        "+CIMI: 404110123456789\r\n",
        "OK\r\n",
        NULL
    };

    for (int i = 0; lines[i] != NULL; i++) {
        if (sb_append(&resp, lines[i]) != 0) {
            fprintf(stderr, "Buffer grow failed at line %d\n", i);
            sb_free(&resp);
            return 1;
        }
        printf("[Cap: %4zu | Len: %4zu] After appending line %d\n",
               resp.capacity, resp.length, i+1);
    }

    printf("\n--- Full AT response ---\n%s", resp.data);
    printf("---\nFinal buffer capacity: %zu bytes\n", resp.capacity);

    sb_free(&resp);
    return 0;
}

The doubling strategy means realloc() is called O(log N) times for N appended characters, giving amortized O(1) per append. This pattern is used in nearly every dynamic array or string implementation in C.

Dynamic buffer growth realloc() safe pattern AT command parsing

3. memalign() and posix_memalign() — Aligned Memory Allocation

Standard malloc() aligns memory to 8 or 16 bytes — sufficient for all basic C types. But certain hardware requires stricter alignment:

DMA engines — often require 64-byte or page-aligned (4096-byte) buffers.
SIMD/NEON/SSE — 16-byte or 32-byte alignment for vectorized operations.
Cache line alignment — 64-byte alignment to prevent false sharing in multi-threaded code.
Crypto accelerators — hardware AES blocks may require 16-byte-aligned input.

#include <malloc.h>
void *memalign(size_t boundary, size_t size);
/* boundary MUST be a power of two */
/* Returns aligned pointer or NULL on failure */

#include <stdlib.h>
int posix_memalign(void **memptr, size_t alignment, size_t size);
/* alignment must be a power-of-two multiple of sizeof(void*) */
/* Returns 0 on success, positive error number on failure      */
/* (does NOT set errno — reads the return value directly)      */

Prefer posix_memalign() for new code. It is standardized by POSIX, supported across all modern Linux distros, and its error-return convention (positive integer rather than -1/errno) prevents confusion. Both functions return memory that should be freed with free() on Linux.

Alignment Visualized (16-byte boundary)

0x1001
malloc — may be anywhere

…

0x1010
memalign(16,…) — always 16-byte aligned

…

0x1020
next aligned slot

💡 Example 3: Cache-Line-Aligned Structure for Multi-Threaded Sensor Processing

In a multi-core embedded Linux system, two threads often access different sensor channels stored in adjacent elements of an array. When both elements fit within the same 64-byte cache line, writes to one channel cause the other core to invalidate its cache copy — a performance problem called false sharing. Aligning each channel to a cache line boundary eliminates this.

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#include <pthread.h>

#define CACHE_LINE_SIZE  64   /* typical x86 and ARM cache line */
#define NUM_CHANNELS     4

/* Each channel struct is padded to occupy exactly one cache line */
/* This prevents false sharing when different threads access      */
/* different channels in the array.                               */
typedef struct {
    uint32_t  sensor_id;
    float     last_reading;
    uint64_t  update_count;
    uint8_t   status;
    /* Pad to 64 bytes to fill the cache line */
    uint8_t   _pad[64 - sizeof(uint32_t) - sizeof(float)
                      - sizeof(uint64_t) - sizeof(uint8_t)];
} __attribute__((packed)) SensorChannel;

/* Verify at compile time that we padded correctly */
_Static_assert(sizeof(SensorChannel) == CACHE_LINE_SIZE,
               "SensorChannel must be exactly one cache line");

int main(void)
{
    SensorChannel *channels;
    int err;

    /*
     * posix_memalign() allocates an array of NUM_CHANNELS structs,
     * each 64 bytes, starting at a 64-byte boundary.
     * This guarantees each element starts at a new cache line.
     */
    err = posix_memalign((void **)&channels,
                         CACHE_LINE_SIZE,               /* alignment  */
                         NUM_CHANNELS * sizeof(SensorChannel)); /* size */
    if (err != 0) {
        fprintf(stderr, "posix_memalign failed: error %d\n", err);
        return 1;
    }

    /* Zero-initialize the aligned block */
    memset(channels, 0, NUM_CHANNELS * sizeof(SensorChannel));

    /* Verify alignment */
    printf("Base address: %p\n", (void*)channels);
    printf("Address & 63 = %lu (must be 0 for 64-byte alignment)\n",
           (unsigned long)channels & 63);

    /* Simulate two threads updating different channels */
    channels[0].sensor_id    = 1001;
    channels[0].last_reading = 23.5f;
    channels[0].update_count = 1;

    channels[1].sensor_id    = 1002;
    channels[1].last_reading = 45.7f;
    channels[1].update_count = 1;

    /* Each channel update touches a different cache line — no false sharing */
    for (int i = 0; i < NUM_CHANNELS; i++) {
        printf("Channel %d | addr: %p | sensor_id: %u | reading: %.1f\n",
               i,
               (void*)&channels[i],
               channels[i].sensor_id,
               channels[i].last_reading);
    }

    free(channels);  /* aligned memory freed with regular free() on Linux */
    return 0;
}

What to observe: When you print the address, addr & 63 will always be 0, confirming cache-line alignment. Each element’s address differs by exactly 64 bytes. In a multi-threaded workload, this can provide a measurable throughput improvement by eliminating cross-core cache invalidations.

DMA Buffer Example (posix_memalign for page-aligned buffer)

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define DMA_ALIGN  4096   /* page-aligned, standard for many DMA controllers */
#define DMA_BUF_SIZE  (16 * 1024)  /* 16 KB DMA transfer buffer */

int main(void)
{
    uint8_t *dma_buf;
    int err;

    /* Allocate a 16KB buffer aligned to a 4096-byte page boundary */
    /* Many DMA controllers require the source/destination address  */
    /* to be at least page-aligned                                  */
    err = posix_memalign((void **)&dma_buf, DMA_ALIGN, DMA_BUF_SIZE);
    if (err != 0) {
        fprintf(stderr, "DMA buffer allocation failed: %d\n", err);
        return 1;
    }

    memset(dma_buf, 0xAB, DMA_BUF_SIZE);  /* fill with test pattern */

    printf("DMA buffer at: %p\n", (void*)dma_buf);
    printf("Page offset  : %lu (must be 0)\n",
           (unsigned long)dma_buf % DMA_ALIGN);
    printf("First byte   : 0x%02X\n", dma_buf[0]);

    /* -- Here you would pass dma_buf to your kernel DMA API --    */
    /* e.g., ioctl(fd, DMA_START, dma_buf);                        */
    /* or write to a /dev/mem-mapped peripheral register           */

    free(dma_buf);
    return 0;
}

posix_memalign() Cache line alignment DMA buffer False sharing prevention

🎯 Interview Questions: calloc, realloc, and Aligned Memory

#	Question	Answer / Key Points
1	What is the difference between malloc(n) and calloc(1, n)?	Both allocate n bytes, but calloc() zero-initializes the memory. calloc() also performs an internal overflow check on the product numitems * size, making it safer for array allocations.
2	Why should you never do: ptr = realloc(ptr, new_size)?	If realloc() fails, it returns NULL and the original block remains valid. Assigning NULL back to ptr destroys your only reference to the original block, causing a memory leak. Always use a temporary pointer.
3	Does realloc() zero-initialize the newly added bytes when growing a block?	No. Only the original data is preserved. The added bytes are uninitialized. Use memset() on the new region if you need it zeroed.
4	What happens to pointers into a realloc’d block after the call?	They may be invalidated. realloc() can move the block to a new address. Only the pointer to the block’s start (offset 0) is guaranteed to be updated. All interior pointers (pointing into the block at an offset) must be recalculated.
5	Why is aligned memory needed for DMA transfers?	DMA engines operate on physical memory addresses. Many require the source/destination to be aligned to a page boundary (4096 bytes) or the controller’s native word size. Misaligned DMA causes undefined behavior, bus errors, or data corruption.
6	What is the alignment constraint for posix_memalign()?	The alignment must be a power-of-two multiple of sizeof(void ) — so on a 64-bit system (sizeof(void ) = 8), valid values include 8, 16, 32, 64, 4096, etc.
7	What is false sharing and how does aligned allocation prevent it?	False sharing occurs when two CPU cores modify different variables that share a cache line, causing unnecessary cache invalidations. Aligning shared data structures to cache line boundaries (typically 64 bytes) ensures each object occupies its own cache line.
8	Can you free() memory allocated by posix_memalign() on Linux?	Yes, on Linux/glibc, free() works correctly with posix_memalign() and memalign()-allocated memory. Some older/non-glibc implementations have restrictions — check your platform’s documentation.

Summary

calloc(n, size) allocates n × size zero-initialized bytes and guards against integer overflow in the size calculation.
realloc(ptr, new_size) resizes an existing block, possibly moving it. Always use a temporary pointer to handle failure without leaking the original block.
Bytes added by realloc() are not zero-initialized — use memset() if you need a clean extension.
posix_memalign() is the portable, standards-compliant way to allocate memory at power-of-two alignment boundaries — essential for DMA buffers, SIMD, and cache-line-aligned structures.
All three functions return memory that is freed with the regular free() call on Linux/glibc.

Next in the Series

Part 4: alloca() — stack-based dynamic allocation. Faster than malloc(), automatically freed on return, and the ideal choice for temporary scratch buffers in embedded code.

Next: alloca() and Stack Memory → Back to EmbeddedPathashala

embeddedpathashala.com