3 — Section 3.3
3.3.1 to 3.3.4.5
Intermediate
This tutorial covers the first stage of LC3 encoding: the complete encoder module pipeline, how raw PCM input is scaled to a standard range, and how the Low Delay MDCT (LD-MDCT) transforms that time-domain signal into spectral frequency coefficients grouped into bands. These are the first four steps every frame goes through before anything else.
3.3.1 Encoder Module Pipeline
The LC3 encoder is a spectral transform coder. It takes time-domain PCM samples, converts them to frequency-domain coefficients using an MDCT, then applies several noise-shaping and quantization stages before packing everything into the output bitstream. Here is the complete encoder pipeline in order:
| Step | Module | What It Does | Section |
|---|---|---|---|
| 1 | Input scaling | Scales PCM to 16-bit range without losing precision | 3.3.3 |
| 2 | LD-MDCT | Converts time-domain samples to frequency-domain coefficients X(k) | 3.3.4 |
| 3 | Bandwidth Detector | Detects if signal is bandlimited (e.g., narrowband speech upsampled to wideband) | 3.3.5 |
| 4 | Attack Detector | Detects transients in the audio to improve quality at attack events | 3.3.6 |
| 5 | SNS (Spectral Noise Shaping) | Applies scale factors to shape quantization noise to be less perceptible | 3.3.7 |
| 6 | TNS (Temporal Noise Shaping) | Applies LPC-based filter in frequency domain to control temporal noise shape | 3.3.8 |
| 7 | LTPF (Long Term Postfilter) | Estimates pitch lag for a decoder-side postfilter that reduces noise in spectral valleys | 3.3.9 |
| 8 | Spectral Quantization | Quantizes MDCT coefficients using a global gain and scalar dead-zone quantizer | 3.3.10 |
| 9 | Residual Coding | Uses leftover bits to refine non-zero coefficients | 3.3.11 |
| 10 | Noise Level Estimation | Computes a noise level parameter for the decoder’s noise-filling step | 3.3.12 |
| 11 | Bitstream Encoding | Packs all side information and arithmetically encoded spectral data into the payload | 3.3.13 |
3.3.2 Input Signal
The input to the encoder for frame b is denoted xb(n), where n = 0 to NF−1. The newest sample is at index NF−1. Samples from previous frames are accessed using negative indices: xb(−1) is the last sample of the previous frame, xb(−2) is the second-to-last, and so on.
The input is standard PCM integer format with values in the range:
For example, for 16-bit PCM: range is [−32768, 32767]. For 24-bit: [−8388608, 8388607].
3.3.3 Input Signal Scaling
Before any processing, the input PCM is scaled so that all bit depths (16, 24, 32-bit) work with the same internal range. The target range is [−32768, 32768] — the native 16-bit PCM range.
Step 1: Scale to 16-bit range (without losing precision)
For 16-bit input (s=16): the exponent is 0, so xs0 = xb (no scaling). For 24-bit (s=24): xs0 = xb × 2^(−8) = xb / 256.
Step 2: Clip to native 16-bit range
xs(n) = −32768, if xs0(n) < −32768
xs(n) = xs0(n), otherwise
This clipping only fires for floating-point inputs that exceed the 16-bit range. For normal integer PCM it is not needed, but the spec requires it for robustness.
3.3.4 Low Delay MDCT Analysis
3.3.4.1 Overview
The LD-MDCT (Low Delay Modified Discrete Cosine Transform) converts time-domain samples into NF frequency-domain spectral coefficients. These coefficients represent which frequencies are present in the audio and how strongly. Everything downstream works on these coefficients.
The key property of the Low Delay variant is that it uses an asymmetric window — the window has leading zeros on the right side. This allows the encoder to produce output with a shorter look-ahead delay than a standard MDCT. The trade-off is a slightly less optimal frequency response compared to a symmetric window.
The LD-MDCT produces: spectral coefficients X(k), energy per band EB(b), and a near-Nyquist flag.
3.3.4.2 Update Time Buffer
The MDCT operates on a buffer t of size 2×NF. This buffer is filled from the scaled input signal for the current frame. The last Z samples of the buffer are set to zero — these are the leading zeros that create the look-ahead.
t(2×NF − Z + n) = 0, for n = 0 … Z−1
The first part copies scaled input samples (including samples from the previous frame via negative indexing). The trailing zeros are the window’s leading zeros that shift the MDCT analysis window forward in time, reducing delay.
3.3.4.3 Time-Frequency Transformation (MDCT equation)
The MDCT transform computes NF frequency coefficients from the 2×NF time buffer. The formula is the standard MDCT but applied with the asymmetric LD window:
for k = 0 … NF−1
Where w(n) is the pre-computed LD-MDCT window from Section 3.7.3. The window was computed by an off-line optimization and cannot be derived analytically — the tabulated values must be used.
Encoded coefficient range: Not all NF coefficients are used. For NF = 480 (48 kHz, 10 ms), only coefficients X(0..399) are encoded — this limits the audio bandwidth to 20 kHz. For NF = 360 (48 kHz, 7.5 ms), only X(0..299) are used.
| NF | NE (encoded lines) | Max bandwidth |
|---|---|---|
| 480 (48/44.1 kHz, 10ms) | 400 | ~20 kHz |
| 360 (48/44.1 kHz, 7.5ms) | 300 | ~18.4 kHz |
| All other NF values | NF (all coefficients) | Up to Nyquist |
3.3.4.4 Energy Estimation per Band
After the MDCT, MDCT coefficients are grouped into Nb frequency bands (64 in most cases). The energy per band EB(b) is the average squared magnitude of all coefficients in that band:
Where Ifs(b) is the band index table (different for 10 ms vs 7.5 ms frames, in Section 3.7.1 and 3.7.2). The number of bands Nb = 64 for all configurations, except 7.5 ms at 8 kHz where Nb = 60.
EB(b) is used by: SNS (to estimate scale factors), the bandwidth detector, and TNS. It is the most frequently used derived quantity in the encoder.
3.3.4.5 Near Nyquist Detector
The near-Nyquist detector is only active for sampling rates ≤ 32 kHz. At low sampling rates, aliasing-like spectral structures can appear near the Nyquist frequency. If TNS processes these structures it produces distortion, so the detector disables TNS when such signals are present.
It compares the energy in the upper bands vs. the lower bands:
near_nyquist_flag = 1 (disable TNS)
else:
near_nyquist_flag = 0 (TNS can proceed normally)
| Frame Duration | nn_idx (boundary band) |
|---|---|
| 10 ms | NB − 2 |
| 7.5 ms | NB − 4 |
The near_nyquist_flag is passed to TNS (Section 3.3.8) and to the LTPF activation check (Section 3.3.9.8). It does not affect spectral quantization.
LD-MDCT in C — Key Implementation Notes
In practice, the MDCT is never computed with the direct formula above — it is implemented using an FFT with pre/post-processing. The open-source liblc3 (the reference LC3 implementation used alongside BlueZ) implements this as follows:
/* From liblc3 (open source reference implementation):
lc3/mdct.c — simplified conceptual equivalent of the MDCT stage */
/* Step 1: Input scaling (Section 3.3.3)
Scale from s-bit PCM to [-32768, 32768] range */
float scale = powf(2.0f, -(s - 1) + 15); /* s = bit depth */
for (int n = 0; n < NF; n++)
xs[n] = (float)pcm_in[n] * scale;
/* Clip to 16-bit range */
for (int n = 0; n < NF; n++) {
if (xs[n] > 32767.0f) xs[n] = 32767.0f;
if (xs[n] < -32768.0f) xs[n] = -32768.0f;
}
/* Step 2: Build MDCT time buffer t[0..2*NF-1] (Section 3.3.4.2)
Copy current + previous frame samples, then zero-pad last Z samples */
for (int n = 0; n < 2*NF - Z; n++)
t[n] = get_xs(xs, xs_prev, Z - NF + n, NF);
for (int n = 0; n < Z; n++)
t[2*NF - Z + n] = 0.0f;
/* Step 3: Apply LD-MDCT window w[] then compute MDCT
In practice: use FFT of size NF with pre/post-twiddle factors */
/* Step 4: Energy per band EB (Section 3.3.4.4) */
for (int b = 0; b < Nb; b++) {
float energy = 0.0f;
int start = Ifs[b], stop = Ifs[b+1];
for (int k = start; k < stop; k++)
energy += X[k] * X[k];
EB[b] = energy / (stop - start); /* average energy in band */
}
Next in this Series
Section 3.3.5 and 3.3.6 — Bandwidth Detector and Time Domain Attack Detector
