LC3 Encoding: Spectral Quantization, Residual Coding and Noise Level

 

LC3 Encoding: Spectral Quantization, Residual Coding and Noise Level
Chapter 3, Sections 3.3.10 – 3.3.12 — Global Gain, Quantization, Truncation, Residual, Noise Filling
Chapter
3 — Sections 3.3.10–3.3.12
Level
Advanced

After SNS and TNS shape the spectrum, the actual compression happens here: the TNS-filtered coefficients Xf(k) are scalar quantized using a global gain. The global gain controls how coarsely each coefficient is rounded — a higher gain means coarser quantization (fewer bits used) and lower gain means finer quantization (more bits). The encoder runs a bisection search to find the global gain that uses as close to the available bit budget as possible. Leftover bits are used for residual refinement and noise level transmission.

Keywords

Global Gain ggind Dead-Zone Quantization Bisection Search lsbMode Residual Coding Noise Level FNF

3.3.10.1 Bit Budget for the Spectrum

The spectral quantizer gets only the bits that are left after all side information is accounted for:

nbits_spec = nbits − nbits_bw − nbits_TNS − nbits_LTPF − 38(SNS) − 8(gain) − 3(NF) − nbits_ari
Field Bits Note
SNS VQ (both stages) 38 (fixed) Stage 1 = 10, Stage 2 = 28
Global gain index 8 (fixed) ggind in [0, 255]
Noise level FNF 3 (fixed) Values 0–7
Bandwidth Pbw 0–3 (variable) From Table 3.6
TNS Variable 0 if TNS off
LTPF 1 or 11 1 if no pitch, 11 if pitch present
Arithmetic coder overhead Variable (nbits_ari) Depends on NE and nbits

3.3.10.2 First Global Gain Estimation — Bisection Search

The encoder runs an 8-iteration bisection search to find the global gain index ggind (0–255) such that the resulting quantized spectrum uses approximately nbits_spec bits.

Gain offset ggoff is computed once per frame from the bitrate and sampling rate:

ggoff = −min(115, floor(nbits / (10×(fsind+1)))) − 105 − 5×(fsind+1)

The energy E[k] (in dB) of blocks of 4 MDCT coefficients is computed for the bisection. The bisection starts at ggind=255 and tests 8 halvings, adjusting ggind up/down based on whether the estimated bit count exceeds the adjusted budget nbits_spec’ (which includes a carry-over offset from the previous frame’s actual vs. estimated bit usage).

A minimum gain ggmin is enforced to prevent the quantized spectrum from exceeding the 16-bit range [−32768, 32767]:

ggmin = ceil(28 × log10(Xfmax / (32768 − 0.375))) − ggoff
ggind = max(ggind, ggmin)

3.3.10.3 Quantization — Dead-Zone Scalar Quantizer

The global gain is first dequantized:

gg = 10^((ggind + ggoff) / 28)

Each coefficient is then quantized using a dead-zone quantizer with dead-zone offset ±0.375:

Xq(n) = floor(Xf(n)/gg + 0.375), if Xf(n) ≥ 0
Xq(n) = ceil(Xf(n)/gg − 0.375), if Xf(n) < 0

The ±0.375 dead zone (instead of standard ±0.5) slightly biases toward smaller-magnitude quantized values, producing slightly more zeros which compress better. Coefficients smaller than 0.375×gg in magnitude are quantized to 0.

3.3.10.4 Bit Consumption Estimation

After quantization, the encoder estimates how many bits it would take to arithmetically encode Xq. This estimation drives two decisions: (1) truncation of trailing zeros, and (2) lsbMode flag. The algorithm walks through spectral 2-tuples (pairs of coefficients), simulating the arithmetic coder context to estimate bit cost using the ac_spec tables from Section 3.7.7.

Two bitrate flags are set first:

rateFlag = 512 if nbits > 160 + fsind×160, else 0
modeFlag = 1 if nbits ≥ 480 + fsind×160, else 0

rateFlag shifts the arithmetic coder context table offset for higher-bitrate content. modeFlag enables the high-bitrate LSB mode where the LSBs of large coefficients are sent separately in the residual section.

3.3.10.5 Truncation and lsbMode

If the estimated bit count exceeds nbits_spec, trailing zero coefficients are cut:

for k = lastnz_trunc to lastnz−1: Xq[k] = 0

The lsbMode flag is set to 1 when modeFlag=1 and even after truncation the bit count still exceeds nbits_spec. In lsbMode, the LSBs (least significant bits) of high-value coefficients are stripped from the arithmetic coder stream and stored separately in the backward-written side information area, providing fine-grained bit savings.

3.3.10.6 Global Gain Adjustment

After bit estimation, if the actual bit count is significantly different from the target, ggind is adjusted by ±1 or +2 and the spectrum is re-quantized once. The adjustment thresholds delta and delta2 are adaptive — they are larger at high bitrates (where you can afford to be more generous) and smaller at low bitrates. The three tables t1, t2, t3 govern the threshold transitions:

t1 = {80, 230, 380, 530, 680} (indexed by fsind 0–4)
t2 = {500, 1025, 1550, 2075, 2600}
t3 = {850, 1700, 2550, 3400, 4250}

The adjustment runs at most once per frame. If re-quantization is done, the original nbits_est is preserved for the next frame’s offset computation.

3.3.11 Residual Coding

Residual coding uses any remaining bits (beyond what arithmetic coding needs) to refine non-zero coefficients. It is only active when lsbMode = 0.

For each non-zero Xq(k), one bit is stored indicating whether the original Xf(k) was above or below the quantization midpoint:

nbits_residual_max = nbits_spec − nbits_trunc + 4
k = 0; nbits_residual = 0;
while (k < NE && nbits_residual < nbits_residual_max):
    if Xq[k] != 0:
        res_bits[nbits_residual] = (Xf[k] >= Xq[k]*gg) ? 1 : 0
        nbits_residual++
    k++

At the decoder, these residual bits shift each non-zero quantized value by ±0.1875 or ±0.3125, improving accuracy without full re-quantization.

3.3.12 Noise Level Estimation

Spectral coefficients that are quantized to zero are not transmitted — but at the decoder, they need to be filled with something to avoid silence in those frequency bins (which sounds harsh). The decoder uses noise filling, substituting random noise at a level controlled by the transmitted noise factor FNF.

The relevant zero coefficients are those within the active bandwidth and surrounded by other zeros within ±NFwidth bins (i.e., isolated zeros, not zeros at the spectral edge). The mean level of the TNS-filtered coefficients in this mask is computed and normalized by the global gain:

NFstart = 24 (10ms) or 18 (7.5ms)
NFwidth = 3 (10ms) or 2 (7.5ms)
INF(k) = 1 if NFstart ≤ k < bw_stop AND all Xq(i)=0 for i in [k−NFwidth, k+NFwidth]

LNF = SUM[INF(k)=1] |Xf(k)| / gg / SUM[INF(k)=1] 1

LNF is quantized to 8 levels:

FNF = min(max(nint(8 − 16 × LNF), 0), 7)

FNF = 7 means LNF ≈ 0 (nothing to fill). FNF = 0 means LNF ≈ 0.5 (strong noise fill level). FNF is transmitted as 3 bits in the side information.

Next in this Series

Section 3.3.13 — Bitstream Encoding: how all the side information and spectral data is packed into the output payload

Next Tutorial → All Tutorials

Leave a Reply

Your email address will not be published. Required fields are marked *