LC3 Encoding: Spectral Noise Shaping (SNS)

 

LC3 Encoding: Spectral Noise Shaping (SNS)
Chapter 3, Section 3.3.7 — SNS Analysis, Two-Stage VQ, Scale Factor Interpolation, Spectral Shaping
Chapter
3 — Section 3.3.7
Subsections
3.3.7.1 to 3.3.7.5
Level
Advanced

SNS is the most complex single module in the LC3 encoder. Its purpose is simple: shape the quantization noise to follow the audio signal’s energy profile, so the noise is masked by the signal and is not audible. The implementation is a two-stage vector quantizer that efficiently represents 16 spectral scale factors using only 38 bits. This tutorial explains every step from the raw band energies to the final shaped spectrum.

Keywords

Spectral Noise Shaping Scale Factors Split VQ Pyramid VQ (PVQ) LFCB / HFCB DCT Rotation MPVQ Enumeration

3.3.7.1 Overview — What SNS Does

Human hearing is very sensitive to noise in frequency regions where the audio signal is quiet, and less sensitive to noise where the signal is loud (called noise masking). SNS exploits this by making the quantization noise in quiet frequency regions smaller and the noise in loud regions larger — overall, the noise is perceptually hidden under the signal.

SNS works by computing 16 scale factors (one per group of 4 bands), quantizing them with 38 bits total, interpolating them back to 64 per-band values, and then multiplying each MDCT coefficient by its corresponding scale factor. This shapes (normalizes) the spectrum so that the spectral quantizer that follows can apply a uniform step size across all bands — the non-uniform energy is “pre-normalized” away.

Step Action Section
1 SNS Analysis: estimate 16 scale factors from band energies EB(b) 3.3.7.2
2 SNS Quantization: encode 16 scale factors using Stage 1 (split VQ, 10 bits) + Stage 2 (PVQ, 28 bits) = 38 bits total 3.3.7.3
3 Interpolation: expand 16 quantized scale factors to 64 (one per band) 3.3.7.4
4 Spectral Shaping: multiply each MDCT coefficient by its band scale factor 3.3.7.5

3.3.7.2 SNS Analysis — Computing the 16 Scale Factors

Step 1: Padding (when NB < 64)

SNS always works with 64 bands. If the actual NB < 64 (only happens for 7.5 ms at 8 kHz where NB = 60), the EB array is padded to 64 by duplicating the lowest-frequency entries. This ensures the VQ operates on the same-size input regardless of configuration.

Step 2: Smoothing

The 64-band energy array EB is smoothed with a 3-tap FIR filter to remove sharp variations between adjacent bands:

ES(0) = 0.75 × EB(0) + 0.25 × EB(1)
ES(63) = 0.25 × EB(62) + 0.75 × EB(63)
ES(b) = 0.25 × EB(b−1) + 0.5 × EB(b) + 0.25 × EB(b+1), for 1 ≤ b ≤ 62

Step 3: Pre-emphasis (tilt)

A frequency tilt is applied to boost high-frequency energy, compensating for the natural roll-off in audio spectra:

EP(b) = ES(b) × 10^(b × gtilt / 630), for b = 0..63
fs (Hz) gtilt
8,000 14
16,000 18
24,000 22
32,000 26
44,100 / 48,000 30

Steps 4–7: Noise Floor, Log, Band Grouping, and Mean Removal

Noise floor: A minimum noise floor of −40 dB relative to the average band energy is added to EP(b) to prevent zero or near-zero energy bands from causing extreme scale factors.

Log transform: The energy is converted to a log scale (log2) halved, giving EL(b) in units of bits/2.

Band grouping: The 64-band log-energy vector EL is downsampled to 16 values E4(b2) using a 6-tap FIR filter with weights {1,2,3,3,2,1}/12. Each output E4(b2) represents a group of 4 adjacent bands, with weighted overlap at the edges.

Mean removal + scaling: The mean is removed and the result is scaled by 0.85 to produce the initial 16 scale factors scf0(b2).

Attack handling: If Fatt(k) = 1 (attack detected), the scale factors are additionally smoothed across bands to reduce temporal pre-echo artifacts. The smoothing factor is 0.5 for 10 ms and 0.3 for 7.5 ms frames.

The final 16 scale factors scf(n) describe the relative spectral energy shape of the current frame. These 16 values are then quantized and transmitted to the decoder.

3.3.7.3 SNS Quantization — Two-Stage VQ

Stage 1: Split VQ (10 bits)

The 16 scale factors are split into two halves: scf(0..7) and scf(8..15). Each half is independently quantized using a pre-trained codebook with 32 entries of dimension 8:

Half Codebook Size Index Bits
Lower (0..7) LFCB 32 × 8 ind_LF 5
Upper (8..15) HFCB 32 × 8 ind_HF 5

The closest codebook entry to each half is found by minimum MSE. The combined first-stage vector st1 is formed by joining the two best-match codebook entries. The Stage 1 residual r1 = scf − st1 captures what Stage 1 missed.

Stage 2: Pyramid Vector Quantizer — PVQ (28 bits)

The Stage 2 PVQ quantizer works in a DCT-rotated domain. First, the 16-dimensional stage 1 residual r1 is rotated using a 16×16 DCT matrix D: t2rot = D × r1. This decorrelates the residual, making PVQ search more efficient.

Four shape candidates are evaluated:

Shape j Name Set A (PVQ config) Set B
0 regular PVQ(10, 10) — positions 0–9 PVQ(6, 1) — positions 10–15
1 regular_lf PVQ(10, 10) — positions 0–9 Zeroed (positions 10–15 = 0)
2 outlier_near PVQ(16, 8) — all 16 positions Empty
3 outlier_far PVQ(16, 6) — all 16 positions Empty

A PVQ(N, K) vector has N dimensions and exactly K unit pulses. The “regular” shapes concentrate pulses in the lower 10 coefficients (lower SNS frequency), while the “outlier” shapes spread them over all 16 (for signals with outlier high-frequency components).

For each shape, an adjustment gain index (gain_i) is also searched, giving 18 total gain-shape combinations. The combination minimizing MSE in the rotated domain is selected. The selected shape vector is then IDCT-rotated back: scfQ = st1 + Gain × DT × xq.

The PVQ enumeration (converting the integer pulse vector to a compact index called MPVQ index + leading sign) uses a recursive offset table from Section 3.7.3.2.

3.3.7.4 SNS Scale Factor Interpolation

The 16 quantized scale factors scfQ(n) are interpolated to 64 values (one per band) using piecewise linear interpolation with steps of 1/8 between consecutive scale factor pairs:

scfQint(0) = scfQ(0)
scfQint(1) = scfQ(0)
scfQint(4n+2) = scfQ(n) + (1/8) × (scfQ(n+1) − scfQ(n)), n = 0..14
scfQint(4n+3) = scfQ(n) + (3/8) × (scfQ(n+1) − scfQ(n)), n = 0..14
scfQint(4n+4) = scfQ(n) + (5/8) × (scfQ(n+1) − scfQ(n)), n = 0..14
scfQint(4n+5) = scfQ(n) + (7/8) × (scfQ(n+1) − scfQ(n)), n = 0..14
scfQint(62) = scfQ(15) + (1/8) × (scfQ(15) − scfQ(14))
scfQint(63) = scfQ(15) + (3/8) × (scfQ(15) − scfQ(14))

Finally, the log-domain scale factors are converted back to linear multipliers:

gSNS(b) = 2^(−scfQint(b)), for b = 0..NB−1

Note: At the encoder, the sign is negative (dividing by the scale factor normalizes the spectrum). At the decoder, the sign is positive (multiplying by the scale factor re-introduces the spectral shape).

3.3.7.5 Spectral Shaping

The interpolated scale factors gSNS(b) are applied to the MDCT coefficients. All coefficients within band b are multiplied by the same scale factor:

for b = 0 to NB−1:
    for k = Ifs(b) to Ifs(b+1)−1:
        Xs(k) = X(k) × gSNS(b)

The output Xs(k) is the SNS-shaped spectrum that is passed to TNS (Section 3.3.8) for further noise shaping.

SNS Analysis in C — Key Stages

/* SNS Analysis — simplified C (Section 3.3.7.2) */

/* Step 1: Smoothing  */
float ES[64];
ES[0]  = 0.75f * EB[0] + 0.25f * EB[1];
ES[63] = 0.25f * EB[62] + 0.75f * EB[63];
for (int b = 1; b < 63; b++)
    ES[b] = 0.25f * EB[b-1] + 0.5f * EB[b] + 0.25f * EB[b+1];

/* Step 2: Pre-emphasis (gtilt depends on fs) */
float EP[64];
for (int b = 0; b < 64; b++)
    EP[b] = ES[b] * powf(10.0f, (float)b * gtilt / 630.0f);

/* Step 3: Noise floor at -40 dB relative to mean */
float sum = 0.0f;
for (int b = 0; b < 64; b++) sum += EP[b];
float noise_floor = fmaxf((sum / 64.0f) * 1e-4f, powf(2.0f, -32.0f));
for (int b = 0; b < 64; b++)
    EP[b] = fmaxf(EP[b], noise_floor);

/* Step 4: Log transform */
float EL[64];
for (int b = 0; b < 64; b++)
    EL[b] = log2f(1e-31f + EP[b]) / 2.0f;

/* Step 5: Band grouping - downsample 64 bands to 16 using weighted filter */
float w[6] = {1.0f/12, 2.0f/12, 3.0f/12, 3.0f/12, 2.0f/12, 1.0f/12};
float E4[16];
for (int b2 = 0; b2 < 16; b2++) {
    float e = 0.0f;
    for (int k = 0; k < 6; k++) {
        int idx = 4*b2 + k - 1;
        idx = (idx < 0) ? 0 : (idx > 63) ? 63 : idx;
        e += w[k] * EL[idx];
    }
    E4[b2] = e;
}

/* Step 6: Mean removal + 0.85 scaling */
float mean = 0.0f;
for (int n = 0; n < 16; n++) mean += E4[n];
mean /= 16.0f;
float scf[16];
for (int n = 0; n < 16; n++)
    scf[n] = 0.85f * (E4[n] - mean);

Next in this Series

Section 3.3.8 — Temporal Noise Shaping (TNS): LPC analysis, quantization, and the TNS filter

Next Tutorial → All Tutorials

Leave a Reply

Your email address will not be published. Required fields are marked *