3 — Section 3.3.5–3.3.6
3.3.5.1–3.3.5.2
3.3.6.1–3.3.6.4
Intermediate
Two helper modules run in parallel after the LD-MDCT: the Bandwidth Detector identifies whether the audio signal is truly using its full declared bandwidth, and the Time Domain Attack Detector identifies sudden loud transients (like drum hits or consonants). Their outputs are flags that control other encoder modules — they do not modify the audio signal themselves.
3.3.5 Bandwidth Detector
3.3.5.1 Algorithm — Why It Exists
Consider a phone call captured at 8 kHz but resampled to 48 kHz before being passed to the LC3 encoder. The signal physically only contains frequencies up to 4 kHz (NB range), but the encoder sees 480 samples per frame. If TNS and noise-filling treat all 400 spectral bins as active, they will spread noise into the empty upper frequency region — which is audible and wasteful.
The bandwidth detector checks the actual energy in each candidate bandwidth region and outputs a bandwidth index Pbw indicating the detected bandwidth: NB (0–4 kHz), WB (0–8 kHz), SSWB (0–12 kHz), SWB (0–16 kHz), or FB (0–20 kHz).
Stage 1 — Quietness test: For each bandwidth candidate k = 0..Nbw−1, compute the average energy in the frequency region above the candidate’s cutoff. If that energy is below a threshold TQ(k), the region is flagged as quiet (FQ(k) = 1). The first stage output bw0 is the largest index where FQ(bw0 − 1) = 0.
Stage 2 — Energy drop test: If bw0 < Nbw (not full bandwidth), check for a sharp energy drop at the candidate bandwidth edge. Compare the energy just below and just above the cutoff. If the ratio exceeds threshold TC(bw0), confirm bw = bw0; otherwise set bw = Nbw (full bandwidth).
Note: Pbw is not a fixed session parameter — it can change every frame depending on the detected content.
3.3.5.2 Parameters
The detection parameters depend on both sampling rate and frame duration. Below is the complete Table 3.6 from the spec (10 ms rows shown; 7.5 ms rows have slightly different Ibw_start/stop values):
| fs | Nbw | Ibw_start | Ibw_stop | Detected Bandwidths | nbits_bw |
|---|---|---|---|---|---|
| 8 kHz | 0 | — | — | NB only (no detection needed) | 0 |
| 16 kHz | 1 | {53} | {63} | NB, WB | 1 |
| 24 kHz | 2 | {47, 59} | {56, 63} | NB, WB, SSWB | 2 |
| 32 kHz | 3 | {44, 54, 60} | {52, 59, 63} | NB, WB, SSWB, SWB | 2 |
| 44.1/48 kHz | 4 | {41, 51, 57, 61} | {49, 55, 60, 63} | NB, WB, SSWB, SWB, FB | 3 |
The thresholds are: TQ = {20, 10, 10, 10} (quietness), TC = {15, 23, 20, 20} (energy drop). L (look-back bands) = {4, 4, 3, 1} for 10ms and {4, 4, 3, 2} for 7.5ms.
At 8 kHz there is only one possible bandwidth (NB), so no detection is needed and Pbw = 0 always. No bits are spent on Pbw in the bitstream.
3.3.6 Time Domain Attack Detector
3.3.6.1 Overview and Activation Conditions
An attack (also called a transient) is a sudden large increase in signal energy — like a snare drum hit, a click, or a consonant like “k” or “t” in speech. When such an event occurs mid-frame, the MDCT spreads its energy across the entire frame window, causing a kind of “pre-echo” artifact before the transient. The attack detector identifies these events and passes a flag to SNS, which then applies extra smoothing to reduce the artifact.
The attack detector is only active under specific conditions (high bitrate + high sample rate):
| Condition | Activation Rule |
|---|---|
| 10 ms, fs = 32 kHz | nbytes > 80 |
| 10 ms, fs ≥ 44.1 kHz | nbytes ≥ 100 |
| 7.5 ms, fs = 32 kHz | 61 ≤ nbytes < 150 |
| 7.5 ms, fs ≥ 44.1 kHz | 75 ≤ nbytes < 150 |
If not active: Fatt(k) = 0 always. The upper nbytes limit on 7.5 ms prevents interference with higher bitrate modes where pre-echo is less of a concern.
3.3.6.2 Downsampling and High-Pass Filtering
The attack detector works in a downsampled domain for efficiency. First, the input xs(n) is downsampled to MF = 16 × Nms samples by summing groups of NF/MF input samples:
For 10 ms: MF = 160 samples (at 16 samples/ms × 10 ms). For 7.5 ms: MF = 120 samples.
Then a simple FIR high-pass filter removes DC and low-frequency content:
The high-pass filter removes the DC component and low-frequency energy so that the attack detector responds specifically to fast, high-energy transients — not just loud sustained low-frequency content like bass.
3.3.6.3 Block Energy Calculation
The filtered signal xhp is divided into Nblocks = Nms/2.5 blocks of 40 samples each. The energy in each block is computed:
For 10 ms: Nblocks = 4 blocks. For 7.5 ms: Nblocks = 3 blocks.
A long-term temporal energy envelope Aatt(n) is tracked using a leaky integrator:
The 0.25 factor means the envelope decays by 12 dB per block. This means it tracks the “recent average” energy while remaining sensitive to sudden increases.
3.3.6.4 Attack Detection
An attack is detected if the current block’s energy suddenly exceeds 8.5 times the background envelope:
If an attack is detected, the attack position Patt(k) is set to the largest n satisfying the above. Otherwise Patt(k) = −1.
The attack flag for frame k is:
Fatt(k) = 0, otherwise
where Tatt = floor(Nblocks / 2)
The flag remains 1 for an extra frame if the previous frame had an attack in its second half. This prevents abrupt transitions right after a transient. Fatt(k) is passed to Section 3.3.7.2.7 (SNS mean removal and scaling).
Attack Detector in C
/* Attack detector — simplified C (Sections 3.3.6.2 to 3.3.6.4) */
#define MF_10MS 160 /* 16 samples/ms * 10ms */
#define NBLOCKS 4 /* Nms/2.5 = 10/2.5 = 4 for 10ms */
/* Downsampling: sum NF/MF input samples into each of MF output samples */
int ratio = NF / MF;
for (int n = 0; n < MF; n++) {
float sum = 0.0f;
for (int m = 0; m < ratio; m++)
sum += xs[ratio * n + m];
xatt[n] = sum;
}
/* High-pass filter: requires xatt[-1] and xatt[-2] from previous frame */
for (int n = 0; n < MF; n++) {
float x0 = xatt[n];
float xm1 = (n > 0) ? xatt[n-1] : xatt_prev[MF-1];
float xm2 = (n > 1) ? xatt[n-2] : (n==1 ? xatt_prev[MF-1] : xatt_prev[MF-2]);
xhp[n] = 0.375f * x0 - 0.5f * xm1 + 0.125f * xm2;
}
/* Block energies + leaky envelope */
int patt = -1;
for (int n = 0; n < NBLOCKS; n++) {
float e = 0.0f;
for (int l = 40*n; l < 40*(n+1); l++)
e += xhp[l] * xhp[l];
Eatt[n] = e;
/* Leaky envelope: A(n) = max(0.25*A(n-1), E(n-1)) */
Aatt[n] = (n == 0) ? Aatt_prev : fmaxf(0.25f * Aatt[n-1], Eatt[n-1]);
if (Eatt[n] > 8.5f * Aatt[n])
patt = n; /* keep the largest n */
}
/* Attack flag: active if current frame or previous frame had attack */
int Tatt = NBLOCKS / 2; /* = 2 */
int Fatt = (patt >= 0) || (patt_prev >= Tatt) ? 1 : 0;
Next in this Series
Section 3.3.7 — Spectral Noise Shaping (SNS): the full SNS analysis, quantization (Stage 1 and Stage 2 PVQ), interpolation, and spectral shaping
