3 — Sections 3.4.8–3.4.10
Advanced
These are the final decoder stages: the inverse MDCT converts the reconstructed spectrum back to time-domain samples, the LTPF postfilter sharpens the harmonic structure of voiced speech, and the output scaling converts the internal floating-point signal back to the requested PCM integer format (16, 24, or 32 bits). The final OutputPCM samples are ready for playback.
3.4.8 Low Delay MDCT Synthesis
The inverse MDCT converts the NE reconstructed spectral coefficients X̂(k) back to a time-domain signal. This is a three-step process: IMDCT transform, windowing, and overlap-add.
Step 1: Generate time-aliased buffer t̂(n)
for n = 0..2NF−1
This is the standard IMDCT — the same formula as the forward MDCT but without the window, producing 2NF output samples from NF input coefficients.
Step 2: Windowing (in-place)
The same LD-MDCT window wN is applied — but flipped (indexed from the end). This is the analysis-synthesis property of the MDCT that ensures perfect reconstruction when no quantization is applied.
Step 3: Overlap-Add
x̂(n) = t̂(Z + n), for n = NF−Z..NF−1
mem_oladd(n) = t̂(NF + Z + n), for n = 0..NF−Z−1 (save for next frame)
The overlap-add combines the current frame’s IMDCT output with the saved second half from the previous frame (mem_oladd). This produces NF output samples per frame. mem_oladd is initialized to zero before the first frame.
3.4.9 Long Term Postfilter (LTPF) Decoder
3.4.9.1 Overview
The LTPF postfilter is an IIR filter applied in the time domain on the MDCT synthesis output. It sharpens the harmonic structure of voiced speech by attenuating quantization noise in spectral valleys between harmonics. Its parameter set (pitch integer part, fractional part, gain) is derived from the transmitted pitch_index and ltpf_active bits.
If gain_ltpf = 0 (high bitrate — see Section 3.4.9.4), the filter output equals the input; only the internal LTPF buffers are updated. The processing still runs to maintain memory continuity for future frames.
3.4.9.2 Transition Handling — First 2.5 ms
The first norm = NF/4 × 10/Nms samples of each frame undergo a smooth transition to avoid clicks when LTPF parameters change. There are five distinct cases depending on the current and previous frame’s ltpf_active state and whether the pitch parameters changed:
| Case | ltpf_active | mem_ltpf_active | Action |
|---|---|---|---|
| 1 | 0 | 0 | Pass through unchanged (no filter active) |
| 2 | 1 | 0 | Fade in current filter (n/norm × filter correction) |
| 3 | 0 | 1 | Fade out previous filter ((1 − n/norm) × prev filter correction) |
| 4 | 1 | 1 | Same pitch → apply filter continuously (no transition needed) |
| 5 | 1 | 1 | Different pitch → fade out prev, fade in new over norm samples |
norm = NF/4 × 10/Nms. For 48 kHz 10ms: norm = 480/4 × 1 = 120 samples = 2.5 ms.
3.4.9.3 Remainder of Frame
For samples n = norm to NF−1 (the remaining 7.5 ms of a 10ms frame), either the filter is completely off (ltpf_active=0: pass through) or the current frame’s filter is applied in full:
+ SUM[k=0..Lden] cden(k, pfr) × x̂_ltpf(n − pint + Lden/2 − k)
This is a feedback IIR filter. The denominator uses the past LTPF output (x̂_ltpf) at the pitch lag position pint, creating the resonant comb structure that amplifies harmonics and attenuates valleys.
3.4.9.4 Filter Parameters
When ltpf_active = 1, the filter parameters are computed from the received pitch_index:
Step 1: Recover pitch at 12.8 kHz
if 380 ≤ pitch_index < 440: pitch_int = floor(pitch_index/2) − 63
if pitch_index < 380: pitch_int = floor(pitch_index/4) + 32
Step 2: Scale pitch to output sample rate
pup = nint(pitchfs × 4)
pint = floor(pup / 4)
pfr = pup − 4 × pint
Filter lengths and gain lookup:
Lnum = Lden − 2
cnum(k) = 0.85 × gain_ltpf × tab_ltpf_num_fs[gain_ind][k]
cden(k, pfr) = gain_ltpf × tab_ltpf_den_fs[pfr][k]
| t_nbits range | gain_ltpf | gain_ind |
|---|---|---|
| < 320 + fsind×80 | 0.4 | 0 |
| < 400 + fsind×80 | 0.35 | 1 |
| < 480 + fsind×80 | 0.3 | 2 |
| < 560 + fsind×80 | 0.25 | 3 |
| ≥ 560 + fsind×80 | 0 (LTPF disabled) | N/A |
t_nbits = nbits × 10/7.5 for 7.5ms frames (to normalize to 10ms equivalent), else t_nbits = nbits.
3.4.10 Output Signal Scaling and Rounding
The LTPF output x̂_ltpf(n) is in the internal 16-bit scale range [−32768, 32768]. Two final steps convert it to the output PCM format:
Step 1: Clip to 16-bit integer range
x̂_clip(n) = −32768, if x̂_ltpf(n) < −32768
x̂_clip(n) = x̂_ltpf(n), otherwise
Step 2: Scale to output bit depth s
where s = bits_per_audio_sample_dec (16, 24, or 32)
For 16-bit output (s=16): scale factor = 2^0 = 1 → no change, just round. For 24-bit output: scale factor = 2^8 = 256 → multiply by 256. For 32-bit: multiply by 65536.
xo(n) is the final OutputPCM integer in the requested bit depth — this is what gets written to the audio playback buffer.
Decoder Output in BlueZ — Writing to Audio Buffer
/* Full decode loop with BFI handling (BlueZ ISO socket + liblc3) */
#include <lc3.h>
#include <bluetooth/bluetooth.h>
#include <bluetooth/iso.h>
/* Session config: 48kHz, 10ms frame, 16-bit PCM */
lc3_decoder_t dec = lc3_setup_decoder(10000, 48000, 0, malloc(lc3_decoder_size(10000, 48000)));
/* Per-frame decode loop */
uint8_t payload[400]; /* max byte_count per channel */
int16_t pcm_out[480]; /* NF = 480 samples for 48kHz 10ms */
uint8_t bfi = 0; /* bad frame indication */
/* Receive payload and extract BFI from cmsg (shown in tutorial 2) */
ssize_t len = receive_from_iso_socket(iso_fd, payload, sizeof(payload), &bfi);
if (bfi == 0 && len > 0) {
/* Good frame: decode normally */
int rc = lc3_decode(dec, payload, (int)len, LC3_PCM_FORMAT_S16, pcm_out, 1);
if (rc != 0) {
/* Internal BEC detected — treat as bad frame */
bfi = 1;
}
}
if (bfi != 0) {
/* Bad frame: generate PLC output
liblc3 lc3_decode with NULL payload triggers built-in PLC */
lc3_decode(dec, NULL, 0, LC3_PCM_FORMAT_S16, pcm_out, 1);
}
/* Write pcm_out to ALSA playback buffer */
snd_pcm_writei(pcm_handle, pcm_out, 480);
Next in this Series
Sections 3.5 and 3.6 — Frame Structure layout and External Rate Adaptation
