LC3 Decoding: MDCT Synthesis, LTPF Decoder, and Output Scaling

 

LC3 Decoding: MDCT Synthesis, LTPF Decoder, and Output Scaling
Chapter 3, Sections 3.4.8 – 3.4.10 — Inverse Transform, Pitch Postfilter, and PCM Output
Chapter
3 — Sections 3.4.8–3.4.10
Level
Advanced

These are the final decoder stages: the inverse MDCT converts the reconstructed spectrum back to time-domain samples, the LTPF postfilter sharpens the harmonic structure of voiced speech, and the output scaling converts the internal floating-point signal back to the requested PCM integer format (16, 24, or 32 bits). The final OutputPCM samples are ready for playback.

Keywords

Inverse LD-MDCT Overlap-Add mem_oladd LTPF IIR Filter Transition Handling Output Clipping PCM Scaling

3.4.8 Low Delay MDCT Synthesis

The inverse MDCT converts the NE reconstructed spectral coefficients X̂(k) back to a time-domain signal. This is a three-step process: IMDCT transform, windowing, and overlap-add.

Step 1: Generate time-aliased buffer t̂(n)

t̂(n) = sqrt(2/NF) × SUM[k=0..NF−1] X̂(k) × cos(π/NF × (n + 0.5 + NF/2) × (k + 0.5))
for n = 0..2NF−1

This is the standard IMDCT — the same formula as the forward MDCT but without the window, producing 2NF output samples from NF input coefficients.

Step 2: Windowing (in-place)

t̂(n) = wN(2N − 1 − n) × t̂(n), for n = 0..2NF−1

The same LD-MDCT window wN is applied — but flipped (indexed from the end). This is the analysis-synthesis property of the MDCT that ensures perfect reconstruction when no quantization is applied.

Step 3: Overlap-Add

x̂(n) = mem_oladd(n) + t̂(Z + n), for n = 0..NF−Z−1
x̂(n) = t̂(Z + n), for n = NF−Z..NF−1
mem_oladd(n) = t̂(NF + Z + n), for n = 0..NF−Z−1 (save for next frame)

The overlap-add combines the current frame’s IMDCT output with the saved second half from the previous frame (mem_oladd). This produces NF output samples per frame. mem_oladd is initialized to zero before the first frame.

3.4.9 Long Term Postfilter (LTPF) Decoder

3.4.9.1 Overview

The LTPF postfilter is an IIR filter applied in the time domain on the MDCT synthesis output. It sharpens the harmonic structure of voiced speech by attenuating quantization noise in spectral valleys between harmonics. Its parameter set (pitch integer part, fractional part, gain) is derived from the transmitted pitch_index and ltpf_active bits.

If gain_ltpf = 0 (high bitrate — see Section 3.4.9.4), the filter output equals the input; only the internal LTPF buffers are updated. The processing still runs to maintain memory continuity for future frames.

3.4.9.2 Transition Handling — First 2.5 ms

The first norm = NF/4 × 10/Nms samples of each frame undergo a smooth transition to avoid clicks when LTPF parameters change. There are five distinct cases depending on the current and previous frame’s ltpf_active state and whether the pitch parameters changed:

Case ltpf_active mem_ltpf_active Action
1 0 0 Pass through unchanged (no filter active)
2 1 0 Fade in current filter (n/norm × filter correction)
3 0 1 Fade out previous filter ((1 − n/norm) × prev filter correction)
4 1 1 Same pitch → apply filter continuously (no transition needed)
5 1 1 Different pitch → fade out prev, fade in new over norm samples

norm = NF/4 × 10/Nms. For 48 kHz 10ms: norm = 480/4 × 1 = 120 samples = 2.5 ms.

3.4.9.3 Remainder of Frame

For samples n = norm to NF−1 (the remaining 7.5 ms of a 10ms frame), either the filter is completely off (ltpf_active=0: pass through) or the current frame’s filter is applied in full:

x̂_ltpf(n) = x̂(n) − SUM[k=0..Lnum] cnum(k) × x̂(n−k)
+ SUM[k=0..Lden] cden(k, pfr) × x̂_ltpf(n − pint + Lden/2 − k)

This is a feedback IIR filter. The denominator uses the past LTPF output (x̂_ltpf) at the pitch lag position pint, creating the resonant comb structure that amplifies harmonics and attenuates valleys.

3.4.9.4 Filter Parameters

When ltpf_active = 1, the filter parameters are computed from the received pitch_index:

Step 1: Recover pitch at 12.8 kHz

if pitch_index ≥ 440: pitch_int = pitch_index − 283
if 380 ≤ pitch_index < 440: pitch_int = floor(pitch_index/2) − 63
if pitch_index < 380: pitch_int = floor(pitch_index/4) + 32

Step 2: Scale pitch to output sample rate

pitchfs = pitch × 8000 × ceil(fs/8000) / 12800
pup = nint(pitchfs × 4)
pint = floor(pup / 4)
pfr = pup − 4 × pint

Filter lengths and gain lookup:

Lden = max(4, ceil(fs/4000))
Lnum = Lden − 2
cnum(k) = 0.85 × gain_ltpf × tab_ltpf_num_fs[gain_ind][k]
cden(k, pfr) = gain_ltpf × tab_ltpf_den_fs[pfr][k]
t_nbits range gain_ltpf gain_ind
< 320 + fsind×80 0.4 0
< 400 + fsind×80 0.35 1
< 480 + fsind×80 0.3 2
< 560 + fsind×80 0.25 3
≥ 560 + fsind×80 0 (LTPF disabled) N/A

t_nbits = nbits × 10/7.5 for 7.5ms frames (to normalize to 10ms equivalent), else t_nbits = nbits.

3.4.10 Output Signal Scaling and Rounding

The LTPF output x̂_ltpf(n) is in the internal 16-bit scale range [−32768, 32768]. Two final steps convert it to the output PCM format:

Step 1: Clip to 16-bit integer range

x̂_clip(n) = 32767, if x̂_ltpf(n) > 32767
x̂_clip(n) = −32768, if x̂_ltpf(n) < −32768
x̂_clip(n) = x̂_ltpf(n), otherwise

Step 2: Scale to output bit depth s

xo(n) = nint(x̂_clip(n) × 2^(−15 + s − 1))
where s = bits_per_audio_sample_dec (16, 24, or 32)

For 16-bit output (s=16): scale factor = 2^0 = 1 → no change, just round. For 24-bit output: scale factor = 2^8 = 256 → multiply by 256. For 32-bit: multiply by 65536.

xo(n) is the final OutputPCM integer in the requested bit depth — this is what gets written to the audio playback buffer.

Decoder Output in BlueZ — Writing to Audio Buffer

/* Full decode loop with BFI handling (BlueZ ISO socket + liblc3) */
#include <lc3.h>
#include <bluetooth/bluetooth.h>
#include <bluetooth/iso.h>

/* Session config: 48kHz, 10ms frame, 16-bit PCM */
lc3_decoder_t dec = lc3_setup_decoder(10000, 48000, 0, malloc(lc3_decoder_size(10000, 48000)));

/* Per-frame decode loop */
uint8_t  payload[400];       /* max byte_count per channel    */
int16_t  pcm_out[480];       /* NF = 480 samples for 48kHz 10ms */
uint8_t  bfi = 0;            /* bad frame indication          */

/* Receive payload and extract BFI from cmsg (shown in tutorial 2)  */
ssize_t len = receive_from_iso_socket(iso_fd, payload, sizeof(payload), &bfi);

if (bfi == 0 && len > 0) {
    /* Good frame: decode normally */
    int rc = lc3_decode(dec, payload, (int)len, LC3_PCM_FORMAT_S16, pcm_out, 1);
    if (rc != 0) {
        /* Internal BEC detected — treat as bad frame */
        bfi = 1;
    }
}

if (bfi != 0) {
    /* Bad frame: generate PLC output
       liblc3 lc3_decode with NULL payload triggers built-in PLC */
    lc3_decode(dec, NULL, 0, LC3_PCM_FORMAT_S16, pcm_out, 1);
}

/* Write pcm_out to ALSA playback buffer */
snd_pcm_writei(pcm_handle, pcm_out, 480);

Next in this Series

Sections 3.5 and 3.6 — Frame Structure layout and External Rate Adaptation

Next Tutorial → All Tutorials

Leave a Reply

Your email address will not be published. Required fields are marked *