LC3 Internals, QoS & Multi-channel Audio

🔬 BLE Audio Deep-Dive

Chapter 5 · Part 2 — LC3 Internals, QoS & Multi-channel Audio

~20ms

Minimum LC3 Latency

BAP Codec Configs

PLC

Packet Loss Concealment

What this post covers

This is Part 2 of our Chapter 5 series. We now go inside LC3 — understanding the encoder pipeline, decoder, Packet Loss Concealment, how latency builds up in a real BLE Audio link, and finally how Quality of Service (QoS) parameters are chosen. We also look at how stereo audio is handled with multiple channels.

Keywords in this post

LD-MDCT SNS TNS LTPF PLC BFI Flag Presentation Delay QoS RTN Isochronous Interval Low Latency / High Reliability Airtime CIS / BIS Channel Allocation BAP

⚙️ 5. Inside the LC3 Encoder

The LC3 encoder takes in raw PCM audio samples and outputs a small compressed packet. Let’s trace the audio through each block — like an assembly line for sound.

LC3 Encoder Pipeline — Step by Step

🎙️ INPUT Raw PCM audio samples (e.g. 480 samples @ 48 kHz = 10 ms)	Audio comes in as uncompressed integer samples. At 48 kHz with 10 ms frames, that’s 480 samples per frame — about 960 bytes raw.
⬇
1️⃣ LD-MDCT (Low Delay MDCT)	Converts time-domain samples → frequency-domain coefficients. Think of it like decomposing music into its individual notes and frequencies. This is where most of the codec’s delay comes from (~2.5 ms for LC3). Analogy: Like a spectrum analyser on a music player — it splits the audio into frequency bands.
⬇
2️⃣ Bandwidth Detector	Detects what bandwidth the incoming audio actually uses. Possible bandwidths: NB (0–4 kHz for old phones), WB (0–8 kHz), SSWB (0–12 kHz), SWB (0–16 kHz), FB (0–20 kHz). If audio only goes up to 8 kHz, no point encoding 16–20 kHz coefficients. This avoids smearing noise into empty frequency ranges.
⬇
3️⃣ SNS (Spectral Noise Shaper)	Maximises perceived audio quality by shaping quantisation noise. When compressing, some noise is unavoidable. SNS shapes this noise so the human ear perceives it as least annoying — pushing noise into frequency regions where masking hides it. This is the psychoacoustic heart of LC3.
⬇
4️⃣ Attack Detector + TNS (Temporal Noise Shaper)	Handles sharp attack sounds like percussion. Percussion (castanets, glockenspiel, triangles) are notoriously hard for codecs because they have a flat spectrum with sudden spikes. The Attack Detector flags these, and TNS then suppresses the ringing artefacts that would otherwise appear. BLE Audio test suites specifically use castanets and triangles to evaluate this.
⬇
5️⃣ Spectral Quantiser + Noise Level	Decides how many bits each frequency coefficient gets. Like an intelligent automatic gain control — louder/more important frequencies get more bits; near-silent ones are quantised to zero (coded as silence). Noise Level then fills any gaps with pseudo-random noise so the decoder doesn’t hear complete silence where there should be low-level sound.
⬇
6️⃣ LTPF (Long Term Post Filter)	Reduces coding noise for pitched/tonal sounds at low bitrates. Runs on a resampled 12.8 kHz version of the input. Particularly helpful for voice (which is tonal) at low bitrates. The filter parameters are embedded in the bitstream so the decoder can mirror them.
⬇
📤 OUTPUT Compressed LC3 bitstream packet (e.g. 100 bytes for 48 kHz / 80 kbps)	All the coefficients, noise parameters, and LTPF data are entropy-coded and multiplexed into a single packet. No timestamps or sequence numbers — the transport layer handles timing.

LC3 Encoding in Code (BlueZ / liblc3 usage)

#include <lc3.h>   /* Bluetooth SIG reference LC3 library */
#include <stdint.h>

#define SAMPLE_RATE   48000
#define FRAME_US      10000   /* 10 ms in microseconds */
#define BITRATE       80000   /* 80 kbps */
#define CHANNELS      1       /* mono (one earbud) */

/* Calculate derived values */
int frame_samples = lc3_frame_samples(FRAME_US, SAMPLE_RATE);
/* = 480 samples for 48 kHz / 10 ms */

int encoded_bytes  = lc3_frame_bytes(FRAME_US, BITRATE);
/* = 100 bytes for 80 kbps / 10 ms */

/* Allocate encoder — one per channel */
lc3_encoder_t encoder = lc3_setup_encoder(
    FRAME_US,     /* frame duration in µs: 7500 or 10000 */
    SAMPLE_RATE,  /* 8000 / 16000 / 24000 / 32000 / 44100 / 48000 */
    0,            /* hrmode: 0 = standard                */
    malloc(lc3_encoder_size(FRAME_US, SAMPLE_RATE))
);

/* Per-frame encode loop — called every 10 ms by your audio thread */
int16_t  pcm_buffer[480];      /* 10 ms of 48 kHz samples */
uint8_t  lc3_packet[100];      /* compressed output        */

/* Collect 480 samples from audio source (mic / file / ALSA) ... */

int ret = lc3_encode(
    encoder,
    LC3_PCM_FORMAT_S16,   /* 16-bit signed PCM */
    pcm_buffer,           /* input samples     */
    1,                    /* stride (1 = packed) */
    encoded_bytes,        /* target size in bytes */
    lc3_packet            /* output buffer       */
);

if (ret == 0) {
    /* Hand lc3_packet to BlueZ ISO socket for transmission */
    send(iso_sock_fd, lc3_packet, encoded_bytes, 0);
}

/* Note: lc3_packet has NO timestamp or sequence number.
 * The BLE Controller handles timing via the Isochronous Channel. */

🔄 6. The LC3 Decoder & Packet Loss Concealment (PLC)

6.1 The Decoder — Reverse Assembly Line

The decoder is essentially the encoder in reverse. It starts with the compressed bitstream and reconstructs the original audio waveform.

LC3 Decoder Pipeline

📥 INPUT: Compressed LC3 packet (e.g. 100 bytes)

⬇

Arithmetic Decode → Restore Spectral Coefficients

⬇

Noise Filling → Fill zeroed coefficients with noise (using bandwidth info)

⬇

TNS Decode + SNS Decode → Undo noise shaping

⬇

Inverse LD-MDCT → Convert frequency back to time domain (PCM samples)

⬇

Long Term Post Filter → Apply pitch-based filter using transmitted parameters

⬇

📤 OUTPUT: Reconstructed PCM audio samples → Speaker

6.2 What Happens When a Packet is Lost? — PLC

Bluetooth shares the 2.4 GHz band with Wi-Fi, baby monitors, and countless other devices. Packets will get lost occasionally — especially outdoors where walls don’t reflect the signal.

When a packet is lost, the Controller sets a BFI flag (Bad Frame Indication). The decoder then runs PLC (Packet Loss Concealment) instead of decoding a real packet.

❌

Insert Silence
Simple but annoying. You hear a pop/click.

🔁

Repeat Last Frame
OK for 1 frame, sounds robotic if multiple frames lost.

🧠

LC3 PLC Algorithm
Predicts missing audio from surrounding frames. Nearly transparent on voice, good on music.

💡 Real analogy: Imagine you’re on a phone call and 1 word gets cut. Your brain often fills in the word from context — “Can you meet me at the _____ tomorrow?” → “restaurant?”. LC3 PLC does something similar for audio, predicting the most likely waveform.

/* BlueZ / ISO socket receive loop with PLC handling */

uint8_t  recv_buf[200];
int16_t  pcm_out[480];       /* decoded 10 ms of audio */
uint8_t  bad_frame = 0;      /* BFI flag */

ssize_t bytes_received = recv(iso_sock_fd, recv_buf, sizeof(recv_buf), 0);

if (bytes_received <= 0) {
    /* Packet lost or socket error — trigger PLC */
    bad_frame = 1;
}

/* LC3 decode — pass bad_frame=1 to activate PLC internally */
int ret = lc3_decode(
    decoder,
    bad_frame ? NULL : recv_buf,  /* NULL triggers PLC */
    bad_frame ? 0    : bytes_received,
    LC3_PCM_FORMAT_S16,
    pcm_out,
    1                              /* stride */
);

/* pcm_out now has either real decoded audio or PLC-generated audio.
 * Feed it to ALSA / PulseAudio for playback at the Presentation Delay. */
write_to_alsa(pcm_out, 480);

⏱️ 7. LC3 Latency — How It Adds Up

Let’s trace exactly how the latency builds up end-to-end in a BLE Audio link. The key insight is that latency is not one big delay — it’s the sum of many small delays, each with a purpose.

End-to-End Latency Timeline for a 10 ms LC3 Frame

Time	What Happens	Duration	Running Total
t = 0 ms	🎙️ Audio Capture starts — first sample enters the encoder	10 ms (frame fill)	10 ms
t = 10 ms	⚙️ LC3 Encoding — LD-MDCT + SNS + TNS + Quantiser run	~2.5 ms	12.5 ms
t = 12.5 ms	📡 Radio Transmission — packet sent over BLE ISO channel	<1 ms (if no retx)	~13 ms
t = 13–14 ms	🔁 Retransmissions — extra copies sent for reliability (BLE can fit several in one frame)	~1–2 ms	~14 ms
t = 14 ms	⏸️ Synchronisation Reference Point — every Acceptor starts decoding at this fixed moment	—	14 ms (fixed)
t = 14–19 ms	🔄 Presentation Delay — LC3 decode (~2 ms) + optional PLC (~2 ms) + audio processing (noise cancellation etc)	~5–40 ms	19–54 ms
t = 20+ ms	🔊 Audio rendered at speaker — Rendering Point	—	≥ ~20 ms total

🌍 Real-world comparison: 20 ms of latency = time for sound to travel 7 metres in air. Human hearing evolved to handle 25–30 ms echoes in a room without noticing. That’s why BLE Audio hearing aids feel natural — the delay is within what the brain ignores.

7.1 The Synchronisation Reference Point — Why It’s Critical for Stereo

Imagine left and right earbuds each decoding at slightly different times. You’d hear stereo as slightly time-shifted — the music would feel “off”. The Synchronisation Reference Point is a fixed timestamp broadcast by the Initiator. All Acceptors start decoding at exactly this moment, ensuring left and right earbuds are perfectly synchronised.

7.2 Presentation Delay — The Buffer Before Rendering

BAP (Basic Audio Profile) requires every Audio Sink to support a Presentation Delay of 40 ms. This is not the default — it’s the minimum that must be in the supported range. Why 40 ms? It gives enough buffer time for:

LC3 decoding (~2 ms)
PLC if needed (~2 ms)
Noise cancellation / speech enhancement algorithms (~5–30 ms)
Synchronisation buffer between devices

High-end hearing aids can achieve Presentation Delay as low as 5 ms. TMAP and HAP profiles require support for 20 ms.

🎛️ 8. Choosing LC3 Parameters — What You Configure as a Developer

You never tweak the internals of LC3. As a BLE Audio developer, you configure four key parameters, and BAP defines 16 standard combinations to help interoperability.

LC3 Configuration Parameters

Parameter	Options	Use Case Hint
Sampling Rate	8 / 16 / 24 / 32 / 44.1 / 48 kHz	8 kHz = basic voice; 48 kHz = hi-fi music
Frame Duration	7.5 ms or 10 ms	10 ms for new designs; 7.5 ms for Classic BT interop
Bytes per Frame	20 – 400 bytes	More bytes = better quality + more airtime used
Bits per Sample	16, 24, or 32	16 is standard; 24/32 for professional audio

Key BAP Codec Configurations (from BAP Table 3.5) — Selected Entries

Config Name	Sampling (kHz)	Frame Duration	Bytes/Frame	Bitrate (kbps)	Typical Use
8_2	8	10 ms	30	24	Basic voice / IoT
16_2 ★	16	10 ms	40	32	Mandatory for all BLE Audio; good voice
24_2 ★	24	10 ms	60	48	Mandatory for Broadcast Sinks; music with background noise
32_2	32	10 ms	80	64	Good music, most users won’t notice difference
48_2	48	10 ms	100	80	Hi-fi music; indistinguishable from original
48_6	48	10 ms	155	124	Maximum quality — but uses heavy airtime

★ Mandatory minimum configurations that every BLE Audio device must support.

Subjective Audio Quality at Different Sampling Rates

Sampling Rate	What You Hear	Best For
8 kHz	Old telephone quality — intelligible voice	IoT devices, baby monitors
16 kHz	Clear voice; good for voice recognition	Phone calls, voice assistants
24 kHz	Adequate music with some noticeable quality limit	Hearing aids, PA systems, multi-stream scenarios
32 kHz	Most users cannot tell difference from original	Earbuds, headphones
48 kHz	Transparent — indistinguishable from original	Premium earbuds, audiophile use

📊 9. Quality of Service (QoS) — The Reliability vs Latency Trade-off

QoS in BLE Audio is about answering: “How aggressively should we retransmit packets to avoid audio glitches, and what latency price are we willing to pay?”

BLE Audio uses Isochronous Channels to send audio. Each transmission window is called an Isochronous Interval (= frame duration = typically 10 ms). If a packet can be retransmitted multiple times within one interval, latency stays low. If retransmissions spill into the next interval (or the one after), latency grows by one frame per extra interval.

BAP QoS Modes — Low Latency vs High Reliability

Mode	Strategy	Isochronous Intervals Used	Typical Latency (16 kHz, PD=40ms)	Best For
Low Latency	Fit all retransmissions in 1 interval	1 (most configs)	~62.5 ms	Hearing aids, gaming, video sync
High Reliability (Unicast)	Spread retransmissions across up to 10 intervals	Up to 10	~147.5 ms	Music streaming (glitch-free matters more than latency)
High Reliability (Broadcast)	Spread across up to 6–8 intervals	Up to 6–8	~112.5 ms	Cinema / concert hall broadcast

9.1 RTN — The Retransmission Number

The Host tells the Controller how many retransmissions to allow via the RTN (Retransmission Number) parameter. But RTN is only a hint — the Controller (hardware scheduler) can adjust this based on what else the chip is doing (e.g. Wi-Fi coexistence).

Key insight: RTN is the maximum retransmissions, not the average. If the audio link is clean, the packet gets through on the first try. RTN only kicks in when interference happens. Average transmissions per frame = NSE / BN (from the Isochronous Channel parameters).

9.2 The 48_2_2 Example — Three Ways the Controller Can Schedule It

BAP provides a concrete example: the 48_2_2 High Reliability broadcast configuration (48 kHz, 10 ms, 100 bytes, RTN=4, max transport latency 65 ms). The Controller can implement this in three ways:

Controller Scheduling Options for 48_2_2 Broadcast QoS

Option	ISO Interval	BN	NSE	Latency (Stereo)	Airtime (Stereo)	Best For
1	30 ms	3	9	71 ms	36% ✅ lowest	Wi-Fi coexistence (phone streaming)
2	10 ms	1	5	46 ms ✅ lowest	59% ❌ highest	Dedicated broadcast device, no Wi-Fi
3	20 ms	2	8	70 ms	48%	Balanced — Wi-Fi + moderate reliability

BN = Burst Number (frames per ISO interval); NSE = Number of Sub-Events. The Host has no control over which option the chip chooses — ask your silicon vendor.

9.3 Airtime — The Scarce Resource

Airtime is like road space on a single-lane highway. Every packet occupies the road for a while. LC3 is already much better than SBC (roughly half the airtime for the same audio quality). But as you increase quality (48 kHz, high bitrate) and reliability (more retransmissions), airtime usage climbs fast:

Approximate Stereo Airtime Usage by Configuration (10 ms frame, Low Latency)

16_2_1	~16% mono / ~20% stereo
24_2_1	~22% mono / ~30% stereo
32_2_1	~28% mono / ~38% stereo
48_2_2	~30% mono / ~59% stereo (Option 2)
48_6_2	~80%+ stereo — leaves very little room for anything else

At 24 kHz you can fit two stereo streams (multi-language) in the air. At 48 kHz with high reliability, you can barely fit one stereo stream.

🎧 10. Multi-channel Audio — How Stereo Works in BLE Audio

Stereo means two audio channels (Left and Right). BLE Audio gives you flexibility in how to deliver those two channels. This is controlled by Channel Allocation and how many CISes (Connected Isochronous Streams) are established.

Two Ways to Deliver Stereo from Phone to Headphones

Option	How	Pros	Cons
Option (a) Two Separate CISes	CIS 1 → Left channel only CIS 2 → Right channel only Channel Allocation = 1 per CIS	Smaller individual packets → better reliability	Two CIS connections must be maintained
Option (b) Single CIS, Multiplexed	L+R concatenated by Controller → Single CIS packet Channel Allocation = 2 per CIS	One CIS; simpler connection management	Larger packets, more susceptible to interference

10.1 The Popular Case — Separate Earbuds

Stereo to Separate Earbuds — Option (a): One CIS per Earbud

📱 Phone (Initiator)	CIS — Left (L)	🎧 Left Earbud Audio Location: Front Left
	CIS — Right (R)	🎧 Right Earbud Audio Location: Front Right

This is the standard configuration for TWS (True Wireless Stereo) earbuds like AirPods.

10.2 Broadcast for Multiple Languages

A cinema can broadcast audio in multiple languages simultaneously using BIS (Broadcast Isochronous Streams). Each language stream is a separate BIS. Audience members tune in to whichever BIS matches their preferred language — like selecting a radio channel.

📡

Cinema BIS 1
English Stereo (24 kHz, 60 bytes)

📡

Cinema BIS 2
Hindi Stereo (24 kHz, 60 bytes)

📡

Cinema BIS 3
Audio Description (8 kHz, 30 bytes)

This works at 24 kHz because each stereo stream uses ~30% airtime — two streams = 60%, leaving 40% for other uses. At 48 kHz, one stereo stream already uses 59% — there’s no room for a second language!

/* BlueZ: Setting up two CIS connections (Left + Right earbuds)
 * using the ISO socket API (BlueZ 5.65+)                        */

#include <bluetooth/bluetooth.h>
#include <bluetooth/iso.h>

struct bt_iso_qos qos = {
    .cig = BT_ISO_QOS_CIG_UNSET,
    .cis = BT_ISO_QOS_CIS_UNSET,
    .sdu        = 100,         /* 100 bytes per frame = 80 kbps @ 48kHz */
    .phy        = BT_ISO_PHY_2M,
    .rtn        = 2,           /* retransmission number (guidance to controller) */
    .latency    = 10,          /* max transport latency in ms */
    .interval   = 10000,       /* SDU interval = 10 ms in µs */
    .framing    = BT_ISO_FRAMING_UNFRAMED,
};

/* Create two ISO sockets — one for each earbud */
int left_sock  = socket(PF_BLUETOOTH, SOCK_SEQPACKET, BTPROTO_ISO);
int right_sock = socket(PF_BLUETOOTH, SOCK_SEQPACKET, BTPROTO_ISO);

/* Set QoS on each socket */
setsockopt(left_sock,  SOL_BLUETOOTH, BT_ISO_QOS, &qos, sizeof(qos));
setsockopt(right_sock, SOL_BLUETOOTH, BT_ISO_QOS, &qos, sizeof(qos));

/* Bind and connect to each earbud address... */
/* (bind/connect code omitted for brevity)   */

/* In the audio loop: encode LEFT channel, send on left_sock
 *                    encode RIGHT channel, send on right_sock
 *                    Both share the same CIG so they're synchronised */

lc3_encode(enc_left,  LC3_PCM_FORMAT_S16, left_pcm,  1, 100, left_pkt);
lc3_encode(enc_right, LC3_PCM_FORMAT_S16, right_pcm, 1, 100, right_pkt);

send(left_sock,  left_pkt,  100, 0);  /* sent on CIS for left earbud  */
send(right_sock, right_pkt, 100, 0);  /* sent on CIS for right earbud */

/* Both earbuds share the same CIG synchronisation reference point
 * so they render audio at exactly the same moment → perfect stereo */

🧠 Complete Chapter 5 Summary

LD-MDCT	Converts time-domain audio to frequency-domain (like a spectrum analyser). Core of LC3 encoding.
SNS / TNS	Shape quantisation noise so the ear perceives it as least annoying. Handle percussion transients.
PLC	Predicts and fills in lost audio packets. Triggered by BFI flag. Works well on voice, good on music.
Presentation Delay	Buffer time at receiver for decode + PLC + audio processing. BAP requires 40 ms in supported range.
16_2 / 24_2	Mandatory BAP configurations every BLE Audio device must support.
Low Latency QoS	Fits all retransmissions in 1 ISO interval. ~62 ms total at 16 kHz with PD=40ms.
High Reliability QoS	Spreads retransmissions across up to 10 intervals. ~147 ms. Best for glitch-free music streaming.
RTN	Retransmission Number — a hint to the Controller, not a hard command. Average tx/frame = NSE/BN.
Stereo Options	Two separate CISes (one per earbud — most common) OR single CIS with L+R multiplexed.

Series Complete! 🎉

You now understand the full stack: why codecs exist, how LC3 compresses and reconstructs audio, how latency builds up in BLE Audio, and how to choose QoS settings for your application. Next up: explore BAP, TMAP, and HAP profiles that build on top of everything you’ve learned here.

← Back to EmbeddedPathashala BLE Audio Series Index

embeddedpathashala.com