LC3 Internals, QoS & Multi-channel Audio

πŸ”¬ BLE Audio Deep-Dive
Chapter 5 Β· Part 2 β€” LC3 Internals, QoS & Multi-channel Audio
~20ms
Minimum LC3 Latency
16
BAP Codec Configs
PLC
Packet Loss Concealment

What this post covers

This is Part 2 of our Chapter 5 series. We now go inside LC3 β€” understanding the encoder pipeline, decoder, Packet Loss Concealment, how latency builds up in a real BLE Audio link, and finally how Quality of Service (QoS) parameters are chosen. We also look at how stereo audio is handled with multiple channels.
Keywords in this post
LD-MDCT SNS TNS LTPF PLC BFI Flag Presentation Delay QoS RTN Isochronous Interval Low Latency / High Reliability Airtime CIS / BIS Channel Allocation BAP
βš™οΈ 5. Inside the LC3 Encoder
The LC3 encoder takes in raw PCM audio samples and outputs a small compressed packet. Let’s trace the audio through each block β€” like an assembly line for sound.
LC3 Encoder Pipeline β€” Step by Step
πŸŽ™οΈ INPUT Raw PCM audio samples (e.g. 480 samples @ 48 kHz = 10 ms) Audio comes in as uncompressed integer samples. At 48 kHz with 10 ms frames, that’s 480 samples per frame β€” about 960 bytes raw.
⬇
1️⃣ LD-MDCT (Low Delay MDCT) Converts time-domain samples β†’ frequency-domain coefficients. Think of it like decomposing music into its individual notes and frequencies. This is where most of the codec’s delay comes from (~2.5 ms for LC3). Analogy: Like a spectrum analyser on a music player β€” it splits the audio into frequency bands.
⬇
2️⃣ Bandwidth Detector Detects what bandwidth the incoming audio actually uses. Possible bandwidths: NB (0–4 kHz for old phones), WB (0–8 kHz), SSWB (0–12 kHz), SWB (0–16 kHz), FB (0–20 kHz). If audio only goes up to 8 kHz, no point encoding 16–20 kHz coefficients. This avoids smearing noise into empty frequency ranges.
⬇
3️⃣ SNS (Spectral Noise Shaper) Maximises perceived audio quality by shaping quantisation noise. When compressing, some noise is unavoidable. SNS shapes this noise so the human ear perceives it as least annoying β€” pushing noise into frequency regions where masking hides it. This is the psychoacoustic heart of LC3.
⬇
4️⃣ Attack Detector + TNS (Temporal Noise Shaper) Handles sharp attack sounds like percussion. Percussion (castanets, glockenspiel, triangles) are notoriously hard for codecs because they have a flat spectrum with sudden spikes. The Attack Detector flags these, and TNS then suppresses the ringing artefacts that would otherwise appear. BLE Audio test suites specifically use castanets and triangles to evaluate this.
⬇
5️⃣ Spectral Quantiser + Noise Level Decides how many bits each frequency coefficient gets. Like an intelligent automatic gain control β€” louder/more important frequencies get more bits; near-silent ones are quantised to zero (coded as silence). Noise Level then fills any gaps with pseudo-random noise so the decoder doesn’t hear complete silence where there should be low-level sound.
⬇
6️⃣ LTPF (Long Term Post Filter) Reduces coding noise for pitched/tonal sounds at low bitrates. Runs on a resampled 12.8 kHz version of the input. Particularly helpful for voice (which is tonal) at low bitrates. The filter parameters are embedded in the bitstream so the decoder can mirror them.
⬇
πŸ“€ OUTPUT Compressed LC3 bitstream packet (e.g. 100 bytes for 48 kHz / 80 kbps) All the coefficients, noise parameters, and LTPF data are entropy-coded and multiplexed into a single packet. No timestamps or sequence numbers β€” the transport layer handles timing.

LC3 Encoding in Code (BlueZ / liblc3 usage)

#include <lc3.h>   /* Bluetooth SIG reference LC3 library */
#include <stdint.h>

#define SAMPLE_RATE   48000
#define FRAME_US      10000   /* 10 ms in microseconds */
#define BITRATE       80000   /* 80 kbps */
#define CHANNELS      1       /* mono (one earbud) */

/* Calculate derived values */
int frame_samples = lc3_frame_samples(FRAME_US, SAMPLE_RATE);
/* = 480 samples for 48 kHz / 10 ms */

int encoded_bytes  = lc3_frame_bytes(FRAME_US, BITRATE);
/* = 100 bytes for 80 kbps / 10 ms */

/* Allocate encoder β€” one per channel */
lc3_encoder_t encoder = lc3_setup_encoder(
    FRAME_US,     /* frame duration in Β΅s: 7500 or 10000 */
    SAMPLE_RATE,  /* 8000 / 16000 / 24000 / 32000 / 44100 / 48000 */
    0,            /* hrmode: 0 = standard                */
    malloc(lc3_encoder_size(FRAME_US, SAMPLE_RATE))
);

/* Per-frame encode loop β€” called every 10 ms by your audio thread */
int16_t  pcm_buffer[480];      /* 10 ms of 48 kHz samples */
uint8_t  lc3_packet[100];      /* compressed output        */

/* Collect 480 samples from audio source (mic / file / ALSA) ... */

int ret = lc3_encode(
    encoder,
    LC3_PCM_FORMAT_S16,   /* 16-bit signed PCM */
    pcm_buffer,           /* input samples     */
    1,                    /* stride (1 = packed) */
    encoded_bytes,        /* target size in bytes */
    lc3_packet            /* output buffer       */
);

if (ret == 0) {
    /* Hand lc3_packet to BlueZ ISO socket for transmission */
    send(iso_sock_fd, lc3_packet, encoded_bytes, 0);
}

/* Note: lc3_packet has NO timestamp or sequence number.
 * The BLE Controller handles timing via the Isochronous Channel. */

πŸ”„ 6. The LC3 Decoder & Packet Loss Concealment (PLC)

6.1 The Decoder β€” Reverse Assembly Line

The decoder is essentially the encoder in reverse. It starts with the compressed bitstream and reconstructs the original audio waveform.

LC3 Decoder Pipeline
πŸ“₯ INPUT: Compressed LC3 packet (e.g. 100 bytes)
⬇
Arithmetic Decode β†’ Restore Spectral Coefficients
⬇
Noise Filling β†’ Fill zeroed coefficients with noise (using bandwidth info)
⬇
TNS Decode + SNS Decode β†’ Undo noise shaping
⬇
Inverse LD-MDCT β†’ Convert frequency back to time domain (PCM samples)
⬇
Long Term Post Filter β†’ Apply pitch-based filter using transmitted parameters
⬇
πŸ“€ OUTPUT: Reconstructed PCM audio samples β†’ Speaker

6.2 What Happens When a Packet is Lost? β€” PLC

Bluetooth shares the 2.4 GHz band with Wi-Fi, baby monitors, and countless other devices. Packets will get lost occasionally β€” especially outdoors where walls don’t reflect the signal.

When a packet is lost, the Controller sets a BFI flag (Bad Frame Indication). The decoder then runs PLC (Packet Loss Concealment) instead of decoding a real packet.

❌

Insert Silence
Simple but annoying. You hear a pop/click.

πŸ”

Repeat Last Frame
OK for 1 frame, sounds robotic if multiple frames lost.

🧠

LC3 PLC Algorithm
Predicts missing audio from surrounding frames. Nearly transparent on voice, good on music.

πŸ’‘ Real analogy: Imagine you’re on a phone call and 1 word gets cut. Your brain often fills in the word from context β€” “Can you meet me at the _____ tomorrow?” β†’ “restaurant?”. LC3 PLC does something similar for audio, predicting the most likely waveform.
/* BlueZ / ISO socket receive loop with PLC handling */

uint8_t  recv_buf[200];
int16_t  pcm_out[480];       /* decoded 10 ms of audio */
uint8_t  bad_frame = 0;      /* BFI flag */

ssize_t bytes_received = recv(iso_sock_fd, recv_buf, sizeof(recv_buf), 0);

if (bytes_received <= 0) {
    /* Packet lost or socket error β€” trigger PLC */
    bad_frame = 1;
}

/* LC3 decode β€” pass bad_frame=1 to activate PLC internally */
int ret = lc3_decode(
    decoder,
    bad_frame ? NULL : recv_buf,  /* NULL triggers PLC */
    bad_frame ? 0    : bytes_received,
    LC3_PCM_FORMAT_S16,
    pcm_out,
    1                              /* stride */
);

/* pcm_out now has either real decoded audio or PLC-generated audio.
 * Feed it to ALSA / PulseAudio for playback at the Presentation Delay. */
write_to_alsa(pcm_out, 480);

⏱️ 7. LC3 Latency β€” How It Adds Up

Let’s trace exactly how the latency builds up end-to-end in a BLE Audio link. The key insight is that latency is not one big delay β€” it’s the sum of many small delays, each with a purpose.

End-to-End Latency Timeline for a 10 ms LC3 Frame
Time What Happens Duration Running Total
t = 0 ms πŸŽ™οΈ Audio Capture starts β€” first sample enters the encoder 10 ms (frame fill) 10 ms
t = 10 ms βš™οΈ LC3 Encoding β€” LD-MDCT + SNS + TNS + Quantiser run ~2.5 ms 12.5 ms
t = 12.5 ms πŸ“‘ Radio Transmission β€” packet sent over BLE ISO channel <1 ms (if no retx) ~13 ms
t = 13–14 ms πŸ” Retransmissions β€” extra copies sent for reliability (BLE can fit several in one frame) ~1–2 ms ~14 ms
t = 14 ms ⏸️ Synchronisation Reference Point β€” every Acceptor starts decoding at this fixed moment β€” 14 ms (fixed)
t = 14–19 ms πŸ”„ Presentation Delay β€” LC3 decode (~2 ms) + optional PLC (~2 ms) + audio processing (noise cancellation etc) ~5–40 ms 19–54 ms
t = 20+ ms πŸ”Š Audio rendered at speaker β€” Rendering Point β€” β‰₯ ~20 ms total
🌍 Real-world comparison: 20 ms of latency = time for sound to travel 7 metres in air. Human hearing evolved to handle 25–30 ms echoes in a room without noticing. That’s why BLE Audio hearing aids feel natural β€” the delay is within what the brain ignores.

7.1 The Synchronisation Reference Point β€” Why It’s Critical for Stereo

Imagine left and right earbuds each decoding at slightly different times. You’d hear stereo as slightly time-shifted β€” the music would feel “off”. The Synchronisation Reference Point is a fixed timestamp broadcast by the Initiator. All Acceptors start decoding at exactly this moment, ensuring left and right earbuds are perfectly synchronised.

7.2 Presentation Delay β€” The Buffer Before Rendering

BAP (Basic Audio Profile) requires every Audio Sink to support a Presentation Delay of 40 ms. This is not the default β€” it’s the minimum that must be in the supported range. Why 40 ms? It gives enough buffer time for:

  • LC3 decoding (~2 ms)
  • PLC if needed (~2 ms)
  • Noise cancellation / speech enhancement algorithms (~5–30 ms)
  • Synchronisation buffer between devices

High-end hearing aids can achieve Presentation Delay as low as 5 ms. TMAP and HAP profiles require support for 20 ms.

πŸŽ›οΈ 8. Choosing LC3 Parameters β€” What You Configure as a Developer

You never tweak the internals of LC3. As a BLE Audio developer, you configure four key parameters, and BAP defines 16 standard combinations to help interoperability.

LC3 Configuration Parameters
Parameter Options Use Case Hint
Sampling Rate 8 / 16 / 24 / 32 / 44.1 / 48 kHz 8 kHz = basic voice; 48 kHz = hi-fi music
Frame Duration 7.5 ms or 10 ms 10 ms for new designs; 7.5 ms for Classic BT interop
Bytes per Frame 20 – 400 bytes More bytes = better quality + more airtime used
Bits per Sample 16, 24, or 32 16 is standard; 24/32 for professional audio

Key BAP Codec Configurations (from BAP Table 3.5) β€” Selected Entries
Config Name Sampling (kHz) Frame Duration Bytes/Frame Bitrate (kbps) Typical Use
8_2 8 10 ms 30 24 Basic voice / IoT
16_2 β˜… 16 10 ms 40 32 Mandatory for all BLE Audio; good voice
24_2 β˜… 24 10 ms 60 48 Mandatory for Broadcast Sinks; music with background noise
32_2 32 10 ms 80 64 Good music, most users won’t notice difference
48_2 48 10 ms 100 80 Hi-fi music; indistinguishable from original
48_6 48 10 ms 155 124 Maximum quality β€” but uses heavy airtime

β˜… Mandatory minimum configurations that every BLE Audio device must support.

Subjective Audio Quality at Different Sampling Rates
Sampling Rate What You Hear Best For
8 kHz Old telephone quality β€” intelligible voice IoT devices, baby monitors
16 kHz Clear voice; good for voice recognition Phone calls, voice assistants
24 kHz Adequate music with some noticeable quality limit Hearing aids, PA systems, multi-stream scenarios
32 kHz Most users cannot tell difference from original Earbuds, headphones
48 kHz Transparent β€” indistinguishable from original Premium earbuds, audiophile use

πŸ“Š 9. Quality of Service (QoS) β€” The Reliability vs Latency Trade-off

QoS in BLE Audio is about answering: “How aggressively should we retransmit packets to avoid audio glitches, and what latency price are we willing to pay?”

BLE Audio uses Isochronous Channels to send audio. Each transmission window is called an Isochronous Interval (= frame duration = typically 10 ms). If a packet can be retransmitted multiple times within one interval, latency stays low. If retransmissions spill into the next interval (or the one after), latency grows by one frame per extra interval.

BAP QoS Modes β€” Low Latency vs High Reliability
Mode Strategy Isochronous Intervals Used Typical Latency (16 kHz, PD=40ms) Best For
Low Latency Fit all retransmissions in 1 interval 1 (most configs) ~62.5 ms Hearing aids, gaming, video sync
High Reliability (Unicast) Spread retransmissions across up to 10 intervals Up to 10 ~147.5 ms Music streaming (glitch-free matters more than latency)
High Reliability (Broadcast) Spread across up to 6–8 intervals Up to 6–8 ~112.5 ms Cinema / concert hall broadcast

9.1 RTN β€” The Retransmission Number

The Host tells the Controller how many retransmissions to allow via the RTN (Retransmission Number) parameter. But RTN is only a hint β€” the Controller (hardware scheduler) can adjust this based on what else the chip is doing (e.g. Wi-Fi coexistence).

Key insight: RTN is the maximum retransmissions, not the average. If the audio link is clean, the packet gets through on the first try. RTN only kicks in when interference happens. Average transmissions per frame = NSE / BN (from the Isochronous Channel parameters).

9.2 The 48_2_2 Example β€” Three Ways the Controller Can Schedule It

BAP provides a concrete example: the 48_2_2 High Reliability broadcast configuration (48 kHz, 10 ms, 100 bytes, RTN=4, max transport latency 65 ms). The Controller can implement this in three ways:

Controller Scheduling Options for 48_2_2 Broadcast QoS
Option ISO Interval BN NSE Latency (Stereo) Airtime (Stereo) Best For
1 30 ms 3 9 71 ms 36% βœ… lowest Wi-Fi coexistence (phone streaming)
2 10 ms 1 5 46 ms βœ… lowest 59% ❌ highest Dedicated broadcast device, no Wi-Fi
3 20 ms 2 8 70 ms 48% Balanced β€” Wi-Fi + moderate reliability

BN = Burst Number (frames per ISO interval); NSE = Number of Sub-Events. The Host has no control over which option the chip chooses β€” ask your silicon vendor.

9.3 Airtime β€” The Scarce Resource

Airtime is like road space on a single-lane highway. Every packet occupies the road for a while. LC3 is already much better than SBC (roughly half the airtime for the same audio quality). But as you increase quality (48 kHz, high bitrate) and reliability (more retransmissions), airtime usage climbs fast:

Approximate Stereo Airtime Usage by Configuration (10 ms frame, Low Latency)
16_2_1

~16% mono / ~20% stereo

24_2_1

~22% mono / ~30% stereo

32_2_1

~28% mono / ~38% stereo

48_2_2

~30% mono / ~59% stereo (Option 2)

48_6_2

~80%+ stereo β€” leaves very little room for anything else

At 24 kHz you can fit two stereo streams (multi-language) in the air. At 48 kHz with high reliability, you can barely fit one stereo stream.

🎧 10. Multi-channel Audio β€” How Stereo Works in BLE Audio

Stereo means two audio channels (Left and Right). BLE Audio gives you flexibility in how to deliver those two channels. This is controlled by Channel Allocation and how many CISes (Connected Isochronous Streams) are established.

Two Ways to Deliver Stereo from Phone to Headphones
Option How Pros Cons
Option (a)
Two Separate CISes
CIS 1 β†’ Left channel only
CIS 2 β†’ Right channel only
Channel Allocation = 1 per CIS
Smaller individual packets β†’ better reliability Two CIS connections must be maintained
Option (b)
Single CIS, Multiplexed
L+R concatenated by Controller
β†’ Single CIS packet
Channel Allocation = 2 per CIS
One CIS; simpler connection management Larger packets, more susceptible to interference

10.1 The Popular Case β€” Separate Earbuds

Stereo to Separate Earbuds β€” Option (a): One CIS per Earbud

πŸ“±
Phone
(Initiator)
CIS β€” Left (L)
🎧
Left Earbud
Audio Location: Front Left
CIS β€” Right (R)
🎧
Right Earbud
Audio Location: Front Right

This is the standard configuration for TWS (True Wireless Stereo) earbuds like AirPods.

10.2 Broadcast for Multiple Languages

A cinema can broadcast audio in multiple languages simultaneously using BIS (Broadcast Isochronous Streams). Each language stream is a separate BIS. Audience members tune in to whichever BIS matches their preferred language β€” like selecting a radio channel.

πŸ“‘

Cinema BIS 1
English Stereo (24 kHz, 60 bytes)

πŸ“‘

Cinema BIS 2
Hindi Stereo (24 kHz, 60 bytes)

πŸ“‘

Cinema BIS 3
Audio Description (8 kHz, 30 bytes)

This works at 24 kHz because each stereo stream uses ~30% airtime β€” two streams = 60%, leaving 40% for other uses. At 48 kHz, one stereo stream already uses 59% β€” there’s no room for a second language!

/* BlueZ: Setting up two CIS connections (Left + Right earbuds)
 * using the ISO socket API (BlueZ 5.65+)                        */

#include <bluetooth/bluetooth.h>
#include <bluetooth/iso.h>

struct bt_iso_qos qos = {
    .cig = BT_ISO_QOS_CIG_UNSET,
    .cis = BT_ISO_QOS_CIS_UNSET,
    .sdu        = 100,         /* 100 bytes per frame = 80 kbps @ 48kHz */
    .phy        = BT_ISO_PHY_2M,
    .rtn        = 2,           /* retransmission number (guidance to controller) */
    .latency    = 10,          /* max transport latency in ms */
    .interval   = 10000,       /* SDU interval = 10 ms in Β΅s */
    .framing    = BT_ISO_FRAMING_UNFRAMED,
};

/* Create two ISO sockets β€” one for each earbud */
int left_sock  = socket(PF_BLUETOOTH, SOCK_SEQPACKET, BTPROTO_ISO);
int right_sock = socket(PF_BLUETOOTH, SOCK_SEQPACKET, BTPROTO_ISO);

/* Set QoS on each socket */
setsockopt(left_sock,  SOL_BLUETOOTH, BT_ISO_QOS, &qos, sizeof(qos));
setsockopt(right_sock, SOL_BLUETOOTH, BT_ISO_QOS, &qos, sizeof(qos));

/* Bind and connect to each earbud address... */
/* (bind/connect code omitted for brevity)   */

/* In the audio loop: encode LEFT channel, send on left_sock
 *                    encode RIGHT channel, send on right_sock
 *                    Both share the same CIG so they're synchronised */

lc3_encode(enc_left,  LC3_PCM_FORMAT_S16, left_pcm,  1, 100, left_pkt);
lc3_encode(enc_right, LC3_PCM_FORMAT_S16, right_pcm, 1, 100, right_pkt);

send(left_sock,  left_pkt,  100, 0);  /* sent on CIS for left earbud  */
send(right_sock, right_pkt, 100, 0);  /* sent on CIS for right earbud */

/* Both earbuds share the same CIG synchronisation reference point
 * so they render audio at exactly the same moment β†’ perfect stereo */

🧠 Complete Chapter 5 Summary
LD-MDCT Converts time-domain audio to frequency-domain (like a spectrum analyser). Core of LC3 encoding.
SNS / TNS Shape quantisation noise so the ear perceives it as least annoying. Handle percussion transients.
PLC Predicts and fills in lost audio packets. Triggered by BFI flag. Works well on voice, good on music.
Presentation Delay Buffer time at receiver for decode + PLC + audio processing. BAP requires 40 ms in supported range.
16_2 / 24_2 Mandatory BAP configurations every BLE Audio device must support.
Low Latency QoS Fits all retransmissions in 1 ISO interval. ~62 ms total at 16 kHz with PD=40ms.
High Reliability QoS Spreads retransmissions across up to 10 intervals. ~147 ms. Best for glitch-free music streaming.
RTN Retransmission Number β€” a hint to the Controller, not a hard command. Average tx/frame = NSE/BN.
Stereo Options Two separate CISes (one per earbud β€” most common) OR single CIS with L+R multiplexed.

Series Complete! πŸŽ‰

You now understand the full stack: why codecs exist, how LC3 compresses and reconstructs audio, how latency builds up in BLE Audio, and how to choose QoS settings for your application. Next up: explore BAP, TMAP, and HAP profiles that build on top of everything you’ve learned here.

← Back to EmbeddedPathashala BLE Audio Series Index

Leave a Reply

Your email address will not be published. Required fields are marked *