What this post covers
This is Part 2 of our Chapter 5 series. We now go inside LC3 β understanding the encoder pipeline, decoder, Packet Loss Concealment, how latency builds up in a real BLE Audio link, and finally how Quality of Service (QoS) parameters are chosen. We also look at how stereo audio is handled with multiple channels.| ποΈ INPUT Raw PCM audio samples (e.g. 480 samples @ 48 kHz = 10 ms) | Audio comes in as uncompressed integer samples. At 48 kHz with 10 ms frames, that’s 480 samples per frame β about 960 bytes raw. |
| β¬ | |
| 1οΈβ£ LD-MDCT (Low Delay MDCT) | Converts time-domain samples β frequency-domain coefficients. Think of it like decomposing music into its individual notes and frequencies. This is where most of the codec’s delay comes from (~2.5 ms for LC3). Analogy: Like a spectrum analyser on a music player β it splits the audio into frequency bands. |
| β¬ | |
| 2οΈβ£ Bandwidth Detector | Detects what bandwidth the incoming audio actually uses. Possible bandwidths: NB (0β4 kHz for old phones), WB (0β8 kHz), SSWB (0β12 kHz), SWB (0β16 kHz), FB (0β20 kHz). If audio only goes up to 8 kHz, no point encoding 16β20 kHz coefficients. This avoids smearing noise into empty frequency ranges. |
| β¬ | |
| 3οΈβ£ SNS (Spectral Noise Shaper) | Maximises perceived audio quality by shaping quantisation noise. When compressing, some noise is unavoidable. SNS shapes this noise so the human ear perceives it as least annoying β pushing noise into frequency regions where masking hides it. This is the psychoacoustic heart of LC3. |
| β¬ | |
| 4οΈβ£ Attack Detector + TNS (Temporal Noise Shaper) | Handles sharp attack sounds like percussion. Percussion (castanets, glockenspiel, triangles) are notoriously hard for codecs because they have a flat spectrum with sudden spikes. The Attack Detector flags these, and TNS then suppresses the ringing artefacts that would otherwise appear. BLE Audio test suites specifically use castanets and triangles to evaluate this. |
| β¬ | |
| 5οΈβ£ Spectral Quantiser + Noise Level | Decides how many bits each frequency coefficient gets. Like an intelligent automatic gain control β louder/more important frequencies get more bits; near-silent ones are quantised to zero (coded as silence). Noise Level then fills any gaps with pseudo-random noise so the decoder doesn’t hear complete silence where there should be low-level sound. |
| β¬ | |
| 6οΈβ£ LTPF (Long Term Post Filter) | Reduces coding noise for pitched/tonal sounds at low bitrates. Runs on a resampled 12.8 kHz version of the input. Particularly helpful for voice (which is tonal) at low bitrates. The filter parameters are embedded in the bitstream so the decoder can mirror them. |
| β¬ | |
| π€ OUTPUT Compressed LC3 bitstream packet (e.g. 100 bytes for 48 kHz / 80 kbps) | All the coefficients, noise parameters, and LTPF data are entropy-coded and multiplexed into a single packet. No timestamps or sequence numbers β the transport layer handles timing. |
LC3 Encoding in Code (BlueZ / liblc3 usage)
#include <lc3.h> /* Bluetooth SIG reference LC3 library */
#include <stdint.h>
#define SAMPLE_RATE 48000
#define FRAME_US 10000 /* 10 ms in microseconds */
#define BITRATE 80000 /* 80 kbps */
#define CHANNELS 1 /* mono (one earbud) */
/* Calculate derived values */
int frame_samples = lc3_frame_samples(FRAME_US, SAMPLE_RATE);
/* = 480 samples for 48 kHz / 10 ms */
int encoded_bytes = lc3_frame_bytes(FRAME_US, BITRATE);
/* = 100 bytes for 80 kbps / 10 ms */
/* Allocate encoder β one per channel */
lc3_encoder_t encoder = lc3_setup_encoder(
FRAME_US, /* frame duration in Β΅s: 7500 or 10000 */
SAMPLE_RATE, /* 8000 / 16000 / 24000 / 32000 / 44100 / 48000 */
0, /* hrmode: 0 = standard */
malloc(lc3_encoder_size(FRAME_US, SAMPLE_RATE))
);
/* Per-frame encode loop β called every 10 ms by your audio thread */
int16_t pcm_buffer[480]; /* 10 ms of 48 kHz samples */
uint8_t lc3_packet[100]; /* compressed output */
/* Collect 480 samples from audio source (mic / file / ALSA) ... */
int ret = lc3_encode(
encoder,
LC3_PCM_FORMAT_S16, /* 16-bit signed PCM */
pcm_buffer, /* input samples */
1, /* stride (1 = packed) */
encoded_bytes, /* target size in bytes */
lc3_packet /* output buffer */
);
if (ret == 0) {
/* Hand lc3_packet to BlueZ ISO socket for transmission */
send(iso_sock_fd, lc3_packet, encoded_bytes, 0);
}
/* Note: lc3_packet has NO timestamp or sequence number.
* The BLE Controller handles timing via the Isochronous Channel. */
6.1 The Decoder β Reverse Assembly Line
The decoder is essentially the encoder in reverse. It starts with the compressed bitstream and reconstructs the original audio waveform.
| π₯ INPUT: Compressed LC3 packet (e.g. 100 bytes) |
| β¬ |
| Arithmetic Decode β Restore Spectral Coefficients |
| β¬ |
| Noise Filling β Fill zeroed coefficients with noise (using bandwidth info) |
| β¬ |
| TNS Decode + SNS Decode β Undo noise shaping |
| β¬ |
| Inverse LD-MDCT β Convert frequency back to time domain (PCM samples) |
| β¬ |
| Long Term Post Filter β Apply pitch-based filter using transmitted parameters |
| β¬ |
| π€ OUTPUT: Reconstructed PCM audio samples β Speaker |
6.2 What Happens When a Packet is Lost? β PLC
Bluetooth shares the 2.4 GHz band with Wi-Fi, baby monitors, and countless other devices. Packets will get lost occasionally β especially outdoors where walls don’t reflect the signal.
When a packet is lost, the Controller sets a BFI flag (Bad Frame Indication). The decoder then runs PLC (Packet Loss Concealment) instead of decoding a real packet.
Insert Silence
Simple but annoying. You hear a pop/click.
Repeat Last Frame
OK for 1 frame, sounds robotic if multiple frames lost.
LC3 PLC Algorithm
Predicts missing audio from surrounding frames. Nearly transparent on voice, good on music.
/* BlueZ / ISO socket receive loop with PLC handling */
uint8_t recv_buf[200];
int16_t pcm_out[480]; /* decoded 10 ms of audio */
uint8_t bad_frame = 0; /* BFI flag */
ssize_t bytes_received = recv(iso_sock_fd, recv_buf, sizeof(recv_buf), 0);
if (bytes_received <= 0) {
/* Packet lost or socket error β trigger PLC */
bad_frame = 1;
}
/* LC3 decode β pass bad_frame=1 to activate PLC internally */
int ret = lc3_decode(
decoder,
bad_frame ? NULL : recv_buf, /* NULL triggers PLC */
bad_frame ? 0 : bytes_received,
LC3_PCM_FORMAT_S16,
pcm_out,
1 /* stride */
);
/* pcm_out now has either real decoded audio or PLC-generated audio.
* Feed it to ALSA / PulseAudio for playback at the Presentation Delay. */
write_to_alsa(pcm_out, 480);
Let’s trace exactly how the latency builds up end-to-end in a BLE Audio link. The key insight is that latency is not one big delay β it’s the sum of many small delays, each with a purpose.
| Time | What Happens | Duration | Running Total |
|---|---|---|---|
| t = 0 ms | ποΈ Audio Capture starts β first sample enters the encoder | 10 ms (frame fill) | 10 ms |
| t = 10 ms | βοΈ LC3 Encoding β LD-MDCT + SNS + TNS + Quantiser run | ~2.5 ms | 12.5 ms |
| t = 12.5 ms | π‘ Radio Transmission β packet sent over BLE ISO channel | <1 ms (if no retx) | ~13 ms |
| t = 13β14 ms | π Retransmissions β extra copies sent for reliability (BLE can fit several in one frame) | ~1β2 ms | ~14 ms |
| t = 14 ms | βΈοΈ Synchronisation Reference Point β every Acceptor starts decoding at this fixed moment | β | 14 ms (fixed) |
| t = 14β19 ms | π Presentation Delay β LC3 decode (~2 ms) + optional PLC (~2 ms) + audio processing (noise cancellation etc) | ~5β40 ms | 19β54 ms |
| t = 20+ ms | π Audio rendered at speaker β Rendering Point | β | β₯ ~20 ms total |
7.1 The Synchronisation Reference Point β Why It’s Critical for Stereo
Imagine left and right earbuds each decoding at slightly different times. You’d hear stereo as slightly time-shifted β the music would feel “off”. The Synchronisation Reference Point is a fixed timestamp broadcast by the Initiator. All Acceptors start decoding at exactly this moment, ensuring left and right earbuds are perfectly synchronised.
7.2 Presentation Delay β The Buffer Before Rendering
BAP (Basic Audio Profile) requires every Audio Sink to support a Presentation Delay of 40 ms. This is not the default β it’s the minimum that must be in the supported range. Why 40 ms? It gives enough buffer time for:
- LC3 decoding (~2 ms)
- PLC if needed (~2 ms)
- Noise cancellation / speech enhancement algorithms (~5β30 ms)
- Synchronisation buffer between devices
High-end hearing aids can achieve Presentation Delay as low as 5 ms. TMAP and HAP profiles require support for 20 ms.
You never tweak the internals of LC3. As a BLE Audio developer, you configure four key parameters, and BAP defines 16 standard combinations to help interoperability.
| Parameter | Options | Use Case Hint |
|---|---|---|
| Sampling Rate | 8 / 16 / 24 / 32 / 44.1 / 48 kHz | 8 kHz = basic voice; 48 kHz = hi-fi music |
| Frame Duration | 7.5 ms or 10 ms | 10 ms for new designs; 7.5 ms for Classic BT interop |
| Bytes per Frame | 20 β 400 bytes | More bytes = better quality + more airtime used |
| Bits per Sample | 16, 24, or 32 | 16 is standard; 24/32 for professional audio |
| Config Name | Sampling (kHz) | Frame Duration | Bytes/Frame | Bitrate (kbps) | Typical Use |
|---|---|---|---|---|---|
| 8_2 | 8 | 10 ms | 30 | 24 | Basic voice / IoT |
| 16_2 β | 16 | 10 ms | 40 | 32 | Mandatory for all BLE Audio; good voice |
| 24_2 β | 24 | 10 ms | 60 | 48 | Mandatory for Broadcast Sinks; music with background noise |
| 32_2 | 32 | 10 ms | 80 | 64 | Good music, most users won’t notice difference |
| 48_2 | 48 | 10 ms | 100 | 80 | Hi-fi music; indistinguishable from original |
| 48_6 | 48 | 10 ms | 155 | 124 | Maximum quality β but uses heavy airtime |
β Mandatory minimum configurations that every BLE Audio device must support.
| Sampling Rate | What You Hear | Best For |
|---|---|---|
| 8 kHz | Old telephone quality β intelligible voice | IoT devices, baby monitors |
| 16 kHz | Clear voice; good for voice recognition | Phone calls, voice assistants |
| 24 kHz | Adequate music with some noticeable quality limit | Hearing aids, PA systems, multi-stream scenarios |
| 32 kHz | Most users cannot tell difference from original | Earbuds, headphones |
| 48 kHz | Transparent β indistinguishable from original | Premium earbuds, audiophile use |
QoS in BLE Audio is about answering: “How aggressively should we retransmit packets to avoid audio glitches, and what latency price are we willing to pay?”
BLE Audio uses Isochronous Channels to send audio. Each transmission window is called an Isochronous Interval (= frame duration = typically 10 ms). If a packet can be retransmitted multiple times within one interval, latency stays low. If retransmissions spill into the next interval (or the one after), latency grows by one frame per extra interval.
| Mode | Strategy | Isochronous Intervals Used | Typical Latency (16 kHz, PD=40ms) | Best For |
|---|---|---|---|---|
| Low Latency | Fit all retransmissions in 1 interval | 1 (most configs) | ~62.5 ms | Hearing aids, gaming, video sync |
| High Reliability (Unicast) | Spread retransmissions across up to 10 intervals | Up to 10 | ~147.5 ms | Music streaming (glitch-free matters more than latency) |
| High Reliability (Broadcast) | Spread across up to 6β8 intervals | Up to 6β8 | ~112.5 ms | Cinema / concert hall broadcast |
9.1 RTN β The Retransmission Number
The Host tells the Controller how many retransmissions to allow via the RTN (Retransmission Number) parameter. But RTN is only a hint β the Controller (hardware scheduler) can adjust this based on what else the chip is doing (e.g. Wi-Fi coexistence).
9.2 The 48_2_2 Example β Three Ways the Controller Can Schedule It
BAP provides a concrete example: the 48_2_2 High Reliability broadcast configuration (48 kHz, 10 ms, 100 bytes, RTN=4, max transport latency 65 ms). The Controller can implement this in three ways:
| Option | ISO Interval | BN | NSE | Latency (Stereo) | Airtime (Stereo) | Best For |
|---|---|---|---|---|---|---|
| 1 | 30 ms | 3 | 9 | 71 ms | 36% β lowest | Wi-Fi coexistence (phone streaming) |
| 2 | 10 ms | 1 | 5 | 46 ms β lowest | 59% β highest | Dedicated broadcast device, no Wi-Fi |
| 3 | 20 ms | 2 | 8 | 70 ms | 48% | Balanced β Wi-Fi + moderate reliability |
BN = Burst Number (frames per ISO interval); NSE = Number of Sub-Events. The Host has no control over which option the chip chooses β ask your silicon vendor.
9.3 Airtime β The Scarce Resource
Airtime is like road space on a single-lane highway. Every packet occupies the road for a while. LC3 is already much better than SBC (roughly half the airtime for the same audio quality). But as you increase quality (48 kHz, high bitrate) and reliability (more retransmissions), airtime usage climbs fast:
| 16_2_1 |
~16% mono / ~20% stereo |
| 24_2_1 |
~22% mono / ~30% stereo |
| 32_2_1 |
~28% mono / ~38% stereo |
| 48_2_2 |
~30% mono / ~59% stereo (Option 2) |
| 48_6_2 |
~80%+ stereo β leaves very little room for anything else |
At 24 kHz you can fit two stereo streams (multi-language) in the air. At 48 kHz with high reliability, you can barely fit one stereo stream.
Stereo means two audio channels (Left and Right). BLE Audio gives you flexibility in how to deliver those two channels. This is controlled by Channel Allocation and how many CISes (Connected Isochronous Streams) are established.
| Option | How | Pros | Cons |
|---|---|---|---|
| Option (a) Two Separate CISes |
CIS 1 β Left channel only CIS 2 β Right channel only Channel Allocation = 1 per CIS |
Smaller individual packets β better reliability | Two CIS connections must be maintained |
| Option (b) Single CIS, Multiplexed |
L+R concatenated by Controller β Single CIS packet Channel Allocation = 2 per CIS |
One CIS; simpler connection management | Larger packets, more susceptible to interference |
10.1 The Popular Case β Separate Earbuds
| π± Phone (Initiator) |
CIS β Left (L)
|
π§ Left Earbud Audio Location: Front Left |
|
CIS β Right (R)
|
π§ Right Earbud Audio Location: Front Right |
This is the standard configuration for TWS (True Wireless Stereo) earbuds like AirPods.
10.2 Broadcast for Multiple Languages
A cinema can broadcast audio in multiple languages simultaneously using BIS (Broadcast Isochronous Streams). Each language stream is a separate BIS. Audience members tune in to whichever BIS matches their preferred language β like selecting a radio channel.
Cinema BIS 1
English Stereo (24 kHz, 60 bytes)
Cinema BIS 2
Hindi Stereo (24 kHz, 60 bytes)
Cinema BIS 3
Audio Description (8 kHz, 30 bytes)
This works at 24 kHz because each stereo stream uses ~30% airtime β two streams = 60%, leaving 40% for other uses. At 48 kHz, one stereo stream already uses 59% β there’s no room for a second language!
/* BlueZ: Setting up two CIS connections (Left + Right earbuds)
* using the ISO socket API (BlueZ 5.65+) */
#include <bluetooth/bluetooth.h>
#include <bluetooth/iso.h>
struct bt_iso_qos qos = {
.cig = BT_ISO_QOS_CIG_UNSET,
.cis = BT_ISO_QOS_CIS_UNSET,
.sdu = 100, /* 100 bytes per frame = 80 kbps @ 48kHz */
.phy = BT_ISO_PHY_2M,
.rtn = 2, /* retransmission number (guidance to controller) */
.latency = 10, /* max transport latency in ms */
.interval = 10000, /* SDU interval = 10 ms in Β΅s */
.framing = BT_ISO_FRAMING_UNFRAMED,
};
/* Create two ISO sockets β one for each earbud */
int left_sock = socket(PF_BLUETOOTH, SOCK_SEQPACKET, BTPROTO_ISO);
int right_sock = socket(PF_BLUETOOTH, SOCK_SEQPACKET, BTPROTO_ISO);
/* Set QoS on each socket */
setsockopt(left_sock, SOL_BLUETOOTH, BT_ISO_QOS, &qos, sizeof(qos));
setsockopt(right_sock, SOL_BLUETOOTH, BT_ISO_QOS, &qos, sizeof(qos));
/* Bind and connect to each earbud address... */
/* (bind/connect code omitted for brevity) */
/* In the audio loop: encode LEFT channel, send on left_sock
* encode RIGHT channel, send on right_sock
* Both share the same CIG so they're synchronised */
lc3_encode(enc_left, LC3_PCM_FORMAT_S16, left_pcm, 1, 100, left_pkt);
lc3_encode(enc_right, LC3_PCM_FORMAT_S16, right_pcm, 1, 100, right_pkt);
send(left_sock, left_pkt, 100, 0); /* sent on CIS for left earbud */
send(right_sock, right_pkt, 100, 0); /* sent on CIS for right earbud */
/* Both earbuds share the same CIG synchronisation reference point
* so they render audio at exactly the same moment β perfect stereo */
| LD-MDCT | Converts time-domain audio to frequency-domain (like a spectrum analyser). Core of LC3 encoding. |
| SNS / TNS | Shape quantisation noise so the ear perceives it as least annoying. Handle percussion transients. |
| PLC | Predicts and fills in lost audio packets. Triggered by BFI flag. Works well on voice, good on music. |
| Presentation Delay | Buffer time at receiver for decode + PLC + audio processing. BAP requires 40 ms in supported range. |
| 16_2 / 24_2 | Mandatory BAP configurations every BLE Audio device must support. |
| Low Latency QoS | Fits all retransmissions in 1 ISO interval. ~62 ms total at 16 kHz with PD=40ms. |
| High Reliability QoS | Spreads retransmissions across up to 10 intervals. ~147 ms. Best for glitch-free music streaming. |
| RTN | Retransmission Number β a hint to the Controller, not a hard command. Average tx/frame = NSE/BN. |
| Stereo Options | Two separate CISes (one per earbud β most common) OR single CIS with L+R multiplexed. |
You now understand the full stack: why codecs exist, how LC3 compresses and reconstructs audio, how latency builds up in BLE Audio, and how to choose QoS settings for your application. Next up: explore BAP, TMAP, and HAP profiles that build on top of everything you’ve learned here.
