BLE Audio course: Coordinated Sets & Presentation Delay

BLE Audio: Coordinated Sets & Presentation Delay
TWS earbuds done right · CSIP/CSIS Lock · Synchronised rendering across earbuds
25µs
Max Inter-ear Delta
CSIP
Coordinated Set Profile
40ms
Universal Broadcast Max

Classic Bluetooth TWS earbuds rely entirely on proprietary silicon extensions — there is no standard. BLE Audio fixes this with two mechanisms: Coordinated Sets (CSIP/CSIS) for grouping devices and managing synchronized actions, and Presentation Delay for ensuring both ears render audio at exactly the same microsecond.

1. Coordinated Sets — Treating Multiple Devices as One

The Problem: Classic Audio TWS Has No Standard

Every TWS earbud brand uses its own proprietary protocol for left-right coordination. Isochronous Channels fix the data transport, but BLE Audio still needed a way to:

  • Expose that two earbuds belong to the same set
  • Prevent different Initiators sending conflicting commands to left and right earbuds simultaneously
  • Apply volume changes / audio source switches atomically to both earbuds
Head Attenuation Problem — Why Earbuds Can’t Just Talk to Each Other
🎧     👤     🎧
2.4GHz BT signal: Head contains ~70% water.
Attenuates 2.4GHz severely.
Small earbud antenna → low TX power → no reliable ear-to-ear link.
❌ Can’t use Bluetooth for ear-to-ear coordination
Current Workarounds
NFMI (Near Field Magnetic Induction): Lower frequency, penetrates the head. Used by many premium earbuds for ear-to-ear. Adds cost and space.
BLE Audio CSIP: Coordinates via Initiator (phone) as the intermediary — no direct earbud-to-earbud communication needed.
CSIP / CSIS — How Coordination Works via Lock

The Coordinated Set Identification Profile (CSIP) and its underlying service CSIS let the Initiator act as the coordinator. The key mechanism is a Lock.

Phone Call Scenario: Accepting Call on Both Earbuds via CSIP Lock
Step Initiator (Phone) Action Left Earbud Right Earbud
1 Incoming call detected Idle Idle
2 Write CSIS Lock=Locked on both earbuds via ACL 🔒 Locked 🔒 Locked
3 Set up CIS to left earbud (call audio out + mic in) ✅ CIS active 🔒 Waiting
4 Set up second CIS to right earbud (call audio out, no mic) ✅ CIS active ✅ CIS active
5 Release Lock on both earbuds 🔓 Unlocked, streaming 🔓 Unlocked, streaming
Why Lock matters: Without the Lock, a second Initiator (e.g., your laptop) could grab the right earbud during step 3-4, leaving your left ear on a phone call while your right ear plays music. The Lock prevents exactly this.

2. Presentation Delay — Synchronized Rendering

The Serial Transmission Problem

CISes send packets serially — first to left earbud, then to right. Packets arrive at different times. Without compensation, the brain perceives the sound source as moving.

Why Timing Matters — Head Rotation & Inter-ear Delay Perception
Physical Reality (Sound in Air)
2m away from audio source + 10° head rotation:
→ ~70µs difference in sound arrival between earsThe brain detects this difference to locate sounds.

If Bluetooth introduces a rendering difference > 25µs between left and right ear that changes regularly, the user perceives the sound moving around inside their head — very unpleasant.

BLE Audio Solution: Presentation Delay

1. Initiator defines a common Synchronisation Point — the time by which all Acceptors have guaranteed received their packet (after all retransmission slots).

2. Initiator sets a Presentation Delay — time AFTER the Sync Point when ALL Acceptors render simultaneously.

Both earbuds hold decoded audio in a buffer and play at exactly the Presentation Delay offset.

Presentation Delay Timeline (Acceptor / Audio Sink Side)
First TX
attempt
(Isochronous)
· · · Last
Retransmit
slot
· · · SDU SYNC
POINT
(C→P ref)
· · · Decode
+ANC
+PLC
· · · RENDER
Audio Out
🔊
Acceptor buffers all packets
◄── Presentation Delay ──►
Same Presentation Delay value applied to ALL Acceptors → renders at identical absolute time
Presentation Delay Parameter Meaning Set By
Presentation_Delay_Min Shortest decode + processing time before Acceptor can render. Hardware limit. Acceptor (read by Initiator)
Presentation_Delay_Max Longest buffer the Acceptor can maintain. Must not be exceeded by Initiator. Acceptor (read by Initiator)
Preferred_Presentation_Delay The value Acceptor recommends. Initiator should use this unless overridden (e.g., for lip-sync). Acceptor (advisory)
Final Presentation Delay Must be ≥ max(all Delay_Min) and ≤ min(all Delay_Max) across all Acceptors in the stream. Initiator (written to all Acceptors)
Broadcast Presentation Delay: When there’s no ACL connection (broadcast), the Initiator cannot read Acceptor min/max values. Keep broadcast Presentation Delay ≤ 40ms (mandatory minimum support for all Acceptors). Values above 40ms may introduce echo when ambient sound is present.
Capture direction (P→C): Presentation Delay also applies to audio capture (microphone → phone). Here it represents the time from sound capture through processing and encoding to the first CIS transmission slot. This ensures both microphones in a stereo headset capture at the same moment.

3. BlueZ: Coordinated Sets & ISO QoS Presentation Delay

CSIP via BlueZ D-Bus + Presentation Delay in ISO QoS
/* ============================================================ * Coordinated Set — BlueZ CSIS (GATT Service) * The Acceptor (earbud) exposes CSIS with these characteristics: * – Set Identity Resolving Key (SIRK): 16-byte key identifies the set * – Coordinated Set Size: number of members (e.g., 2 for a pair) * – Set Member Lock: Initiator writes 0x02 (Lock) / 0x01 (Release) * – Set Member Rank: order within the set (left=1, right=2) * ============================================================ */ /* BlueZ D-Bus: Register CSIS Service on Acceptor (Python) */ import dbus import dbus.service CSIS_UUID = “00001846-0000-1000-8000-00805f9b34fb” class CoordinatedSetService(dbus.service.Object): SIRK = [0x01,0x02,0x03,0x04,0x05,0x06,0x07,0x08, 0x09,0x0A,0x0B,0x0C,0x0D,0x0E,0x0F,0x10] # 16-byte set key SET_SIZE = 2 # pair of earbuds MEMBER_LOCK = 0x01 # 0x01 = Unlocked, 0x02 = Locked MEMBER_RANK = 1 # this is left earbud @dbus.service.method(“org.bluez.GattCharacteristic1”, in_signature=’a{sv}’, out_signature=’ay’) def ReadValue(self, options): # Returns SIRK — same for both earbuds in the pair return dbus.Array(self.SIRK, signature=’y’) /* ============================================================ * Presentation Delay in BlueZ ISO QoS (Initiator / C side) * Set via setsockopt on the ISO socket before connect() * ============================================================ */ struct bt_iso_qos qos = { .ucast = { .cig = BT_ISO_QOS_CIG_UNSET, .cis = BT_ISO_QOS_CIS_UNSET, /* Presentation Delay in microseconds */ /* Read from Acceptor’s ASE characteristics first, * then pick a value within [max(delay_min), min(delay_max)] */ .out = { .phy = BT_ISO_PHY_2M, .sdu = 120, /* bytes: LC3 @ 48kHz/10ms */ .rtn = 2, /* retransmission number */ .latency = 10, /* max transport latency ms */ }, .in = { .phy = BT_ISO_PHY_2M, .sdu = 40, /* mic: LC3 @ 16kHz/10ms */ .rtn = 2, .latency = 10, }, }, }; /* The actual Presentation Delay value is negotiated via ASCS * (ASE Enable / Receiver Start Ready procedures on ACL) * BlueZ writes it into the QoS during stream establishment. * * Typical values: * HAP/TMAP Live context (with ambient sound): PD ≤ 20,000 µs (20ms) * General media streaming: 20,000 – 40,000 µs * Video lip-sync: set explicitly by app, often 60,000 – 100,000 µs * Broadcast (no ACL): ≤ 40,000 µs (40ms mandatory max) */ /* To check negotiated Presentation Delay after connection: */ struct bt_iso_qos qos_result; socklen_t qos_len = sizeof(qos_result); getsockopt(iso_sk, SOL_BLUETOOTH, BT_ISO_QOS, &qos_result, &qos_len); printf(“Presentation Delay: %u µs\n”, qos_result.ucast.out.latency * 1000); /* Note: latency field maps to transport latency; actual PD is in ASCS */
Key Takeaways
Coordinated Sets group earbuds as one logical device (CSIP/CSIS)
Lock prevents conflicting Initiators grabbing left vs right earbud
Human head blocks 2.4GHz → no direct ear-to-ear BT link possible
Presentation Delay = common render point after SDU Sync Point
>25µs inter-ear delta = perceived sound movement → keep PD small
Broadcast PD ≤ 40ms (all Acceptors must support this minimum)

Leave a Reply

Your email address will not be published. Required fields are marked *