BLE Audio course – Inside the Generic Audio Framework BLE Audio

BLE Audio course – Inside the Generic Audio Framework
BAPS, stream state machines, volume control, media/telephony control, coordinated sets — with BlueZ implementation notes
BAPS
Foundation specs
6 States
ASCS state machine
25µs
Earbud L/R sync window
CAP
Commander role

The GAF — How the Pieces Fit Together

Hi students welcome to this BLE Audio course – Inside the Generic Audio Framework, The Generic Audio Framework is a set of 23 profiles and services that live in the Host. Each one handles a specific aspect of audio operation. They are designed to be composable — you can use a subset for a simple product, or combine most of them for a feature-rich one.

This post goes through each functional group in detail: how stream setup works step by step through the ASCS state machine, how volume and microphone control is structured, how media and telephony control is handled, and how coordination across paired devices works. BlueZ-specific code and commands are included throughout.

Four Functional Groups

GAF Functional Grouping

Stream Control (BAPS)
BAP PACS ASCS BASS
Setup and manage ISO audio channels

Rendering & Capture
VCP / VCS VOCS AICS MICP / MICS
Volume and microphone gain on the sink

Content Control
MCP / MCS GMCS CCP / TBS GTBS
Play/pause media; answer/hold calls

Coordination
CAP / CAS CSIP / CSIS
Sync control across L+R earbuds, transitions between streams

1. Stream Control — BAPS

BAPS is the collective name for the four foundation specs: BAP, PACS, ASCS, and BASS. They are responsible for everything related to setting up the underlying ISO channels that carry audio data.

PACS — Published Audio Capabilities Service

Before a phone (Unicast Client) starts streaming to an earbud (Unicast Server), it needs to know what the earbud supports. PACS is a GATT service on the earbud that exposes its capabilities as a list of PAC (Published Audio Capability) records. Each record describes one supported codec configuration — sampling rate, frame duration, supported octets per frame, and audio channel count.

PACS has four characteristics:

Characteristic Direction Meaning
Sink PAC Phone → Earbud Codec configs the earbud can receive
Source PAC Earbud → Phone Codec configs the earbud can transmit (microphone)
Sink Audio Location Which audio channels: Left, Right, or Mono
Available Audio Contexts Context types currently available: Media, Conversational, etc.
# Read PACS records using gatttool or bluetoothctl
# In bluetoothctl media menu, endpoints show PACS data:
[bluetooth]# menu media
[bluetooth]# list endpoints AA:BB:CC:DD:EE:FF

EndPoint /org/bluez/hci0/dev_.../pac_sink_0
  UUID: 00001850-...  (BAP Sink / PACS Sink PAC)
  Codec: LC3 (0x06)
  Capabilities:
    Sampling Frequencies: 16kHz 24kHz 32kHz 48kHz
    Frame Durations:       7.5ms 10ms
    Supported Octets/Frame: 26..240
    Supported Frames/SDU:   1

ASCS — Audio Stream Control Service and the Stream State Machine

ASCS defines the state machine for each individual Audio Stream Endpoint (ASE). An ASE represents one direction of one audio stream — for example, the left earbud’s sink ASE. The phone (Client) drives the state machine by writing control opcodes to the ASE Control Point characteristic in ASCS. The earbud (Server) holds the state and notifies the phone whenever state changes.

The state machine has six states for unicast:

ASCS Unicast Stream State Machine

IDLE

→ Config Codec
Release ←

CODEC
CONFIGURED

→ Config QoS
Release ←

QoS
CONFIGURED

→ Enable
Disable ←

ENABLING

→ Rcvr Start
Disable ←

STREAMING

Config Codec: sets LC3 config (sampling rate, frame duration, octets/frame) Config QoS: sets CIG/CIS parameters (SDU interval, latency, RTN) Enable: arms the stream (CIS created at this point) Receiver Start Ready: earbud signals it is ready for audio Disable / Release: stop or tear down the stream

The state lives in ASCS on the earbud. The phone writes opcodes to the ASE Control Point (ASCP) characteristic. For a typical stereo earbud, there are two ASEs — one for the left channel, one for the right. Both go through the same state machine independently, but are coordinated via the CIG so they stream in sync.

# Monitor ASCS state changes during stream setup with btmon
$ sudo btmon | grep -A5 "Write Request\|Notification"

# Config Codec opcode (0x01) written to ASE Control Point:
> ACL Data TX: handle 0x0040 flags 0x00 dlen 20
  ATT: Write Request (0x12) len 15
  Handle: 0x002a (ASE Control Point)
  Data: 01 01 06 10 00 0f 00 ...
  #    [opcode][ase_id][codec_id LC3=0x06][config...]

# Server notifies new ASE state = CODEC CONFIGURED (0x02):
< ACL Data RX: handle 0x0040 flags 0x02 dlen 12
  ATT: Handle Value Notification (0x1b)
  Handle: 0x0028 (ASE Characteristic)
  Data: 01 02 ...   # ase_id=1, state=CODEC_CONFIGURED

BAP — Basic Audio Profile and Its Roles

BAP defines the roles that devices take and the procedures they use. There are five roles:

BAP Role What It Does Example Device
Unicast Client Reads PACS, drives ASCS state machine Phone, laptop, tablet
Unicast Server Hosts PACS and ASCS, receives or sends audio Earbud, hearing aid, speaker
Broadcast Source Advertises and transmits a BIG TV, audio transmitter
Broadcast Sink Scans, syncs to PA, receives BIS audio Hearing aid, earbud
Broadcast Assistant Helps Broadcast Sink find sources (uses PAST) Phone acting as a remote control for hearing aid

The Broadcast Assistant uses BASS (Broadcast Audio Scan Service) to write the scan result to the Broadcast Sink over the ACL connection. The Broadcast Sink then uses PAST to synchronise to the broadcast without scanning for it itself — this saves significant battery on the hearing aid side.

2. Rendering and Capture Control

Once a stream is running, the user wants to control volume and microphones. These specs handle that. An important design decision in BLE Audio: the final volume gain is always applied at the audio sink (earbud or speaker), not at the source. The audio stream is transmitted at line level, preserving maximum dynamic range. The gain is adjusted at the sink end.

VCP, VCS, VOCS, AICS — Volume and Input Control

Volume control in BLE Audio is more complex than a single slider because audio devices can have multiple inputs, multiple outputs, and multiple controllers at the same time:

VCP / VCS
Volume Control Profile/Service. VCS lives on the audio sink (earbud). One VCS instance per sink. Exposes absolute volume (0–255) and mute state. VCP on the controller (phone/watch) reads and writes it. Volume can be expressed as absolute or relative delta.
VOCS — Volume Offset
Acts as a balance control between multiple outputs. If the left earbud is 3dB louder than the right, VOCS adjusts the relative offset. Works across separate physical devices (L+R earbuds from different manufacturers).
AICS — Audio Input Control
Controls individual audio inputs that get mixed together inside the sink. A soundbar with BT, HDMI, and mic inputs would have one AICS instance per input, each with its own gain and mute. For a hearing aid: BT stream + ambient mic + telecoil.
Key behaviour: VCS notifies all connected clients whenever the volume changes — whether changed from the phone, a smartwatch, or a local button on the earbud. This keeps every controller in sync with the current volume state without polling.

MICP / MICS — Microphone Control

MICP (Microphone Control Profile) and MICS (Microphone Control Service) control the overall mute state of the microphone(s) in an earbud or hearing aid. They work alongside AICS — AICS controls the gain of each individual microphone input, while MICS provides a master mute for all captured audio that is destined for a BLE stream.

A typical hearing aid has at least two microphones (front and rear) for directional processing. AICS gives individual gain control over each; MICS mutes them all at once (useful for push-to-talk scenarios or muting yourself on a call).

3. Content Control

Content control covers everything related to what is playing and the state of calls — not the audio stream itself, but the application that generates it. These specs replicate what AVRCP does for A2DP and what HFP does for calls, but in a more flexible, decoupled way. Because they are separated from the audio streams, they can manage transitions — for example, automatically pausing music when a call arrives.

MCP / MCS — Media Control

MCS (Media Control Service) lives on the audio source — the phone or PC that is playing music. It exposes the state of the media player as a GATT service. MCP (Media Control Profile) on the earbud reads and controls that state.

The MCS state machine covers three primary states:

MCS Media Player State Machine
INACTIVE
→ Play Stop ←
PLAYING
→ Pause Play ←
PAUSED
→ Seek Pause ←
SEEKING

Beyond basic play/pause, MCS provides: track navigation (next, previous, fast-forward, rewind), playback order (single, repeat, shuffle), group management, metadata (track title, artist, duration), playback speed control, and content search using the Object Transfer Service (OTS). A suitably capable MCP implementation can fully replicate a music player UI from an earbud. If there are multiple media applications on the phone, each gets its own MCS instance. The single-instance variant — GMCS (Generic Media Control Service) — acts as a unified interface across all players.

CCP / TBS — Telephony Control

TBS (Telephone Bearer Service) lives on the call device (phone, PC, laptop). CCP (Call Control Profile) on the earbud controls the call by writing opcodes to TBS. The key difference from HFP: TBS is designed for the way telephony actually works today.

HFP was designed around a single cellular call. TBS handles:

  • Multiple simultaneous calls (cellular, SIP, Zoom, Teams) — each call type is a separate bearer
  • Call operations: accept, terminate, hold, retrieve, join, silence incoming ring
  • Caller ID, call state (incoming, dialing, active, held, remotely held)
  • Inband and out-of-band ringtone selection
  • Signal strength (useful for informing the user of call quality)

Like MCS, TBS can be instantiated once per bearer (e.g., one TBS for the cellular app, one for the VoIP app), or as a single GTBS (Generic TBS) that directs all commands to the correct underlying app.

4. Transition and Coordination Control

Coordination is the glue layer. When you pause music on a left earbud, the right one should pause too. When a new connection arrives, both earbuds should transition together — not one going to the phone and the other staying with the TV. CSIP/CSIS and CAP handle this.

CSIP / CSIS — Coordinated Set Identification

When two earbuds are manufactured as a pair, each one is configured with a CSIS (Coordinated Set Identification Service). CSIS holds a Coordinated Set Identification (CSRK) that allows the phone to discover that two separate BLE devices belong to the same set.

Coordinated Set — L+R Earbuds as a Single Logical Device

📱 Phone
Unicast Client
CSIP Client

CIS (left)
both in same
Coordinated Set
CIS (right)

🎧 Left Earbud
ASCS + PACS + VCS
CSIS member #1
🎧 Right Earbud
ASCS + PACS + VCS
CSIS member #2
CSIS Lock mechanism: before transitioning both earbuds to a new stream, CAP locks both members simultaneously so no partial-connection can occur

CSIS introduces two key concepts: Lock — before a transition (e.g., switching from TV to phone), CAP acquires a lock on all set members so they transition together; and Rank — determines which set member has priority for operations that can only apply to one member at a time. Devices configured as Coordinated Set members are typically set up this way at manufacturing time.

CAP — Common Audio Profile and the Commander Role

CAP (Common Audio Profile) is the orchestrator. It introduces the Commander role — a device that can remotely control Bluetooth LE Audio streams across multiple devices. The Commander can be a phone, a tablet, or a smartwatch.

CAP uses CSIP/CSIS to treat a Coordinated Set as a single entity, and introduces two important concepts:

  • Context Types — metadata about what kind of audio is being played (Conversational, Media, Game, Ringtone, Alarm, etc.). Devices use this to decide whether to accept or prioritise a connection request. For example, a hearing aid set to “prioritise ringtones” will interrupt media to take an incoming call.
  • Content Control IDs (CCID) — links a specific audio stream to the media or telephony service that controls it. This allows the Commander to know which TBS or MCS instance controls which stream, enabling clean transitions (pause stream X when call Y arrives).
Broadcast + CAP: For encrypted broadcast streams, the Commander role is particularly useful. The Commander can hold the broadcast encryption key (Broadcast_Code) and distribute it to authorised Broadcast Sinks — converting a public broadcast into a private listening experience without the Sink needing to connect to the Broadcast Source.

5. Top Level Profiles

Top level profiles sit above the GAF. They add application-specific requirements — mandating optional GAF features and specifying codec configurations required for their use case. They are intentionally lean, building on what GAF already provides.

Profile Full Name What It Adds Target Device
HAP / HAS Hearing Access Profile/Service Mandates specific BAP configs, hearing aid presets (programmed by audiologist) Hearing aids
TMAP Telephony and Media Audio Profile Higher quality codec settings (32kHz/48kHz), richer MCP/CCP control Consumer earbuds, headsets
PBP Public Broadcast Profile Standardises broadcast stream discovery and metadata for public installations (no GATT — no connection) Airport PA, cinema, events
PBP note: PBP has no accompanying service specification. Broadcast means no connection, and no connection means no GATT Client-Server interaction is possible. PBP specifies what must be in the Extended Advertising data so that any BLE Audio receiver can identify and select the broadcast without any prior association.

BlueZ GAF Implementation Notes

Registering a BAP Audio Endpoint in BlueZ

In BlueZ, GAF profiles are exposed to applications via D-Bus. An application that wants to act as a BAP audio endpoint (Unicast Server or Broadcast Sink) registers itself with bluetoothd using the org.bluez.Media1 interface’s RegisterEndpoint method.

"""
BlueZ BAP Sink endpoint registration (Python + dbus-python)
Registers an LC3 capable sink endpoint with bluetoothd.
bluetoothd will then populate PACS on the device.
"""
import dbus, dbus.service, dbus.mainloop.glib
from gi.repository import GLib

BLUEZ_SERVICE = 'org.bluez'
MEDIA_IFACE   = 'org.bluez.Media1'
ENDPOINT_IFACE = 'org.bluez.MediaEndpoint1'

# LC3 codec ID as per Bluetooth Assigned Numbers
LC3_CODEC_ID = 0x06

# BAP Sink PAC UUID
BAP_SINK_UUID = '00001850-0000-1000-8000-00805f9b34fb'

class BAPSinkEndpoint(dbus.service.Object):
    def __init__(self, bus, path):
        dbus.service.Object.__init__(self, bus, path)

    @dbus.service.method(ENDPOINT_IFACE,
                         in_signature='a{sv}', out_signature='')
    def SetConfiguration(self, cfg):
        # bluetoothd calls this when ASE reaches CODEC CONFIGURED
        # cfg contains: Device, UUID, Codec, Configuration (LTV encoded)
        print("Stream configured:", dict(cfg))

    @dbus.service.method(ENDPOINT_IFACE,
                         in_signature='', out_signature='')
    def ClearConfiguration(self):
        print("Stream cleared (ASE back to IDLE)")

    @dbus.service.method(ENDPOINT_IFACE,
                         in_signature='', out_signature='ay')
    def SelectConfiguration(self, caps):
        # Return preferred LC3 config from offered capabilities
        # LTV: Type=0x01 (Sampling Freq), Length=1, Value=0x08 (16kHz)
        return [0x02, 0x01, 0x08,   # Sampling freq: 16kHz
                0x02, 0x02, 0x01,   # Frame duration: 7.5ms
                0x02, 0x04, 0x1a]   # Octets per frame: 26

def main():
    dbus.mainloop.glib.DBusGMainLoop(set_as_default=True)
    bus = dbus.SystemBus()

    adapter_path = '/org/bluez/hci0'
    media = dbus.Interface(
        bus.get_object(BLUEZ_SERVICE, adapter_path),
        MEDIA_IFACE)

    endpoint_path = '/org/embeddedpathashala/bap_sink_0'
    endpoint = BAPSinkEndpoint(bus, endpoint_path)

    # LC3 capabilities LTV structure
    caps = [
        0x03, 0x01, 0x00, 0xff,   # Supported sampling frequencies
        0x02, 0x02, 0x03,          # Supported frame durations: 7.5ms + 10ms
        0x02, 0x04, 0xf0, 0x00,   # Supported octets per frame: 26..240
    ]

    media.RegisterEndpoint(
        dbus.ObjectPath(endpoint_path),
        {
            'UUID':         dbus.String(BAP_SINK_UUID),
            'Codec':        dbus.Byte(LC3_CODEC_ID),
            'Capabilities': dbus.Array(caps, signature='y'),
        }
    )
    GLib.MainLoop().run()

if __name__ == '__main__':
    main()

Monitor the ASCS state machine transitions during stream setup:

# btmon output — full unicast stream setup sequence
$ sudo btmon

# 1. Phone reads PACS from earbud
> ATT Read Request  Handle: PACS Sink PAC

# 2. Phone writes Config Codec opcode to ASCS Control Point
> ATT Write Request  Handle: ASE Control Point
   Opcode: 0x01 (Config Codec)  ASE_ID: 0x01
   Codec: LC3  Freq: 48kHz  Duration: 10ms  Octets: 120

# 3. Earbud notifies CODEC CONFIGURED state
< ATT Handle Value Notification  Handle: ASE 0x01
   State: 0x02 (Codec Configured)

# 4. Phone writes Config QoS
> ATT Write Request  Opcode: 0x02 (Config QoS)
   CIG_ID: 0x00  CIS_ID: 0x00  SDU_Interval: 10000us  RTN: 2

# 5. Earbud: QoS CONFIGURED
< ATT Notification  State: 0x03 (QoS Configured)

# 6. Phone creates the CIS at HCI level
> HCI: LE Create CIS (CIG 0x00, CIS 0x00)
< HCI Event: LE CIS Established

# 7. Phone writes Enable
> ATT Write Request  Opcode: 0x03 (Enable)

# 8. Earbud signals Receiver Start Ready — enters STREAMING
> ATT Write Request  Opcode: 0x07 (Receiver Start Ready)
< ATT Notification  State: 0x05 (Streaming)

# 9. ISO audio data flows
< HCI ISO Data  handle: CIS_0  length: 120 bytes  [LC3 encoded]

Summary — GAF at a Glance

Spec Type Resides On Responsibility
PACS Service Earbud / Sink Advertises supported codec configurations
ASCS Service Earbud / Sink Holds the ASE state machine for each stream
BAP Profile Phone / Source Reads PACS, drives ASCS state machine, sets up CIG/CIS
BASS Service Broadcast Sink Accepts broadcast source info from Broadcast Assistant
VCS / VOCS / AICS Services Earbud / Sink Volume gain, balance, per-input gain
MCS / TBS Services Phone / Source Media player state and call bearer state
CSIS Service Each earbud in a pair Identifies left+right as members of a Coordinated Set
CAP Profile Commander device Orchestrates all profiles; Context Types; CCID linking
Key Terms from This Post
ASE — Audio Stream Endpoint ASCP — ASE Control Point PAC — Published Audio Capability CIG / BIG CIS / BIS Coordinated Set CCID — Content Control ID Context Types Commander role Broadcast Assistant GTBS / GMCS HAP / TMAP / PBP

Continue the Series

Next posts in this series will cover the LC3 codec internals in depth, how ISO sockets work at the kernel level in Linux, and practical BlueZ programming for building a BLE Audio source and sink.

Leave a Reply

Your email address will not be published. Required fields are marked *