ALSA Compress-Offload API¶
Pierre-Louis.Bossart <pierre-louis.bossart@linux.intel.com>
Vinod Koul <vinod.koul@linux.intel.com>
Overview¶
Since its early days, the ALSA API was defined with PCM support orconstant bitrates payloads such as IEC61937 in mind. Arguments andreturned values in frames are the norm, making it a challenge toextend the existing API to compressed data streams.
In recent years, audio digital signal processors (DSP) were integratedin system-on-chip designs, and DSPs are also integrated in audiocodecs. Processing compressed data on such DSPs results in a dramaticreduction of power consumption compared to host-basedprocessing. Support for such hardware has not been very good in Linux,mostly because of a lack of a generic API available in the mainlinekernel.
Rather than requiring a compatibility break with an API change of theALSA PCM interface, a new ‘Compressed Data’ API is introduced toprovide a control and data-streaming interface for audio DSPs.
The design of this API was inspired by the 2-year experience with theIntel Moorestown SOC, with many corrections required to upstream theAPI in the mainline kernel instead of the staging tree and make itusable by others.
Requirements¶
The main requirements are:
separation between byte counts and time. Compressed formats may havea header per file, per frame, or no header at all. The payload sizemay vary from frame-to-frame. As a result, it is not possible toestimate reliably the duration of audio buffers when handlingcompressed data. Dedicated mechanisms are required to allow forreliable audio-video synchronization, which requires precisereporting of the number of samples rendered at any given time.
Handling of multiple formats. PCM data only requires a specificationof the sampling rate, number of channels and bits per sample. Incontrast, compressed data comes in a variety of formats. Audio DSPsmay also provide support for a limited number of audio encoders anddecoders embedded in firmware, or may support more choices throughdynamic download of libraries.
Focus on main formats. This API provides support for the mostpopular formats used for audio and video capture and playback. It islikely that as audio compression technology advances, new formatswill be added.
Handling of multiple configurations. Even for a given format likeAAC, some implementations may support AAC multichannel but HE-AACstereo. Likewise WMA10 level M3 may require too much memory and cpucycles. The new API needs to provide a generic way of listing theseformats.
Rendering/Grabbing only. This API does not provide any means ofhardware acceleration, where PCM samples are provided back touser-space for additional processing. This API focuses instead onstreaming compressed data to a DSP, with the assumption that thedecoded samples are routed to a physical output or logical back-end.
Complexity hiding. Existing user-space multimedia frameworks allhave existing enums/structures for each compressed format. This newAPI assumes the existence of a platform-specific compatibility layerto expose, translate and make use of the capabilities of the audioDSP, eg. Android HAL or PulseAudio sinks. By construction, regularapplications are not supposed to make use of this API.
Design¶
The new API shares a number of concepts with the PCM API for flowcontrol. Start, pause, resume, drain and stop commands have the samesemantics no matter what the content is.
The concept of memory ring buffer divided in a set of fragments isborrowed from the ALSA PCM API. However, only sizes in bytes can bespecified.
Seeks/trick modes are assumed to be handled by the host.
The notion of rewinds/forwards is not supported. Data committed to thering buffer cannot be invalidated, except when dropping all buffers.
The Compressed Data API does not make any assumptions on how the datais transmitted to the audio DSP. DMA transfers from main memory to anembedded audio cluster or to a SPI interface for external DSPs arepossible. As in the ALSA PCM case, a core set of routines is exposed;each driver implementer will have to write support for a set ofmandatory routines and possibly make use of optional ones.
The main additions are
- get_caps
This routine returns the list of audio formats supported. Querying thecodecs on a capture stream will return encoders, decoders will belisted for playback streams.
- get_codec_caps
For each codec, this routine returns a list ofcapabilities. The intent is to make sure all the capabilitiescorrespond to valid settings, and to minimize the risks ofconfiguration failures. For example, for a complex codec such as AAC,the number of channels supported may depend on a specific profile. Ifthe capabilities were exposed with a single descriptor, it may happenthat a specific combination of profiles/channels/formats may not besupported. Likewise, embedded DSPs have limited memory and cpu cycles,it is likely that some implementations make the list of capabilitiesdynamic and dependent on existing workloads. In addition to codecsettings, this routine returns the minimum buffer size handled by theimplementation. This information can be a function of the DMA buffersizes, the number of bytes required to synchronize, etc, and can beused by userspace to define how much needs to be written in the ringbuffer before playback can start.
- set_params
This routine sets the configuration chosen for a specific codec. Themost important field in the parameters is the codec type; in mostcases decoders will ignore other fields, while encoders will strictlycomply to the settings
- get_params
This routines returns the actual settings used by the DSP. Changes tothe settings should remain the exception.
- get_timestamp
The timestamp becomes a multiple field structure. It lists the numberof bytes transferred, the number of samples processed and the numberof samples rendered/grabbed. All these values can be used to determinethe average bitrate, figure out if the ring buffer needs to berefilled or the delay due to decoding/encoding/io on the DSP.
Note that the list of codecs/profiles/modes was derived from theOpenMAX AL specification instead of reinventing the wheel.Modifications include:- Addition of FLAC and IEC formats- Merge of encoder/decoder capabilities- Profiles/modes listed as bitmasks to make descriptors more compact- Addition of set_params for decoders (missing in OpenMAX AL)- Addition of AMR/AMR-WB encoding modes (missing in OpenMAX AL)- Addition of format information for WMA- Addition of encoding options when required (derived from OpenMAX IL)- Addition of rateControlSupported (missing in OpenMAX AL)
State Machine¶
The compressed audio stream state machine is described below
+----------+ | | | OPEN | | | +----------+ | | | compr_set_params() | v compr_free() +----------++------------------------------------| || | SETUP || +-------------------------| |<-------------------------+| | compr_write() +----------+ || | ^ || | | compr_drain_notify() || | | or || | | compr_stop() || | | || | +----------+ || | | | || | | DRAIN | || | | | || | +----------+ || | ^ || | | || | | compr_drain() || | | || v | || +----------+ +----------+ || | | compr_start() | | compr_stop() || | PREPARE |------------------->| RUNNING |--------------------------+| | | | | || +----------+ +----------+ || | | ^ || |compr_free() | | || | compr_pause() | | compr_resume() || | | | || v v | || +----------+ +----------+ || | | | | compr_stop() |+--->| FREE | | PAUSE |---------------------------+ | | | | +----------+ +----------+
Gapless Playback¶
When playing thru an album, the decoders have the ability to skip the encoderdelay and padding and directly move from one track content to another. The enduser can perceive this as gapless playback as we don’t have silence whileswitching from one track to another
Also, there might be low-intensity noises due to encoding. Perfect gapless isdifficult to reach with all types of compressed data, but works fine with mostmusic content. The decoder needs to know the encoder delay and encoder padding.So we need to pass this to DSP. This metadata is extracted from ID3/MP4 headersand are not present by default in the bitstream, hence the need for a newinterface to pass this information to the DSP. Also DSP and userspace needs toswitch from one track to another and start using data for second track.
The main additions are:
- set_metadata
This routine sets the encoder delay and encoder padding. This can be used bydecoder to strip the silence. This needs to be set before the data in the trackis written.
- set_next_track
This routine tells DSP that metadata and write operation sent after this wouldcorrespond to subsequent track
- partial drain
This is called when end of file is reached. The userspace can inform DSP thatEOF is reached and now DSP can start skipping padding delay. Also next writedata would belong to next track
Sequence flow for gapless would be:- Open- Get caps / codec caps- Set params- Set metadata of the first track- Fill data of the first track- Trigger start- User-space finished sending all,- Indicate next track data by sending set_next_track- Set metadata of the next track- then call partial_drain to flush most of buffer in DSP- Fill data of the next track- DSP switches to second track
(note: order for partial_drain and write for next track can be reversed as well)
Gapless Playback SM¶
For Gapless, we move from running state to partial drain and back, alongwith setting of meta_data and signalling for next track
+----------+ compr_drain_notify() | |+------------------------>| RUNNING || | || +----------+| || || | compr_next_track()| || V| +----------+| compr_set_params() | || +-----------|NEXT_TRACK|| | | || | +--+-------+| | | || +--------------+ || || | compr_partial_drain()| || V| +----------+| | |+------------------------ | PARTIAL_ | | DRAIN | +----------+
Not supported¶
Support for VoIP/circuit-switched calls is not the target of thisAPI. Support for dynamic bit-rate changes would require a tightcoupling between the DSP and the host stack, limiting power savings.
Packet-loss concealment is not supported. This would require anadditional interface to let the decoder synthesize data when framesare lost during transmission. This may be added in the future.
Volume control/routing is not handled by this API. Devices exposing acompressed data interface will be considered as regular ALSA devices;volume changes and routing information will be provided with regularALSA kcontrols.
Embedded audio effects. Such effects should be enabled in the samemanner, no matter if the input was PCM or compressed.
multichannel IEC encoding. Unclear if this is required.
Encoding/decoding acceleration is not supported as mentionedabove. It is possible to route the output of a decoder to a capturestream, or even implement transcoding capabilities. This routingwould be enabled with ALSA kcontrols.
Audio policy/resource management. This API does not provide anyhooks to query the utilization of the audio DSP, nor any preemptionmechanisms.
No notion of underrun/overrun. Since the bytes written are compressedin nature and data written/read doesn’t translate directly torendered output in time, this does not deal with underrun/overrun andmaybe dealt in user-library
Credits¶
Mark Brown and Liam Girdwood for discussions on the need for this API
Harsha Priya for her work on intel_sst compressed API
Rakesh Ughreja for valuable feedback
Sing Nallasellan, Sikkandar Madar and Prasanna Samaga fordemonstrating and quantifying the benefits of audio offload on areal platform.