Introduction to Sound Programming with ALSA
ALSA stands for the Advanced Linux Sound Architecture. It consists of aset of kernel drivers, an application programming interface (API)library and utility programs for supporting sound under Linux.In this article, I present a brief overview of the ALSA Projectand its software components. The focus is on programming the PCMinterfaces of ALSA, including programming examples with which you canexperiment.
You may want to explore ALSA simply because it is new, but it is notthe only sound API available. ALSA is a good choice if you are performinglow-level audio functions for maximum control and performance or wantto make use of special features not supported by other sound APIs. Ifyou already have written an audio application, you may want to addnative support for the ALSA sound drivers. If your primary interestisn't audio and you simply want to play sound files, using one of thehigher-level sound toolkits, such as SDL, OpenAL or those provided indesktop environments, may be a better choice. Byusing ALSA you are restricted to using systems running aLinux kernel with ALSA support.
The ALSA Project was started because the sound drivers in the Linuxkernel (OSS/Free drivers) were not being maintained actively andwere lagging behind the capabilities of new sound technology. JaroslavKysela, who previously had written a sound card driver, started theproject. Over time, more developers joined, support for many soundcards was added and the structure of the API was refined.
During development of the 2.5 series of Linux kernel, ALSA was mergedinto the official kernel source. With the release of the 2.6 kernel,ALSA will be part of the stable Linux kernel and should bein wide use.
Sound, consisting of waves of varying air pressure, is converted toits electrical form by a transducer, such as a microphone. An analog-to-digital converter (ADC) converts the analog voltages into discretevalues, called samples, at regular intervals in time, known as thesampling rate. By sending the samples to a digital-to-analogconverter and an output transducer, such as a loudspeaker, theoriginal sound can be reproduced.
The size of the samples, expressed in bits, is one factor thatdetermines how accurately the sound is represented in digital form.The other major factor affecting sound quality is the sampling rate.The Nyquist Theorem states that the highest frequency that can berepresented accurately is at most one-half the sampling rate.
ALSA consists of a series of kernel device drivers for many differentsound cards, and it also provides an API library, libasound. Applicationdevelopers are encouraged to program using the library APIand not the kernel interface. The library provides a higher-level andmore developer-friendly programming interface along with a logical naming ofdevices so that developers do not need to be aware of low-level detailssuch as device files.
In contrast, OSS/Free drivers are programmed at the kernel systemcall level and require the developer to specify device filenames andperform many functions using ioctl calls. For backward compatibility,ALSA provides kernel modules that emulate the OSS/Free sound drivers,so most existing sound applications continue to rununchanged. An emulation wrapper library, libaoss, is available toemulate the OSS/Free API without kernel modules.
ALSA has a capability called plugins that allows extension to newdevices, including virtual devices implemented entirely in software.ALSA provides a number of command-line utilities, including a mixer,sound file player and tools for controlling special features ofspecific sound cards.
The ALSA API can be broken down into the major interfaces it supports:
Control interface: a general-purpose facility for managing registersof sound cards and querying the available devices.
PCM interface: the interface for managing digital audio capture andplayback. The rest of this article focuses on this interface, as itis the one most commonly used for digital audio applications.
Raw MIDI interface: supports MIDI (Musical Instrument DigitalInterface), a standard for electronic musical instruments. This APIprovides access to a MIDI bus on a sound card. The raw interface worksdirectly with the MIDI events, and the programmer is responsible formanaging the protocol and timing.
Timer interface: provides access to timing hardware on sound cardsused for synchronizing sound events.
Sequencer interface: a higher-level interface for MIDI programming andsound synthesis than the raw MIDI interface. It handles much of theMIDI protocol and timing.
Mixer interface: controls the devices on sound cardsthat route signals and control volume levels. It is built on top ofthe control interface.
The library API works with logical device names rather than devicefiles. The device names can be real hardware devices or plugins.Hardware devices use the formathw:i,j, wherei is the card numberandj is the device on that card. The first sound device ishw:0,0. The alias default refers to the first sound device and isused in all of the examples in this article. Plugins use other uniquenames; plughw:, for example, is a plugin that provides access to thehardware device but provides features, such as sampling rate conversion, in software for hardware that does not directly support it. The dmixand dshare plugins allow you to downmix several streams and split asingle stream dynamically among different applications.
A sound card has a hardware buffer that stores recorded samples.When the buffer is sufficiently full, it generates an interrupt. Thekernel sound driver then uses direct memory access (DMA) to transfersamples to an application buffer in memory. Similarly, for playback, another application buffer is transferred from memory tothe sound card's hardware buffer using DMA.
These hardware buffers are ring buffers, meaning the data wrapsback to the start when the end of the buffer is reached. Apointer is maintained to keep track of the current positions in both thehardware buffer and the application buffer. Outside of the kernel,only the application buffer is of interest, so from here on wediscuss only the application buffer.
The size of the buffer can be programmed by ALSA library calls. Thebuffer can be quite large, and transferring it in one operation could result in unacceptable delays, called latency. To solvethis, ALSA splits the buffer up into a series of periods (calledfragments in OSS/Free) and transfers the data in units of a period.
A period stores frames, each of which contains the samples captured atone point in time. For a stereo device, the frame would containsamples for two channels. Figure 1 illustrates the breakdown of abuffer into periods, frames and samples with some hypotheticalvalues.Here, left and right channel information is stored alternately within aframe; this is called interleaved mode. A non-interleaved mode, whereall the sample data for one channel is stored followed by the data forthe next channel, also is supported.

Figure 1. The Application Buffer
When a sound device is active, data is transferredcontinuously between the hardware and application buffers. In the case of datacapture (recording), if the application does not read the data in thebuffer rapidly enough, the circular buffer is overwritten withnew data. The resulting data loss is known as overrun. During playback, ifthe application does not pass data into the buffer quickly enough, itbecomes starved for data, resulting in an error called underrun.The ALSA documentation sometimes refers to both of these conditionsusing the term XRUN. Properly designed applications can minimizeXRUN and recover if it occurs.
Programs that use the PCM interface generally follow thispseudo-code:
open interface for capture or playbackset hardware parameters(access mode, data format, channels, rate, etc.)while there is data to be processed: read PCM data (capture) or write PCM data (playback)close interface
We look at some working code in the following sections. Irecommend you compile and run these on your Linux system, look at theoutput and try some of the suggested modifications. The full listingsfor the example programs that accompany this article are available fordownload fromftp.linuxjournal.com/pub/lj/listings/issue126/6735.tgz.
Listing 1. Display Some PCM Types and Formats
#include <alsa/asoundlib.h>int main() { int val; printf("ALSA library version: %s\n", SND_LIB_VERSION_STR); printf("\nPCM stream types:\n"); for (val = 0; val <= SND_PCM_STREAM_LAST; val++) printf(" %s\n", snd_pcm_stream_name((snd_pcm_stream_t)val)); printf("\nPCM access types:\n"); for (val = 0; val <= SND_PCM_ACCESS_LAST; val++) printf(" %s\n", snd_pcm_access_name((snd_pcm_access_t)val)); printf("\nPCM formats:\n"); for (val = 0; val <= SND_PCM_FORMAT_LAST; val++) if (snd_pcm_format_name((snd_pcm_format_t)val) != NULL) printf(" %s (%s)\n", snd_pcm_format_name((snd_pcm_format_t)val), snd_pcm_format_description( (snd_pcm_format_t)val)); printf("\nPCM subformats:\n"); for (val = 0; val <= SND_PCM_SUBFORMAT_LAST; val++) printf(" %s (%s)\n", snd_pcm_subformat_name(( snd_pcm_subformat_t)val), snd_pcm_subformat_description(( snd_pcm_subformat_t)val)); printf("\nPCM states:\n"); for (val = 0; val <= SND_PCM_STATE_LAST; val++) printf(" %s\n", snd_pcm_state_name((snd_pcm_state_t)val)); return 0;}Listing 1 displays some of the PCM data types and parameters usedby ALSA.The first requirement is to include the header file that brings in thedefinitions for all of the ALSA library functions. One of thedefinitions is the version of ALSA, which is displayed.
The remainder of the program iterates through a number of PCM datatypes, starting with the stream types. ALSA providessymbolic names for the last enumerated value and a utility functionthat returns a descriptive string for a value. As you can see in theoutput, ALSA supports many different data formats, 38 for the versionof ALSA on my system.
The program must be linked with the ALSA library,libasound, to run. Typically, you would add the option -lasound on thelinker command line. Some ALSA library functions use the dlopenfunction and floating-point operations, so you also may need to add-ldl and -lm.
Listing 2. Opening PCM Device and Setting Parameters
/*This example opens the default PCM device, setssome parameters, and then displays the valueof most of the hardware parameters. It does notperform any sound playback or recording.*//* Use the newer ALSA API */#define ALSA_PCM_NEW_HW_PARAMS_API/* All of the ALSA library API is defined * in this header */#include <alsa/asoundlib.h>int main() { int rc; snd_pcm_t *handle; snd_pcm_hw_params_t *params; unsigned int val, val2; int dir; snd_pcm_uframes_t frames; /* Open PCM device for playback. */ rc = snd_pcm_open(&handle, "default", SND_PCM_STREAM_PLAYBACK, 0); if (rc < 0) { fprintf(stderr, "unable to open pcm device: %s\n", snd_strerror(rc)); exit(1); } /* Allocate a hardware parameters object. */ snd_pcm_hw_params_alloca(¶ms); /* Fill it in with default values. */ snd_pcm_hw_params_any(handle, params); /* Set the desired hardware parameters. */ /* Interleaved mode */ snd_pcm_hw_params_set_access(handle, params, SND_PCM_ACCESS_RW_INTERLEAVED); /* Signed 16-bit little-endian format */ snd_pcm_hw_params_set_format(handle, params, SND_PCM_FORMAT_S16_LE); /* Two channels (stereo) */ snd_pcm_hw_params_set_channels(handle, params, 2); /* 44100 bits/second sampling rate (CD quality) */ val = 44100; snd_pcm_hw_params_set_rate_near(handle, params, &val, &dir); /* Write the parameters to the driver */ rc = snd_pcm_hw_params(handle, params); if (rc < 0) { fprintf(stderr, "unable to set hw parameters: %s\n", snd_strerror(rc)); exit(1); } /* Display information about the PCM interface */ printf("PCM handle name = '%s'\n", snd_pcm_name(handle)); printf("PCM state = %s\n", snd_pcm_state_name(snd_pcm_state(handle))); snd_pcm_hw_params_get_access(params, (snd_pcm_access_t *) &val); printf("access type = %s\n", snd_pcm_access_name((snd_pcm_access_t)val)); snd_pcm_hw_params_get_format(params, &val); printf("format = '%s' (%s)\n", snd_pcm_format_name((snd_pcm_format_t)val), snd_pcm_format_description( (snd_pcm_format_t)val)); snd_pcm_hw_params_get_subformat(params, (snd_pcm_subformat_t *)&val); printf("subformat = '%s' (%s)\n", snd_pcm_subformat_name((snd_pcm_subformat_t)val), snd_pcm_subformat_description( (snd_pcm_subformat_t)val)); snd_pcm_hw_params_get_channels(params, &val); printf("channels = %d\n", val); snd_pcm_hw_params_get_rate(params, &val, &dir); printf("rate = %d bps\n", val); snd_pcm_hw_params_get_period_time(params, &val, &dir); printf("period time = %d us\n", val); snd_pcm_hw_params_get_period_size(params, &frames, &dir); printf("period size = %d frames\n", (int)frames); snd_pcm_hw_params_get_buffer_time(params, &val, &dir); printf("buffer time = %d us\n", val); snd_pcm_hw_params_get_buffer_size(params, (snd_pcm_uframes_t *) &val); printf("buffer size = %d frames\n", val); snd_pcm_hw_params_get_periods(params, &val, &dir); printf("periods per buffer = %d frames\n", val); snd_pcm_hw_params_get_rate_numden(params, &val, &val2); printf("exact rate = %d/%d bps\n", val, val2); val = snd_pcm_hw_params_get_sbits(params); printf("significant bits = %d\n", val); snd_pcm_hw_params_get_tick_time(params, &val, &dir); printf("tick time = %d us\n", val); val = snd_pcm_hw_params_is_batch(params); printf("is batch = %d\n", val); val = snd_pcm_hw_params_is_block_transfer(params); printf("is block transfer = %d\n", val); val = snd_pcm_hw_params_is_double(params); printf("is double = %d\n", val); val = snd_pcm_hw_params_is_half_duplex(params); printf("is half duplex = %d\n", val); val = snd_pcm_hw_params_is_joint_duplex(params); printf("is joint duplex = %d\n", val); val = snd_pcm_hw_params_can_overrange(params); printf("can overrange = %d\n", val); val = snd_pcm_hw_params_can_mmap_sample_resolution(params); printf("can mmap = %d\n", val); val = snd_pcm_hw_params_can_pause(params); printf("can pause = %d\n", val); val = snd_pcm_hw_params_can_resume(params); printf("can resume = %d\n", val); val = snd_pcm_hw_params_can_sync_start(params); printf("can sync start = %d\n", val); snd_pcm_close(handle); return 0;}Listing 2 opens the default PCM device, sets some parameters andthen displays the values of most of the hardware parameters. It doesnot perform any sound playback or recording.The call to snd_pcm_open opens the default PCM device and sets theaccess mode to PLAYBACK. This function returns a handle in thefirst function argument that is used in subsequent calls tomanipulate the PCM stream. Like most ALSA library calls, the functionreturns an integer return status, a negative value indicating an errorcondition. In this case, we check the return code; if it indicatesfailure, we display the error message using the snd_strerror functionand exit. In the interest of clarity, I have omitted most of the errorchecking from the example programs. In a production application, oneshould check the return code of every API call and provide appropriateerror handling.
In order to set the hardware parameters for the stream, we need toallocate a variable of type snd_pcm_hw_params_t. We do this with themacro snd_pcm_hw_params_alloca. Next, we initialize the variable usingthe function snd_pcm_hw_params_any, passing the previously openedPCM stream.
We now set the desired hardware parameters using API calls that takethe PCM stream handle, the hardware parameters structure and theparameter value. We set the stream to interleaved mode, 16-bit samplesize, 2 channels and a 44,100 bps sampling rate. In the case ofthe sampling rate, sound hardware is not always able to support everysampling rate exactly. We use the functionsnd_pcm_hw_params_set_rate_near to request the nearest supportedsampling rate to the requested value. The hardware parameters are notactually made active until we call the function snd_pcm_hw_params.
The rest of the program obtains and displays a number of the PCMstream parameters, including the period and buffer sizes. The resultsdisplayed vary somewhat depending on the sound hardware.
After running the program on your system, experiment and makesome changes. Change the device name from default to hw:0,0 orplughw: and see whether the results change. Change the hardware parametervalues and observe how the displayed results change.
Listing 3. Simple Sound Playback
/*This example reads standard from input and writesto the default PCM device for 5 seconds of data.*//* Use the newer ALSA API */#define ALSA_PCM_NEW_HW_PARAMS_API#include <alsa/asoundlib.h>int main() { long loops; int rc; int size; snd_pcm_t *handle; snd_pcm_hw_params_t *params; unsigned int val; int dir; snd_pcm_uframes_t frames; char *buffer; /* Open PCM device for playback. */ rc = snd_pcm_open(&handle, "default", SND_PCM_STREAM_PLAYBACK, 0); if (rc < 0) { fprintf(stderr, "unable to open pcm device: %s\n", snd_strerror(rc)); exit(1); } /* Allocate a hardware parameters object. */ snd_pcm_hw_params_alloca(¶ms); /* Fill it in with default values. */ snd_pcm_hw_params_any(handle, params); /* Set the desired hardware parameters. */ /* Interleaved mode */ snd_pcm_hw_params_set_access(handle, params, SND_PCM_ACCESS_RW_INTERLEAVED); /* Signed 16-bit little-endian format */ snd_pcm_hw_params_set_format(handle, params, SND_PCM_FORMAT_S16_LE); /* Two channels (stereo) */ snd_pcm_hw_params_set_channels(handle, params, 2); /* 44100 bits/second sampling rate (CD quality) */ val = 44100; snd_pcm_hw_params_set_rate_near(handle, params, &val, &dir); /* Set period size to 32 frames. */ frames = 32; snd_pcm_hw_params_set_period_size_near(handle, params, &frames, &dir); /* Write the parameters to the driver */ rc = snd_pcm_hw_params(handle, params); if (rc < 0) { fprintf(stderr, "unable to set hw parameters: %s\n", snd_strerror(rc)); exit(1); } /* Use a buffer large enough to hold one period */ snd_pcm_hw_params_get_period_size(params, &frames, &dir); size = frames * 4; /* 2 bytes/sample, 2 channels */ buffer = (char *) malloc(size); /* We want to loop for 5 seconds */ snd_pcm_hw_params_get_period_time(params, &val, &dir); /* 5 seconds in microseconds divided by * period time */ loops = 5000000 / val; while (loops > 0) { loops--; rc = read(0, buffer, size); if (rc == 0) { fprintf(stderr, "end of file on input\n"); break; } else if (rc != size) { fprintf(stderr, "short read: read %d bytes\n", rc); } rc = snd_pcm_writei(handle, buffer, frames); if (rc == -EPIPE) { /* EPIPE means underrun */ fprintf(stderr, "underrun occurred\n"); snd_pcm_prepare(handle); } else if (rc < 0) { fprintf(stderr, "error from writei: %s\n", snd_strerror(rc)); } else if (rc != (int)frames) { fprintf(stderr, "short write, write %d frames\n", rc); } } snd_pcm_drain(handle); snd_pcm_close(handle); free(buffer); return 0;}Listing 3 extends the previous example by writing sound samples to thesound card to produce playback. In this case we read bytes fromstandard input, enough for one period, and write them to the soundcard until five seconds of data has been transferred.
The beginning of the program is the same as in the previous example—thePCM device is opened and the hardware parameters are set. We use theperiod size chosen by ALSA and make this the size of ourbuffer for storing samples. We then find out that period time sowe can calculate how many periods the program should process in orderto run for five seconds.
In the loop that manages data, we read from standard input and fillour buffer with one period of samples. We check for and handle errorsresulting from reaching the end offile or reading a different number of bytes from what was expected.
To send data to the PCM device, we use the snd_pcm_writei call. Itoperates much like the kernel write system call, except that the sizeis specified in frames. We check the return code for a number of errorconditions. A return code of EPIPE indicates that underrunoccurred, which causes the PCM stream to go into the XRUN state andstop processing data. The standard method to recover from this state is touse the snd_pcm_prepare function call to put the stream in thePREPARED state so it can start again the next time we write data tothe stream. If we receive a different error result, we display theerror code and continue. Finally, if the number of frames written isnot what was expected, we display an error message.
The program loops until five seconds' worth of frames has beentransferred or end of file read occurs on the input. We then callsnd_pcm_drain to allow any pending sound samples to be transferred, thenclose the stream. We free the dynamically allocated buffer and exit.
We should see that the program is not useful unless the input is redirected tosomething other than a console. Try running it with the device/dev/urandom, which produces random data, like this:
./example3 < /dev/urandom
The random data should produce white noise for five seconds.
Next, try redirecting the input to /dev/null or /dev/zero and compare theresults. Change some parameters, such as the sampling rate and data format, andsee how it affects the results.
Listing 4. Simple Sound Recording
/*This example reads from the default PCM deviceand writes to standard output for 5 seconds of data.*//* Use the newer ALSA API */#define ALSA_PCM_NEW_HW_PARAMS_API#include <alsa/asoundlib.h>int main() { long loops; int rc; int size; snd_pcm_t *handle; snd_pcm_hw_params_t *params; unsigned int val; int dir; snd_pcm_uframes_t frames; char *buffer; /* Open PCM device for recording (capture). */ rc = snd_pcm_open(&handle, "default", SND_PCM_STREAM_CAPTURE, 0); if (rc < 0) { fprintf(stderr, "unable to open pcm device: %s\n", snd_strerror(rc)); exit(1); } /* Allocate a hardware parameters object. */ snd_pcm_hw_params_alloca(¶ms); /* Fill it in with default values. */ snd_pcm_hw_params_any(handle, params); /* Set the desired hardware parameters. */ /* Interleaved mode */ snd_pcm_hw_params_set_access(handle, params, SND_PCM_ACCESS_RW_INTERLEAVED); /* Signed 16-bit little-endian format */ snd_pcm_hw_params_set_format(handle, params, SND_PCM_FORMAT_S16_LE); /* Two channels (stereo) */ snd_pcm_hw_params_set_channels(handle, params, 2); /* 44100 bits/second sampling rate (CD quality) */ val = 44100; snd_pcm_hw_params_set_rate_near(handle, params, &val, &dir); /* Set period size to 32 frames. */ frames = 32; snd_pcm_hw_params_set_period_size_near(handle, params, &frames, &dir); /* Write the parameters to the driver */ rc = snd_pcm_hw_params(handle, params); if (rc < 0) { fprintf(stderr, "unable to set hw parameters: %s\n", snd_strerror(rc)); exit(1); } /* Use a buffer large enough to hold one period */ snd_pcm_hw_params_get_period_size(params, &frames, &dir); size = frames * 4; /* 2 bytes/sample, 2 channels */ buffer = (char *) malloc(size); /* We want to loop for 5 seconds */ snd_pcm_hw_params_get_period_time(params, &val, &dir); loops = 5000000 / val; while (loops > 0) { loops--; rc = snd_pcm_readi(handle, buffer, frames); if (rc == -EPIPE) { /* EPIPE means overrun */ fprintf(stderr, "overrun occurred\n"); snd_pcm_prepare(handle); } else if (rc < 0) { fprintf(stderr, "error from read: %s\n", snd_strerror(rc)); } else if (rc != (int)frames) { fprintf(stderr, "short read, read %d frames\n", rc); } rc = write(1, buffer, size); if (rc != size) fprintf(stderr, "short write: wrote %d bytes\n", rc); } snd_pcm_drain(handle); snd_pcm_close(handle); free(buffer); return 0;}Listing 4 is much like Listing 3, except that we perform PCMcapture (recording). When we open the PCM stream, we specify the modeas SND_PCM_STREAM_CAPTURE.In the main processing loop, we read the samples from the soundhardware using snd_pcm_readi and write it to standard output usingwrite. We check for overrun and handle it in the same manner as we didunderrun in Listing 3.
Running Listing 4 records approximately five seconds of data andsends it to standard out; you should redirect it to a file.If you have a microphone connected to your sound card, use a mixerprogram to set the recording source and level. Alternatively, you canrun a CD player program and set the recording source to CD. Try runningListing 4 and redirecting the output to a file. You then can run Listing3 to play back the data:
./listing4 > sound.raw./listing3 < sound.raw
If your sound card supports full duplex sound, you should be able to pipe theprograms together and hear the recorded sound coming out of the soundcard by typing:./listing4 | ./listing3.By changing the PCM parameters you can experiment with the effect ofsampling rates and formats.
In the previous examples, the PCM streams were operating in blockingmode, that is, the calls would not return until the data had beentransferred. In an interactive event-driven application, this situation couldlock up the application for unacceptably long periods of time. ALSAallows opening a stream in nonblocking mode where the read and writefunctions return immediately. If data transfers are pending and thecalls cannot be processed, ALSA returns an error code of EBUSY.
Many graphical applications use callbacks to handle events. ALSAsupports opening a PCM stream in asynchronous mode. This allowsregistering a callback function to be called when a period of sampledata has been transferred.
The snd_pcm_readi and snd_pcm_writei calls used here are similar to theLinux read and write system calls. The letter i indicates that theframes are interleaved; corresponding functions exist fornon-interleaved mode. Many devices under Linux also support the mmapsystem call, which maps them into memory where they can be manipulatedwith pointers. Finally, ALSA supports opening a PCM channel in mmapmode, which allows efficient zero copy access to sound data.
I hope this article has motivated you to try some ALSA programming.As the 2.6 kernel becomes commonly used by Linux distributions, ALSAshould become more widely used, and its advanced features should helpLinux audio applications move forward.
My thanks to Jaroslav Kysela and Takashi Iwai for reviewing a draftof this article and providing me with useful comments.
Resources for this article:/article/7705.
Jeff Tranter has been using, writing about and contributing to Linuxsince 1992. He works for Xandros Corporation in Ottawa, Canada.
