- Notifications
You must be signed in to change notification settings - Fork161
State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.
License
descriptinc/descript-audio-codec
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
This repository contains training and inference scriptsfor the Descript Audio Codec (.dac), a high fidelity generalneural audio codec, introduced in the paper titledHigh-Fidelity Audio Compression with Improved RVQGAN.
arXiv Paper: High-Fidelity Audio Compression with Improved RVQGAN
📈Demo Site
⚙Model Weights
👉 With Descript Audio Codec, you can compress44.1 KHz audio into discrete codes at alow 8 kbps bitrate.
🤌 That's approximately90x compression while maintaining exceptional fidelity and minimizing artifacts.
💪 Our universal model works on all domains (speech, environment, music, etc.), making it widely applicable to generative modeling of all audio.
👌 It can be used as a drop-in replacement for EnCodec for all audio language modeling applications (such as AudioLMs, MusicLMs, MusicGen, etc.)
pip install descript-audio-codecOR
pip install git+https://github.com/descriptinc/descript-audio-codecWeights are released as part of this repo under MIT license.We release weights for models that can natively support 16 kHz, 24kHz, and 44.1kHz sampling rates.Weights are automatically downloaded when you first runencode ordecode command. You can cache them using one of the following commands
python3 -m dac download# downloads the default 44kHz variantpython3 -m dac download --model_type 44khz# downloads the 44kHz variantpython3 -m dac download --model_type 24khz# downloads the 24kHz variantpython3 -m dac download --model_type 16khz# downloads the 16kHz variant
We provide a Dockerfile that installs all required dependencies for encoding and decoding. The build process caches the default model weights inside the image. This allows the image to be used without an internet connection.Please refer to instructions below.
python3 -m dac encode /path/to/input --output /path/to/output/codesThis command will create.dac files with the same name as the input files.It will also preserve the directory structure relative to input root andre-create it in the output directory. Please usepython -m dac encode --helpfor more options.
python3 -m dac decode /path/to/output/codes --output /path/to/reconstructed_inputThis command will create.wav files with the same name as the input files.It will also preserve the directory structure relative to input root andre-create it in the output directory. Please usepython -m dac decode --helpfor more options.
importdacfromaudiotoolsimportAudioSignal# Download a modelmodel_path=dac.utils.download(model_type="44khz")model=dac.DAC.load(model_path)model.to('cuda')# Load audio signal filesignal=AudioSignal('input.wav')# Encode audio signal as one long file# (may run out of GPU memory on long files)signal.to(model.device)x=model.preprocess(signal.audio_data,signal.sample_rate)z,codes,latents,_,_=model.encode(x)# Decode audio signaly=model.decode(z)# Alternatively, use the `compress` and `decompress` functions# to compress long files.signal=signal.cpu()x=model.compress(signal)# Save and load to and from diskx.save("compressed.dac")x=dac.DACFile.load("compressed.dac")# Decompress it back to an AudioSignaly=model.decompress(x)# Write to filey.write('output.wav')
We provide a dockerfile to build a docker image with all the necessarydependencies.
Building the image.
docker build -t dac .Using the image.
Usage on CPU:
docker run dac <command>Usage on GPU:
docker run --gpus=all dac <command><command>can be one of the compression and reconstruction commands listedabove. For example, if you want to run compression,docker run --gpus=all dac python3 -m dac encode ...
The baseline model configuration can be trained using the following commands.
Please install the correct dependencies
pip install -e ".[dev]"We have provided a Dockerfile and docker compose setup that makes running experiments easy.
To build the docker image do:
docker compose buildThen, to launch a container, do:
docker compose run -p 8888:8888 -p 6006:6006 devThe port arguments (-p) are optional, but useful if you want to launch a Jupyter and Tensorboard instances within the container. Thedefault password for Jupyter ispassword, and the current directoryis mounted to/u/home/src, which also becomes the working directory.
Then, run your training command.
export CUDA_VISIBLE_DEVICES=0python scripts/train.py --args.load conf/ablations/baseline.yml --save_path runs/baseline/export CUDA_VISIBLE_DEVICES=0,1torchrun --nproc_per_node gpu scripts/train.py --args.load conf/ablations/baseline.yml --save_path runs/baseline/We provide two test scripts to test CLI + training functionality. Pleasemake sure that the trainig pre-requisites are satisfied before launching thesetests. To launch these tests please run
python -m pytest testsAbout
State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.

