Movatterモバイル変換

xiph/rnnoisePublic

NotificationsYou must be signed in to change notification settings
Fork962
Star4.8k

Recurrent neural network for audio noise reduction

License

BSD-3-Clause license

4.8k stars 962 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 157 Commits
doc		doc
examples		examples
include		include
m4		m4
scripts		scripts
src		src
torch		torch
training		training
.gitlab-ci.yml		.gitlab-ci.yml
AUTHORS		AUTHORS
COPYING		COPYING
Makefile.am		Makefile.am
README		README
autogen.sh		autogen.sh
configure.ac		configure.ac
datasets.txt		datasets.txt
download_model.sh		download_model.sh
model_version		model_version
rnnoise-uninstalled.pc.in		rnnoise-uninstalled.pc.in
rnnoise.pc.in		rnnoise.pc.in
update_version		update_version

Repository files navigation

RNNoise is a noise suppression library based on a recurrent neural network.A description of the algorithm is provided in the following paper:J.-M. Valin, A Hybrid DSP/Deep Learning Approach to Real-Time Full-Band SpeechEnhancement, Proceedings of IEEE Multimedia Signal Processing (MMSP) Workshop,arXiv:1709.08243, 2018.https://arxiv.org/pdf/1709.08243.pdfAn interactive demo of version 0.1 is available at:https://jmvalin.ca/demo/rnnoise/To compile, just type:% ./autogen.sh% ./configure% makeOptionally:% make installIt is recommended to either set -march= in the CFLAGS to an architecturewith AVX2 support or to add --enable-x86-rtcd to the configure scriptso that AVX2 (or SSE4.1) can at least be used as an option.Note that the autogen.sh script will automatically download the model filesfrom the Xiph.Org servers, since those are too large to put in Git.While it is meant to be used as a library, a simple command-line tool isprovided as an example. It operates on RAW 16-bit (machine endian) monoPCM files sampled at 48 kHz. It can be used as:% ./examples/rnnoise_demo <noisy speech> <output denoised>The output is also a 16-bit raw PCM file.NOTE AGAIN, THE INPUT and OUTPUT ARE IN RAW FORMAT, NOT WAV.The latest version of the source is available fromhttps://gitlab.xiph.org/xiph/rnnoise .  The GitHub repositoryis a convenience copy.== Training ==The models distributed with RNNoise are now trained using only the publiclyavailable datasets listed below and using the training precedure describedhere. Exact results will still depend on the the exact mix of data used,on how long the training is performed and on the various random seeds involved.To train an RNNoise model, you need both clean speech data, and noise data.Both need to be sampled at 48 kHz, in 16-bit PCM format (machine endian).Clean speech data can be obtained from the datasets listed in the datasets.txtfile, or by downloaded the already-concatenation of those files inhttps://media.xiph.org/rnnoise/data/tts_speech_48k.swFor noise data, we suggest the background_noise.sw and foreground_noise.sw(or later versions) noise files fromhttps://media.xiph.org/rnnoise/data/The foreground_noise.sw file contains noise signals that are meant to be addedto the background noise (e.g. keyboard sounds). Optionally, the foreground noisefile can even be denoised with a traditional denoiser (e.g. libspeexdsp) tokeep only the transient components. For background noise, the data from theoriginal RNNoise noise collection have now been sufficiently filtered toprovide good results -- either alone or in combination with thebackground_noise.sw file. The dataset can be downloaded (updated Jan 30th 2025)from:https://media.xiph.org/rnnoise/rnnoise_contributions.tar.gzThe first step is to take the speech and noise, and mix them in a variety ofways to simulate real life conditions (including pauses, filtering and more).Assuming the files are called speech.pcm and noise.pcm, start by generatingthe training feature data with:% ./dump_features speech.pcm background_noise.pcm foreground_noise.pcm features.f32 <count>where <count> is the number of sequences to process. The number of sequencesshould be at least 10000, but the more the better (200000 or more isrecommended).Optionally, training can also simulate reverberation, in which case room impulseresponses (RIR) are also needed. Limited RIR data is available at:https://media.xiph.org/rnnoise/data/measured_rirs-v2.tar.gzThe format for those is raw 32-bit floating-point (files are little endian).Assuming a list of all the RIR files is contained in a rir_list.txt file,the training feature data can be generated with:% ./dump_features -rir_list rir_list.txt speech.pcm background_noise.pcm foreground_noise.pcm features.f32 <count>To make the feature generation faster, you can use the script provided inscript/dump_features_parallel.sh (you will need to modify the script if youwant to add RIR augmentation).To use it:% script/dump_features_parallel.sh ./dump_features speech.pcm background_noise.pcm foreground_noise.pcm features.f32 <count> rir_list.txtwhich will run nb_processes processes, each for count sequences, andconcatenate the output to a single file.Once the feature file is computed, you can start the training with:% python3 train_rnnoise.py features.f32 output_directoryChoose a number of epochs (using --epochs) that leads to about 75000 weightupdates. The training will produce .pth files, e.g. rnnoise_50.pth .The next step is to convert the model to C files using:% python3 dump_rnnoise_weights.py --quantize rnnoise_50.pth rnnoise_cwhich will produce the rnnoise_data.c and rnnoise_data.h files in thernnoise_c directory.Copy these files to src/ and then build RNNoise using the instructions above.For slightly better results, a trained model can be used to remove any noisefrom the "clean" training speech, before restaring the denoising processagain (no need to do that more than once).== Loadable Models ==The model format has changed since v0.1.1. Models now use a binary"machine endian" format. To output a model in that format, build RNNoisewith that model and use the dump_weights_blob executable to output aweights_blob.bin binary file. That file can then be used with thernnoise_model_from_file() API call. Note that the model object MUST NOTbe deleted while the RNNoise state is active and the file MUST NOTbe closed.To avoid including the default model in the build (e.g. to reduce downloadsize) and rely only on model loading, add -DUSE_WEIGHTS_FILE to the CFLAGS.To be able to load different models, the model size (and header file) needsto patch the size use during build. Otherwise the model will not loadWe provide a "little" model with half as an alternative. To use the smallermodel, rename rnnoise_data_little.c to rnnoise_data.c. It is possibleto build both the regular and little binary weights and load any of themat run time since the little model has the same size as the regular one(except for the increased sparsity).