maxrmorrison/torchcrepePublic

NotificationsYou must be signed in to change notification settings
Fork70
Star455

Pytorch implementation of the CREPE pitch tracker

License

MIT license

455 stars 70 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
.github/workflows		.github/workflows
tests		tests
torchcrepe		torchcrepe
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Repository files navigation

torchcrepe

Pytorch implementation of the CREPE [1] pitch tracker. The original Tensorflowimplementation can be foundhere. Theprovided model weights were obtained by converting the "tiny" and "full" modelsusingMMdnn, an open-source modelmanagement framework.

Installation

Perform the system-dependent PyTorch install using the instructions foundhere.

pip install torchcrepe

Usage

Computing pitch and periodicity from audio

importtorchcrepe# Load audioaudio,sr=torchcrepe.load.audio( ... )# Here we'll use a 5 millisecond hop lengthhop_length=int(sr/200.)# Provide a sensible frequency range for your domain (upper limit is 2006 Hz)# This would be a reasonable range for speechfmin=50fmax=550# Select a model capacity--one of "tiny" or "full"model='tiny'# Choose a device to use for inferencedevice='cuda:0'# Pick a batch size that doesn't cause memory errors on your gpubatch_size=2048# Compute pitch using first gpupitch=torchcrepe.predict(audio,sr,hop_length,fmin,fmax,model,batch_size=batch_size,device=device)

A periodicity metric similar to the Crepe confidence score can also beextracted by passingreturn_periodicity=True totorchcrepe.predict.

Decoding

By default,torchcrepe uses Viterbi decoding on the softmax of the networkoutput. This is different than the original implementation, which uses aweighted average near the argmax of binary cross-entropy probabilities.The argmax operation can cause double/half frequency errors. These can beremoved by penalizing large pitch jumps via Viterbi decoding. Thedecodesubmodule provides some options for decoding.

# Decode using viterbi decoding (default)torchcrepe.predict(...,decoder=torchcrepe.decode.viterbi)# Decode using weighted argmax (as in the original implementation)torchcrepe.predict(...,decoder=torchcrepe.decode.weighted_argmax)# Decode using argmaxtorchcrepe.predict(...,decoder=torchcrepe.decode.argmax)

Filtering and thresholding

When periodicity is low, the pitch is less reliable. For some problems, itmakes sense to mask these less reliable pitch values. However, the periodicitycan be noisy and the pitch has quantization artifacts.torchcrepe providessubmodulesfilter andthreshold for this purpose. The filter and thresholdparameters should be tuned to your data. For clean speech, a 10-20 millisecondwindow with a threshold of 0.21 has worked.

# We'll use a 15 millisecond window assuming a hop length of 5 millisecondswin_length=3# Median filter noisy confidence valueperiodicity=torchcrepe.filter.median(periodicity,win_length)# Remove inharmonic regionspitch=torchcrepe.threshold.At(.21)(pitch,periodicity)# Optionally smooth pitch to remove quantization artifactspitch=torchcrepe.filter.mean(pitch,win_length)

For more fine-grained control over pitch thresholding, seetorchcrepe.threshold.Hysteresis. This is especially useful for removingspurious voiced regions caused by noise in the periodicity values, buthas more parameters and may require more manual tuning to your data.

CREPE was not trained on silent audio. Therefore, it sometimes assigns highconfidence to pitch bins in silent regions. You can usetorchcrepe.threshold.Silence to manually set the periodicity in silentregions to zero.

periodicity=torchcrepe.threshold.Silence(-60.)(periodicity,audio,sr,hop_length)

Computing the CREPE model output activations

batch=next(torchcrepe.preprocess(audio,sr,hop_length))probabilities=torchcrepe.infer(batch)

Computing the CREPE embedding space

As in Differentiable Digital Signal Processing [2], this uses the output of thefifth max-pooling layer as a pretrained pitch embedding

embeddings=torchcrepe.embed(audio,sr,hop_length)

Computing from files

torchcrepe defines the following functions convenient for predictingdirectly from audio files on disk. Each of these functions also takesadevice argument that can be used for device placement (e.g.,device='cuda:0').

torchcrepe.predict_from_file(audio_file, ...)torchcrepe.predict_from_file_to_file(audio_file,output_pitch_file,output_periodicity_file, ...)torchcrepe.predict_from_files_to_files(audio_files,output_pitch_files,output_periodicity_files, ...)torchcrepe.embed_from_file(audio_file, ...)torchcrepe.embed_from_file_to_file(audio_file,output_file, ...)torchcrepe.embed_from_files_to_files(audio_files,output_files, ...)

Command-line interface

usage: python -m torchcrepe    [-h]    --audio_files AUDIO_FILES [AUDIO_FILES ...]    --output_files OUTPUT_FILES [OUTPUT_FILES ...]    [--hop_length HOP_LENGTH]    [--output_periodicity_files OUTPUT_PERIODICITY_FILES [OUTPUT_PERIODICITY_FILES ...]]    [--embed]    [--fmin FMIN]    [--fmax FMAX]    [--model MODEL]    [--decoder DECODER]    [--gpu GPU]    [--no_pad]optional arguments:-h, --help            show thishelp message andexit  --audio_files AUDIO_FILES [AUDIO_FILES ...]                        The audio file to process  --output_files OUTPUT_FILES [OUTPUT_FILES ...]                        The file to save pitch or embedding  --hop_length HOP_LENGTH                        The hop length of the analysis window  --output_periodicity_files OUTPUT_PERIODICITY_FILES [OUTPUT_PERIODICITY_FILES ...]                        The file to save periodicity  --embed               Performs embedding instead of pitch prediction  --fmin FMIN           The minimum frequency allowed  --fmax FMAX           The maximum frequency allowed  --model MODEL         The model capacity. One of"tiny" or"full"  --decoder DECODER     The decoder to use. One of"argmax","viterbi", or"weighted_argmax"  --gpu GPU             The gpu to perform inference on  --no_pad              Whether to pad the audio

Tests

The module tests can be run as follows.

pip install pytestpytest

References

[1] J. W. Kim, J. Salamon, P. Li, and J. P. Bello, “Crepe: AConvolutional Representation for Pitch Estimation,” in 2018 IEEEInternational Conference on Acoustics, Speech and SignalProcessing (ICASSP).

[2] J. H. Engel, L. Hantrakul, C. Gu, and A. Roberts,“DDSP: Differentiable Digital Signal Processing,” in2020 International Conference on LearningRepresentations (ICLR).

About

Pytorch implementation of the CREPE pitch tracker

Languages

Python100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

torchcrepe

Installation

Usage

Computing pitch and periodicity from audio

Decoding

Filtering and thresholding

Computing the CREPE model output activations

Computing the CREPE embedding space

Computing from files

Command-line interface

Tests

References

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Used by1.6k

Contributors10

Uh oh!

Languages

Movatterモバイル変換

License

maxrmorrison/torchcrepe

Folders and files

Latest commit

History

Repository files navigation

torchcrepe

Installation

Usage

Computing pitch and periodicity from audio

Decoding

Filtering and thresholding

Computing the CREPE model output activations

Computing the CREPE embedding space

Computing from files

Command-line interface

Tests

References

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Used by1.6k

Contributors10

Uh oh!

Languages

Packages