Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Faster Whisper transcription with CTranslate2

License

NotificationsYou must be signed in to change notification settings

SYSTRAN/faster-whisper

Repository files navigation

CIPyPI version

Faster Whisper transcription with CTranslate2

faster-whisper is a reimplementation of OpenAI's Whisper model usingCTranslate2, which is a fast inference engine for Transformer models.

This implementation is up to 4 times faster thanopenai/whisper for the same accuracy while using less memory. The efficiency can be further improved with 8-bit quantization on both CPU and GPU.

Benchmark

Whisper

For reference, here's the time and memory usage that are required to transcribe13 minutes of audio using different implementations:

Large-v2 model on GPU

ImplementationPrecisionBeam sizeTimeVRAM Usage
openai/whisperfp1652m23s4708MB
whisper.cpp (Flash Attention)fp1651m05s4127MB
transformers (SDPA)1fp1651m52s4960MB
faster-whisperfp1651m03s4525MB
faster-whisper (batch_size=8)fp16517s6090MB
faster-whisperint8559s2926MB
faster-whisper (batch_size=8)int8516s4500MB

distil-whisper-large-v3 model on GPU

ImplementationPrecisionBeam sizeTimeYT Commons WER
transformers (SDPA) (batch_size=16)fp16546m12s14.801
faster-whisper (batch_size=16)fp16525m50s13.527

GPU Benchmarks are Executed with CUDA 12.4 on a NVIDIA RTX 3070 Ti 8GB.

Small model on CPU

ImplementationPrecisionBeam sizeTimeRAM Usage
openai/whisperfp3256m58s2335MB
whisper.cppfp3252m05s1049MB
whisper.cpp (OpenVINO)fp3251m45s1642MB
faster-whisperfp3252m37s2257MB
faster-whisper (batch_size=8)fp3251m06s4230MB
faster-whisperint851m42s1477MB
faster-whisper (batch_size=8)int8551s3608MB

Executed with 8 threads on an Intel Core i7-12700K.

Requirements

  • Python 3.9 or greater

Unlike openai-whisper, FFmpeg doesnot need to be installed on the system. The audio is decoded with the Python libraryPyAV which bundles the FFmpeg libraries in its package.

GPU

GPU execution requires the following NVIDIA libraries to be installed:

Note: The latest versions ofctranslate2 only support CUDA 12 and cuDNN 9. For CUDA 11 and cuDNN 8, the current workaround is downgrading to the3.24.0 version ofctranslate2, for CUDA 12 and cuDNN 8, downgrade to the4.4.0 version ofctranslate2, (This can be done withpip install --force-reinstall ctranslate2==4.4.0 or specifying the version in arequirements.txt).

There are multiple ways to install the NVIDIA libraries mentioned above. The recommended way is described in the official NVIDIA documentation, but we also suggest other installation methods below.

Other installation methods (click to expand)

Note: For all these methods below, keep in mind the above note regarding CUDA versions. Depending on your setup, you may need to install theCUDA 11 versions of libraries that correspond to the CUDA 12 libraries listed in the instructions below.

Use Docker

The libraries (cuBLAS, cuDNN) are installed in this official NVIDIA CUDA Docker images:nvidia/cuda:12.3.2-cudnn9-runtime-ubuntu22.04.

Install withpip (Linux only)

On Linux these libraries can be installed withpip. Note thatLD_LIBRARY_PATH must be set before launching Python.

pip install nvidia-cublas-cu12 nvidia-cudnn-cu12==9.*export LD_LIBRARY_PATH=`python3 -c'import os; import nvidia.cublas.lib; import nvidia.cudnn.lib; print(os.path.dirname(nvidia.cublas.lib.__file__) + ":" + os.path.dirname(nvidia.cudnn.lib.__file__))'`

Download the libraries from Purfview's repository (Windows & Linux)

Purfview'swhisper-standalone-win provides the required NVIDIA libraries for Windows & Linux in asingle archive. Decompress the archive and place the libraries in a directory included in thePATH.

Installation

The module can be installed fromPyPI:

pip install faster-whisper
Other installation methods (click to expand)

Install the master branch

pip install --force-reinstall"faster-whisper @ https://github.com/SYSTRAN/faster-whisper/archive/refs/heads/master.tar.gz"

Install a specific commit

pip install --force-reinstall"faster-whisper @ https://github.com/SYSTRAN/faster-whisper/archive/a4f1cc8f11433e454c3934442b5e1a4ed5e865c3.tar.gz"

Usage

Faster-whisper

fromfaster_whisperimportWhisperModelmodel_size="large-v3"# Run on GPU with FP16model=WhisperModel(model_size,device="cuda",compute_type="float16")# or run on GPU with INT8# model = WhisperModel(model_size, device="cuda", compute_type="int8_float16")# or run on CPU with INT8# model = WhisperModel(model_size, device="cpu", compute_type="int8")segments,info=model.transcribe("audio.mp3",beam_size=5)print("Detected language '%s' with probability %f"% (info.language,info.language_probability))forsegmentinsegments:print("[%.2fs -> %.2fs] %s"% (segment.start,segment.end,segment.text))

Warning:segments is agenerator so the transcription only starts when you iterate over it. The transcription can be run to completion by gathering the segments in a list or afor loop:

segments,_=model.transcribe("audio.mp3")segments=list(segments)# The transcription will actually run here.

Batched Transcription

The following code snippet illustrates how to run batched transcription on an example audio file.BatchedInferencePipeline.transcribe is a drop-in replacement forWhisperModel.transcribe

fromfaster_whisperimportWhisperModel,BatchedInferencePipelinemodel=WhisperModel("turbo",device="cuda",compute_type="float16")batched_model=BatchedInferencePipeline(model=model)segments,info=batched_model.transcribe("audio.mp3",batch_size=16)forsegmentinsegments:print("[%.2fs -> %.2fs] %s"% (segment.start,segment.end,segment.text))

Faster Distil-Whisper

The Distil-Whisper checkpoints are compatible with the Faster-Whisper package. In particular, the latestdistil-large-v3checkpoint is intrinsically designed to work with the Faster-Whisper transcription algorithm. The following code snippetdemonstrates how to run inference with distil-large-v3 on a specified audio file:

fromfaster_whisperimportWhisperModelmodel_size="distil-large-v3"model=WhisperModel(model_size,device="cuda",compute_type="float16")segments,info=model.transcribe("audio.mp3",beam_size=5,language="en",condition_on_previous_text=False)forsegmentinsegments:print("[%.2fs -> %.2fs] %s"% (segment.start,segment.end,segment.text))

For more information about the distil-large-v3 model, refer to the originalmodel card.

Word-level timestamps

segments,_=model.transcribe("audio.mp3",word_timestamps=True)forsegmentinsegments:forwordinsegment.words:print("[%.2fs -> %.2fs] %s"% (word.start,word.end,word.word))

VAD filter

The library integrates theSilero VAD model to filter out parts of the audio without speech:

segments,_=model.transcribe("audio.mp3",vad_filter=True)

The default behavior is conservative and only removes silence longer than 2 seconds. See the available VAD parameters and default values in thesource code. They can be customized with the dictionary argumentvad_parameters:

segments,_=model.transcribe("audio.mp3",vad_filter=True,vad_parameters=dict(min_silence_duration_ms=500),)

Vad filter is enabled by default for batched transcription.

Logging

The library logging level can be configured like this:

importlogginglogging.basicConfig()logging.getLogger("faster_whisper").setLevel(logging.DEBUG)

Going further

See more model and transcription options in theWhisperModel class implementation.

Community integrations

Here is a non exhaustive list of open-source projects using faster-whisper. Feel free to add your project to the list!

  • speaches is an OpenAI compatible server usingfaster-whisper. It's easily deployable with Docker, works with OpenAI SDKs/CLI, supports streaming, and live transcription.
  • WhisperX is an award-winning Python library that offers speaker diarization and accurate word-level timestamps using wav2vec2 alignment
  • whisper-ctranslate2 is a command line client based on faster-whisper and compatible with the original client from openai/whisper.
  • whisper-diarize is a speaker diarization tool that is based on faster-whisper and NVIDIA NeMo.
  • whisper-standalone-win Standalone CLI executables of faster-whisper for Windows, Linux & macOS.
  • asr-sd-pipeline provides a scalable, modular, end to end multi-speaker speech to text solution implemented using AzureML pipelines.
  • Open-Lyrics is a Python library that transcribes voice files using faster-whisper, and translates/polishes the resulting text into.lrc files in the desired language using OpenAI-GPT.
  • wscribe is a flexible transcript generation tool supporting faster-whisper, it can export word level transcript and the exported transcript then can be edited withwscribe-editor
  • aTrain is a graphical user interface implementation of faster-whisper developed at the BANDAS-Center at the University of Graz for transcription and diarization in Windows (Windows Store App) and Linux.
  • Whisper-Streaming implements real-time mode for offline Whisper-like speech-to-text models with faster-whisper as the most recommended back-end. It implements a streaming policy with self-adaptive latency based on the actual source complexity, and demonstrates the state of the art.
  • WhisperLive is a nearly-live implementation of OpenAI's Whisper which uses faster-whisper as the backend to transcribe audio in real-time.
  • Faster-Whisper-Transcriber is a simple but reliable voice transcriber that provides a user-friendly interface.
  • Open-dubbing is open dubbing is an AI dubbing system which uses machine learning models to automatically translate and synchronize audio dialogue into different languages.

Model conversion

When loading a model from its size such asWhisperModel("large-v3"), the corresponding CTranslate2 model is automatically downloaded from theHugging Face Hub.

We also provide a script to convert any Whisper models compatible with the Transformers library. They could be the original OpenAI models or user fine-tuned models.

For example the command below converts theoriginal "large-v3" Whisper model and saves the weights in FP16:

pip install transformers[torch]>=4.23ct2-transformers-converter --model openai/whisper-large-v3 --output_dir whisper-large-v3-ct2--copy_files tokenizer.json preprocessor_config.json --quantization float16
  • The option--model accepts a model name on the Hub or a path to a model directory.
  • If the option--copy_files tokenizer.json is not used, the tokenizer configuration is automatically downloaded when the model is loaded later.

Models can also be converted from the code. See theconversion API.

Load a converted model

  1. Directly load the model from a local directory:
model=faster_whisper.WhisperModel("whisper-large-v3-ct2")
  1. Upload your model to the Hugging Face Hub and load it from its name:
model=faster_whisper.WhisperModel("username/whisper-large-v3-ct2")

Comparing performance against other implementations

If you are comparing the performance against other Whisper implementations, you should make sure to run the comparison with similar settings. In particular:

  • Verify that the same transcription options are used, especially the same beam size. For example in openai/whisper,model.transcribe uses a default beam size of 1 but here we use a default beam size of 5.
  • Transcription speed is closely affected by the number of words in the transcript, so ensure that other implementations have a similar WER (Word Error Rate) to this one.
  • When running on CPU, make sure to set the same number of threads. Many frameworks will read the environment variableOMP_NUM_THREADS, which can be set when running your script:
OMP_NUM_THREADS=4 python3 my_script.py

Footnotes

  1. transformers OOM for any batch size > 1


[8]ページ先頭

©2009-2025 Movatter.jp