- Notifications
You must be signed in to change notification settings - Fork131
-
Can you tell me what is the fastest solution for whisper. Are there any models that do not have the 30-second audios limitation |
BetaWas this translation helpful?Give feedback.
All reactions
Replies: 1 comment 12 replies
-
Do you want to recognize audios longer or shorter than 30s? |
BetaWas this translation helpful?Give feedback.
All reactions
-
Why not, you say Using Fine-tuned Whisper Official whisper models only accept 30-second audios. To improve the throughput, you could fine-tune the whisper model to remove the 30 seconds restriction. Seeexamples. We prepared twoChinese fine-tuned whisper TensorRT-LLM weights repo. They could be directly used fromhere. |
BetaWas this translation helpful?Give feedback.
All reactions
-
Theoretically, if your RAM is infinite, then it is possible. Removing the 30s constraint means you can input audio files less than 30s without padding. It does not mean you.can input files of arbitrary length. By the way, is there any disadvantage of using a vad model here? |
BetaWas this translation helpful?Give feedback.
All reactions
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
-
Okay, I realized that long files aren't possible. How do i enter audio files less than 30 seconds without filling in ? |
BetaWas this translation helpful?Give feedback.
All reactions
-
Padding is invisible for users if the input audio is less than 30s. it is an implementation detail. |
BetaWas this translation helpful?Give feedback.
All reactions
👀 1
-
Faytuning the model has improved your results ? strings x = (x + self.positional_embedding[:x.shape[1], :]).to(x.dtype) |
BetaWas this translation helpful?Give feedback.