- Notifications
You must be signed in to change notification settings - Fork5
ccoreilly/wav2vec2-catala
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Models de reconeixement automàtic de la parla Wav2Vec2 pel Català.
S'ha fet fine-tuning a partir de dos models base, elfacebook/wav2vec2-large-xlsr-53 i elfacebook/wav2vec2-large-100k-voxpopuli. Els podeu trobar a:
Fine-tuned Wav2Vec2 models for the Catalan language based onfacebook/wav2vec2-large-xlsr-53 andfacebook/wav2vec2-large-100k-voxpopuli
You can find the models in the huggingface repository:
Avaluada en els següents datasets no vistos durant l'entrenament:
Word error rate was evaluated on the following datasets unseen by the model:
| Dataset | XLSR-53 | VoxPopuli |
|---|---|---|
| Test split CV+ParlamentParla | 6,92% | 5.98% |
| Google Crowsourced Corpus | 12,99% | 12,14% |
| Audiobook “La llegenda de Sant Jordi” | 13,23% | 12,02% |
Com que les dades de CommonVoice contenen metadades sobre l'edat, el gènere i la variant dialectal del parlant, podem avaluar el model segons aquests paràmetres. Desafortunadament, per alguna de les categories no hi ha prou dades com per considerar la mostra significativa, és per això que s'acompanya la taxa d'error amb la mida de la mostra.
| Edat | Mostra | XLSR-53 | VoxPopuli |
|---|---|---|---|
| 10-19 | 64 | 7,96% | 8,54% |
| 20-29 | 330 | 7,52% | 6,10% |
| 30-39 | 377 | 5,65% | 4,55% |
| 40-49 | 611 | 6,37% | 6,17% |
| 50-59 | 438 | 5,75% | 5,30% |
| 60-69 | 166 | 4,82% | 4,20% |
| 70-79 | 37 | 5,81% | 5,33% |
| Accent | Mostra | XLSR-53 | VoxPopuli |
|---|---|---|---|
| Balear | 64 | 5,84% | 5,11% |
| Central | 1202 | 5,98% | 5,37% |
| Nord-occidental | 140 | 6,60% | 5,77% |
| Septentrional | 75 | 5,11% | 5,58% |
| Valencià | 290 | 5,69% | 5,30% |
| Sexe | Mostra | XLSR-53 | VoxPopuli |
|---|---|---|---|
| Femení | 749 | 5,57% | 4,95% |
| Masculí | 1280 | 6,65% | 5,98% |
importtorchimporttorchaudiofromdatasetsimportload_datasetfromtransformersimportWav2Vec2ForCTC,Wav2Vec2Processortest_dataset=load_dataset("common_voice","ca",split="test[:2%]")processor=Wav2Vec2Processor.from_pretrained("ccoreilly/wav2vec2-large-100k-voxpopuli-catala")model=Wav2Vec2ForCTC.from_pretrained("ccoreilly/wav2vec2-large-100k-voxpopuli-catala")resampler=torchaudio.transforms.Resample(48_000,16_000)# Preprocessing the datasets.# We need to read the audio files as arraysdefspeech_file_to_array_fn(batch):speech_array,sampling_rate=torchaudio.load(batch["path"])batch["speech"]=resampler(speech_array).squeeze().numpy()returnbatchtest_dataset=test_dataset.map(speech_file_to_array_fn)inputs=processor(test_dataset["speech"][:2],sampling_rate=16_000,return_tensors="pt",padding=True)withtorch.no_grad():logits=model(inputs.input_values,attention_mask=inputs.attention_mask).logitspredicted_ids=torch.argmax(logits,dim=-1)print("Prediction:",processor.batch_decode(predicted_ids))print("Reference:",test_dataset["sentence"][:2])
About
Wav2Vec 2.0 catalan training scripts and models
Resources
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Uh oh!
There was an error while loading.Please reload this page.