- Notifications
You must be signed in to change notification settings - Fork7
License
NotificationsYou must be signed in to change notification settings
sovaai/sova-dataset
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
SOVA Dataset is free public STT/ASR dataset.
Key facts:
- Russian, English and Chinese languages
- ~ 32 328 hours
- ~ 3,21 TB in
.wav
format
Name | Lang | Hours | Size | Source | Equipment | Annotation | Speech type | Augmentation | Quality | |
---|---|---|---|---|---|---|---|---|---|---|
EngAudiobooksOriginal | Download | EN | 7 130 | 743 Gb | audiobook | professional | forced alignment | reading | none | 95% |
EngAudiobooksNoisy | Download | EN | 3 873 | 310 Gb | audiobook | professional | forced alignment | reading | phone calls | 95% |
RuAudiobooksDevices | Download | RU | 298 | 30,24 Gb | audiobook | unprofessional | manual | reading | none | 99% |
RuDevices | Download | RU | 101 | 10,42 Gb | audio records | unprofessional | manual | live speech | none | 98% |
RuYoutube | Download | RU | 17 451 | 1 873 Gb | audio records | unprofessional | asr | live speech | none | 95% |
ZhYoutube | Download | CN | 3 475,1 | 321 Gb | audio records | unprofessional | asr | live speech | none | 97.83% |
TOTAL | - | - | 32 328,1 | 3 287,66 Gb (3,21 TB) | - | - | - | - | - | - |
- Bit rate mode: constant
- Bit rate: 256 kbps
- Channel(s): 1 channel
- Sample rate: 16.0 kHz
- Bit depth: 16 bit
- 08/11/2022:Release v0.4.0
- 10/12/2021:Release v0.3.0
- 22/12/2020:Release v0.2.0
- 24/12/2019: Published dataset with 116 hours.
For all questions please feel free to contact ussupport@sova.ai
SOVA Dataset is licensed underCreative Commons BY 4.0 license by Virtual Assistant, LLC.