Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

License

NotificationsYou must be signed in to change notification settings

sovaai/sova-dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 

Repository files navigation

SOVA Dataset is free public STT/ASR dataset.

Key facts:

  • Russian, English and Chinese languages
  • ~ 32 328 hours
  • ~ 3,21 TB in.wav format

Dataset composition

NameLangHoursSizeSourceEquipmentAnnotationSpeech typeAugmentationQuality
EngAudiobooksOriginalDownloadEN7 130743 Gbaudiobookprofessionalforced alignmentreadingnone95%
EngAudiobooksNoisyDownloadEN3 873310 Gbaudiobookprofessionalforced alignmentreadingphone calls95%
RuAudiobooksDevicesDownloadRU29830,24 Gbaudiobookunprofessionalmanualreadingnone99%
RuDevicesDownloadRU10110,42 Gbaudio recordsunprofessionalmanuallive speechnone98%
RuYoutubeDownloadRU17 4511 873 Gbaudio recordsunprofessionalasrlive speechnone95%
ZhYoutubeDownloadCN3 475,1321 Gbaudio recordsunprofessionalasrlive speechnone97.83%
TOTAL--32 328,13 287,66 Gb
(3,21 TB)
------

Audio characteristics

  • Bit rate mode: constant
  • Bit rate: 256 kbps
  • Channel(s): 1 channel
  • Sample rate: 16.0 kHz
  • Bit depth: 16 bit

Updates

Contacts

For all questions please feel free to contact ussupport@sova.ai

License

SOVA Dataset is licensed underCreative Commons BY 4.0 license by Virtual Assistant, LLC.


[8]ページ先頭

©2009-2025 Movatter.jp