Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Voice Activity Detection in R using the "webrtc" toolkit

License

MPL-2.0, Unknown licenses found

Licenses found

MPL-2.0
LICENSE
Unknown
LICENSE.note
NotificationsYou must be signed in to change notification settings

bnosac/audio.vadwebrtc

Repository files navigation

This repository contains an R package which is an Rcpp wrapper around thewebrtc Voice Activity Detection module.

example-vad.mp4

The package was created with as main goal to remove non-speech audio segments before doing an automatic transcription usingaudio.whisper to avoid transcription hallucinations. It contains

  • functions to detect the location of voice in audio using a Gaussian Mixture Model implemented inwebrtc
  • functions to extract audio where there is voice / silence in a new audio file
  • functionality to rewrite the timepoints of transcribed sentences where specific sections with non-audio are removed to make sure the timepoints of the transcriptions without silences align with the original audio signal

Installation

  • The package is currently not on CRAN
  • For thedevelopment version of this package:remotes::install_github("bnosac/audio.vadwebrtc")

Look to the documentation of the functions:help(package = "audio.vadwebrtc")

Example

Get a audio file in 16 bit with mono PCM samples (pcm_s16le codec) with a sampling rate of either 8Khz, 16KHz or 32Khz

library(audio.vadwebrtc)file <- system.file(package = "audio.vadwebrtc", "extdata", "test_wav.wav")vad  <- VAD(file, mode = "normal")vadVoice Activity Detection   - file: D:/Jan/R/win-library/4.1/audio.vadwebrtc/extdata/test_wav.wav   - sample rate: 16000   - VAD type: webrtc-gmm, VAD mode: normal, VAD by milliseconds: 10, VAD frame_length: 160    - Percent of audio containing a voiced signal: 90.2%     - Seconds voiced: 6.3     - Seconds unvoiced: 0.7vad$vad_segments vad_segment start  end has_voice           1  0.00 0.08     FALSE           2  0.09 3.30      TRUE           3  3.31 3.71     FALSE           4  3.72 6.78      TRUE           5  6.79 6.99     FALSE

Example of a simple plot of these audio and voice segments

library(av)x <- read_audio_bin(file)plot(seq_along(x) / 16000, x, type = "l", xlab = "Seconds", ylab = "Signal")abline(v = vad$vad_segments$start, col = "red", lwd = 2)abline(v = vad$vad_segments$end, col = "blue", lwd = 2)

Or show it interactively alongside R package wavesurfer:wavesurfer

library(wavesurfer)library(shiny)file <- system.file(package = "audio.vadwebrtc", "extdata", "test_wav.wav")vad  <- VAD(file, mode = "lowbitrate")anno <- data.frame(audio_id = vad$file,                    region_id = vad$vad_segments$vad_segment,                    start = vad$vad_segments$start,                    end = vad$vad_segments$end,                    label = ifelse(vad$vad_segments$has_voice, "Voiced", "Silent"))anno <- subset(anno, label %in% "Silent")  wavs_folder <- system.file(package = "audio.vadwebrtc", "extdata")shiny::addResourcePath("wav", wavs_folder)ui <- fluidPage(  wavesurferOutput("my_ws", height = "128px"),  tags$p("Press spacebar to toggle play/pause."),)server <- function(input, output, session) {  output$my_ws <- renderWavesurfer({    wavesurfer(audio = paste0("wav/", "test_wav.wav"), annotations = anno) %>%      ws_set_wave_color('#5511aa') %>%      ws_cursor()  })}shinyApp(ui = ui, server = server)

Support in text mining

Need support in text mining?Contact BNOSAC:http://www.bnosac.be

About

Voice Activity Detection in R using the "webrtc" toolkit

Resources

License

MPL-2.0, Unknown licenses found

Licenses found

MPL-2.0
LICENSE
Unknown
LICENSE.note

Stars

Watchers

Forks

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp