Shahriar-0/Wake-Word-DetectionPublic

NotificationsYou must be signed in to change notification settings
Fork0
Star0

using a simple convolution neural network to classify voices based on the existence of wake word

License

MPL-2.0 license

0 stars 0 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
images		images
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Repository files navigation

Wake-Word-Detection

Wake-Word-Detection

Introduction

in this notebook we are about to design a voice classifier, which will classify voice based on the fact that it can detect a specific word (or combination of words) in it or not.

Data Collection

for data, we gathered 100 voices for each of classes: wake word and not wake word. you can use the functions in the notebook to gather your own voices, or you can use other data available on the internet, but be sure to do the preprocessing in the notebook, so they would all have the same length and format.

Data Preprocessing

there are many steps for preprocessing voice data, in our experiment we will use the following steps:

Feature extraction: we extracted the MFCC features from the voice.
normalization: we normalized the features by subtracting the mean and dividing by the standard deviation.
Data augmentation: we augmented the data by adding noise, shifting, stretching, and pitch.

a sample extracted MFCC features from the voice:

wake word sample
not wake word sample

Model

we used aConvolutional Neural Network (CNN) model for our experiment. we used aConvolutional 1D,Batch Normalization,MaxPooling 1D to extract features and then used a simple fully connected neural network to classify the voice.

the summary of model architecture is shown below:

Model: "sequential_1"_________________________________________________________________ Layer (type)                Output Shape              Param #   ================================================================= conv1d_2 (Conv1D)           (None, 36, 32)            192                                                                         batch_normalization_2 (Batc  (None, 36, 32)           128        hNormalization)                                                                                                                   max_pooling1d_2 (MaxPooling  (None, 18, 32)           0          1D)                                                                                                                               dropout_3 (Dropout)         (None, 18, 32)            0                                                                           conv1d_3 (Conv1D)           (None, 14, 64)            10304                                                                       batch_normalization_3 (Batc  (None, 14, 64)           256        hNormalization)                                                                                                                   max_pooling1d_3 (MaxPooling  (None, 7, 64)            0          1D)                                                                                                                               dropout_4 (Dropout)         (None, 7, 64)             0                                                                           flatten_1 (Flatten)         (None, 448)               0                                                                           dense_2 (Dense)             (None, 128)               57472                                                                       dropout_5 (Dropout)         (None, 128)               0                                                                           dense_3 (Dense)             (None, 2)                 258                                                                        =================================================================Total params: 68,610Trainable params: 68,418Non-trainable params: 192_________________________________________________________________

Results

the results of our experiment are shown below:

Test Loss: 0.03955698758363724
Test Accuracy: 0.9739999771118164

as can be seen from the confusion matrix, the model is very accurate. this type of models can be used in voice-assistant applications especially those who are word-triggered, things likeBixby orAlexa.

Contributors

About

using a simple convolution neural network to classify voices based on the existence of wake word

Languages

Jupyter Notebook100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Wake-Word-Detection

Introduction

Data Collection

Data Preprocessing

Model

Results

Contributors

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Contributors2

Uh oh!

Languages

Movatterモバイル変換

License

Shahriar-0/Wake-Word-Detection

Folders and files

Latest commit

History

Repository files navigation

Wake-Word-Detection

Introduction

Data Collection

Data Preprocessing

Model

Results

Contributors

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Contributors2

Uh oh!

Languages

Packages