- Notifications
You must be signed in to change notification settings - Fork2
Otosaku/OtosakuKWS-iOS
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
OtosakuKWS is a lightweight, privacy-focused keyword spotting engine for iOS, designed to detect speech commands in real time — entirely on device.
It uses a CRNN CoreML model combined with log-Mel spectrograms for fast, accurate, and low-latency voice command recognition.
Watch the model running live on iPhone 13:
This project depends on theOtosakuFeatureExtractor-iOS Swift package, which extracts log-Mel spectrograms in real time using Accelerate.
It also includes a ready-to-use filterbank archive (filterbank.npy,hann_window.npy).
The CRNN model was trained on the keywords:“go”, “no”, “stop”, “yes”
Includes:
CRNNKeywordSpotter.mlmodelcclasses.txt
| Metric | Value |
|---|---|
| val_accuracy | 0.971313 |
| val_f1_go | 0.964216 |
| val_f1_no | 0.974067 |
| val_f1_other | 0.949783 |
| val_f1_stop | 0.983282 |
| val_f1_yes | 0.98564 |
| val_loss | 0.0846668 |
| val_precision_go | 0.977573 |
| val_precision_no | 0.966123 |
| val_precision_other | 0.949195 |
| val_precision_stop | 0.985112 |
| val_precision_yes | 0.979248 |
| val_recall_go | 0.95122 |
| val_recall_no | 0.982143 |
| val_recall_other | 0.950372 |
| val_recall_stop | 0.981459 |
| val_recall_yes | 0.992116 |
The model was trained on a balanced subset of [Google Speech Commands v2], using strong augmentations and class balancing.
letkws=tryOtosakuKWS( modelRootURL: modelURL, featureExtractorRootURL: featurizerURL, configuration:.init())kws.onKeywordDetected={ keyword, confidenceinprint("Detected:\(keyword) [\(confidence)]")}letaudioInput=AudioStreamer()// The `onBuffer` callback receives a chunk of audio sampled at 16kHz, mono (1 channel).// `AudioStreamer` here is a dummy real-time microphone streamer that simulates live input.audioInput.onBuffer={ bufferinTask{await kws.handleAudioBuffer(buffer)}}
If you need a custom KWS model for your use case — different keywords, languages, or domain-specific speech — feel free to reach out:
CoreML, keyword spotting, speech commands, offline voice recognition, privacy-first AI, log-Mel spectrogram, iOS speech processing, CRNN, on-device inference, streaming audio, Swift AI
About
Lightweight on-device keyword spotting engine for iOS using CoreML and real-time audio streaming.
Topics
Resources
Uh oh!
There was an error while loading.Please reload this page.
