Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Lightweight on-device keyword spotting engine for iOS using CoreML and real-time audio streaming.

NotificationsYou must be signed in to change notification settings

Otosaku/OtosakuKWS-iOS

Repository files navigation

OtosakuKWS is a lightweight, privacy-focused keyword spotting engine for iOS, designed to detect speech commands in real time — entirely on device.

It uses a CRNN CoreML model combined with log-Mel spectrograms for fast, accurate, and low-latency voice command recognition.


🎥 Demo

Watch the model running live on iPhone 13:

Demo running on iPhone


🚀 Getting Started

1. Install Feature Extractor

This project depends on theOtosakuFeatureExtractor-iOS Swift package, which extracts log-Mel spectrograms in real time using Accelerate.

It also includes a ready-to-use filterbank archive (filterbank.npy,hann_window.npy).


2. Download Pretrained Model

The CRNN model was trained on the keywords:“go”, “no”, “stop”, “yes”

⬇️ Download model archive

Includes:

  • CRNNKeywordSpotter.mlmodelc
  • classes.txt

🧪 Validation Metrics

MetricValue
val_accuracy0.971313
val_f1_go0.964216
val_f1_no0.974067
val_f1_other0.949783
val_f1_stop0.983282
val_f1_yes0.98564
val_loss0.0846668
val_precision_go0.977573
val_precision_no0.966123
val_precision_other0.949195
val_precision_stop0.985112
val_precision_yes0.979248
val_recall_go0.95122
val_recall_no0.982143
val_recall_other0.950372
val_recall_stop0.981459
val_recall_yes0.992116

The model was trained on a balanced subset of [Google Speech Commands v2], using strong augmentations and class balancing.


🧩 Integration Example

letkws=tryOtosakuKWS(    modelRootURL: modelURL,    featureExtractorRootURL: featurizerURL,    configuration:.init())kws.onKeywordDetected={ keyword, confidenceinprint("Detected:\(keyword) [\(confidence)]")}letaudioInput=AudioStreamer()// The `onBuffer` callback receives a chunk of audio sampled at 16kHz, mono (1 channel).// `AudioStreamer` here is a dummy real-time microphone streamer that simulates live input.audioInput.onBuffer={ bufferinTask{await kws.handleAudioBuffer(buffer)}}

📬 Need custom commands?

If you need a custom KWS model for your use case — different keywords, languages, or domain-specific speech — feel free to reach out:

📧otosaku.dsp@gmail.com


🗝️ Keywords

CoreML, keyword spotting, speech commands, offline voice recognition, privacy-first AI, log-Mel spectrogram, iOS speech processing, CRNN, on-device inference, streaming audio, Swift AI

About

Lightweight on-device keyword spotting engine for iOS using CoreML and real-time audio streaming.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages


[8]ページ先頭

©2009-2026 Movatter.jp