Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

A Whisper + ChatGPT MagicMirror Module.

License

NotificationsYou must be signed in to change notification settings

Nikro/MMM-WhisperGPT

Repository files navigation

This is a module for theMagicMirror².

IMAGE ALT TEXT HERE

How it works 👉https://nikro.me/articles/professional/crafting-our-ai-assistant/

Goal of the module is to create a custom interactive widget that uses Open AI tools:

  • Whisper - self-hosted model for voice-to-text transcription.
  • LangChain - intended to be used with ChatGPT API, to process the requests.
  • Picovoice -> Porcupine - is used for offline (self-hosted) word trigger (accent on the privacy).
  • also... mimic3 :)

Idea is the following:

  1. Wake word (Porcupine).
  2. ...record query (show a sexy animation, will be done later)
  3. ...pass to self-hosted Whisper
  4. ...transcribe voice-to-text
  5. Show the question as transcribed rendered-text (in the module render)
  6. ...pass through LangChain to ChatGPT
  7. ...pass the textual reply back to the module and render on-screen
  8. ...use TTS (mimic3) - self-hosted on the network, to throw back a wav file to play.

Using the module

To use this module, add the following configuration block to the modules array in theconfig/config.js file:

varconfig={modules:[{module:'MMM-WhisperGPT',config:{// See below for configurable optionspicovoiceKey:'xxx',picovoiceWord:'JARVIS',picovoiceSilenceTime:3,picovoiceSilenceThreshold:600,audioDeviceIndex:3,openAiKey:'xxx',openAiSystemMsg:'xxx',whisperUrl:'192.168.1.5:9000/asr',whisperMethod:'openai-whisper',mimic3Url:'192.168.1.6:59125'}}]}

Configuration options

OptionRequired?Description
picovoiceKeyRequiredPicovoice access key - you have to register to obtain it - this is used for trigger word.
picovoiceWordOptionalPicovoice trigger word, i.e. BUMBLEBEE, JARVIS, etc. Defaults to JARVIS.
picovoiceSilenceTimeOptionalSilence period - defaults to 3 (3 seconds).
picovoiceSilenceThresholdOptionalThis is usually background noise * THIS NUMBER. Default value is 1.1 (aka 10%).
audioDeviceIndexOptionalAudio device - i.e. 3 - those will be printed out when you're using debug mode. Defaults to 0.
whisperUrlRequiredURL (or IP?) to self-hosted instance of the Whisper.
whisperMethodOptionalWhisper method: openai-whisper or faster-whisper. Defaults to: faster-whisper.
whisperLanguageOptionalDefaults to: en.
openAiKeyRequiredAPI Key of OpenAI.
openAiSystemMsgOptionalSystem msg - how the AI should behave.
mimic3UrlRequiredMimic3 URL (server), with protocol, port, without /api/tts
mimic3VoiceOptionalMimic3 Voice - default: en_US/cmu-arctic_low%23gka
debugOptionalIf you want to debug, default is: false.

What is Picovoice / Porcupine

Picovoice /Porcupine is used for the "Trigger" word. It's a self-hosted small AI / Neural Network (NN). Picovoice offers a range of services, including a license for this offline AI. It only sends usage statistics, not the actual audio conversations.

What is Whisper

Whisper is an open-source product from OpenAI. It's a Large Language Model (LLM) AI that handles speech-to-text (transcription). In my personal case, I have it self-hosted on my local network.

I used this:https://github.com/ahmetoner/whisper-asr-webservice

What is ChatGPT

ChatGPT is another product from OpenAI. It's a Large Language Model (LLM) AI. You will need to register and get an API Key to use it.

What is LangChain

LangChain is a library built around LLMs that allows for extra functionality, such as long-term memory.

What is Mimic3 (Mycroft)

Mycroft's Mimic3 is a Text-to-Speech (TTS) system based on a Large Language Model (LLM). It offers realistic TTS that can run on somewhat resource-restricted systems. I initially tried to set it up on my OrangePi, but instead, I installed it on the same machine with Whisper and use it via the network.

I used this docker-compose.yml 😉

version:'3.7'services:mimic3:image:mycroftai/mimic3ports:      -59125:59125volumes:      -.:/home/mimic3/.local/share/mycroft/mimic3stdin_open:truetty:true

Troubleshooting

  1. If your audio doesn't work - check if you're usingalsa orpulseaudio. You might need to installmpg123. You can install it using the commandsudo apt-get install mpg123.
  2. You might also need to installlame for audio encoding. You can install it using the commandsudo apt-get install lame.

[8]ページ先頭

©2009-2025 Movatter.jp