- Notifications
You must be signed in to change notification settings - Fork1
A Whisper + ChatGPT MagicMirror Module.
License
Nikro/MMM-WhisperGPT
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This is a module for theMagicMirror².
How it works 👉https://nikro.me/articles/professional/crafting-our-ai-assistant/
Goal of the module is to create a custom interactive widget that uses Open AI tools:
- Whisper - self-hosted model for voice-to-text transcription.
- LangChain - intended to be used with ChatGPT API, to process the requests.
- Picovoice -> Porcupine - is used for offline (self-hosted) word trigger (accent on the privacy).
- also... mimic3 :)
Idea is the following:
- Wake word (Porcupine).
- ...record query (show a sexy animation, will be done later)
- ...pass to self-hosted Whisper
- ...transcribe voice-to-text
- Show the question as transcribed rendered-text (in the module render)
- ...pass through LangChain to ChatGPT
- ...pass the textual reply back to the module and render on-screen
- ...use TTS (mimic3) - self-hosted on the network, to throw back a wav file to play.
To use this module, add the following configuration block to the modules array in theconfig/config.js
file:
varconfig={modules:[{module:'MMM-WhisperGPT',config:{// See below for configurable optionspicovoiceKey:'xxx',picovoiceWord:'JARVIS',picovoiceSilenceTime:3,picovoiceSilenceThreshold:600,audioDeviceIndex:3,openAiKey:'xxx',openAiSystemMsg:'xxx',whisperUrl:'192.168.1.5:9000/asr',whisperMethod:'openai-whisper',mimic3Url:'192.168.1.6:59125'}}]}
Option | Required? | Description |
---|---|---|
picovoiceKey | Required | Picovoice access key - you have to register to obtain it - this is used for trigger word. |
picovoiceWord | Optional | Picovoice trigger word, i.e. BUMBLEBEE, JARVIS, etc. Defaults to JARVIS. |
picovoiceSilenceTime | Optional | Silence period - defaults to 3 (3 seconds). |
picovoiceSilenceThreshold | Optional | This is usually background noise * THIS NUMBER. Default value is 1.1 (aka 10%). |
audioDeviceIndex | Optional | Audio device - i.e. 3 - those will be printed out when you're using debug mode. Defaults to 0. |
whisperUrl | Required | URL (or IP?) to self-hosted instance of the Whisper. |
whisperMethod | Optional | Whisper method: openai-whisper or faster-whisper. Defaults to: faster-whisper. |
whisperLanguage | Optional | Defaults to: en. |
openAiKey | Required | API Key of OpenAI. |
openAiSystemMsg | Optional | System msg - how the AI should behave. |
mimic3Url | Required | Mimic3 URL (server), with protocol, port, without /api/tts |
mimic3Voice | Optional | Mimic3 Voice - default: en_US/cmu-arctic_low%23gka |
debug | Optional | If you want to debug, default is: false. |
Picovoice /Porcupine is used for the "Trigger" word. It's a self-hosted small AI / Neural Network (NN). Picovoice offers a range of services, including a license for this offline AI. It only sends usage statistics, not the actual audio conversations.
Whisper is an open-source product from OpenAI. It's a Large Language Model (LLM) AI that handles speech-to-text (transcription). In my personal case, I have it self-hosted on my local network.
I used this:https://github.com/ahmetoner/whisper-asr-webservice
ChatGPT is another product from OpenAI. It's a Large Language Model (LLM) AI. You will need to register and get an API Key to use it.
LangChain is a library built around LLMs that allows for extra functionality, such as long-term memory.
Mycroft's Mimic3 is a Text-to-Speech (TTS) system based on a Large Language Model (LLM). It offers realistic TTS that can run on somewhat resource-restricted systems. I initially tried to set it up on my OrangePi, but instead, I installed it on the same machine with Whisper and use it via the network.
I used this docker-compose.yml 😉
version:'3.7'services:mimic3:image:mycroftai/mimic3ports: -59125:59125volumes: -.:/home/mimic3/.local/share/mycroft/mimic3stdin_open:truetty:true
- If your audio doesn't work - check if you're usingalsa orpulseaudio. You might need to install
mpg123
. You can install it using the commandsudo apt-get install mpg123
. - You might also need to install
lame
for audio encoding. You can install it using the commandsudo apt-get install lame
.