- Notifications
You must be signed in to change notification settings - Fork4
A Whisper + ChatGPT MagicMirror Module.
License
Nikro/MMM-WhisperGPT
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This is a module for theMagicMirror².
How it works 👉https://nikro.me/articles/professional/crafting-our-ai-assistant/
Goal of the module is to create a custom interactive widget that uses Open AI tools:
- Whisper - self-hosted model for voice-to-text transcription.
- LangChain - intended to be used with ChatGPT API, to process the requests.
- Picovoice -> Porcupine - is used for offline (self-hosted) word trigger (accent on the privacy).
- also... mimic3 :)
Idea is the following:
- Wake word (Porcupine).
- ...record query (show a sexy animation, will be done later)
- ...pass to self-hosted Whisper
- ...transcribe voice-to-text
- Show the question as transcribed rendered-text (in the module render)
- ...pass through LangChain to ChatGPT
- ...pass the textual reply back to the module and render on-screen
- ...use TTS (mimic3) - self-hosted on the network, to throw back a wav file to play.
To use this module, add the following configuration block to the modules array in theconfig/config.js
file:
varconfig={modules:[{module:'MMM-WhisperGPT',config:{// See below for configurable optionspicovoiceKey:'xxx',picovoiceWord:'JARVIS',picovoiceSilenceTime:3,picovoiceSilenceThreshold:600,audioDeviceIndex:3,openAiKey:'xxx',openAiSystemMsg:'xxx',whisperUrl:'192.168.1.5:9000/asr',whisperMethod:'openai-whisper',mimic3Url:'192.168.1.6:59125'}}]}
Option | Required? | Description |
---|---|---|
picovoiceKey | Required | Picovoice access key - you have to register to obtain it - this is used for trigger word. |
picovoiceWord | Optional | Picovoice trigger word, i.e. BUMBLEBEE, JARVIS, etc. Defaults to JARVIS. |
picovoiceSilenceTime | Optional | Silence period - defaults to 3 (3 seconds). |
picovoiceSilenceThreshold | Optional | This is usually background noise * THIS NUMBER. Default value is 1.1 (aka 10%). |
audioDeviceIndex | Optional | Audio device - i.e. 3 - those will be printed out when you're using debug mode. Defaults to 0. |
whisperUrl | Required | URL (or IP?) to self-hosted instance of the Whisper. |
whisperMethod | Optional | Whisper method: openai-whisper or faster-whisper. Defaults to: faster-whisper. |
whisperLanguage | Optional | Defaults to: en. |
openAiKey | Required | API Key of OpenAI. |
openAiSystemMsg | Optional | System msg - how the AI should behave. |
mimic3Url | Required | Mimic3 URL (server), with protocol, port, without /api/tts |
mimic3Voice | Optional | Mimic3 Voice - default: en_US/cmu-arctic_low%23gka |
debug | Optional | If you want to debug, default is: false. |
Picovoice /Porcupine is used for the "Trigger" word. It's a self-hosted small AI / Neural Network (NN). Picovoice offers a range of services, including a license for this offline AI. It only sends usage statistics, not the actual audio conversations.
Whisper is an open-source product from OpenAI. It's a Large Language Model (LLM) AI that handles speech-to-text (transcription). In my personal case, I have it self-hosted on my local network.
I used this:https://github.com/ahmetoner/whisper-asr-webservice
ChatGPT is another product from OpenAI. It's a Large Language Model (LLM) AI. You will need to register and get an API Key to use it.
LangChain is a library built around LLMs that allows for extra functionality, such as long-term memory.
Mycroft's Mimic3 is a Text-to-Speech (TTS) system based on a Large Language Model (LLM). It offers realistic TTS that can run on somewhat resource-restricted systems. I initially tried to set it up on my OrangePi, but instead, I installed it on the same machine with Whisper and use it via the network.
I used this docker-compose.yml 😉
version:'3.7'services:mimic3:image:mycroftai/mimic3ports: -59125:59125volumes: -.:/home/mimic3/.local/share/mycroft/mimic3stdin_open:truetty:true
- If your audio doesn't work - check if you're usingalsa orpulseaudio. You might need to install
mpg123
. You can install it using the commandsudo apt-get install mpg123
. - You might also need to install
lame
for audio encoding. You can install it using the commandsudo apt-get install lame
.
About
A Whisper + ChatGPT MagicMirror Module.
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Uh oh!
There was an error while loading.Please reload this page.