- Notifications
You must be signed in to change notification settings - Fork0
HRI-2025 system for emoji based emotion TTS
License
rosielab/emojivoice
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
An expressive pseudo Speech-to-Speech system 🗣️ for HRI experiments 🤖, a part ofDo You Feel Me?
The system is structured as follows:
ASR -> LLM -> TTS
Modified version of:
Ollama and langchain chatbot implementation of:
Fine tuned:
We currently have 3 available emoji checkpoints:
- Paige - Female, intense emotions
- Olivia - Female, subtle emotions
- Zach - Male
Current checkpoints and data can be foundhere
Too see per model (WhisperLive and Matcha-TTS) information and make edits within the pipeline see internal READMEs inthe respective folders
Clone this repo
git clone git@github.com:rosielab/do_you_feel_me.git
Create conda environment or virtualenv and install the requirementsNote this repo has been tested with python 3.11.9
pip install requirements.txt
Speech-to-Speech system:
You will need to pull the llama 3 model
curl -fsSL https://ollama.com/install.sh | shollama run llama3
If not already running ollama, you may need to run this before run llama3
ollama serve
You will need espeak to run Matcha-tts
sudo apt-get install espeak-ng
Then run:
python feel_me.py
You can end the session by saying 'end session'
It is possible to customize the pipeline. You can perform the following modifications:
- Modify the LLM prompt and emojis
- Change to a different LLM available from Ollama
- Change the Whisper model
- Change the temperature of the TTS and LLM
- Use a different Matcha-TTS checkpoint
- Modify the speaking rate
- Change the number of steps in the ODE solver for the TTS
- Change the TTS vocoder
All of these changes can be found at the top of thefeel_me.py
Currently the system contains 11 emoji voices: 😎🤔😍🤣🙂😮🙄😅🥲😭😡😁If you wish to change the personality of the chatbot or the emojis used by the chatbot edit thePROMPT
parameter
If you wish to use a different voice or add new emojis you can quickly and easily fine tune Matcha-TTS to createyour own voice
Matcha TTS can be fine tuned for your own emojis within as little as 2 minutes of data per emoji.The new checkpoint can be trained directly from the base Matcha-tts checkpoint (seeREADMEfor links) or from our provided checkpoints.
You can use our scriptrecord_audio.py to easily record your data andget_duration.ipynb to check the duration of all of your recordings.
To record audio create a<emoji_name>.txt
where each line is a script to read, then set the emoji and emoji name (file name), with theEMOJI_MAPPING
parameter inrecord_audio.py
When fine tuning you will be overwriting the current voices, in general, we have produced better quality voices whenselecting a voice to overwrite that is more similar to the target voice, e.g. same accent and gender. To easily hear all the voicesalong with their speaker numbers use thishugging face space.
Follow the information inREADME for fine tuning on thevctk checkpoint where each speaker number is an emoji number. You may see our dataand transcription set up inemojis-hri-clean.zip
here as an example.
Hints: for fine tuning
First create your own experiment and data configs following theexamples mapping to your trascriptionfile location.
Then follow the orginal Matcha-TTS instructions
To train from a checkpoint run:
python matcha/train.py experiment=<YOUR EXPERIMENT> ckpt_path=<PATH TO CHECKPOINT>
To run multi-speaker synthesis:
matcha-tts --text"<INPUT TEXT>" --checkpoint_path<PATH TO CHECKPOINT> --spk<SPEAKER NUMBER> --vocoder hifigan_univ_v1 --speaking_rate<SPEECH RATE>