rosielab/emojivoicePublic

NotificationsYou must be signed in to change notification settings
Fork2
Star4

HRI-2025 system for emoji based emotion TTS

License

MIT license

4 stars 2 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 182 Commits
.github		.github
Matcha-TTS		Matcha-TTS
case_studies		case_studies
hri-demo		hri-demo
notebooks		notebooks
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.project-root		.project-root
.pylintrc		.pylintrc
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
feel_me.py		feel_me.py
output.wav		output.wav
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py
synthesis.ipynb		synthesis.ipynb

Repository files navigation

EmojiVoice 🎉

An expressive pseudo Speech-to-Speech system 🗣️ for HRI experiments 🤖, a part ofDo You Feel Me?

Structure

The system is structured as follows:

ASR -> LLM -> TTS

ASR

Modified version of:

Whisper

LLM

Ollama and langchain chatbot implementation of:

Llama3

TTS

Fine tuned:

Matcha TTS

We currently have 3 available emoji checkpoints:

Paige - Female, intense emotions
Olivia - Female, subtle emotions
Zach - Male

Current checkpoints and data can be foundhere

Too see per model (WhisperLive and Matcha-TTS) information and make edits within the pipeline see internal READMEs inthe respective folders

Useage

Clone this repo

git clone git@github.com:rosielab/do_you_feel_me.git

Create conda environment or virtualenv and install the requirementsNote this repo has been tested with python 3.11.9

pip install requirements.txt

Speech-to-Speech system:

You will need to pull the llama 3 model

curl -fsSL https://ollama.com/install.sh | shollama run llama3

If not already running ollama, you may need to run this before run llama3

ollama serve

You will need espeak to run Matcha-tts

sudo apt-get install espeak-ng

Then run:

python feel_me.py

You can end the session by saying 'end session'

Customize

It is possible to customize the pipeline. You can perform the following modifications:

Modify the LLM prompt and emojis
Change to a different LLM available from Ollama
Change the Whisper model
Change the temperature of the TTS and LLM
Use a different Matcha-TTS checkpoint
Modify the speaking rate
Change the number of steps in the ODE solver for the TTS
Change the TTS vocoder

All of these changes can be found at the top of thefeel_me.py

Currently the system contains 11 emoji voices: 😎🤔😍🤣🙂😮🙄😅😭😡😁If you wish to change the personality of the chatbot or the emojis used by the chatbot edit thePROMPT parameter

If you wish to use a different voice or add new emojis you can quickly and easily fine tune Matcha-TTS to createyour own voice

Fine tune TTS

Matcha TTS can be fine tuned for your own emojis within as little as 2 minutes of data per emoji.The new checkpoint can be trained directly from the base Matcha-tts checkpoint (seeREADMEfor links) or from our provided checkpoints.

You can use our scriptrecord_audio.py to easily record your data andget_duration.ipynb to check the duration of all of your recordings.If fine tuning from a checkpoint the sampling rate for the audio files must be 22050.

To record audio create a<emoji_name>.txt where each line is a script to read, then set the emoji and emoji name (file name), with theEMOJI_MAPPING parameter inrecord_audio.py

When fine tuning you will be overwriting the current voices, in general, we have produced better quality voices whenselecting a voice to overwrite that is more similar to the target voice, e.g. same accent and gender. To easily hear all the voicesalong with their speaker numbers use thishugging face space.

Follow the information inREADME for fine tuning on thevctk checkpoint where each speaker number is an emoji number. You may see our dataand transcription set up inemojis-hri-clean.ziphere as an example.

Hints: for fine tuning

First create your own experiment and data configs following theexamples mapping to your trascriptionfile location. The two primary configs to create (and check out the paths to the data) are one indata andone inexperiments. The paths here should point to where your train and validation files are stored,and your train and validation files should point to your audio file locations. You can test that all these files are pointing the right way before training when you run:matcha-data-stats -i ljspeech.yamlas per the matcha repo training steps.

Then follow the orginal Matcha-TTS instructions

To train from a checkpoint run:

python matcha/train.py experiment=<YOUR EXPERIMENT> ckpt_path=<PATH TO CHECKPOINT>

You can train off of the matcha base release checkpoints or the emojivoice checkpoints.

To run multi-speaker synthesis:

matcha-tts --text"<INPUT TEXT>" --checkpoint_path<PATH TO CHECKPOINT> --spk<SPEAKER NUMBER> --vocoder hifigan_univ_v1 --speaking_rate<SPEECH RATE>

If you are having issues, sometimes cuda will make the error messages convoluted, run training incpu(set accelerator to cpu and remove devices)mode to get more clear error outputs.

About

HRI-2025 system for emoji based emotion TTS

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

EmojiVoice 🎉

Structure

ASR

LLM

TTS

Useage

Customize

Fine tune TTS

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Contributors8

Uh oh!

Languages

Movatterモバイル変換

License

rosielab/emojivoice

Folders and files

Latest commit

History

Repository files navigation

EmojiVoice 🎉

Structure

ASR

LLM

TTS

Useage

Customize

Fine tune TTS

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Contributors8

Uh oh!

Languages

Packages