Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

HRI-2025 system for emoji based emotion TTS

License

NotificationsYou must be signed in to change notification settings

rosielab/emojivoice

Repository files navigation

An expressive pseudo Speech-to-Speech system 🗣️ for HRI experiments 🤖, a part ofDo You Feel Me?

Structure

The system is structured as follows:

ASR -> LLM -> TTS

ASR

Modified version of:

LLM

Ollama and langchain chatbot implementation of:

TTS

Fine tuned:

We currently have 3 available emoji checkpoints:

  • Paige - Female, intense emotions
  • Olivia - Female, subtle emotions
  • Zach - Male

Current checkpoints and data can be foundhere

Too see per model (WhisperLive and Matcha-TTS) information and make edits within the pipeline see internal READMEs inthe respective folders

Useage

Clone this repo

git clone git@github.com:rosielab/do_you_feel_me.git

Create conda environment or virtualenv and install the requirementsNote this repo has been tested with python 3.11.9

pip install requirements.txt

Speech-to-Speech system:

You will need to pull the llama 3 model

curl -fsSL https://ollama.com/install.sh | shollama run llama3

If not already running ollama, you may need to run this before run llama3

ollama serve

You will need espeak to run Matcha-tts

sudo apt-get install espeak-ng

Then run:

python feel_me.py

You can end the session by saying 'end session'

Customize

It is possible to customize the pipeline. You can perform the following modifications:

  • Modify the LLM prompt and emojis
  • Change to a different LLM available from Ollama
  • Change the Whisper model
  • Change the temperature of the TTS and LLM
  • Use a different Matcha-TTS checkpoint
  • Modify the speaking rate
  • Change the number of steps in the ODE solver for the TTS
  • Change the TTS vocoder

All of these changes can be found at the top of thefeel_me.py

Currently the system contains 11 emoji voices: 😎🤔😍🤣🙂😮🙄😅😭😡😁If you wish to change the personality of the chatbot or the emojis used by the chatbot edit thePROMPT parameter

If you wish to use a different voice or add new emojis you can quickly and easily fine tune Matcha-TTS to createyour own voice

Fine tune TTS

Matcha TTS can be fine tuned for your own emojis within as little as 2 minutes of data per emoji.The new checkpoint can be trained directly from the base Matcha-tts checkpoint (seeREADMEfor links) or from our provided checkpoints.

You can use our scriptrecord_audio.py to easily record your data andget_duration.ipynb to check the duration of all of your recordings.If fine tuning from a checkpoint the sampling rate for the audio files must be 22050.

To record audio create a<emoji_name>.txt where each line is a script to read, then set the emoji and emoji name (file name), with theEMOJI_MAPPING parameter inrecord_audio.py

When fine tuning you will be overwriting the current voices, in general, we have produced better quality voices whenselecting a voice to overwrite that is more similar to the target voice, e.g. same accent and gender. To easily hear all the voicesalong with their speaker numbers use thishugging face space.

Follow the information inREADME for fine tuning on thevctk checkpoint where each speaker number is an emoji number. You may see our dataand transcription set up inemojis-hri-clean.ziphere as an example.

Hints: for fine tuning

First create your own experiment and data configs following theexamples mapping to your trascriptionfile location. The two primary configs to create (and check out the paths to the data) are one indata andone inexperiments. The paths here should point to where your train and validation files are stored,and your train and validation files should point to your audio file locations. You can test that all these files are pointing the right way before training when you run:matcha-data-stats -i ljspeech.yamlas per the matcha repo training steps.

Then follow the orginal Matcha-TTS instructions

To train from a checkpoint run:

python matcha/train.py experiment=<YOUR EXPERIMENT> ckpt_path=<PATH TO CHECKPOINT>

You can train off of the matcha base release checkpoints or the emojivoice checkpoints.

To run multi-speaker synthesis:

matcha-tts --text"<INPUT TEXT>" --checkpoint_path<PATH TO CHECKPOINT> --spk<SPEAKER NUMBER> --vocoder hifigan_univ_v1 --speaking_rate<SPEECH RATE>

If you are having issues, sometimes cuda will make the error messages convoluted, run training incpu(set accelerator to cpu and remove devices)mode to get more clear error outputs.

About

HRI-2025 system for emoji based emotion TTS

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors8


[8]ページ先頭

©2009-2025 Movatter.jp