GestureGeneration/Speech_driven_gesture_generation_with_autoencoderPublic

NotificationsYou must be signed in to change notification settings
Fork26
Star110

This is the official implementation for IVA '19 paper "Analyzing Input and Output Representations for Speech-Driven Gesture Generation".

License

Apache-2.0 license

110 stars 26 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
data_processing		data_processing
evaluation		evaluation
example_scripts		example_scripts
helpers		helpers
motion_repr_learning		motion_repr_learning
visuals		visuals
LICENSE		LICENSE
README.md		README.md
hierarchy.txt		hierarchy.txt
predict.py		predict.py
requirements.txt		requirements.txt
train.py		train.py

Repository files navigation

Aud2Repr2Pose: Analyzing input and output representations for speech-driven gesture generation

Taras Kucherenko,Dai Hasegawa,Gustav Eje Henter, Naoshi Kaneko,Hedvig Kjellström

This repository contains Keras and Tensorflow based implementation of the speech-driven gesture generation by a neural network which was published at International Conference on Intelligent Virtual Agents (IVA'19) and the extention was published in International Journal of Human-Computer Interaction in 2021.

Theproject website contains all the information about this project, includingvideo explanation of the method and thepaper.

Demo on another dataset

This model has been applied to English dataset.

Thedemo video as well as thecode to run the pre-trained model are online.

Requirements

Python 3

Initial setup

install packages

# if you have GPUpip install tensorflow-gpu==1.15.2# if you don't have GPUpip install tensorflow==1.15.2pip install -r requirements.txt

install ffmpeg

# macosbrew install ffmpeg

# ubuntusudo add-apt-repository ppa:jonathonf/ffmpeg-4sudo apt-get updatesudo apt-get install ffmpeg

How to use this repository?

0. Notation

We write all the parameters which needs to be specified by a user in the capslock.

1. Download raw data

Clone this repository
Download a dataset fromhttps://www.dropbox.com/sh/j419kp4m8hkt9nd/AAC_pIcS1b_WFBqUp5ofBG1Ia?dl=0
Create a directory nameddataset and put two directoriesmotion/ andspeech/ underdataset/

2. Split dataset

Put the folder with the dataset in thedata_processing directory of this repo: next to the scriptprepare_data.py
Run the following command

python data_processing/prepare_data.py DATA_DIR# DATA_DIR = directory to save data such as 'data/'

Note: DATA_DIR is not a directory where the raw data is stored (the folder with data, "dataset" , has to be stored in the root folder of this repo). DATA_DIR is the directory where the postprocessed data should be saved. After this step you don't need to have "dataset" in the root folder any more.You should use the same DATA_DIR in all the following scripts.

After this command:

train/test/dev/ are created underDATA_DIR/
- ininputs/ inside each directory, audio(id).wav files are stored
- inlabels/ inside each directory, gesture(id).bvh files are stored
underDATA_DIR/, three csv filesgg-train.csvgg-test.csvgg-dev.csv are created and these files have paths to actual data

3. Convert the dataset into vectors

python data_processing/create_vector.py DATA_DIR N_CONTEXT# N_CONTEXT = number of context, in our experiments was set to '60'# (this means 30 steps backwards and forwards)

Note: if you change the N_CONTEXT value - you need to update it in thetrain.py script.

(You are likely to get a warning like this "WARNING:root:frame length (5513) is greater than FFT size (512), frame will be truncated. Increase NFFT to avoid." )

As a result of running this script

numpy binary filesX_train.npy,Y_train.npy (vectord dataset) are created underDATA_DIR
underDATA_DIR/test_inputs/ , test audios, such asX_test_audio1168.npy , are created
when N_CONTEXT = 60, the audio vector's shape is (num of timesteps, 61, 26)
gesture vector's shape is（num of timesteps, 384)- 384 = 64joints × (x,y,z positions + x,y,z velocities)

If you don't want to customize anything - you can skip reading about steps 4-7 and just use already prepared scripts at the folder`example_scripts`

4. Learn motion representation by AutoEncoder

Create a directory to save training checkpoints such aschkpt/ and use it as CHKPT_DIR parameter.

Learn dataset encoding

python motion_repr_learning/ae/learn_dataset_encoding.py DATA_DIR -chkpt_dir=CHKPT_DIR -layer1_width=DIM

The optimal dimensionality (DIM) in our experiment was 325

Encode dataset

Create DATA_DIR/DIM directory

python motion_repr_learning/ae/encode_dataset.py DATA_DIR -chkpt_dir=CHKPT_DIR -restore=True -pretrain=False -layer1_width=DIM

More information can be found in the foldermotion_repr_learning

5. Learn speech-driven gesture generation model

python train.py MODEL_NAME EPOCHS DATA_DIR N_INPUT ENCODE DIM# MODEL_NAME = hdf5 file name such as 'model_500ep_posvel_60.hdf5'# EPOCHS = how many epochs do we want to train the model (recommended - 100)# DATA_DIR = directory with the data (should be same as above)# N_INPUT = how many dimension does speech data have (default - 26)# ENCODE = weather we train on the encoded gestures (using proposed model) or on just on the gestures as their are (using baseline model)# DIM = how many dimension does encoding have (ignored if you don't encode)

6. Predict gesture

python predict.py MODEL_NAME INPUT_SPEECH_FILE OUTPUT_GESTURE_FILE

# Usage examplepython predict.py model.hdf5 data/test_inputs/X_test_audio1168.npy data/test_inputs/predict_1168_20fps.txt

# You need to decode the gesturespython motion_repr_learning/ae/decode.py DATA_DIR ENCODED_PREDICTION_FILE DECODED_GESTURE_FILE -restore=True -pretrain=False -layer1_width=DIM -chkpt_dir=CHKPT_DIR -batch_size=8

Note: This can be used in a for loop over all the test sequences. Examples are provided in theexample_scripts folder of this directory

# The network produces both coordinates and velocity# So we need to remove velocitiespython helpers/remove_velocity.py -g PATH_TO_GESTURES

7. Quantitative evaluation

Use scripts in theevaluation folder of this directory.

Examples are provided in theexample_scripts folder of this repository

8. Qualitative evaluation

Useanimation server

Citation

If you use this code in your research please cite the paper:

@article{kucherenko2021moving,  title={Moving fast and slow: Analysis of representations and post-processing in speech-driven automatic gesture generation},  author={Kucherenko, Taras and Hasegawa, Dai and Kaneko, Naoshi and Henter, Gustav Eje and Kjellstr{\"o}m, Hedvig},  journal={International Journal of Human–Computer Interaction},  doi={10.1080/10447318.2021.1883883},  year={2021}}

Contact

If you encounter any problems/bugs/issues please contact me on Github or by emailing me attarask@kth.se for any bug reports/questions/suggestions. I prefer questions and bug reports on Github as that provides visibility to others who might be encountering same issues or who have the same questions.

About

This is the official implementation for IVA '19 paper "Analyzing Input and Output Representations for Speech-Driven Gesture Generation".

svito-zar.github.io/audio2gestures/

Movatterモバイル変換

License

GestureGeneration/Speech_driven_gesture_generation_with_autoencoder

Folders and files

Latest commit

History

Repository files navigation

Aud2Repr2Pose: Analyzing input and output representations for speech-driven gesture generation

Demo on another dataset

Requirements

Initial setup

install packages

install ffmpeg

How to use this repository?

0. Notation

1. Download raw data

2. Split dataset

3. Convert the dataset into vectors

If you don't want to customize anything - you can skip reading about steps 4-7 and just use already prepared scripts at the folderexample_scripts

4. Learn motion representation by AutoEncoder

Learn dataset encoding

Encode dataset

5. Learn speech-driven gesture generation model

6. Predict gesture

7. Quantitative evaluation

8. Qualitative evaluation

Citation

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors3

Uh oh!

Languages

If you don't want to customize anything - you can skip reading about steps 4-7 and just use already prepared scripts at the folder`example_scripts`