Movatterモバイル変換

TaoRuijie/TalkNet-ASDPublic

NotificationsYou must be signed in to change notification settings
Fork82
Star398

ACM MM 2021: 'Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection'

License

MIT license

398 stars 82 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
TalkSet		TalkSet
demo		demo
model		model
utils		utils
.gitignore		.gitignore
FAQ.md		FAQ.md
LICENSE.md		LICENSE.md
README.md		README.md
awesomeASD.md		awesomeASD.md
dataLoader.py		dataLoader.py
demoTalkNet.py		demoTalkNet.py
loss.py		loss.py
requirement.txt		requirement.txt
talkNet.py		talkNet.py
trainTalkNet.py		trainTalkNet.py

Repository files navigation

Is someone talking? TalkNet: Audio-visual active speaker detection Model

This repository contains the code for our ACM MM 2021 paper (oral), TalkNet, an active speaker detection model to detect 'whether the face in the screen is speaking or not?'. [Paper] [Video_English] [Video_Chinese].

Updates:

A newdemo page. Thanks the contribution frommvoodarla !

Awesome ASD: Papers about active speaker detection in last years.
TalkNet in AVA-Activespeaker dataset: The code to preprocess the AVA-ActiveSpeaker dataset, train TalkNet in AVA train set and evaluate it in AVA val/test set.
TalkNet in TalkSet and Columbia ASD dataset: The code to generate TalkSet, an ASD dataset in the wild, based on VoxCeleb2 and LRS3, train TalkNet in TalkSet and evaluate it in Columnbia ASD dataset.
An ASD Demo with pretrained TalkNet model: An end-to-end script to detect and mark the speaking face by the pretrained TalkNet model.

Dependencies

Start from building the environment

conda create -n TalkNet python=3.7.9 anacondaconda activate TalkNetpip install -r requirement.txt

Start from the existing environment

pip install -r requirement.txt

TalkNet in AVA-Activespeaker dataset

Data preparation

The following script can be used to download and prepare the AVA dataset for training.

python trainTalkNet.py --dataPathAVA AVADataPath --download

AVADataPath is the folder you want to save the AVA dataset and its preprocessing outputs, the details can be found inhere . Please read them carefully.

Training

Then you can train TalkNet in AVA end-to-end by using:

python trainTalkNet.py --dataPathAVA AVADataPath

exps/exps1/score.txt: output score file,exps/exp1/model/model_00xx.model: trained model,exps/exps1/val_res.csv: prediction for val set.

Pretrained model

Our pretrained model performsmAP: 92.3 in validation set, you can check it by using:

python trainTalkNet.py --dataPathAVA AVADataPath --evaluation

The pretrained model will automaticly be downloaded intoTalkNet_ASD/pretrain_AVA.model. It performsmAP: 90.8 in the testing set.

TalkNet in TalkSet and Columbia ASD dataset

Data preparation

We find that it is challenge to apply the model we trained in AVA for the videos not in AVA (Reason ishere, Q3.1). So we build TalkSet, an active speaker detection dataset in the wild, based onVoxCeleb2 andLRS3.

We do not plan to upload this dataset since we just modify it, instead of building it. InTalkSet folder we provide these.txt files to describe which files we used to generate the TalkSet and their ASD labels. You can generate thisTalkSet if you are interested to train an ASD model in the wild.

Also, we have provided our pretrained TalkNet model in TalkSet. You can evaluate it in Columbia ASD dataset or other raw videos in the wild.

Usage

A pretrain model in TalkSet will be download intoTalkNet_ASD/pretrain_TalkSet.model when using the following script:

python demoTalkNet.py --evalCol --colSavePath colDataPath

Also, Columnbia ASD dataset and the labels will be downloaded intocolDataPath. Finally you can get the following F1 result.

Name	Bell	Boll	Lieb	Long	Sick	Avg.
F1	98.1	88.8	98.7	98.0	97.7	96.3

(This result is different from that in our paper because we train the model again, while the avg. F1 is very similar)

An ASD Demo with pretrained TalkNet model

Data preparation

We build an end-to-end script to detect and extract the active speaker from the raw video by our pretrain model in TalkSet.

You can put the raw video (.mp4 and.avi are both fine) into thedemo folder, such as001.mp4.

Usage

python demoTalkNet.py --videoName 001

A pretrain model in TalkSet will be downloaded intoTalkNet_ASD/pretrain_TalkSet.model. The structure of the output reults can be found inhere.

You can get the output videodemo/001/pyavi/video_out.avi, which has marked the active speaker by green box and non-active speaker by red box.

If you want to evaluate by using cpu only, you can modifydemoTalkNet.py andtalkNet.py file: modify allcuda intocpu. Then replace line 83 in talkNet.py intoloadedState = torch.load(path,map_location=torch.device('cpu'))

Citation

Please cite the following if our paper or code is helpful to your research.

@inproceedings{tao2021someone,  title={Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection},  author={Tao, Ruijie and Pan, Zexu and Das, Rohan Kumar and Qian, Xinyuan and Shou, Mike Zheng and Li, Haizhou},  booktitle = {Proceedings of the 29th ACM International Conference on Multimedia},  pages = {3927–3935},  year={2021}}

I have summaried some potentialFAQs. You can also check theissues in Github for other questions that I have answered.

This is my first open-source work, please let me know if I can future improve in this repositories or there is anything wrong in our work. Thanks for your support!

Acknowledge

We study many useful projects in our codeing process, which includes:

The structure of the project layout and the audio encoder is learnt from thisrepository.

Demo for visulization is modified from thisrepository.

AVA data download code is learnt from thisrepository.

The model for the visual frontend is learnt from thisrepository.

Thanks for these authors to open source their code!

Cooperation

If you are interested to work on this topic and have some ideas to implement, I am glad to collaborate and contribute with my experiences & knowlegde in this topic. Please contact me withruijie.tao@u.nus.edu.

About

ACM MM 2021: 'Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection'

Releases

No releases published

Packages

No packages published

Languages

Python100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Is someone talking? TalkNet: Audio-visual active speaker detection Model

Updates:

Dependencies

TalkNet in AVA-Activespeaker dataset

Data preparation

Training

Pretrained model

TalkNet in TalkSet and Columbia ASD dataset

Data preparation

Usage

An ASD Demo with pretrained TalkNet model

Data preparation

Usage

Citation

Acknowledge

Cooperation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Languages

Movatterモバイル変換

License

TaoRuijie/TalkNet-ASD

Folders and files

Latest commit

History

Repository files navigation

Is someone talking? TalkNet: Audio-visual active speaker detection Model

Updates:

Dependencies

TalkNet in AVA-Activespeaker dataset

Data preparation

Training

Pretrained model

TalkNet in TalkSet and Columbia ASD dataset

Data preparation

Usage

An ASD Demo with pretrained TalkNet model

Data preparation

Usage

Citation

Acknowledge

Cooperation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Languages

Packages