Vladimir Iashin v-iashin

👨‍💻

Postdoc in VGG at University of Oxford. Researcher of multi-modal machine learning

video_featuresvideo_featuresPublic
Extract video features from raw videos using multiple GPUs. We support RAFT flow frames as well as S3D, I3D, R(2+1)D, VGGish, CLIP, and TIMM models.
Python 624 102
SpecVQGANSpecVQGANPublic
Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)
Jupyter Notebook 367 39
BMTBMTPublic
Source code for "Bi-modal Transformer for Dense Video Captioning" (BMVC 2020)
Jupyter Notebook 228 56
MDVCMDVCPublic
PyTorch implementation of Multi-modal Dense Video Captioning (CVPR 2020 Workshops)
Python 143 20
SynchformerSynchformerPublic
Source code for "Synchformer: Efficient Synchronization from Sparse Cues" (ICASSP 2024)
Python 94 9
SparseSyncSparseSyncPublic
Source code for "Sparse in Space and Time: Audio-visual Synchronisation with Trainable Selectors." (Spotlight at the BMVC 2022)
Python 53 10

Movatterモバイル変換