This repository was archived by the owner on Aug 6, 2025. It is now read-only.

facebookresearch/AudioMAEPublic archive

NotificationsYou must be signed in to change notification settings
Fork51
Star624

This repo hosts the code and models of "Masked Autoencoders that Listen".

License

View license

624 stars 51 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
assets		assets
dataset/audioset		dataset/audioset
demo		demo
timm_patch		timm_patch
util		util
CODE_OF_CONDUCT		CODE_OF_CONDUCT
CONTRIBUTING		CONTRIBUTING
LICENSE		LICENSE
README.md		README.md
dataset.py		dataset.py
engine_finetune.py		engine_finetune.py
engine_finetune_as.py		engine_finetune_as.py
engine_pretrain.py		engine_pretrain.py
ft_as.sh		ft_as.sh
inf.sh		inf.sh
inf_dist.sh		inf_dist.sh
mae_env.yml		mae_env.yml
main_finetune_as.py		main_finetune_as.py
main_finetune_esc.py		main_finetune_esc.py
main_pretrain.py		main_pretrain.py
models_mae.py		models_mae.py
models_vit.py		models_vit.py
submit_ft.sh		submit_ft.sh
submit_ft_mask_bal.sh		submit_ft_mask_bal.sh
submit_pretrain_audioset2M.sh		submit_pretrain_audioset2M.sh
submitit_finetune.py		submitit_finetune.py
submitit_linprobe.py		submitit_linprobe.py
submitit_pretrain.py		submitit_pretrain.py
timm_patch.sh		timm_patch.sh

Repository files navigation

Audio-MAE

This repo hosts the code and models of "Masked Autoencoders that Listen" [NeurIPS 2022bib].

Demo Examples

Music,Speech,Event Sound

1. Installation

This repo follows theMAE repo, Installation and preparation follow that repo.
Copy files and patch the timm package by ``bash timm_patch.sh'' (Please change the path to your own timm package path). We use timm==0.3.2, for which afix is needed to work with PyTorch 1.8.1+.
Please findmae_env.yml for all the dependencies.
You may also use download the conda-packedconda env, untar it, and then:

source path_to_env/bin/activate

2. Prepare data:

Please download AudioSet athere. Due to copyright we cannot release the data. The data annotation json parased and used in this work is availablehere. The format follows the one inAST. Please be sure to modify the path in the scripts accordingly to reflect your own setup.

3. Pretrianing on AudioSet-2M

For the brave ones to pre-train on AudioSet-2M: Please use the pretrain_audioset2M.sh by:

bash pretrain_audioset2M.sh

4. Fine-tuning on AudioSet-2M and AudioSet-20K

For Finetuning from an AuioSet-pretrained model. Please use your own pretrained model from the previous step or download our pre-trainedckpt and put it under ./ckpt/. Please use the script submit_ft_mask_bal.sh by

bash submit_ft_mask_bal.sh 2e-4 0.2 0.2 ./ckpt/pretrained.pth"

This will perform weighted distributed sampling on the unbalanded Audioset to fine-tuned the model with class-balanced data for 100 epochs. The resulting mAP on the AudioSet should be around 47.3. We provide our finetuned checkpoint athere. An example log of finetuning is as follows:

[07:10:32.717347] log_dir: /checkpoint/berniehuang/experiments/419909[07:10:36.394431] Epoch: [99]  [  0/781]  eta: 0:47:51  lr: 0.000001  loss: 0.0066 (0.0066)  time: 3.6761  data: 1.6724  max mem: 2606[07:12:24.728503] Epoch: [99]  [500/781]  eta: 0:01:02  lr: 0.000001  loss: 0.0116 (0.0128)  time: 0.2130  data: 0.0002  max mem: 2606[07:13:24.602830] Epoch: [99]  [780/781]  eta: 0:00:00  lr: 0.000001  loss: 0.0122 (0.0128)  time: 0.1837  data: 0.0003  max mem: 2606[07:13:24.853957] Epoch: [99] Total time: 0:02:52 (0.2204 s / it)[07:13:25.085416] Averaged stats: lr: 0.000001  loss: 0.0122 (0.0126)[07:13:28.343364] Test:  [ 0/79]  eta: 0:02:01    time: 1.5353  data: 1.5029  max mem: 2606[07:13:30.942012] Test:  [78/79]  eta: 0:00:00    time: 0.0206  data: 0.0001  max mem: 2606[07:13:31.180169] Test: Total time: 0:00:04 (0.0554 s / it)[07:13:42.547896] mAP: 0.472873[07:13:42.552120] mAP of the network on the 19148 test images: 0.4728[07:13:42.552198] Max mAP: 0.473[07:13:42.566228] Training time 5:16:14submitit INFO (2022-04-22 07:13:43,404) - Job completed successfully

You can also try fine-tuning on AudioSet-20K for 60 epochs with

sbatch ft_as.sh 1e-3 ./ckpt/pretrained.pth

The log.txt will look like:

{"train_lr": 2.1997867184321786e-06, "train_loss": 0.01310475811136991, "test_mAP": 0.36981118189071294, "epoch": 56, "n_parameters": 85659407}{"train_lr": 1.6171788925401227e-06, "train_loss": 0.01304934614071496, "test_mAP": 0.37001905352752995, "epoch": 57, "n_parameters": 85659407}{"train_lr": 1.2277041313086816e-06, "train_loss": 0.013038477757025324, "test_mAP": 0.36998449127640076, "epoch": 58, "n_parameters": 85659407}{"train_lr": 1.0325878664284776e-06, "train_loss": 0.012981618695671238, "test_mAP": 0.36999196624276054, "epoch": 59, "n_parameters": 85659407}

The peformance on AudioSet-20K is around 37.0 mAP.

5. Inference

For inference the finetuned model. Please put your finetuned model under ./ckpt, or please download our finetunedckpt. Then:

bash inf.sh ckpt/finetuned.pth

This should give you 47.3 mAP on AudioSet. An example log is as follows:

[18:22:12.877430] number of params (M): 85.66[18:22:12.877460] base lr: 2.00e-03[18:22:12.877479] actual lr: 1.25e-04[18:22:12.877495] accumulate grad iterations: 1[18:22:12.877511] effective batch size: 16[18:22:12.898235] criterion = BCEWithLogitsLoss()[18:22:14.068845] Test:  [   0/1197]  eta: 0:23:19    time: 1.1690  data: 1.0901  max mem: 1035[18:22:55.447027] Test:  [ 300/1197]  eta: 0:02:06    time: 0.1402  data: 0.0001  max mem: 1046[18:23:37.699615] Test:  [ 600/1197]  eta: 0:01:24    time: 0.1411  data: 0.0001  max mem: 1061[18:24:20.110863] Test:  [ 900/1197]  eta: 0:00:41    time: 0.1417  data: 0.0001  max mem: 1075[18:25:02.194206] Test:  [1196/1197]  eta: 0:00:00    time: 0.1526  data: 0.0001  max mem: 1090[18:25:02.321579] Test: Total time: 0:02:49 (0.1415 s / it)[18:25:11.997641] mAP: 0.472873[18:25:12.004128] Accuracy of the network on the 19148 test images: 0.4729

Per-class AP can be found under ./aps.txt and per-example results is inf_output.npy

Checkpoints:

ViT-B, AS-2Mpretrained
ViT-B, AS-2M pretrained+finetuned

Updates

Code and Model Release
Provide conda-pack envs
Notebook demos for reconstruction (legal blocked)
Additional exps

Citation

@inproceedings{huang2022amae,  title = {Masked Autoencoders that Listen},  author = {Huang, Po-Yao and Xu, Hu and Li, Juncheng and Baevski, Alexei and Auli, Michael and Galuba, Wojciech and Metze, Florian and Feichtenhofer, Christoph}  booktitle = {NeurIPS},  year = {2022}}

Contact

Please contact Bernie Huang (berniehuang@meta.com) if you have any questions. Thank you.

Reference

The codebase is based on the awesomeMAE andAST repos.

License

This project is under the CC-BY 4.0 license. SeeLICENSE for details.

About

This repo hosts the code and models of "Masked Autoencoders that Listen".

Resources

Readme

License

View license

Code of conduct

Contributing

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Audio-MAE

Demo Examples

1. Installation

2. Prepare data:

3. Pretrianing on AudioSet-2M

4. Fine-tuning on AudioSet-2M and AudioSet-20K

5. Inference

Checkpoints:

Updates

Citation

Contact

Reference

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Languages

Movatterモバイル変換

License

facebookresearch/AudioMAE

Folders and files

Latest commit

History

Repository files navigation

Audio-MAE

Demo Examples

1. Installation

2. Prepare data:

3. Pretrianing on AudioSet-2M

4. Fine-tuning on AudioSet-2M and AudioSet-20K

5. Inference

Checkpoints:

Updates

Citation

Contact

Reference

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Languages

Packages