- Notifications
You must be signed in to change notification settings - Fork6
Inspired by the convolutional recurrent neural network(CRNN) and inception, we propose a multiscale time-frequency convolutional recurrent neural network (MTF-CRNN) for audio event detection. Our goal is to improve audio event detection performance and recognize target audio events that have different lengths and accompany the complex audio back…
zhang201882/MTF-CRNN
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
MTF-CRNN: Multi-scale Time-Frequency Convolutional Recurrent Neural Network For Sound Event Detectionbased on DCASE2017 task2references:Toni HeittolaBaseline system, DCASE Framework, Documentationtoni.heittola@tut.fi,http://www.cs.tut.fi/~heittolt/,https://github.com/toni-heittolaAleksandr DimentDataset synthesis (Task 2)aleksandr.diment@tut.fi,http://www.cs.tut.fi/~diment/Annamaria MesarosDocumentationannamaria.mesaros@tut.fi,http://www.cs.tut.fi/~mesaros/DocumentationSeehttps://tut-arg.github.io/DCASE2017-baseline-system/ for detailed instruction, manuals and tutorials.
Getting startedClone repository from Github or download latest release.Install requirements with command: pip install -r requirements.txtRun the application with default settings: python applications/task2.pySystem descriptionThis is the Multi-scale Time-Frequency Convolutional Recurrent Neural Network For Sound Event Detection for the Detection and Classification of Acoustic Scenes and Events 2017 (DCASE2017) challenge task 2.
The code is based on the baseline system. we propose a multi-scale time-frequency convolutional recurrent neural network (MTF-CRNN) for sound event detection. We exploit four groups of parallel and serial convolutional kernels to learn high-level shift invariant features from the time and frequency domains of acoustic samples. A two-layer bi-directional gated recurrent unit is used to capture the temporal context from the extracted high-level features. The proposed method is evaluated on two different sound event datasets. Compared to baseline method and other methods, the performance is greatly improved as a single model with few parameters without pre-training. On the TUT Rare Sound Events 2017 evaluation dataset, our method achieved an error rate(ER) of 0.09$\pm$0.01 which got an improvement of 83${%}$ than the baseline. On the TAU Spatial Sound Events 2019 evaluation dataset, our system reports an ER of 0.11$\pm$0.01, a relative improvement over the baseline of 61${%}$, and the F1 and ER is better than that of on the development dataset. Compared to the state-of-the-art methods, our proposed network achieves very competitive detection performance with few parameters and good generalization capability.
The main approach implemented in the system:
Acoustic features: Log Mel-band energies extracted in 40ms windows with 20ms hop size.Machine learning: neural network approach using multi-scale time-frequency convolutional recurrent neural network (MTF-CRNN) for sound event detection (with 300 neurons each, and 20% dropout between layers).Directory layout
.├── applications # Task specific applications (task2.py)│ └── parameters # Default parameters for the applications├── dcase_framework # DCASE Framework code│ └── application_core.py # The main body for the applications│ └── pytorch_utils.py # The model code├── README.md # This file└── requirements.txt # External module dependencies
InstallationThe system is developed for Python 3.6. 5. This system is tested to work with Linux operating systems.
To get started, run command:
python3 task2.py
See more detailed instructions from documentation.references:MTF-CRNN:Multi-scale Time-Frequency Convolutional Recurrent Neural Network For Sound Event DetectionLicenseThe DCASE Framework and the baseline system is released only for academic research under EULA.pdf from Tampere University of Technology.
About
Inspired by the convolutional recurrent neural network(CRNN) and inception, we propose a multiscale time-frequency convolutional recurrent neural network (MTF-CRNN) for audio event detection. Our goal is to improve audio event detection performance and recognize target audio events that have different lengths and accompany the complex audio back…