Chetupalli et al., 2024
| Publication | Publication Date | Title |
|---|---|---|
| Veluri et al. | Real-time target sound extraction | |
| Chazan et al. | Multi-microphone speaker separation based on deep DOA estimation | |
| Mousazadeh et al. | Voice activity detection in presence of transient noise using spectral clustering | |
| Phan et al. | Random regression forests for acoustic event detection and classification | |
| Seetharaman et al. | Class-conditional embeddings for music source separation | |
| Zhang et al. | X-tasnet: Robust and accurate time-domain speaker extraction network | |
| Maiti et al. | EEND-SS: Joint end-to-end neural speaker diarization and speech separation for flexible number of speakers | |
| Friedland et al. | The ICSI RT-09 speaker diarization system | |
| Lee et al. | Feature extraction based on the non-negative matrix factorization of convolutional neural networks for monitoring domestic activity with acoustic signals | |
| Rosner et al. | Classification of music genres based on music separation into harmonic and drum components | |
| Chetupalli et al. | Speaker counting and separation from single-channel noisy mixtures | |
| Xu et al. | Towards weakly supervised text-to-audio grounding | |
| Phan et al. | A multi-channel fusion framework for audio event detection | |
| Chetupalli et al. | Speech Separation for an Unknown Number of Speakers Using Transformers With Encoder-Decoder Attractors. | |
| Jiang et al. | Unified audio event detection | |
| Köpüklü et al. | ResectNet: An Efficient Architecture for Voice Activity Detection on Mobile Devices. | |
| Sofer et al. | CNN self-attention voice activity detector | |
| Makishima et al. | Joint autoregressive modeling of end-to-end multi-talker overlapped speech recognition and utterance-level timestamp prediction | |
| Zhang et al. | Multi-level speaker representation for target speaker extraction | |
| Petermann et al. | Hyperbolic distance-based speech separation | |
| Chetupalli et al. | A Unified Approach to Speaker Separation and Target Speaker Extraction Using Encoder-Decoder Based Attractors | |
| Zhang et al. | Audio-visual speech separation with adversarially disentangled visual representation | |
| Xiao et al. | Improved source counting and separation for monaural mixture | |
| US20240304205A1 (en) | System and Method for Audio Processing using Time-Invariant Speaker Embeddings | |
| Alam et al. | An ensemble approach to unsupervised anomalous sound detection |