220Accesses
Abstract
In order to achieve high-efficiency blind identification (BI) for underdetermined speech mixing systems without recovery degradation, this paper proposes a novel BI scheme based on effective pattern recognition and the find-density-peaks (FDP) clustering algorithm. To lower BI’s computational complexity, a 3-step effective pattern recognition procedure is proposed, which consists of voiced-sound pattern sifting, spectrum correction based harmonic representation and phase uniformity based single-active-source (SAS) pattern recognition. Furthermore, a 5-step FDP clustering procedure is summarized and utilized to determine the souce number and estimate all the columns of the mixing matrix. Our experimental results showed that, the proposed 3-step effective pattern recognition procedure can condense the original 56383 TF patterns into only 194 effective SAS patterns, which considerably alleviates the computational burden of BI. Moreover, by means of FDP clustering, not only the source number can be intuitively and readily determined, but also the mixing matrix can be estimated with a higher recovery SNR than the existing BI schemes. Due to harmonic-like components are of wide applications, our proposed BI scheme possesses a vast potential in other harmonics-related blind-signal-separation (BSS) fields such as mechanical vibration analysis, channel estimation in communication.
This is a preview of subscription content,log in via an institution to check access.
Access this article
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
Price includes VAT (Japan)
Instant access to the full article PDF.





Similar content being viewed by others
References
Abrard F, Deville Y (2005) A time-frequency blind signal separation method applicable to underdetermined mixtures of dependent sources. Signal Process 85 (7):1389–1403
Aïssa-El-Bey A, Linh-Trung N, Abed-Meraim K, Belouchrani A, Grenier Y (2007) Underdetermined blind separation of nondisjoint sources in the time-frequency domain. IEEE Trans Signal Process 55(3):897–907
Bofill P, Zibulevsky M (2001) Underdetermined blind source separation using sparse representations. Signal Process 81(11):2353–2362
Florea C, Gordan M, Vlaicu A, Orghidan R (2014) Computationally efficient formulation of sparse color image recovery in the JPEG compressed domain. J Math Imaging Vision 49(1):173–190
Gao Z, Zhang H, Xu G, Xue Y, Hauptmannc AG (2015) Multi-view discriminative and structured dictionary learning with group sparsity for human action recognition. Signal Process 112:83–97
Gao Z, Zhang L, Chen M, Hauptmann A, Zhang H, Cai A (2014) Enhanced and hierarchical structure algorithm for data imbalance problem in semantic extraction under massive video dataset. Multimedia Tools and Applications 68(3):641–657
Ge S, Han J, Han M (2015) Nonnegative mixture for underdetermined blind source separation based on a tensor algorithm. Circuits Systems & Signal Processing 34 (9):2935–2950
Hayes M, Lim Jae, Oppenheim A (1980) Signal reconstruction from phase or magnitude. IEEE Trans Acoust Speech Signal Process 28(6):672–680
He Z, Cichocki A, Zdunek R, Xie S (2009) Improved FOCUSS method with conjugate gradient iterations. IEEE Trans Signal Process 57(1):399–404
Jourjine A, Rickard S, Yılmaz Ö (2000) Blind separation of disjoint orthogonal signals: demixingN sources from 2 mixtures. In: ICASSP, pp 2985–2988
Koeipensri T, Boonchoo P, Sueaseenak D (2016) The development of biosignal processing system (BPS-SWU v1.0) for learning and research in biomedical engineering. In: 2016 9th biomedical engineering international conference, pp 1–4
Liu AA, Nie WZ, Gao Y, Su YT (2016) Multi-modal clique-graph matching for view-based 3D model retrieval. IEEE Trans Image Process 25(5):2103–2116
Liu AA, Su YT, Jia PP, Gao Z, Hao T, Yang ZX (2015) Multipe/single-view human action recognition via part-induced Multitask structural learning. IEEE Transactions on Cybernetics 45(6):1194–1208
Liu AA, Su YT, Nie WZ, Kankanhalli M (2017) Hierarchical clustering multi-task learning for joint human action grouping and recognition. IEEE Trans Pattern Anal Mach Intell 39(1):102–114
Liu B, Reju VG, Khong AWH (2014) A linear source recovery method for underdetermined mixtures of uncorrelated AR-model signals without sparseness. IEEE Trans Signal Process 62(19):4947–4958
Mohimani H, Babaie-Zadeh M, Jutten C (2009) A fast approach for overcomplete sparse decomposition based on smoothedℓ0 norm. IEEE Trans Signal Process 57(1):289–301
O’Grady PD, Pearlmutter BA (2008) The LOST algorithm: finding lines and separating speech mixtures. EURASIP Journal on Advances in Signal Processing 2008 (1):1–17
Qiao ZJ, Lei YG, Lin J, Jia F (2016) An adaptive unsaturated bistable stochastic resonance method and its application in mechanical fault diagnosis. Mech Syst Signal Process 84(Part A):731–746
Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496
Saab R, Yılmaz Ö, McKeown MJ, Abugharbieh R (2007) Underdetermined anechoic blind source separation viaℓq-basis-pursuit withq < 1. IEEE Trans Signal Process 55(8):4004–4017
Sha Z, Huang Z, Zhou Y, Wang F (2013) Frequency-hopping signals sorting based on underdetermined blind source separation. IET Commun 7(14):1456–1464
Siegel LJ, Bessey A (1982) Voiced/unvoiced/mixed excitation classification of speech. IEEE Trans Acoust Speech Signal Process 30(3):451–460
Vaseghi SV (2008) Advanced digital signal processing and noise reduction. Wiley, New York
Xie S, Yang L, Yang J, Zhou G, Xiang Y (2012) Time-frequency approach to underdetermined blind source separation. IEEE Transactions on Neural Networks & Learning Systems 23(2):306–316
Xu ZJ, Gong Y, Wang K, Lu WD, Hua JY (2017) Covert digital communication systems based on joint normal distribution. IET Commun 11 (8):1282–1290
Yang Y, Song J, Huang Z, Ma Z, Sebe N, Hauptmann AG (2013) Multi-feature fusion via hierarchical regression for multimedia analysis. IEEE Trans Multimedia 15(3):572–581
Yılmaz Ö, Rickard S (2004) Blind sepraration of speech mixtures via time-frequency masking. IEEE Trans Signal Process 52(7):1830–1847
Zhang F, Geng Z, Yuan W (2001) The algorithm of interpolating windowed FFT for harmonic analysis of electric power system. IEEE Trans Power Delivery 16 (2):160–164
Zhou G, Yang Z, Xie S, Yang J (2011) Mixing matrix estimation from sparse mixtures with unknown number of sources. IEEE Trans Neural Netw 22(2):211–221
Acknowledgements
This work was financially supported by Qingdao National Laboratory for Marine Science and Technology under Grant No. QNLM2016OPR0411.
Author information
Authors and Affiliations
School of Electrical and Information Engineering, Tianjin University, Tianjin, 300072, China
Xiangdong Huang, Lin Yang, Runan Song & Wei Lu
- Xiangdong Huang
You can also search for this author inPubMed Google Scholar
- Lin Yang
You can also search for this author inPubMed Google Scholar
- Runan Song
You can also search for this author inPubMed Google Scholar
- Wei Lu
You can also search for this author inPubMed Google Scholar
Corresponding author
Correspondence toXiangdong Huang.
Appendix A: Dataflow of ratio-based spectrum correction
Appendix A: Dataflow of ratio-based spectrum correction
Given the hanning-windowed STFT spectrograms ofM mixturesXm(t0,kΔf),m = 1,...,M,Δf =fs/L (denotingXm(t0,k) for simplicity) at some momentt =t0, the results of spectrum correction (i.e., the outputs\(\hat {f_{m}}, \hat {d_{m}}, \hat {\phi _{m}}\)) are acquired by the following steps:
Step 1 Collect all the large-amplitude peak indicesk∗ ofXm(t0,k). For each indexk∗, calculate the amplitude ratiovp betweenXm(t0,k∗) and its sub-peak neighbor, i.e.,
$$\begin{array}{@{}rcl@{}} v =\frac{ |X_{m}(t_{0}, k^{*}) |}{\max \{ | X_{m}(t_{0}, k^{*}-1)|, | X_{m}(t_{0}, k^{*}+ 1) |\} }. \end{array} $$(16)Further, a variableu can be calculated as
$$\begin{array}{@{}rcl@{}} u=(2-v )/(1+v ). \end{array} $$(17)Step 2 Estimate the aforementioned frequency offset\(\hat {\delta }\) as
$$ \hat{\delta} = \left\{ \begin{array}{rl} u, ~&\text{if} \ |X_{m}(t_{0}, k^{*}+ 1)|> |X_{m}(t_{0}, k^{*}-1)|\\ -u,~&\text{else} \end{array}\right., $$(18)then, the frequency estimate is\(\hat f_{m}=(k^{*}+\hat {\delta }) f_{s}/L\).
Step 3 Acquire the amplitude estimate\(\hat {d}_{m} \) and phase estimate\(\hat {\phi }_{m}\) as
$$\begin{array}{@{}rcl@{}} \hat{d}_{m}= 2 \pi \hat{\delta}(1-\hat{\delta}^{2}) |X_{m}(t_{0}, k_{p})| / \sin (\pi\hat{\delta}). \end{array} $$(19)$$\begin{array}{@{}rcl@{}} \hat\phi_{m}=\text{ang}[X_{m}(t_{0}, k^{*})]- \pi\hat{\delta}(L-1)/L, \end{array} $$(20)where ang(⋅) refers to the angle operation.
Rights and permissions
About this article
Cite this article
Huang, X., Yang, L., Song, R.et al. Effective pattern recognition and find-density-peaks clustering based blind identification for underdetermined speech mixing systems.Multimed Tools Appl77, 22115–22129 (2018). https://doi.org/10.1007/s11042-018-5619-z
Received:
Revised:
Accepted:
Published:
Issue Date:
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative