Movatterモバイル変換


[0]ホーム

URL:


CN106971703A - A kind of song synthetic method and device based on HMM - Google Patents

A kind of song synthetic method and device based on HMM
Download PDF

Info

Publication number
CN106971703A
CN106971703ACN201710160104.2ACN201710160104ACN106971703ACN 106971703 ACN106971703 ACN 106971703ACN 201710160104 ACN201710160104 ACN 201710160104ACN 106971703 ACN106971703 ACN 106971703A
Authority
CN
China
Prior art keywords
hmm
model
speaker
voice
synthesis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710160104.2A
Other languages
Chinese (zh)
Inventor
杨鸿武
赵娜
冯欢
甘振业
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwest Normal University
Original Assignee
Northwest Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwest Normal UniversityfiledCriticalNorthwest Normal University
Priority to CN201710160104.2ApriorityCriticalpatent/CN106971703A/en
Publication of CN106971703ApublicationCriticalpatent/CN106971703A/en
Pendinglegal-statusCriticalCurrent

Links

Classifications

Landscapes

Abstract

The invention discloses a kind of song synthetic method based on HMM and device, with TTS (literary periodicals) technology, pass through HTS (speech synthesis system based on hidden Markov model), and utilize STRAIGHT algorithms, and establish the acoustic model related based on HMM speaker synthesized towards song, the melody Controlling model of song, and speaker adaptation training has been carried out, realize the personalized speech synthesizer that a kind of lyrics based on HMM are changed in real time to song.The system device enriches the research contents of phonetic synthesis, the voice of synthesis is had more the expression of expressive force and emotion;Especially give the opportunity to study that the technical operations such as song is made, music is handled are provided with music-lover;Social resources workable for people are added, with certain practical value and important meaning.

Description

Translated fromChinese
一种基于HMM的歌曲合成方法及装置A kind of song synthesis method and device based on HMM

技术领域technical field

本发明涉及人机交互技术、文-语转换技术、语音合成技术等领域,具体涉及一种基于HMM的歌曲合成方法及装置。The present invention relates to the fields of human-computer interaction technology, text-to-speech conversion technology, speech synthesis technology, etc., and specifically relates to an HMM-based song synthesis method and device.

背景技术Background technique

随着信息技术的不断创新和完善,许多人机交互方面的音乐多媒体应用也逐渐走入我们的日常生活,例如计算机点歌、谱曲、修饰歌声,以及手机上的听歌识曲等。如何使计算机更加人性化,能够像人类一样“唱歌”,也就是说,已知简谱和歌词,计算机就可以自动产生美妙、动听的歌声已经成为一种新的需求。随着多媒体技术在娱乐领域的飞速发展,同时也为这一技术提供了更为广阔的应用空间。With the continuous innovation and improvement of information technology, many music multimedia applications in terms of human-computer interaction have gradually entered our daily life, such as computer song ordering, composing, modifying singing voices, and listening to and recognizing songs on mobile phones. How to make the computer more humanized and able to "sing" like human beings, that is to say, the computer can automatically generate beautiful and pleasant singing voices has become a new demand if the numbered notation and lyrics are known. With the rapid development of multimedia technology in the field of entertainment, it also provides a broader application space for this technology.

目前绝大多数音乐都是以数字格式来记录和传播的,譬如,WAV、MP3、MIDI、以及实时音乐广播等多种存储形式。和传统的音乐模式相比,数字音乐在制作、存储、发行等方面有着不可比拟的优势。通过计算机,创作者在谱曲的同时能够听到音乐作品的制作效果,对乐谱进行的任何修改操作都可以及时的反馈给创作者,不需要进行传统的排练、演奏、录制、编辑等一系列复杂的过程来处理音乐,极大的降低了音乐制作的周期和人力成本,同时也避免了作曲家在漫长的创作过程中失去偶然得到的创作灵感。At present, most music is recorded and disseminated in digital formats, such as WAV, MP3, MIDI, and real-time music broadcasting and other storage forms. Compared with the traditional music mode, digital music has incomparable advantages in production, storage and distribution. Through the computer, the creator can hear the production effect of the music work while composing the music, and any modification to the score can be fed back to the creator in time, without the need for a series of traditional rehearsals, performances, recordings, editing, etc. The complex process of processing music greatly reduces the cycle and labor costs of music production, and at the same time prevents composers from losing the creative inspiration they get by chance during the long creative process.

语音合成技术是人机交互领域的一个重要研究内容,是嵌入式研究领域的重要组成部分。现如今,歌声合成也逐步成为了一个热点话题。然而,在歌声合成技术出现之前,语音合成技术的发展已经相对成熟了。一些学者试图利用语音合成的方法来合成歌声,但是歌声和语音又存在一定程度的差异性。语音重在内容(当然也可以表达说话人的意向、情感),歌声重在旋律的演绎和起伏变化,这使得语音合成的方法不能够直接应用到歌声的合成当中。Speech synthesis technology is an important research content in the field of human-computer interaction and an important part of the embedded research field. Nowadays, singing voice synthesis has gradually become a hot topic. However, before the appearance of singing voice synthesis technology, the development of speech synthesis technology was relatively mature. Some scholars try to use the method of speech synthesis to synthesize singing voice, but there is a certain degree of difference between singing voice and voice. Speech focuses on content (of course, it can also express the speaker's intention and emotion), and singing focuses on the interpretation and fluctuation of melody, which makes the method of speech synthesis unable to be directly applied to the synthesis of singing.

在长期的国内外研究过程当中,歌声合成类似于语音合成技术,也已逐步形成了三种主流的合成方式:1.波形拼接式合成;2.参数化式合成;3.语音修改式合成。其中拼接合成和参数化式合成都是基于语料库的,合成音质不高,而语音修改方式比较灵活,是根据旋律信息来修改语音信号的声学参数进而达到歌声的合成。在国内外有提出了歌词到歌声实时转换的个性化语音合成。根据歌曲的乐谱信息立即产生歌声,它可以接收一首歌歌词的连续语音。该系统在录入与歌词相对应的语音后用Viterbi算法在连续的语音合成单元合成歌声,通过基音同步波形叠加(Pitch-Synchronous Overlap-Add,PSOLA)方法来实现音高、时长、能量和频谱的实时转换,并合成歌声。由于该系统没有考虑语音和歌声在音高和时长等声学方面的差异性,致使合成的效果不理想。也有在此基础上,提出了一个大语料库的歌词到歌声转换,该系统在自然度和音质等方面都达到了比较好的结果。该系统设计了3个普通话的语料库,用Viterbi算法来确定各个合成单元的最优组合。这种方法的缺陷是:制作语料库花费大量的时间以及人的精力。In the long-term research process at home and abroad, singing voice synthesis is similar to speech synthesis technology, and three mainstream synthesis methods have gradually formed: 1. Waveform splicing synthesis; 2. Parametric synthesis; 3. Voice modification synthesis. Among them, splicing synthesis and parametric synthesis are both based on corpus, and the synthesized sound quality is not high, while the speech modification method is more flexible, and the acoustic parameters of the speech signal are modified according to the melody information to achieve the synthesis of singing voice. Personalized speech synthesis for real-time conversion from lyrics to singing voices has been proposed both at home and abroad. Singing voices are produced immediately based on the musical score information of a song, and it can receive a continuous voice of a song's lyrics. After recording the voice corresponding to the lyrics, the system uses the Viterbi algorithm to synthesize the singing voice in the continuous speech synthesis unit, and realizes the pitch, duration, energy and frequency spectrum by the pitch-synchronous waveform superposition (Pitch-Synchronous Overlap-Add, PSOLA) method. Convert and synthesize vocals in real time. Since the system does not take into account the differences in acoustic aspects such as pitch and duration of speech and singing, the effect of synthesis is not ideal. Also on this basis, a large corpus of lyrics to singing conversion is proposed. The system has achieved relatively good results in terms of naturalness and sound quality. The system designs three Mandarin corpora, and uses the Viterbi algorithm to determine the optimal combination of each synthesis unit. The disadvantage of this method is that it takes a lot of time and human energy to make a corpus.

因此,本领域的技术员致力于开发一种新型的面向有音乐处理需求者的基于HMM的个性化歌曲合成的实现方法和装置。Therefore, those skilled in the art are devoting themselves to developing a novel HMM-based personalized song synthesis method and device for those who need music processing.

发明内容Contents of the invention

有鉴于现有技术的上述缺陷,本发明要解决背景技术中提出的中文歌声的合成研究较少,合成音质不高,操作耗时耗力等问题,提供了一种面向有音乐处理需求者的基于HMM的个性化歌曲合成的实现方法和装置。In view of the above-mentioned defects of the prior art, the present invention aims to solve the problems that there are few researches on the synthesis of Chinese singing voices proposed in the background art, the quality of the synthesized sound is not high, and the operation is time-consuming and labor-intensive. A method and device for realizing personalized song synthesis based on HMM.

为解决上述技术问题,本发明提供的技术方案如下:In order to solve the problems of the technologies described above, the technical solutions provided by the invention are as follows:

一种基于HMM的歌曲合成方法,包括以下步骤:A kind of song synthesis method based on HMM, comprises the following steps:

A、分析语音和歌声在声学特征的差异性,建立歌声的旋律控制模型;A. Analyze the differences in acoustic characteristics between speech and singing, and establish a melody control model for singing;

B、建立面向歌曲合成的基于HMM的说话人相关的声学模型;B. Establishing a speaker-related acoustic model based on HMM for song synthesis;

C、利用基于HMM的语音合成系统合成出歌声。C. Utilize the speech synthesis system based on HMM to synthesize the singing voice.

进一步的,所述步骤A中所述分析语音和歌声在声学特征的差异性的具体步骤如下:Further, the specific steps of analyzing voice and singing voice in the difference of acoustic features described in the step A are as follows:

a、运用时域分析法和频域分析法对语音信号进行谱分析,并将语音信号与歌声信号进行基频的对比分析;a, use time domain analysis method and frequency domain analysis method to carry out spectrum analysis to voice signal, and carry out comparative analysis of fundamental frequency of voice signal and singing voice signal;

b、利用MIDI技术从MIDI系统中提取出所需要的乐谱信息;b. Use MIDI technology to extract the required score information from the MIDI system;

c、通过读取MIDI文件中提取的乐谱的旋律信息,分析其乐谱文件的结构特征,进而获得音乐参数信息,所述音乐参数信息包括通道标号、音符音高、键的速度、音符起始时间和音符持续时间。c. By reading the melody information of the music score extracted from the MIDI file, analyzing the structural features of the music score file, and then obtaining the music parameter information, the music parameter information includes the channel label, the pitch of the note, the speed of the key, and the start time of the note and note duration.

进一步的,所述步骤A中所述歌声的旋律控制模型包括基频控制模型和时长控制模型;利用基频控制模型将乐谱中的离散音高转换为连续的基频曲线,并利用时长控制模型获得歌唱音符的发音时长。Further, the melody control model of the singing voice described in the step A includes a fundamental frequency control model and a duration control model; utilize the fundamental frequency control model to convert the discrete pitch in the music score into a continuous fundamental frequency curve, and utilize the duration control model Get the vocal duration of the singing note.

进一步的,所述步骤B中所述建立面向歌曲合成的基于HMM的说话人相关的声学模型有如下步骤:Further, the speaker-related acoustic model based on HMM for song synthesis described in the step B has the following steps:

a、利用说话人的语音语料,分析语音数据,得到语音数据中包括基频F0、时长、频谱SP和非周期索引AP的声学参数;并利用基于HMM的说话人自适应训练技术,训练获得混合语音的平均音模型;a. Use the speaker's speech corpus to analyze the speech data, and obtain the acoustic parameters including the fundamental frequency F0, duration, spectrum SP and aperiodic index AP in the speech data; and use the HMM-based speaker adaptive training technology to train and obtain a mixture Average tone model of speech;

b、利用待合成的目标说话人的少量语音数据,通过说话人自适应变换技术,得到目标说话人的自适应声学模型,并对自适应模型进行修正与更新。b. Using a small amount of speech data of the target speaker to be synthesized, the adaptive acoustic model of the target speaker is obtained through speaker adaptive transformation technology, and the adaptive model is corrected and updated.

进一步的,所述通过基于HMM的说话人自适应训练,训练得到混合语音的平均音模型包括如下步骤:Further, said through HMM-based speaker adaptive training, training to obtain the average sound model of the mixed voice includes the following steps:

a、对说话人的语料库和目标说话人的语料库数据进行语音分析,提取其声学参数:Mel倒谱系数,并计算它们的一阶差分和二阶差分;a. Perform phonetic analysis on the corpus of the speaker and the corpus of the target speaker, extract its acoustic parameters: Mel cepstral coefficients, and calculate their first-order difference and second-order difference;

b、结合上下文属性集,进行HMM模型训练,训练频谱和基频参数的HMM模型以及状态时长参数的多分布半隐马尔科夫模型MSD-HSMM;b. Combining the context attribute set, conduct HMM model training, train the HMM model of spectrum and fundamental frequency parameters and the multi-distribution semi-hidden Markov model MSD-HSMM of state duration parameters;

c、利用少量目标说话人的语音库,进行说话人自适应训练,获得混合语音的平均音模型,从而得到上下文相关的MSD-HSMM模型。c. Using the voice library of a small number of target speakers, perform speaker adaptive training to obtain the average sound model of the mixed voice, so as to obtain the context-dependent MSD-HSMM model.

进一步的,所述利用待合成的目标说话人的少量语音数据,通过说话人自适应变换技术,得到目标说话人的自适应声学模型,并对自适应模型进行修正与更新,包括如下步骤:Further, the method uses a small amount of voice data of the target speaker to be synthesized to obtain an adaptive acoustic model of the target speaker through speaker adaptive transformation technology, and corrects and updates the adaptive model, including the following steps:

a、说话人自适应训练后,利用基于HSMM的CMLLR自适应算法,计算得到说话人转换的状态输出概率分布以及时长概率分布的均值向量和协方差矩阵,状态i下特征向量o和状态时长d的变换方程为:a. After speaker adaptive training, use the HSMM-based CMLLR adaptive algorithm to calculate the speaker transition state output probability distribution and the mean vector and covariance matrix of the duration probability distribution, the eigenvector o and the state duration d in state i The transformation equation of is:

bi(o)=N(o;Aui-b,AΣiAT)=|A-1|N(Wξ;uii)bi (o)=N(o;Aui -b,AΣi AT )=|A-1 |N(Wξ;uii )

pi(d)=N(d;αmi-β,ασi2α)=|α-1|N(αψ;mii2)pi (d)=N(d;αmi -β,ασi2 α)=|α-1 |N(αψ;mii2 )

其中,ξ=[oT,1],ψ=[d,1]T,μi为状态输出分布的均值,mi为时长分布的均值,Σi为对角协方差矩阵,为方差,W=[A-1 b-1]为目标说话人状态输出概率密度分布的线性变换矩阵,X=[α-1-1]为状态时长概率密度分布的变换矩阵;Among them, ξ=[oT ,1], ψ=[d,1]T , μi is the mean value of the state output distribution,mi is the mean value of the time-length distribution, Σ iis the diagonal covariance matrix, is the variance, W=[A-1 b-1 ] is the linear transformation matrix of the target speaker state output probability density distribution, X=[α-1 , β-1 ] is the transformation matrix of the state duration probability density distribution;

b、通过基于HSMM的自适应变换算法,可对语音数据的频谱、基频和时长参数进行归一化和变换,对于长度为T的自适应数据O,可对变换Λ=(W,X)进行最大似然估计;b, through the adaptive transformation algorithm based on HSMM, can carry out normalization and transformation to the frequency spectrum, fundamental frequency and duration parameter of voice data, for the adaptive data O that length is T, can transform Λ=(W, X) Perform maximum likelihood estimation;

c、采用最大后验MAP算法对语音的自适应模型进行了修正和更新,对于给定HSMM的参数集λ,若其前向概率和后向概率分别为:αt(i)和βt(i),则其在状态i下连续观测序列ot-d+1…ot的生成概率为:c. The maximum a posteriori MAP algorithm is used to modify and update the adaptive model of speech. For a given parameter set λ of HSMM, if its forward probability and backward probability are: αt (i) and βt ( i), then its generation probability of continuous observation sequence ot-d+1 ...ot in state i for:

MAP估计描述如下:The MAP estimation is described as follows:

其中,为线性回归变换后的均值向量,ω和τ分别为状态输出和时长分布的MAP估计参数,为自适应均值向量的加权平均MAP估计值。in, with is the mean vector after linear regression transformation, ω and τ are the MAP estimation parameters of the state output and time-length distribution, respectively, with is the adaptive mean vector with The weighted average MAP estimate of .

进一步的,所述步骤C中所述利用基于HMM的语音合成系统合成出歌声所采用的语音分析与合成方法是以STRAIGHT算法为基础的。Further, the speech analysis and synthesis method used in the step C to synthesize the singing voice by using the HMM-based speech synthesis system is based on the STRAIGHT algorithm.

进一步的,所述步骤C中所述利用基于HMM的语音合成系统合成出歌声包括如下步骤:Further, utilizing the speech synthesis system based on HMM described in the step C to synthesize the singing voice includes the following steps:

a、使用文本分析工具对输入的歌词文本进行分析,利用文本分析程序将给定的歌词文本转换为包含语境描述信息的声学标注序列,用训练过程中聚类得到的各个决策树来预测与每个发音及其语境相关的上下文HMM模型,再拼接成一个语句HMM模型;a. Use a text analysis tool to analyze the input lyrics text, use a text analysis program to convert the given lyrics text into an acoustic label sequence containing contextual description information, and use each decision tree clustered in the training process to predict and Each pronunciation and its context-related context HMM model are spliced into a sentence HMM model;

b、根据MIDI文件,获得歌词中每个音符的音高和音长,通过旋律控制模型得到相应的基频和时长,利用音符时长修改音节的频谱SP、非周期索引AP和基频F0的时长;b. Obtain the pitch and duration of each note in the lyrics according to the MIDI file, obtain the corresponding fundamental frequency and duration through the melody control model, and use the duration of the note to modify the spectrum SP, aperiodic index AP and duration of the fundamental frequency F0 of the syllable;

c、利用说话人相关的声学模型及STRAIGHT算法生成语句HMM模型中的关于频谱SP、非周期索引AP、时长、基频F0的参数序列,并合成出语音,再加入音乐伴奏,实现歌曲的合成。c. Use the speaker-related acoustic model and the STRAIGHT algorithm to generate the parameter sequence of the spectrum SP, aperiodic index AP, duration, and fundamental frequency F0 in the sentence HMM model, and synthesize the speech, and then add music accompaniment to realize the synthesis of songs .

进一步的,所述步骤C中所述利用基于HMM的语音合成系统合成出歌声所采用的语音分析与合成方法是以STRAIGHT算法为基础的,包括如下步骤:Further, the speech analysis and synthesis method that utilizes the speech synthesis system based on HMM to synthesize the singing voice described in the step C is based on the STRAIGHT algorithm, including the following steps:

首先输入说话人的语音信号,用STRAIGHT算法提取语音的基频F0和谱包络Spectral envelope,然后对声学参数进行调制,产生新的声源和时变滤波器,再根据原滤波器模型,采用如下式合成语音:First input the speaker's speech signal, use the STRAIGHT algorithm to extract the fundamental frequency F0 and the spectral envelope of the speech, and then modulate the acoustic parameters to generate a new sound source and time-varying filter, and then according to the original filter model, Speech is synthesized as follows:

其中,Q表示在合成激励中的一组样点的位置,G()表示音高调制,可以任意的与原始语音的F0来匹配调制后的F0,全通滤波器用于控制精细音高和原信号的时间结构,如一个与频率成正比的线性相位移,用于控制F0的精细结构,从调制幅度谱A(S(u(w),r(t)),u(w),r(t))如下式,可以计算得到最小相位脉冲相应的傅里叶变换V(w,ti),其中A()、u()和r()分别表示幅度、频率和时间维的调制;Among them, Q represents the position of a group of sample points in the synthetic excitation, G() represents the pitch modulation, and the modulated F0 can be matched with the F 0of the original speech arbitrarily, and the all-pass filter is used to control the fine pitch and the temporal structure of the original signal, such as a linear phase shift proportional to frequency, is used to control the fine structure of F0 , from the modulation amplitude spectrum A(S(u(w),r(t)),u(w) ,r(t)) as the following formula, the Fourier transform V(w,ti ) corresponding to the minimum phase pulse can be calculated, where A(), u() and r() represent the amplitude, frequency and time dimension respectively modulation;

其中,q表示频率。Among them, q represents the frequency.

一种基于HMM的歌曲合成装置,其特征在于,包括:A kind of song synthesis device based on HMM, is characterized in that, comprises:

旋律控制模块,用于建立歌声的旋律控制模型;The melody control module is used to establish the melody control model of the singing voice;

基于HMM的说话人相关的声学模块,用于建立面向歌曲合成的说话人相关的声学模型;Speaker-related acoustic module based on HMM, used to establish a speaker-related acoustic model for song synthesis;

基于HMM的歌声合成模块,用于合成待合成的歌声语音。The singing voice synthesis module based on HMM is used for synthesizing the singing voice to be synthesized.

进一步的,所述旋律控制模块,包括:Further, the melody control module includes:

MIDI分析单元,用于分析从MIDI文件中提取的乐谱信息,并获得相应的音乐参数信息;A MIDI analysis unit is used to analyze the score information extracted from the MIDI file, and obtain corresponding music parameter information;

韵律控制单元,用于根据语音和歌声在声学特征的差异性,建立歌声的旋律控制模型。The prosody control unit is used to establish a melody control model of the singing voice according to the differences in acoustic features between the voice and the singing voice.

进一步的,所述基于HMM的说话人相关的声学模块,包括:Further, the HMM-based speaker-related acoustic module includes:

声学模型单元,用于得到目标说话人的声学模型;The acoustic model unit is used to obtain the acoustic model of the target speaker;

声学参数子单元,用于基于HMM的参数语音合成。Acoustic parameter subunit for HMM-based parametric speech synthesis.

进一步的,所述基于HMM的歌声合成模块,包括:Further, the described HMM-based singing voice synthesis module includes:

文本分析单元,对输入的歌词文本进行文本分析,获得上下文相关的标注;A text analysis unit performs text analysis on the input lyrics text to obtain context-related annotations;

HMM模型训练子单元,用于建立语音数据的HMM模型库;The HMM model training subunit is used to set up the HMM model library of voice data;

说话人自适应子单元,用于归一化和转换训练中说话人的特征参数,获得自适应模型;The speaker adaptive subunit is used to normalize and convert the characteristic parameters of the speaker in the training to obtain an adaptive model;

语音合成单元,用于合成待合成的歌声语音;Speech synthesis unit, used for synthesizing the singing voice to be synthesized;

歌声合成单元,用于对合成的歌声语音加入音乐伴奏,完成歌曲的合成。The singing voice synthesis unit is used for adding musical accompaniment to the synthesized singing voice to complete the synthesis of the song.

本发明具有的优点和积极效果是:一种基于HMM的歌曲合成方法及装置,运用TTS(文语转换)技术,通过HTS(基于隐马尔可夫模型的语音合成系统),并利用STRAIGHT算法,以及建立了面向歌曲合成的基于HMM说话人相关的声学模型、歌曲的旋律控制模型,并进行了说话人自适应训练,实现了一种基于HMM的歌词到歌曲实时转换的个性化语音合成装置。与传统的歌声合成系统相比,本系统运用的语音分析与合成方法是以STRAIGHT算法为基础,同时在训练阶段加入了说话人自适应训练过程,获得混合语音的平均音模型,通过这个训练过程,可以减小语音库中由于说话人的差异性所造成的影响,从而提高歌声合成的语音质量;在平均音模型的基础上,通过说话人自适应变换技术,利用少量的说话人语料,合成自然度和悦耳度都比较好的歌声语音。本系统装置丰富了语音合成的研究内容,使合成的语音更具表现力与情感的表达;尤其是给具有音乐爱好者提供了歌曲制作、音乐处理等技术操作的学习机会;增加了人们可使用的社会资源,具有一定的实用价值和重要的意义。The advantages and positive effects that the present invention has are: a kind of song synthesizing method and device based on HMM, use TTS (text-to-speech) technology, by HTS (speech synthesis system based on Hidden Markov Model), and utilize STRAIGHT algorithm, And established a HMM-based speaker-related acoustic model and song melody control model for song synthesis, and carried out speaker adaptive training, and realized a personalized speech synthesis device based on HMM for real-time conversion of lyrics to songs. Compared with the traditional singing voice synthesis system, the voice analysis and synthesis method used in this system is based on the STRAIGHT algorithm. At the same time, a speaker adaptive training process is added in the training phase to obtain the average tone model of the mixed voice. Through this training process , which can reduce the influence caused by the difference of speakers in the voice bank, thereby improving the voice quality of singing voice synthesis; on the basis of the average voice model, through the speaker adaptive transformation technology, using a small amount of speaker corpus, the synthesis The singing voice is relatively natural and melodious. This system device enriches the research content of speech synthesis, making the synthesized speech more expressive and emotional; especially providing music lovers with learning opportunities for technical operations such as song production and music processing; It has certain practical value and important significance.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention. Those skilled in the art can also obtain other drawings based on these drawings without creative work.

图1为本发明的一个较佳实施例的一种基于HMM的歌曲合成方法的系统流程框图;Fig. 1 is a system flow diagram of a kind of song synthesis method based on HMM of a preferred embodiment of the present invention;

图2为本发明的一个较佳实施例的MIDI系统框图;Fig. 2 is the MIDI system block diagram of a preferred embodiment of the present invention;

图3为本发明的一个较佳实施例的说话人自适应语音合成系统框图;Fig. 3 is a block diagram of the speaker adaptive speech synthesis system of a preferred embodiment of the present invention;

图4为本发明的一个较佳实施例的STRAIGHT分析-调制-合成系统框图;Fig. 4 is the STRAIGHT analysis-modulation-synthesis system block diagram of a preferred embodiment of the present invention;

图5为本发明的一个较佳实施例的一种基于HMM的歌曲合成实现的装置结构示意图。Fig. 5 is a schematic structural diagram of a device for implementing song synthesis based on HMM in a preferred embodiment of the present invention.

具体实施方式detailed description

下面将结合本发明中的附图,对本发明中的技术方案进行清楚、完整地描述,显然,所描述的仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动的前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solution in the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the present invention. Obviously, what is described is only a part of the embodiments of the present invention, not all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

如图1所示,本发明的一优选实施例公开了一种基于HMM的歌曲合成方法,运用TTS(文语转换)技术,通过HTS(基于隐马尔可夫模型的语音合成系统),并利用STRAIGHT算法,以及建立了面向歌曲合成的基于HMM说话人相关的声学模型、歌曲的旋律控制模型,并进行了说话人自适应训练,实现了一种基于HMM的歌词到歌曲实时转换的个性化语音合成方法。包括以下步骤:As shown in Figure 1, a preferred embodiment of the present invention discloses a kind of song synthesis method based on HMM, utilizes TTS (text-to-speech conversion) technology, by HTS (speech synthesis system based on Hidden Markov Model), and utilizes STRAIGHT algorithm, and established a HMM-based speaker-related acoustic model and song melody control model for song synthesis, and carried out speaker adaptive training, realizing a personalized voice that converts lyrics to songs based on HMM in real time resolve resolution. Include the following steps:

A、分析语音和歌声在声学特征的差异性,建立歌声的旋律控制模型;A. Analyze the differences in acoustic characteristics between speech and singing, and establish a melody control model for singing;

步骤A中所述分析语音和歌声在声学特征的差异性的具体步骤如下:The specific steps of analyzing voice and singing voice in the difference of acoustic feature described in step A are as follows:

a、运用时域分析法和频域分析法对语音信号进行谱分析,并将语音信号与歌声信号进行基频的对比分析;a, use time domain analysis method and frequency domain analysis method to carry out spectrum analysis to voice signal, and carry out comparative analysis of fundamental frequency of voice signal and singing voice signal;

b、利用MIDI技术从MIDI系统中提取出所需要的乐谱信息;b. Use MIDI technology to extract the required score information from the MIDI system;

c、通过读取MIDI文件中提取的乐谱的旋律信息,分析其乐谱文件的结构特征,进而获得音乐参数信息,所述音乐参数信息包括通道标号、音符音高、键的速度、音符起始时间和音符持续时间。c. By reading the melody information of the music score extracted from the MIDI file, analyzing the structural features of the music score file, and then obtaining the music parameter information, the music parameter information includes the channel label, the pitch of the note, the speed of the key, and the start time of the note and note duration.

如图2所示,为MIDI系统框图。As shown in Figure 2, it is a block diagram of the MIDI system.

步骤A中所述歌声的旋律控制模型包括基频控制模型和时长控制模型;利用基频控制模型将乐谱中的离散音高转换为连续的基频曲线,并利用时长控制模型获得歌唱音符的发音时长。The melody control model of the singing voice described in step A includes a fundamental frequency control model and a duration control model; utilize the fundamental frequency control model to convert the discrete pitch in the score into a continuous fundamental frequency curve, and utilize the duration control model to obtain the pronunciation of singing notes duration.

B、建立面向歌曲合成的基于HMM的说话人相关的声学模型;B. Establishing a speaker-related acoustic model based on HMM for song synthesis;

如图3所示,步骤B中所述建立面向歌曲合成的基于HMM的说话人相关的声学模型有如下步骤:As shown in Figure 3, the speaker-related acoustic model based on HMM for song synthesis described in step B has the following steps:

a、利用说话人的语音语料,分析语音数据,得到语音数据中包括基频F0、时长、频谱SP和非周期索引AP的声学参数;并利用基于HMM的说话人自适应训练技术,训练获得混合语音的平均音模型;a. Use the speaker's speech corpus to analyze the speech data, and obtain the acoustic parameters including the fundamental frequency F0, duration, spectrum SP and aperiodic index AP in the speech data; and use the HMM-based speaker adaptive training technology to train and obtain a mixture Average tone model of speech;

b、利用待合成的目标说话人的少量语音数据,通过说话人自适应变换技术,得到目标说话人的自适应声学模型,并对自适应模型进行修正与更新,进而合成出具有目标说话人音色的语音。b. Using a small amount of voice data of the target speaker to be synthesized, the adaptive acoustic model of the target speaker is obtained through the speaker adaptive transformation technology, and the adaptive model is corrected and updated, and then the timbre of the target speaker is synthesized voice.

如图3所示,所述通过基于HMM的说话人自适应训练,训练得到混合语音的平均音模型包括如下步骤:As shown in Fig. 3, described by the speaker adaptive training based on HMM, the average tone model that the training obtains mixed speech comprises the following steps:

a、对说话人的语料库和目标说话人的语料库数据进行语音分析,提取其声学参数:Mel倒谱系数,并计算它们的一阶差分和二阶差分;a. Perform phonetic analysis on the corpus of the speaker and the corpus of the target speaker, extract its acoustic parameters: Mel cepstral coefficients, and calculate their first-order difference and second-order difference;

b、结合上下文属性集,进行HMM模型训练,训练频谱和基频参数的HMM模型以及状态时长参数的多分布半隐马尔科夫模型MSD-HSMM;b. Combining the context attribute set, conduct HMM model training, train the HMM model of spectrum and fundamental frequency parameters and the multi-distribution semi-hidden Markov model MSD-HSMM of state duration parameters;

c、利用少量目标说话人的语音库,进行说话人自适应训练,获得混合语音的平均音模型,从而得到上下文相关的MSD-HSMM模型,包括:c. Using the voice library of a small number of target speakers, perform speaker adaptive training to obtain the average sound model of the mixed voice, so as to obtain the context-dependent MSD-HSMM model, including:

①采用约束最大似然线性回归(CMML)算法,将训练中说话人的语音数据和平均音之间的差异用线性回归函数表示;① Using the Constrained Maximum Likelihood Linear Regression (CMML) algorithm, the difference between the speech data of the speaker in training and the average voice is represented by a linear regression function;

②用一组状态输出分布和状态时长分布的线性回归方程归一化训练说话人之间的差异;② Normalize the differences between training speakers with a set of linear regression equations for state output distributions and state duration distributions;

③训练得到混合语音的平均音模型,从而得到上下文相关的MSD-HSMM模型。③Training to obtain the average sound model of mixed speech, so as to obtain the context-dependent MSD-HSMM model.

所述利用待合成的目标说话人的少量语音数据,通过说话人自适应变换技术,得到目标说话人的自适应声学模型,并对自适应模型进行修正与更新,包括如下步骤:The described method uses a small amount of voice data of the target speaker to be synthesized to obtain an adaptive acoustic model of the target speaker through speaker adaptive transformation technology, and corrects and updates the adaptive model, including the following steps:

a、说话人自适应训练后,利用基于HSMM的CMLLR自适应算法,计算得到说话人转换的状态输出概率分布以及时长概率分布的均值向量和协方差矩阵,状态i下特征向量o和状态时长d的变换方程为:a. After speaker adaptive training, use the HSMM-based CMLLR adaptive algorithm to calculate the speaker transition state output probability distribution and the mean vector and covariance matrix of the duration probability distribution, the eigenvector o and the state duration d in state i The transformation equation of is:

bi(o)=N(o;Aui-b,AΣiAT)=|A-1|N(Wξ;uii)bi (o)=N(o;Aui -b,AΣi AT )=|A-1 |N(Wξ;uii )

pi(d)=N(d;αmi-β,ασi2α)=|α-1|N(αψ;mii2)pi (d)=N(d;αmi -β,ασi2 α)=|α-1 |N(αψ;mii2 )

其中,ξ=[oT,1],ψ=[d,1]T,μi为状态输出分布的均值,mi为时长分布的均值,Σi为对角协方差矩阵,为方差,W=[A-1b-1]为目标说话人状态输出概率密度分布的线性变换矩阵,X=[α-1-1]为状态时长概率密度分布的变换矩阵;Among them, ξ=[oT ,1], ψ=[d,1]T , μi is the mean value of the state output distribution,mi is the mean value of the time-length distribution, Σ iis the diagonal covariance matrix, is the variance, W=[A-1 b-1 ] is the linear transformation matrix of the target speaker state output probability density distribution, X=[α-1 , β-1 ] is the transformation matrix of the state duration probability density distribution;

b、通过基于HSMM的自适应变换算法,可对语音数据的频谱、基频和时长参数进行归一化和变换,对于长度为T的自适应数据O,可对变换Λ=(W,X)进行最大似然估计;b, through the adaptive transformation algorithm based on HSMM, can carry out normalization and transformation to the frequency spectrum, fundamental frequency and duration parameter of voice data, for the adaptive data O that length is T, can transform Λ=(W, X) Perform maximum likelihood estimation;

c、采用最大后验MAP算法对语音的自适应模型进行了修正和更新,对于给定HSMM的参数集λ,若其前向概率和后向概率分别为:αt(i)和βt(i),则其在状态i下连续观测序列ot-d+1…ot的生成概率为:c. The maximum a posteriori MAP algorithm is used to modify and update the adaptive model of speech. For a given parameter set λ of HSMM, if its forward probability and backward probability are: αt (i) and βt ( i), then its generation probability of continuous observation sequence ot-d+1 ...ot in state i for:

MAP估计描述如下:The MAP estimation is described as follows:

其中,为线性回归变换后的均值向量,ω和τ分别为状态输出和时长分布的MAP估计参数,为自适应均值向量的加权平均MAP估计值。in, with is the mean vector after linear regression transformation, ω and τ are the MAP estimation parameters of the state output and time-length distribution, respectively, with is the adaptive mean vector with The weighted average MAP estimate of .

C、利用基于HMM的语音合成系统合成出歌声。C. Utilize the speech synthesis system based on HMM to synthesize the singing voice.

步骤C中所述利用基于HMM的语音合成系统合成出歌声所采用的语音分析与合成方法是以STRAIGHT算法为基础的。The voice analysis and synthesis method used in step C to synthesize the singing voice by using the HMM-based voice synthesis system is based on the STRAIGHT algorithm.

步骤C中所述利用基于HMM的语音合成系统合成出歌声包括如下步骤:a、使用文本分析工具对输入的歌词文本进行分析,利用文本分析程序将给定的歌词文本转换为包含语境描述信息的声学标注序列,用训练过程中聚类得到的各个决策树来预测与每个发音及其语境相关的上下文HMM模型,再拼接成一个语句HMM模型;Described in the step C utilizes the speech synthesis system based on HMM to synthesize the singing voice and comprises the following steps: a, use the text analysis tool to analyze the lyrics text of input, utilize the text analysis program to convert the given lyrics text to include context description information Acoustic labeling sequence of , using each decision tree clustered in the training process to predict the context HMM model related to each pronunciation and its context, and then splicing into a sentence HMM model;

b、根据MIDI文件,获得歌词中每个音符的音高和音长,通过旋律控制模型得到相应的基频和时长,利用音符时长修改音节的频谱SP、非周期索引AP和基频F0的时长;b. Obtain the pitch and duration of each note in the lyrics according to the MIDI file, obtain the corresponding fundamental frequency and duration through the melody control model, and use the duration of the note to modify the spectrum SP, aperiodic index AP and duration of the fundamental frequency F0 of the syllable;

c、利用说话人相关的声学模型及STRAIGHT算法生成语句HMM模型中的关于频谱SP、非周期索引AP、时长、基频F0的参数序列,并合成出语音,再加入音乐伴奏,实现歌曲的合成。c. Use the speaker-related acoustic model and the STRAIGHT algorithm to generate the parameter sequence of the spectrum SP, aperiodic index AP, duration, and fundamental frequency F0 in the sentence HMM model, and synthesize the speech, and then add music accompaniment to realize the synthesis of songs .

如图4所示,在歌声语音合成的过程中,利用STRAIGHT分析-调制-合成系统来准确的提取基频信息、排除谱包络周期性的干扰,所述步骤C中所述利用基于HMM的语音合成系统合成出歌声所采用的语音分析与合成方法是以STRAIGHT算法为基础的,包括如下步骤:As shown in Figure 4, in the process of singing speech synthesis, use STRAIGHT analysis-modulation-synthesis system to accurately extract the fundamental frequency information, get rid of the periodic interference of the spectrum envelope, and use the HMM-based The speech analysis and synthesis method adopted by the speech synthesis system to synthesize the singing voice is based on the STRAIGHT algorithm, including the following steps:

首先输入说话人的语音信号,用STRAIGHT算法提取语音的基频F0和谱包络Spectral envelope,然后对声学参数进行调制,产生新的声源和时变滤波器,再根据原滤波器模型,采用如下式合成语音:First input the speaker's speech signal, use the STRAIGHT algorithm to extract the fundamental frequency F0 and the spectral envelope of the speech, and then modulate the acoustic parameters to generate a new sound source and time-varying filter, and then according to the original filter model, Speech is synthesized as follows:

其中,Q表示在合成激励中的一组样点的位置,G()表示音高调制,可以任意的与原始语音的F0来匹配调制后的F0,全通滤波器用于控制精细音高和原信号的时间结构,如一个与频率成正比的线性相位移,用于控制F0的精细结构,从调制幅度谱A(S(u(w),r(t)),u(w),r(t))如下式,可以计算得到最小相位脉冲相应的傅里叶变换V(w,ti),其中A()、u()和r()分别表示幅度、频率和时间维的调制;Among them, Q represents the position of a group of sample points in the synthetic excitation, G() represents the pitch modulation, and the modulated F0 can be matched with the F 0of the original speech arbitrarily, and the all-pass filter is used to control the fine pitch and the temporal structure of the original signal, such as a linear phase shift proportional to frequency, is used to control the fine structure of F0 , from the modulation amplitude spectrum A(S(u(w),r(t)),u(w) ,r(t)) as the following formula, the Fourier transform V(w,ti ) corresponding to the minimum phase pulse can be calculated, where A(), u() and r() represent the amplitude, frequency and time dimension respectively modulation;

其中,q表示频率。Among them, q represents the frequency.

与上述方法相对应,本发明的另一优选实施例还公开了一种基于HMM的歌曲合成装置,该装置用于建立面向歌曲合成的基于HMM说话人相关的声学模型、歌曲的旋律控制模型,进行说话人自适应训练,并通过利用STRAIGHT算法的HTS(基于隐马尔可夫模型的语音合成系统),结合TTS(文语转换)技术,实现了歌词到歌曲的个性化实时转换。在实现上,可通过软件、硬件或软硬件结合方式实现本装置的功能。Corresponding to the above-mentioned method, another preferred embodiment of the present invention also discloses a song synthesis device based on HMM, which is used to establish a speaker-related acoustic model based on HMM for song synthesis, and a melody control model of the song, Carry out speaker adaptive training, and realize personalized real-time conversion from lyrics to songs through HTS (Hidden Markov Model-based Speech Synthesis System) using STRAIGHT algorithm, combined with TTS (Text-to-Speech) technology. In terms of implementation, the functions of the device can be realized by software, hardware or a combination of software and hardware.

如图5所示,所述歌曲合成装置包括:旋律控制模块,基于HMM的说话人相关的声学模块和基于HMM的歌声合成模块。As shown in FIG. 5 , the song synthesis device includes: a melody control module, a speaker-related acoustic module based on HMM and a singing voice synthesis module based on HMM.

旋律控制模块,用于建立歌声的旋律控制模型;The melody control module is used to establish the melody control model of the singing voice;

所述旋律控制模块,包括:The melody control module includes:

MIDI分析单元,用于分析从MIDI文件中提取的乐谱信息,并获得相应的音乐参数信息;A MIDI analysis unit is used to analyze the score information extracted from the MIDI file, and obtain corresponding music parameter information;

韵律控制单元,用于根据语音和歌声在声学特征的差异性,建立歌声的旋律控制模型。The prosody control unit is used to establish a melody control model of the singing voice according to the differences in acoustic features between the voice and the singing voice.

通过MIDI分析单元,分析从MIDI文件中提取的乐谱信息,并获得相应的音乐参数信息;然后在旋律控制模块,根据语音和歌声在声学特征的差异性,建立歌声的旋律控制模型。Through the MIDI analysis unit, the score information extracted from the MIDI file is analyzed, and the corresponding music parameter information is obtained; then, in the melody control module, the melody control model of the singing voice is established according to the differences in the acoustic characteristics of the voice and singing voice.

基于HMM的说话人相关的声学模块,用于建立面向歌曲合成的说话人相关的声学模型;Speaker-related acoustic module based on HMM, used to establish a speaker-related acoustic model for song synthesis;

所述基于HMM的说话人相关的声学模块,包括:The HMM-based speaker-related acoustic module includes:

声学模型单元,用于得到目标说话人的声学模型;The acoustic model unit is used to obtain the acoustic model of the target speaker;

声学参数子单元,用于基于HMM的参数语音合成。Acoustic parameter subunit for HMM-based parametric speech synthesis.

基于HMM的歌声合成模块,用于合成待合成的歌声语音。The singing voice synthesis module based on HMM is used for synthesizing the singing voice to be synthesized.

所述基于HMM的歌声合成模块,包括:The singing voice synthesis module based on HMM includes:

文本分析单元,对输入的歌词文本进行文本分析,获得上下文相关的标注;A text analysis unit performs text analysis on the input lyrics text to obtain context-related annotations;

HMM模型训练子单元,用于建立语音数据的HMM模型库,通过提取语音库中语音数据的说话人声学参数,主要是提取基频、频谱和时长参数,并结合音库的上下文标注信息,训练声学模型的统计模型,再根据上下文属性集,确定基频、频谱和时长参数;The HMM model training subunit is used to establish the HMM model library of speech data. By extracting the speaker's acoustic parameters of the speech data in the speech database, it mainly extracts the fundamental frequency, frequency spectrum and duration parameters, and combines the context annotation information of the sound database to train The statistical model of the acoustic model, and then determine the fundamental frequency, spectrum and duration parameters according to the context attribute set;

说话人自适应子单元,用于归一化和转换训练中说话人的特征参数,获得自适应模型,通过说话人训练,归一化训练中说话人和平均音模型之间状态输出分布和状态时长分布之间的差异,并采用最大似然线性回归算法确定多说话人混合语音的平均音模型,再利用自适应数据,计算说话人的状态输出概率分布以及时长概率分布的均值向量和协方差矩阵,并将其向目标说话人模型进行转化,从而建立目标说话人的MSD-HSMM的自适应模型;The speaker adaptive subunit is used to normalize and convert the characteristic parameters of the speaker in training to obtain an adaptive model. Through speaker training, the state output distribution and state between the speaker and the average tone model in normalization training are The difference between the duration distribution, and the maximum likelihood linear regression algorithm to determine the average sound model of multi-speaker mixed speech, and then use the adaptive data to calculate the speaker's state output probability distribution and the mean vector and covariance of the duration probability distribution Matrix, and transform it to the target speaker model, so as to establish the adaptive model of MSD-HSMM of the target speaker;

语音合成单元,用于合成待合成的歌声语音,利用修正的自适应模型,预测输入文本歌词的语音参数,并提取语音声学参数,再通过基于STRAIGHT算法的语音合成器合成歌声语音;Speech synthesis unit, for synthesizing the singing voice to be synthesized, using the modified adaptive model, predicting the voice parameters of the input text lyrics, and extracting the voice acoustic parameters, and then synthesizing the singing voice by the voice synthesizer based on the STRAIGHT algorithm;

歌声合成单元,用于对合成的歌声语音加入音乐伴奏,完成歌曲的合成。The singing voice synthesis unit is used for adding musical accompaniment to the synthesized singing voice to complete the synthesis of the song.

以上所述的方法过程可通过程序指令相关的硬件完成,所述的程序可以存储在可读取的存储介质中,该程序在执行时执行上述方法中的相应步骤。The above-mentioned method process can be completed by program instructions related hardware, and the program can be stored in a readable storage medium, and the corresponding steps in the above-mentioned method are executed when the program is executed.

以上内容是结合具体的优选实施方式对本发明所作的进一步详细说明,不能认定本发明的具体实施只局限于这些说明。对于本发明所属技术领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干简单推演或替换,都应当视为属于本发明的保护范围。The above content is a further detailed description of the present invention in conjunction with specific preferred embodiments, and it cannot be assumed that the specific implementation of the present invention is limited to these descriptions. For those of ordinary skill in the technical field of the present invention, without departing from the concept of the present invention, some simple deduction or replacement can be made, which should be regarded as belonging to the protection scope of the present invention.

Claims (13)

CN201710160104.2A2017-03-172017-03-17A kind of song synthetic method and device based on HMMPendingCN106971703A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201710160104.2ACN106971703A (en)2017-03-172017-03-17A kind of song synthetic method and device based on HMM

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201710160104.2ACN106971703A (en)2017-03-172017-03-17A kind of song synthetic method and device based on HMM

Publications (1)

Publication NumberPublication Date
CN106971703Atrue CN106971703A (en)2017-07-21

Family

ID=59329007

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201710160104.2APendingCN106971703A (en)2017-03-172017-03-17A kind of song synthetic method and device based on HMM

Country Status (1)

CountryLink
CN (1)CN106971703A (en)

Cited By (26)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN108831437A (en)*2018-06-152018-11-16百度在线网络技术(北京)有限公司A kind of song generation method, device, terminal and storage medium
CN108831435A (en)*2018-06-062018-11-16安徽继远软件有限公司A kind of emotional speech synthesizing method based on susceptible sense speaker adaptation
CN109036370A (en)*2018-06-062018-12-18安徽继远软件有限公司 A speaker speech adaptive training method
CN109068439A (en)*2018-07-302018-12-21上海应用技术大学A kind of light coloring control method and its control device based on MIDI theme
CN109147809A (en)*2018-09-202019-01-04广州酷狗计算机科技有限公司Acoustic signal processing method, device, terminal and storage medium
CN109147757A (en)*2018-09-112019-01-04广州酷狗计算机科技有限公司Song synthetic method and device
CN109192218A (en)*2018-09-132019-01-11广州酷狗计算机科技有限公司The method and apparatus of audio processing
CN109326280A (en)*2017-07-312019-02-12科大讯飞股份有限公司Singing synthesis method and device and electronic equipment
CN109801608A (en)*2018-12-182019-05-24武汉西山艺创文化有限公司A kind of song generation method neural network based and system
CN110164412A (en)*2019-04-262019-08-23吉林大学珠海学院A kind of music automatic synthesis method and system based on LSTM
CN110189741A (en)*2018-07-052019-08-30腾讯数码(天津)有限公司 Audio synthesis method, apparatus, storage medium and computer equipment
CN110264984A (en)*2019-05-132019-09-20北京奇艺世纪科技有限公司Model training method, music generating method, device and electronic equipment
CN110364140A (en)*2019-06-112019-10-22平安科技(深圳)有限公司Training method, device, computer equipment and the storage medium of song synthetic model
CN110634460A (en)*2018-06-212019-12-31卡西欧计算机株式会社 Electronic musical instrument, control method of electronic musical instrument, and storage medium
CN110634461A (en)*2018-06-212019-12-31卡西欧计算机株式会社 Electronic musical instrument, control method of electronic musical instrument, and storage medium
CN110838286A (en)*2019-11-192020-02-25腾讯科技(深圳)有限公司Model training method, language identification method, device and equipment
WO2020140390A1 (en)*2019-01-042020-07-09平安科技(深圳)有限公司Vibrato modeling method, device, computer apparatus and storage medium
CN111402843A (en)*2020-03-232020-07-10北京字节跳动网络技术有限公司Rap music generation method and device, readable medium and electronic equipment
CN111445892A (en)*2020-03-232020-07-24北京字节跳动网络技术有限公司Song generation method and device, readable medium and electronic equipment
CN112037757A (en)*2020-09-042020-12-04腾讯音乐娱乐科技(深圳)有限公司Singing voice synthesis method and device and computer readable storage medium
CN112309410A (en)*2020-10-302021-02-02北京有竹居网络技术有限公司Song sound repairing method and device, electronic equipment and storage medium
CN112420004A (en)*2019-08-222021-02-26北京峰趣互联网信息服务有限公司Method and device for generating songs, electronic equipment and computer readable storage medium
CN113035163A (en)*2021-05-112021-06-25杭州网易云音乐科技有限公司Automatic generation method and device of musical composition, storage medium and electronic equipment
CN113506554A (en)*2020-03-232021-10-15卡西欧计算机株式会社Electronic musical instrument and control method for electronic musical instrument
CN116001664A (en)*2022-12-122023-04-25瑞声声学科技(深圳)有限公司 Somatosensory in-vehicle reminder method, system and related equipment
US12299031B2 (en)2020-01-222025-05-13Petal Cloud Technology Co., Ltd.Audio generation method, related apparatus, and storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN101030369A (en)*2007-03-302007-09-05清华大学Built-in speech discriminating method based on sub-word hidden Markov model
CN101246685A (en)*2008-03-172008-08-20清华大学 Pronunciation Quality Evaluation Method in Computer Aided Language Learning System
CN101436403A (en)*2007-11-162009-05-20创新未来科技有限公司Method and system for recognizing tone
CN101516005A (en)*2008-02-232009-08-26华为技术有限公司Speech recognition channel selecting system, method and channel switching device
CN102982799A (en)*2012-12-202013-03-20中国科学院自动化研究所Speech recognition optimization decoding method integrating guide probability
CN102982803A (en)*2012-12-112013-03-20华南师范大学Isolated word speech recognition method based on HRSF and improved DTW algorithm
CN104217713A (en)*2014-07-152014-12-17西北师范大学Tibetan-Chinese speech synthesis method and device
CN105390133A (en)*2015-10-092016-03-09西北师范大学Tibetan TTVS system realization method
CN106128450A (en)*2016-08-312016-11-16西北师范大学The bilingual method across language voice conversion and system thereof hidden in a kind of Chinese

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN101030369A (en)*2007-03-302007-09-05清华大学Built-in speech discriminating method based on sub-word hidden Markov model
CN101436403A (en)*2007-11-162009-05-20创新未来科技有限公司Method and system for recognizing tone
CN101516005A (en)*2008-02-232009-08-26华为技术有限公司Speech recognition channel selecting system, method and channel switching device
CN101246685A (en)*2008-03-172008-08-20清华大学 Pronunciation Quality Evaluation Method in Computer Aided Language Learning System
CN102982803A (en)*2012-12-112013-03-20华南师范大学Isolated word speech recognition method based on HRSF and improved DTW algorithm
CN102982799A (en)*2012-12-202013-03-20中国科学院自动化研究所Speech recognition optimization decoding method integrating guide probability
CN104217713A (en)*2014-07-152014-12-17西北师范大学Tibetan-Chinese speech synthesis method and device
CN105390133A (en)*2015-10-092016-03-09西北师范大学Tibetan TTVS system realization method
CN106128450A (en)*2016-08-312016-11-16西北师范大学The bilingual method across language voice conversion and system thereof hidden in a kind of Chinese

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
冯欢: "《基于HMM的歌词到歌声转换的研究》", 《CNKI中国优秀硕士学位论文全文数据库(电子期刊)》*
吴义坚 等: "《基于HMM的可训练中文语音合成》", 《中文信息学报》*
张有为等: "《人机自然交互》", 30 September 2004*

Cited By (36)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN109326280B (en)*2017-07-312022-10-04科大讯飞股份有限公司Singing synthesis method and device and electronic equipment
CN109326280A (en)*2017-07-312019-02-12科大讯飞股份有限公司Singing synthesis method and device and electronic equipment
CN108831435A (en)*2018-06-062018-11-16安徽继远软件有限公司A kind of emotional speech synthesizing method based on susceptible sense speaker adaptation
CN109036370A (en)*2018-06-062018-12-18安徽继远软件有限公司 A speaker speech adaptive training method
CN108831437A (en)*2018-06-152018-11-16百度在线网络技术(北京)有限公司A kind of song generation method, device, terminal and storage medium
CN110634460A (en)*2018-06-212019-12-31卡西欧计算机株式会社 Electronic musical instrument, control method of electronic musical instrument, and storage medium
CN110634461A (en)*2018-06-212019-12-31卡西欧计算机株式会社 Electronic musical instrument, control method of electronic musical instrument, and storage medium
US12046225B2 (en)2018-07-052024-07-23Tencent Technology (Shenzhen) Company LimitedAudio synthesizing method, storage medium and computer equipment
WO2020007148A1 (en)*2018-07-052020-01-09腾讯科技(深圳)有限公司Audio synthesizing method, storage medium and computer equipment
CN110189741A (en)*2018-07-052019-08-30腾讯数码(天津)有限公司 Audio synthesis method, apparatus, storage medium and computer equipment
CN109068439A (en)*2018-07-302018-12-21上海应用技术大学A kind of light coloring control method and its control device based on MIDI theme
CN109147757A (en)*2018-09-112019-01-04广州酷狗计算机科技有限公司Song synthetic method and device
CN109192218A (en)*2018-09-132019-01-11广州酷狗计算机科技有限公司The method and apparatus of audio processing
CN109192218B (en)*2018-09-132021-05-07广州酷狗计算机科技有限公司Method and apparatus for audio processing
CN109147809A (en)*2018-09-202019-01-04广州酷狗计算机科技有限公司Acoustic signal processing method, device, terminal and storage medium
CN109801608A (en)*2018-12-182019-05-24武汉西山艺创文化有限公司A kind of song generation method neural network based and system
WO2020140390A1 (en)*2019-01-042020-07-09平安科技(深圳)有限公司Vibrato modeling method, device, computer apparatus and storage medium
CN110164412A (en)*2019-04-262019-08-23吉林大学珠海学院A kind of music automatic synthesis method and system based on LSTM
CN110264984A (en)*2019-05-132019-09-20北京奇艺世纪科技有限公司Model training method, music generating method, device and electronic equipment
CN110264984B (en)*2019-05-132021-07-06北京奇艺世纪科技有限公司Model training method, music generation method and device and electronic equipment
CN110364140A (en)*2019-06-112019-10-22平安科技(深圳)有限公司Training method, device, computer equipment and the storage medium of song synthetic model
CN110364140B (en)*2019-06-112024-02-06平安科技(深圳)有限公司Singing voice synthesis model training method, singing voice synthesis model training device, computer equipment and storage medium
CN112420004A (en)*2019-08-222021-02-26北京峰趣互联网信息服务有限公司Method and device for generating songs, electronic equipment and computer readable storage medium
CN110838286A (en)*2019-11-192020-02-25腾讯科技(深圳)有限公司Model training method, language identification method, device and equipment
CN110838286B (en)*2019-11-192024-05-03腾讯科技(深圳)有限公司Model training method, language identification method, device and equipment
US12299031B2 (en)2020-01-222025-05-13Petal Cloud Technology Co., Ltd.Audio generation method, related apparatus, and storage medium
CN111402843A (en)*2020-03-232020-07-10北京字节跳动网络技术有限公司Rap music generation method and device, readable medium and electronic equipment
CN113506554A (en)*2020-03-232021-10-15卡西欧计算机株式会社Electronic musical instrument and control method for electronic musical instrument
CN111402843B (en)*2020-03-232021-06-11北京字节跳动网络技术有限公司Rap music generation method and device, readable medium and electronic equipment
CN111445892A (en)*2020-03-232020-07-24北京字节跳动网络技术有限公司Song generation method and device, readable medium and electronic equipment
CN112037757B (en)*2020-09-042024-03-15腾讯音乐娱乐科技(深圳)有限公司Singing voice synthesizing method, singing voice synthesizing equipment and computer readable storage medium
CN112037757A (en)*2020-09-042020-12-04腾讯音乐娱乐科技(深圳)有限公司Singing voice synthesis method and device and computer readable storage medium
CN112309410A (en)*2020-10-302021-02-02北京有竹居网络技术有限公司Song sound repairing method and device, electronic equipment and storage medium
CN113035163B (en)*2021-05-112021-08-10杭州网易云音乐科技有限公司Automatic generation method and device of musical composition, storage medium and electronic equipment
CN113035163A (en)*2021-05-112021-06-25杭州网易云音乐科技有限公司Automatic generation method and device of musical composition, storage medium and electronic equipment
CN116001664A (en)*2022-12-122023-04-25瑞声声学科技(深圳)有限公司 Somatosensory in-vehicle reminder method, system and related equipment

Similar Documents

PublicationPublication DateTitle
CN106971703A (en)A kind of song synthetic method and device based on HMM
Humphrey et al.An introduction to signal processing for singing-voice analysis: High notes in the effort to automate the understanding of vocals in music
Gold et al.Speech and audio signal processing: processing and perception of speech and music
Umbert et al.Expression control in singing voice synthesis: Features, approaches, evaluation, and challenges
KR102168529B1 (en)Method and apparatus for synthesizing singing voice with artificial neural network
Ogawa et al.Tohoku kiritan singing database: A singing database for statistical parametric singing synthesis using japanese pop songs
JP2002023775A (en)Improvement of expressive power for voice synthesis
Koguchi et al.PJS: Phoneme-balanced japanese singing-voice corpus
US20240347037A1 (en)Method and apparatus for synthesizing unified voice wave based on self-supervised learning
Gupta et al.Deep learning approaches in topics of singing information processing
Huang et al.A Research of Automatic Composition and Singing Voice Synthesis System for Taiwanese Popular Songs
KimSinging voice analysis/synthesis
Chu et al.MPop600: A mandarin popular song database with aligned audio, lyrics, and musical scores for singing voice synthesis
Bonada et al.Singing voice synthesis combining excitation plus resonance and sinusoidal plus residual models
CN115273806A (en) Song synthesis model training method and device, song synthesis method and device
O’CallaghanMimetic instrumental resynthesis
Lee et al.A comparative study of spectral transformation techniques for singing voice synthesis.
Wada et al.Sequential generation of singing f0 contours from musical note sequences based on wavenet
TWI360108B (en)Method for synthesizing speech
JanerSinging-driven interfaces for sound synthesizers
Bonada et al.Spectral approach to the modeling of the singing voice
BlaauwModeling timbre for neural singing synthesis: methods for data-efficient, reduced effort voice creation, and fast and stable inference
JP2022065554A (en)Method for synthesizing voice and program
JP2022065566A (en) Speech synthesis methods and programs
Li et al.A lyrics to singing voice synthesis system with variable timbre

Legal Events

DateCodeTitleDescription
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
RJ01Rejection of invention patent application after publication

Application publication date:20170721

RJ01Rejection of invention patent application after publication

[8]ページ先頭

©2009-2025 Movatter.jp