Movatterモバイル変換


[0]ホーム

URL:


CN107221344A - A kind of speech emotional moving method - Google Patents

A kind of speech emotional moving method
Download PDF

Info

Publication number
CN107221344A
CN107221344ACN201710222674.XACN201710222674ACN107221344ACN 107221344 ACN107221344 ACN 107221344ACN 201710222674 ACN201710222674 ACN 201710222674ACN 107221344 ACN107221344 ACN 107221344A
Authority
CN
China
Prior art keywords
speech
emotion
emotional
voice
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710222674.XA
Other languages
Chinese (zh)
Inventor
李华康
杜阳阳
金旭
胡晓东
丘添元
张笑源
孙国梓
李涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Post and Telecommunication University
Original Assignee
Nanjing Post and Telecommunication University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Post and Telecommunication UniversityfiledCriticalNanjing Post and Telecommunication University
Priority to CN201710222674.XApriorityCriticalpatent/CN107221344A/en
Publication of CN107221344ApublicationCriticalpatent/CN107221344A/en
Pendinglegal-statusCriticalCurrent

Links

Classifications

Landscapes

Abstract

Translated fromChinese

本发明公开了一种语音情感迁移方法,首先基于语音数据库生成语音情感数据集,完成标签标注,然后采用语音特征参数模型对音频文件进行音频特征抽取,得到语音特征集;接下来采用机器学习工具对语音特征集与语音情感标签进行机器学习,构建情感模型库。选择待迁移的目标,从多媒体终端输入语音信号,得到当前语音信号的特征集,通过情感分类得到当前情感类别,判断与输入的目标是否一致,如果一致则将原始输入语音信号直接作为目标情感语音输出,否则进行特征情感迁移;最后经过语音合成加工生成最终目标情感语音输出。本发明提出的基于情感分类和特征迁移的方法,能够在不失原始说话人发声特征的前提下实现语音情感的变化。

The invention discloses a voice emotion transfer method. Firstly, a voice emotion data set is generated based on a voice database, labeling is completed, and then audio feature extraction is performed on an audio file by using a voice feature parameter model to obtain a voice feature set; and then a machine learning tool is used. Carry out machine learning on the speech feature set and speech emotion labels to build an emotion model library. Select the target to be migrated, input the voice signal from the multimedia terminal, obtain the feature set of the current voice signal, obtain the current emotion category through emotion classification, judge whether it is consistent with the input target, and if it is consistent, use the original input voice signal directly as the target emotional voice output, otherwise, the feature emotion transfer is performed; finally, the final target emotional voice output is generated through speech synthesis processing. The method based on emotion classification and feature migration proposed by the present invention can realize the change of speech emotion without losing the vocal characteristics of the original speaker.

Description

Translated fromChinese
一种语音情感迁移方法A Speech Emotion Transfer Method

技术领域technical field

本发明属于语音识别技术领域,涉及语音情感的迁移方法,具体涉及一种基于不同语音提供者模型的语音情感的迁移方法。The invention belongs to the technical field of speech recognition, and relates to a speech emotion transfer method, in particular to a speech emotion transfer method based on different speech provider models.

背景技术Background technique

随着智能芯片技术的发展,各种终端设备的智能化和集成化程度越来越高,设备的小型化、轻便化、网络化使得人们的生活越来越便捷。用户不断的通过网络终端进行语音视频的交流,积累了海量的多媒体数据。随着平台数据的积累,智能问答系统也逐渐应运而生。这些智能问答系统包括了语音识别、性感分析、信息检索、语义匹配、句子生成、语音合成等先端技术。With the development of smart chip technology, the intelligence and integration of various terminal equipment are getting higher and higher, and the miniaturization, portability and networking of equipment make people's life more and more convenient. Users continue to exchange voice and video through network terminals, accumulating massive amounts of multimedia data. With the accumulation of platform data, intelligent question answering systems have gradually emerged. These intelligent question answering systems include cutting-edge technologies such as speech recognition, sexy analysis, information retrieval, semantic matching, sentence generation, and speech synthesis.

语音识别技术是让机器通过识别技术和理解过程把语音信号转化为所对应的文本信息或者机器指令,让机器能够听懂人类的表达内容,主要包括语音单元选取、语音特征提取、模式匹配和模型训练等技术。语音单元包括单词(句)、音节和音速三种,具体按照场景和任务来选择。单词单元主要适合小词汇语音识别系统;音节单元更加适合于汉语语音识别;音素虽然能够很好地解释语音基本成分,但由于发音者的复杂多变导致无法得到稳定的数据集,目前仍在研究中。Speech recognition technology is to allow machines to convert speech signals into corresponding text information or machine instructions through recognition technology and understanding processes, so that machines can understand human expressions, mainly including speech unit selection, speech feature extraction, pattern matching and model training techniques. Speech units include words (sentences), syllables and speed of sound, which are selected according to scenarios and tasks. Word units are mainly suitable for small vocabulary speech recognition systems; syllable units are more suitable for Chinese speech recognition; although phonemes can well explain the basic components of speech, due to the complexity and change of speakers, stable data sets cannot be obtained, and are still being studied middle.

另一个研究方向是语音的情感识别,主要由语音信号采集、情感特征提取和情感识别组成。其中情感特征提取主要有韵律学特征、基于谱的相关特征和音质特征三种。这些特征一般以帧为最小粒度来实现提取,并以全局特征统计值的形式进行情感识别。在情感识别算法方面,主要包括离散语言情感分类器和维度语音情感预测器两大类。语音情感识别技术也被广泛应用于电话服务中心、驾驶员精神判别、远程网络课程等领域。Another research direction is speech emotion recognition, which mainly consists of speech signal acquisition, emotion feature extraction and emotion recognition. Among them, the emotional feature extraction mainly includes prosodic feature, spectrum-based correlation feature and sound quality feature. These features are generally extracted at the minimum granularity of the frame, and emotion recognition is performed in the form of global feature statistics. In terms of emotion recognition algorithms, it mainly includes two categories: discrete language emotion classifiers and dimensional speech emotion predictors. Speech emotion recognition technology is also widely used in telephone service centers, driver mental discrimination, remote online courses and other fields.

智能体被誉为是下一代人工智能的综合产物,不仅能够识别周围环境因素,理解人的行为表达和语言描述,甚至在与人的交流过程中,更需要去理解人的情感,并且能够实现模仿人的情感表达,才能实现更为柔和的交互。目前智能体的情感研究主要集中在基于虚拟图像处理,涉及计算机图形学、心理学、认知学、神经生理学、人工智能等多个领域有研究者的成果。据研究,人虽然90%以上的环境感知信息来自视觉,但是绝大部分的情感感知是来自语音。如何从语音领域建立类人智能体的情感体系,至今尚未有公开的研究发布。The intelligent body is hailed as the comprehensive product of the next generation of artificial intelligence. It can not only recognize the surrounding environmental factors, understand human behavior expressions and language descriptions, but even in the process of communicating with people, it needs to understand people's emotions, and can realize Only by imitating human emotional expression can a softer interaction be achieved. At present, the emotional research of agents is mainly based on virtual image processing, involving the achievements of researchers in computer graphics, psychology, cognition, neurophysiology, artificial intelligence and other fields. According to research, although more than 90% of the environmental perception information of people comes from vision, most of the emotional perception comes from voice. How to establish the emotional system of human-like agents from the field of speech has not yet been published.

发明内容Contents of the invention

本发明的目的是以机器学习方法为主要手段,提出一种人的语音情感表述方法,并在此基础上使用深度学习和卷积网络算法,从系统上实现语音情感的迁移。不仅对语音识别、情感分析提供了一定的借鉴方法,更能在未来类人智能体上得到广泛应用。The purpose of the present invention is to use the machine learning method as the main means to propose a human speech emotion expression method, and use deep learning and convolutional network algorithms on this basis to realize the transfer of speech emotion from the system. It not only provides a certain reference method for speech recognition and emotion analysis, but also can be widely used in future humanoid agents.

为实现上述目的,本发明提出的技术方案为一种语音情感迁移方法,具体包含以下步骤:In order to achieve the above object, the technical solution proposed by the present invention is a method for voice emotion transfer, which specifically includes the following steps:

步骤1、准备一个语音数据库,通过标准采样生成语音情感数据集S={s1,s2,…,sn};Step 1. Prepare a speech database and generate a speech emotion dataset S={s1 ,s2 ,...,sn } through standard sampling;

步骤2、采用人工方式对步骤1的语音数据库打标签,标注每个语音文件的情感E={e1,e2,…,en};Step 2. Manually label the voice database in step 1, labeling the emotion E={e1 , e2 ,...,en } of each voice file;

步骤3、采用语音特征参数模型对语音库中的每个音频文件si进行音频特征抽取,得到基本的语音特征集Fi={f1i,f2i,…,fni};Step 3. Use the speech feature parameter model to extract the audio features of each audio file si in the speech library, and obtain the basic speech feature set Fi ={f1i , f2i ,...,fni };

步骤4、采用机器学习工具对步骤3得到的每个语音特征集与步骤2得到的语音情感标签进行机器学习,得到每一类语音情感的特征模型,构建情感模型库EbStep 4, adopt machine learning tool to carry out machine learning to each speech feature set that step 3 obtains and the speech emotion label that step 2 obtains, obtain the characteristic model of each class speech emotion, construct emotion model storehouse Eb ;

步骤5、通过一个多媒体终端,选择需要语音情感迁移的目标Target;Step 5, through a multimedia terminal, select the target Target that needs voice emotion transfer;

步骤6、从多媒体终端输入语音信号stStep 6, input the voice signal st from the multimedia terminal;

步骤7、将当前输入的st输入到语音情感特征提取模块,得到当前语音信号的特征集Ft={f1t,f2t,…,fnt};Step 7. Input the currently inputst to the speech emotion feature extraction module to obtain the feature set Ft of the current speech signal = {f1t , f2t ,..., fnt };

步骤8、采用与步骤4相同的机器学习算法,将步骤7得到的st的语音特征集Ft结合步骤步骤4得到的情感模型库Eb进行情感分类,得到st的当前情感类别seStep 8. Using the same machine learning algorithm as in step 4, combine the speech feature set Ft ofst obtained in step 7 with the emotion model library Eb obtained in step 4 to perform emotion classification, and obtain the current emotion category se ofst ;

步骤9、判断步骤8得到的se和步骤5输入的Target是否一致,如果se=Targete,则将原始输入语音信号直接作为目标情感语音输出,如果seTargete,则调用步骤10进行特征情感迁移;Step 9, judging whether the se obtained in step 8 is consistent with the Target input in step 5, if se =Targete , then directly output the original input voice signal as the target emotional voice, if se Targete , then call step 10 to carry out Feature emotional transfer;

步骤10、将当前语音情感主要特征向情感模型库中的语音情感主要特征进行迁移;Step 10, the main feature of current voice emotion is transferred to the main feature of voice emotion in the emotion model library;

步骤11、采用语音合成算法对步骤10得到的特征迁移后的语音特征进行加工,合成最终目标情感语音输出。Step 11: Process the speech features obtained in step 10 after the feature transfer by using a speech synthesis algorithm, and synthesize the final target emotional speech output.

进一步,上述步骤1中,语音数据的采样频率为44.1KHz,录音时间在3~10s之间,并且保存为wav格式。Further, in the above step 1, the sampling frequency of the voice data is 44.1 KHz, the recording time is between 3 and 10 s, and it is saved in wav format.

步骤1中,为了获得较好的性能,采样数据的自然属性维度不能过于集中,采样数据尽量在不同年龄、性别、职业等人中采集。In step 1, in order to obtain better performance, the natural attribute dimensions of the sampled data should not be too concentrated, and the sampled data should be collected from people of different ages, genders, occupations, etc. as much as possible.

步骤6中,所述输入可以是实时输入,也可以是录制完成后点击递交。In step 6, the input can be real-time input, or click to submit after the recording is completed.

本发明具有以下有益效果:The present invention has the following beneficial effects:

1、本发明首先提出语音情感迁移的概念,可以为未来虚拟现实提供情感构建方法。1. The present invention first proposes the concept of speech emotion transfer, which can provide an emotion construction method for future virtual reality.

2、本发明提出的基于情感分类和特征迁移的方法,能够在不失原始说话人发声特征的前提下实现语音情感的变化。2. The method based on emotion classification and feature transfer proposed by the present invention can realize the change of speech emotion without losing the vocal characteristics of the original speaker.

附图说明Description of drawings

图1是本发明提供的语音情感迁移方法示意图。Fig. 1 is a schematic diagram of the speech emotion transfer method provided by the present invention.

图2是本发明原始输入语音样本的频谱特征图。Fig. 2 is a spectrum feature diagram of the original input speech sample in the present invention.

图3是本发明原始语音样本经过情感转化的频谱特征图。Fig. 3 is a spectrum feature map of the original voice sample after emotion transformation in the present invention.

具体实施方式detailed description

现结合附图对本发明作进一步详细的说明。The present invention is described in further detail now in conjunction with accompanying drawing.

本发明提供一种基于语音情感数据库的用户表达语音情感迁移方法,如图1所示,该方法涉及的模块或功能包括:The present invention provides a kind of user expression voice emotion migration method based on voice emotion database, as shown in Figure 1, the modules or functions involved in this method include:

基础语音库,存有不同年龄、性别、场景下的语音原始数据。The basic voice library contains original voice data of different ages, genders, and scenes.

标签库,对基础语音库进行情感标注,如平和、高兴、生气、愤怒、悲伤等。Tag library, which carries out emotional labeling on the basic voice library, such as peace, happiness, anger, anger, sadness, etc.

语音输入装置,如麦克风,可以实现用户的实时语音输入。The voice input device, such as a microphone, can realize the user's real-time voice input.

语音情感特征提取,通过声音特征分析工具,得到一般的声音特征,并根据人的语音信号特点以及情感表现特点,选取所需的特征集作为语音情感特征。Speech emotion feature extraction, through sound feature analysis tools, to obtain general sound features, and according to the characteristics of human speech signals and emotional performance, select the required feature set as the voice emotion feature.

机器学习,采用机器学习算法印证语音情感标签库,对语音情感特征集构建训练模型。Machine learning, using machine learning algorithms to verify the voice emotion label library, and construct a training model for the voice emotion feature set.

情感模型库,语音库数据通过机器学习得到的按照性别、年龄、情感等维度分类后的语音情感模型库。Emotion model library, speech emotion model library classified by gender, age, emotion and other dimensions obtained through machine learning from voice database data.

选择情感,用户在输入语音信号前选择需要将当前语音实时转化为的情感模式。To select emotion, the user selects the emotion mode that needs to be converted into the current voice in real time before inputting the voice signal.

情感类别判断,判断当前用户输入的情感是否与选择的情感一致。如果一致,则直接输出目标情感语音。如果不一致,调用情感迁移模块。Emotion category judgment, judging whether the emotion input by the current user is consistent with the selected emotion. If consistent, then directly output the target emotional voice. If inconsistent, call the emotion transfer module.

情感迁移,在用户输入语音和选择情感不一致的情况下,将输入语音情感特征集与选择情感特征集进行特征距离对比,调整输入语音情感特征空间表示,实现情感迁移。然后将调整好的情感语音作为目标情感语音输出。Emotion transfer, when the user's input voice and selected emotion are inconsistent, compare the feature distance between the input voice emotion feature set and the selected emotion feature set, adjust the input voice emotion feature space representation, and realize emotion transfer. Then the adjusted emotional speech is output as the target emotional speech.

现提供一个实施例,以说明语音情感的迁移过程,具体包含以下步骤:An embodiment is now provided to illustrate the transfer process of voice emotion, which specifically includes the following steps:

步骤1、该方法需要准备一个语音数据库,作为优选,语音数据采用标准采样44.1KHz,录下某个测试人员一句话,时间在3~10s之间,并且保存为wav格式,得到语音情感数据集S={s1,s2,…,sn}。为了获得较好的性能,采样数据尽力在不在年龄、性别、职业等人的自然属性维度过于集中。Step 1. This method needs to prepare a voice database. As a preference, the voice data adopts standard sampling 44.1KHz, and a tester’s sentence is recorded for 3 to 10 seconds, and saved in wav format to obtain a voice emotion data set S={s1 ,s2 ,...,sn }. In order to obtain better performance, the sampling data try not to be too concentrated in the dimensions of natural attributes such as age, gender, and occupation.

步骤2、采用人工的方式,对步骤1准备的语音数据库打标签,标注每个语音文件的情感E={e1,e2,…,en},如“担心”,“吃惊”,“生气”,“失望”,“悲伤”等Step 2. Manually label the voice database prepared in step 1, and mark the emotion E={e1 , e2 ,..., en } of each voice file, such as "worried", "surprised", "angry","disappointed","sad", etc.

步骤3、采用语音特征参数模型对语音库中每个音频文件si进行音频特征抽取,得到基本的语音特征集Fi={f1i,f2i,…,fni}等(图2所示为原始语音样本的频谱特征示意图),如”包络线(env)”,“语速(speed)”,”过零率(zcr)”,“能量(eng)”,“能量熵(eoe)”,“频谱质心(spec_cent)”,“频谱扩散(spec_spr)”,“梅尔频率(mfccs)”,“彩度向量(chrona)”等。Step 3. Use the speech feature parameter model to extract the audio features of each audio file si in the speech library, and obtain the basic speech feature set Fi = {f1i , f2i ,..., fnii }, etc. (Fig. 2 shows the schematic diagram of the spectrum features of the original speech sample), such as "envelope (env)", "speech speed (speed)", "zero-crossing rate (zcr)", "energy (eng)", "energy entropy (eoe)", "spectral centroid (spec_cent)", "spectral spread (spec_spr)", "mel frequency (mfccs)", "chroma vector (chrona)", etc.

步骤4、采用机器学习工具(如Libsvm)对步骤3得到的每个语音文件的特征集与步骤2所得到的语音情感标签进行机器学习,得到每一类语音情感的特征模型,构建情感模型库EbStep 4, adopt machine learning tools (such as Libsvm) to carry out machine learning to the feature set of each voice file obtained in step 3 and the voice emotion label obtained in step 2, obtain the feature model of each class of voice emotion, and build an emotion model library Eb .

步骤5、通过一个多媒体终端,选择需要语音情感迁移目标Targete,如“悲伤”。Step 5. Through a multimedia terminal, select the target Targete that needs voice emotion transfer, such as "sadness".

步骤6、从多媒体终端输入语音信号st,可以是实时输入,也可以是录制完成后点击递交。Step 6. Input the voice signal st from the multimedia terminal, which can be input in real time, or click to submit after the recording is completed.

步骤7、将当前输入的st输入到语音情感特征提取模块,得到当前语音信号的特征集Ft={f1t,f2t,…,fnt}。Step 7. Input the currently input st to the speech emotion feature extraction module to obtain the feature set Ft ={f1t ,f2t ,...,fnt } of the current speech signal.

步骤8、采用步骤4相同的机器学习算法,将步骤7得到的st的语音特征集Ft结合步骤步骤4得到的情感模型库Eb进行情感分类,得到st的当前情感类别seStep 8. Using the same machine learning algorithm as in step 4, combine the speech feature set Ft ofst obtained in step 7 with the emotion model library Eb obtained in step 4 to perform emotion classification, and obtain the current emotion category se ofst .

步骤9、判断步骤8得到的se和步骤5输入的Targete是否一致,如果se=Targete,则将原始输入语音信号直接作为目标情感语音输出。如果seI Targete,则调用步骤10进行特征情感迁移。Step 9, judging whether the se obtained in step 8 is consistent with the Targete input in step 5, if se =Targete , then directly output the original input speech signal as the target emotional speech. If se I Targete , call step 10 to perform feature emotion transfer.

步骤10、将当前语音情感主要特征向情感模型库中语音情感主要特征进行迁移(图3所示为迁移后的频谱特征),如包络线迁移resultenv=(senv+Targetenv)/2,语速调整resultspeed=(sspeed+Targetspeed)/2。Step 10, the main feature of current voice emotion is migrated to the main feature of voice emotion in the emotion model library (shown in Figure 3 is the spectrum feature after migration), such as envelope migration resultenv =(senv +Targetenv )/2 , Speech speed adjustment resultspeed = (sspeed + Targetspeed )/2.

步骤11、采用一个语音合成算法(基音同步叠加技术,PSOLA)对步骤10得到的特征迁移过的语音特征进行加工合成最终目标情感语音输出。Step 11, using a speech synthesis algorithm (pitch synchronous overlay technology, PSOLA) to process the speech features obtained in step 10 after feature transfer to synthesize the final target emotional speech output.

以上所述仅为本发明的优选实施案例而已,并不用于限制本发明,尽管参照前述实施例对本发明进行了详细的说明,对于本领域的技术人员来说,其依然可以对前述各实施例所记载的技术方案进行改进,或者对其中部分技术进行同等替换。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The above descriptions are only preferred implementation examples of the present invention, and are not intended to limit the present invention. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art can still understand the foregoing embodiments Improvements are made to the technical solutions described, or equivalent replacements are made to some of the technologies. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention shall be included within the protection scope of the present invention.

Claims (4)

CN201710222674.XA2017-04-072017-04-07A kind of speech emotional moving methodPendingCN107221344A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201710222674.XACN107221344A (en)2017-04-072017-04-07A kind of speech emotional moving method

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201710222674.XACN107221344A (en)2017-04-072017-04-07A kind of speech emotional moving method

Publications (1)

Publication NumberPublication Date
CN107221344Atrue CN107221344A (en)2017-09-29

Family

ID=59928228

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201710222674.XAPendingCN107221344A (en)2017-04-072017-04-07A kind of speech emotional moving method

Country Status (1)

CountryLink
CN (1)CN107221344A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
WO2019218773A1 (en)*2018-05-152019-11-21中兴通讯股份有限公司Voice synthesis method and device, storage medium, and electronic device
CN111951778A (en)*2020-07-152020-11-17天津大学 A low-resource approach to emotional speech synthesis using transfer learning
CN112786026A (en)*2019-12-312021-05-11深圳市木愚科技有限公司Parent-child story personalized audio generation system and method based on voice migration learning
CN113421544A (en)*2021-06-302021-09-21平安科技(深圳)有限公司Singing voice synthesis method and device, computer equipment and storage medium
CN113555004A (en)*2021-07-152021-10-26复旦大学Voice depression state identification method based on feature selection and transfer learning
CN114495988A (en)*2021-08-312022-05-13荣耀终端有限公司 Emotion processing method and electronic device for input information
CN116955572A (en)*2023-09-062023-10-27宁波尚煦智能科技有限公司Online service feedback interaction method based on artificial intelligence and big data system

Citations (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN1787074A (en)*2005-12-132006-06-14浙江大学Method for distinguishing speak person based on feeling shifting rule and voice correction
CN101064104A (en)*2006-04-242007-10-31中国科学院自动化研究所Emotion voice creating method based on voice conversion
CN101261832A (en)*2008-04-212008-09-10北京航空航天大学 Extraction and modeling method of emotional information in Chinese speech
CN102184731A (en)*2011-05-122011-09-14北京航空航天大学Method for converting emotional speech by combining rhythm parameters with tone parameters
CN103198827A (en)*2013-03-262013-07-10合肥工业大学Voice emotion correction method based on relevance of prosodic feature parameter and emotion parameter
CN103544963A (en)*2013-11-072014-01-29东南大学 A Speech Emotion Recognition Method Based on Kernel Semi-Supervised Discriminant Analysis

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN1787074A (en)*2005-12-132006-06-14浙江大学Method for distinguishing speak person based on feeling shifting rule and voice correction
CN101064104A (en)*2006-04-242007-10-31中国科学院自动化研究所Emotion voice creating method based on voice conversion
CN101261832A (en)*2008-04-212008-09-10北京航空航天大学 Extraction and modeling method of emotional information in Chinese speech
CN102184731A (en)*2011-05-122011-09-14北京航空航天大学Method for converting emotional speech by combining rhythm parameters with tone parameters
CN103198827A (en)*2013-03-262013-07-10合肥工业大学Voice emotion correction method based on relevance of prosodic feature parameter and emotion parameter
CN103544963A (en)*2013-11-072014-01-29东南大学 A Speech Emotion Recognition Method Based on Kernel Semi-Supervised Discriminant Analysis

Cited By (10)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
WO2019218773A1 (en)*2018-05-152019-11-21中兴通讯股份有限公司Voice synthesis method and device, storage medium, and electronic device
CN112786026A (en)*2019-12-312021-05-11深圳市木愚科技有限公司Parent-child story personalized audio generation system and method based on voice migration learning
CN112786026B (en)*2019-12-312024-05-07深圳市木愚科技有限公司Parent-child story personalized audio generation system and method based on voice transfer learning
CN111951778A (en)*2020-07-152020-11-17天津大学 A low-resource approach to emotional speech synthesis using transfer learning
CN111951778B (en)*2020-07-152023-10-17天津大学 A method of using transfer learning for emotional speech synthesis with low resources
CN113421544A (en)*2021-06-302021-09-21平安科技(深圳)有限公司Singing voice synthesis method and device, computer equipment and storage medium
CN113421544B (en)*2021-06-302024-05-10平安科技(深圳)有限公司Singing voice synthesizing method, singing voice synthesizing device, computer equipment and storage medium
CN113555004A (en)*2021-07-152021-10-26复旦大学Voice depression state identification method based on feature selection and transfer learning
CN114495988A (en)*2021-08-312022-05-13荣耀终端有限公司 Emotion processing method and electronic device for input information
CN116955572A (en)*2023-09-062023-10-27宁波尚煦智能科技有限公司Online service feedback interaction method based on artificial intelligence and big data system

Similar Documents

PublicationPublication DateTitle
Singh et al.A multimodal hierarchical approach to speech emotion recognition from audio and text
CN107993665B (en)Method for determining role of speaker in multi-person conversation scene, intelligent conference method and system
CN107221344A (en)A kind of speech emotional moving method
CN110910903B (en)Speech emotion recognition method, device, equipment and computer readable storage medium
CN112650831A (en)Virtual image generation method and device, storage medium and electronic equipment
CN113538636B (en) A virtual object control method, device, electronic device and medium
CN110634491A (en) System and method for tandem feature extraction for general speech tasks in speech signals
CN111145777A (en) A virtual image display method, device, electronic device and storage medium
CN110880198A (en)Animation generation method and device
CN107972028A (en)Man-machine interaction method, device and electronic equipment
CN104538025A (en)Method and device for converting gestures to Chinese and Tibetan bilingual voices
Wang et al.Comic-guided speech synthesis
CN113257225A (en)Emotional voice synthesis method and system fusing vocabulary and phoneme pronunciation characteristics
CN118646940A (en) Video generation method, device and system based on multimodal input
CN116597858A (en)Voice mouth shape matching method and device, storage medium and electronic equipment
CN118193702A (en)Intelligent man-machine interaction system and method for English teaching
CN118519524A (en)Virtual digital person system and method for learning disorder group
CN114550707A (en) A semantic-integrated speech emotion recognition method
Reddy et al.Indian sign language generation from live audio or text for tamil
CN117352000A (en)Speech classification method, device, electronic equipment and computer readable medium
CN118571229A (en)Voice labeling method and device for voice feature description
CN119763546B (en)Speech synthesis method, system, electronic device and storage medium
CN116226372A (en) Multimodal Speech Emotion Recognition Method Based on Bi-LSTM-CNN
CN116129868A (en)Method and system for generating structured photo
TWI574254B (en)Speech synthesis method and apparatus for electronic system

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
RJ01Rejection of invention patent application after publication
RJ01Rejection of invention patent application after publication

Application publication date:20170929


[8]ページ先頭

©2009-2025 Movatter.jp