技术领域technical field
本发明属于语音识别技术领域,涉及语音情感的迁移方法,具体涉及一种基于不同语音提供者模型的语音情感的迁移方法。The invention belongs to the technical field of speech recognition, and relates to a speech emotion transfer method, in particular to a speech emotion transfer method based on different speech provider models.
背景技术Background technique
随着智能芯片技术的发展,各种终端设备的智能化和集成化程度越来越高,设备的小型化、轻便化、网络化使得人们的生活越来越便捷。用户不断的通过网络终端进行语音视频的交流,积累了海量的多媒体数据。随着平台数据的积累,智能问答系统也逐渐应运而生。这些智能问答系统包括了语音识别、性感分析、信息检索、语义匹配、句子生成、语音合成等先端技术。With the development of smart chip technology, the intelligence and integration of various terminal equipment are getting higher and higher, and the miniaturization, portability and networking of equipment make people's life more and more convenient. Users continue to exchange voice and video through network terminals, accumulating massive amounts of multimedia data. With the accumulation of platform data, intelligent question answering systems have gradually emerged. These intelligent question answering systems include cutting-edge technologies such as speech recognition, sexy analysis, information retrieval, semantic matching, sentence generation, and speech synthesis.
语音识别技术是让机器通过识别技术和理解过程把语音信号转化为所对应的文本信息或者机器指令,让机器能够听懂人类的表达内容,主要包括语音单元选取、语音特征提取、模式匹配和模型训练等技术。语音单元包括单词(句)、音节和音速三种,具体按照场景和任务来选择。单词单元主要适合小词汇语音识别系统;音节单元更加适合于汉语语音识别;音素虽然能够很好地解释语音基本成分,但由于发音者的复杂多变导致无法得到稳定的数据集,目前仍在研究中。Speech recognition technology is to allow machines to convert speech signals into corresponding text information or machine instructions through recognition technology and understanding processes, so that machines can understand human expressions, mainly including speech unit selection, speech feature extraction, pattern matching and model training techniques. Speech units include words (sentences), syllables and speed of sound, which are selected according to scenarios and tasks. Word units are mainly suitable for small vocabulary speech recognition systems; syllable units are more suitable for Chinese speech recognition; although phonemes can well explain the basic components of speech, due to the complexity and change of speakers, stable data sets cannot be obtained, and are still being studied middle.
另一个研究方向是语音的情感识别,主要由语音信号采集、情感特征提取和情感识别组成。其中情感特征提取主要有韵律学特征、基于谱的相关特征和音质特征三种。这些特征一般以帧为最小粒度来实现提取,并以全局特征统计值的形式进行情感识别。在情感识别算法方面,主要包括离散语言情感分类器和维度语音情感预测器两大类。语音情感识别技术也被广泛应用于电话服务中心、驾驶员精神判别、远程网络课程等领域。Another research direction is speech emotion recognition, which mainly consists of speech signal acquisition, emotion feature extraction and emotion recognition. Among them, the emotional feature extraction mainly includes prosodic feature, spectrum-based correlation feature and sound quality feature. These features are generally extracted at the minimum granularity of the frame, and emotion recognition is performed in the form of global feature statistics. In terms of emotion recognition algorithms, it mainly includes two categories: discrete language emotion classifiers and dimensional speech emotion predictors. Speech emotion recognition technology is also widely used in telephone service centers, driver mental discrimination, remote online courses and other fields.
智能体被誉为是下一代人工智能的综合产物,不仅能够识别周围环境因素,理解人的行为表达和语言描述,甚至在与人的交流过程中,更需要去理解人的情感,并且能够实现模仿人的情感表达,才能实现更为柔和的交互。目前智能体的情感研究主要集中在基于虚拟图像处理,涉及计算机图形学、心理学、认知学、神经生理学、人工智能等多个领域有研究者的成果。据研究,人虽然90%以上的环境感知信息来自视觉,但是绝大部分的情感感知是来自语音。如何从语音领域建立类人智能体的情感体系,至今尚未有公开的研究发布。The intelligent body is hailed as the comprehensive product of the next generation of artificial intelligence. It can not only recognize the surrounding environmental factors, understand human behavior expressions and language descriptions, but even in the process of communicating with people, it needs to understand people's emotions, and can realize Only by imitating human emotional expression can a softer interaction be achieved. At present, the emotional research of agents is mainly based on virtual image processing, involving the achievements of researchers in computer graphics, psychology, cognition, neurophysiology, artificial intelligence and other fields. According to research, although more than 90% of the environmental perception information of people comes from vision, most of the emotional perception comes from voice. How to establish the emotional system of human-like agents from the field of speech has not yet been published.
发明内容Contents of the invention
本发明的目的是以机器学习方法为主要手段,提出一种人的语音情感表述方法,并在此基础上使用深度学习和卷积网络算法,从系统上实现语音情感的迁移。不仅对语音识别、情感分析提供了一定的借鉴方法,更能在未来类人智能体上得到广泛应用。The purpose of the present invention is to use the machine learning method as the main means to propose a human speech emotion expression method, and use deep learning and convolutional network algorithms on this basis to realize the transfer of speech emotion from the system. It not only provides a certain reference method for speech recognition and emotion analysis, but also can be widely used in future humanoid agents.
为实现上述目的,本发明提出的技术方案为一种语音情感迁移方法,具体包含以下步骤:In order to achieve the above object, the technical solution proposed by the present invention is a method for voice emotion transfer, which specifically includes the following steps:
步骤1、准备一个语音数据库,通过标准采样生成语音情感数据集S={s1,s2,…,sn};Step 1. Prepare a speech database and generate a speech emotion dataset S={s1 ,s2 ,...,sn } through standard sampling;
步骤2、采用人工方式对步骤1的语音数据库打标签,标注每个语音文件的情感E={e1,e2,…,en};Step 2. Manually label the voice database in step 1, labeling the emotion E={e1 , e2 ,...,en } of each voice file;
步骤3、采用语音特征参数模型对语音库中的每个音频文件si进行音频特征抽取,得到基本的语音特征集Fi={f1i,f2i,…,fni};Step 3. Use the speech feature parameter model to extract the audio features of each audio file si in the speech library, and obtain the basic speech feature set Fi ={f1i , f2i ,...,fni };
步骤4、采用机器学习工具对步骤3得到的每个语音特征集与步骤2得到的语音情感标签进行机器学习,得到每一类语音情感的特征模型,构建情感模型库Eb;Step 4, adopt machine learning tool to carry out machine learning to each speech feature set that step 3 obtains and the speech emotion label that step 2 obtains, obtain the characteristic model of each class speech emotion, construct emotion model storehouse Eb ;
步骤5、通过一个多媒体终端,选择需要语音情感迁移的目标Target;Step 5, through a multimedia terminal, select the target Target that needs voice emotion transfer;
步骤6、从多媒体终端输入语音信号st;Step 6, input the voice signal st from the multimedia terminal;
步骤7、将当前输入的st输入到语音情感特征提取模块,得到当前语音信号的特征集Ft={f1t,f2t,…,fnt};Step 7. Input the currently inputst to the speech emotion feature extraction module to obtain the feature set Ft of the current speech signal = {f1t , f2t ,..., fnt };
步骤8、采用与步骤4相同的机器学习算法,将步骤7得到的st的语音特征集Ft结合步骤步骤4得到的情感模型库Eb进行情感分类,得到st的当前情感类别se;Step 8. Using the same machine learning algorithm as in step 4, combine the speech feature set Ft ofst obtained in step 7 with the emotion model library Eb obtained in step 4 to perform emotion classification, and obtain the current emotion category se ofst ;
步骤9、判断步骤8得到的se和步骤5输入的Target是否一致,如果se=Targete,则将原始输入语音信号直接作为目标情感语音输出,如果seTargete,则调用步骤10进行特征情感迁移;Step 9, judging whether the se obtained in step 8 is consistent with the Target input in step 5, if se =Targete , then directly output the original input voice signal as the target emotional voice, if se Targete , then call step 10 to carry out Feature emotional transfer;
步骤10、将当前语音情感主要特征向情感模型库中的语音情感主要特征进行迁移;Step 10, the main feature of current voice emotion is transferred to the main feature of voice emotion in the emotion model library;
步骤11、采用语音合成算法对步骤10得到的特征迁移后的语音特征进行加工,合成最终目标情感语音输出。Step 11: Process the speech features obtained in step 10 after the feature transfer by using a speech synthesis algorithm, and synthesize the final target emotional speech output.
进一步,上述步骤1中,语音数据的采样频率为44.1KHz,录音时间在3~10s之间,并且保存为wav格式。Further, in the above step 1, the sampling frequency of the voice data is 44.1 KHz, the recording time is between 3 and 10 s, and it is saved in wav format.
步骤1中,为了获得较好的性能,采样数据的自然属性维度不能过于集中,采样数据尽量在不同年龄、性别、职业等人中采集。In step 1, in order to obtain better performance, the natural attribute dimensions of the sampled data should not be too concentrated, and the sampled data should be collected from people of different ages, genders, occupations, etc. as much as possible.
步骤6中,所述输入可以是实时输入,也可以是录制完成后点击递交。In step 6, the input can be real-time input, or click to submit after the recording is completed.
本发明具有以下有益效果:The present invention has the following beneficial effects:
1、本发明首先提出语音情感迁移的概念,可以为未来虚拟现实提供情感构建方法。1. The present invention first proposes the concept of speech emotion transfer, which can provide an emotion construction method for future virtual reality.
2、本发明提出的基于情感分类和特征迁移的方法,能够在不失原始说话人发声特征的前提下实现语音情感的变化。2. The method based on emotion classification and feature transfer proposed by the present invention can realize the change of speech emotion without losing the vocal characteristics of the original speaker.
附图说明Description of drawings
图1是本发明提供的语音情感迁移方法示意图。Fig. 1 is a schematic diagram of the speech emotion transfer method provided by the present invention.
图2是本发明原始输入语音样本的频谱特征图。Fig. 2 is a spectrum feature diagram of the original input speech sample in the present invention.
图3是本发明原始语音样本经过情感转化的频谱特征图。Fig. 3 is a spectrum feature map of the original voice sample after emotion transformation in the present invention.
具体实施方式detailed description
现结合附图对本发明作进一步详细的说明。The present invention is described in further detail now in conjunction with accompanying drawing.
本发明提供一种基于语音情感数据库的用户表达语音情感迁移方法,如图1所示,该方法涉及的模块或功能包括:The present invention provides a kind of user expression voice emotion migration method based on voice emotion database, as shown in Figure 1, the modules or functions involved in this method include:
基础语音库,存有不同年龄、性别、场景下的语音原始数据。The basic voice library contains original voice data of different ages, genders, and scenes.
标签库,对基础语音库进行情感标注,如平和、高兴、生气、愤怒、悲伤等。Tag library, which carries out emotional labeling on the basic voice library, such as peace, happiness, anger, anger, sadness, etc.
语音输入装置,如麦克风,可以实现用户的实时语音输入。The voice input device, such as a microphone, can realize the user's real-time voice input.
语音情感特征提取,通过声音特征分析工具,得到一般的声音特征,并根据人的语音信号特点以及情感表现特点,选取所需的特征集作为语音情感特征。Speech emotion feature extraction, through sound feature analysis tools, to obtain general sound features, and according to the characteristics of human speech signals and emotional performance, select the required feature set as the voice emotion feature.
机器学习,采用机器学习算法印证语音情感标签库,对语音情感特征集构建训练模型。Machine learning, using machine learning algorithms to verify the voice emotion label library, and construct a training model for the voice emotion feature set.
情感模型库,语音库数据通过机器学习得到的按照性别、年龄、情感等维度分类后的语音情感模型库。Emotion model library, speech emotion model library classified by gender, age, emotion and other dimensions obtained through machine learning from voice database data.
选择情感,用户在输入语音信号前选择需要将当前语音实时转化为的情感模式。To select emotion, the user selects the emotion mode that needs to be converted into the current voice in real time before inputting the voice signal.
情感类别判断,判断当前用户输入的情感是否与选择的情感一致。如果一致,则直接输出目标情感语音。如果不一致,调用情感迁移模块。Emotion category judgment, judging whether the emotion input by the current user is consistent with the selected emotion. If consistent, then directly output the target emotional voice. If inconsistent, call the emotion transfer module.
情感迁移,在用户输入语音和选择情感不一致的情况下,将输入语音情感特征集与选择情感特征集进行特征距离对比,调整输入语音情感特征空间表示,实现情感迁移。然后将调整好的情感语音作为目标情感语音输出。Emotion transfer, when the user's input voice and selected emotion are inconsistent, compare the feature distance between the input voice emotion feature set and the selected emotion feature set, adjust the input voice emotion feature space representation, and realize emotion transfer. Then the adjusted emotional speech is output as the target emotional speech.
现提供一个实施例,以说明语音情感的迁移过程,具体包含以下步骤:An embodiment is now provided to illustrate the transfer process of voice emotion, which specifically includes the following steps:
步骤1、该方法需要准备一个语音数据库,作为优选,语音数据采用标准采样44.1KHz,录下某个测试人员一句话,时间在3~10s之间,并且保存为wav格式,得到语音情感数据集S={s1,s2,…,sn}。为了获得较好的性能,采样数据尽力在不在年龄、性别、职业等人的自然属性维度过于集中。Step 1. This method needs to prepare a voice database. As a preference, the voice data adopts standard sampling 44.1KHz, and a tester’s sentence is recorded for 3 to 10 seconds, and saved in wav format to obtain a voice emotion data set S={s1 ,s2 ,...,sn }. In order to obtain better performance, the sampling data try not to be too concentrated in the dimensions of natural attributes such as age, gender, and occupation.
步骤2、采用人工的方式,对步骤1准备的语音数据库打标签,标注每个语音文件的情感E={e1,e2,…,en},如“担心”,“吃惊”,“生气”,“失望”,“悲伤”等Step 2. Manually label the voice database prepared in step 1, and mark the emotion E={e1 , e2 ,..., en } of each voice file, such as "worried", "surprised", "angry","disappointed","sad", etc.
步骤3、采用语音特征参数模型对语音库中每个音频文件si进行音频特征抽取,得到基本的语音特征集Fi={f1i,f2i,…,fni}等(图2所示为原始语音样本的频谱特征示意图),如”包络线(env)”,“语速(speed)”,”过零率(zcr)”,“能量(eng)”,“能量熵(eoe)”,“频谱质心(spec_cent)”,“频谱扩散(spec_spr)”,“梅尔频率(mfccs)”,“彩度向量(chrona)”等。Step 3. Use the speech feature parameter model to extract the audio features of each audio file si in the speech library, and obtain the basic speech feature set Fi = {f1i , f2i ,..., fnii }, etc. (Fig. 2 shows the schematic diagram of the spectrum features of the original speech sample), such as "envelope (env)", "speech speed (speed)", "zero-crossing rate (zcr)", "energy (eng)", "energy entropy (eoe)", "spectral centroid (spec_cent)", "spectral spread (spec_spr)", "mel frequency (mfccs)", "chroma vector (chrona)", etc.
步骤4、采用机器学习工具(如Libsvm)对步骤3得到的每个语音文件的特征集与步骤2所得到的语音情感标签进行机器学习,得到每一类语音情感的特征模型,构建情感模型库Eb。Step 4, adopt machine learning tools (such as Libsvm) to carry out machine learning to the feature set of each voice file obtained in step 3 and the voice emotion label obtained in step 2, obtain the feature model of each class of voice emotion, and build an emotion model library Eb .
步骤5、通过一个多媒体终端,选择需要语音情感迁移目标Targete,如“悲伤”。Step 5. Through a multimedia terminal, select the target Targete that needs voice emotion transfer, such as "sadness".
步骤6、从多媒体终端输入语音信号st,可以是实时输入,也可以是录制完成后点击递交。Step 6. Input the voice signal st from the multimedia terminal, which can be input in real time, or click to submit after the recording is completed.
步骤7、将当前输入的st输入到语音情感特征提取模块,得到当前语音信号的特征集Ft={f1t,f2t,…,fnt}。Step 7. Input the currently input st to the speech emotion feature extraction module to obtain the feature set Ft ={f1t ,f2t ,...,fnt } of the current speech signal.
步骤8、采用步骤4相同的机器学习算法,将步骤7得到的st的语音特征集Ft结合步骤步骤4得到的情感模型库Eb进行情感分类,得到st的当前情感类别se。Step 8. Using the same machine learning algorithm as in step 4, combine the speech feature set Ft ofst obtained in step 7 with the emotion model library Eb obtained in step 4 to perform emotion classification, and obtain the current emotion category se ofst .
步骤9、判断步骤8得到的se和步骤5输入的Targete是否一致,如果se=Targete,则将原始输入语音信号直接作为目标情感语音输出。如果seI Targete,则调用步骤10进行特征情感迁移。Step 9, judging whether the se obtained in step 8 is consistent with the Targete input in step 5, if se =Targete , then directly output the original input speech signal as the target emotional speech. If se I Targete , call step 10 to perform feature emotion transfer.
步骤10、将当前语音情感主要特征向情感模型库中语音情感主要特征进行迁移(图3所示为迁移后的频谱特征),如包络线迁移resultenv=(senv+Targetenv)/2,语速调整resultspeed=(sspeed+Targetspeed)/2。Step 10, the main feature of current voice emotion is migrated to the main feature of voice emotion in the emotion model library (shown in Figure 3 is the spectrum feature after migration), such as envelope migration resultenv =(senv +Targetenv )/2 , Speech speed adjustment resultspeed = (sspeed + Targetspeed )/2.
步骤11、采用一个语音合成算法(基音同步叠加技术,PSOLA)对步骤10得到的特征迁移过的语音特征进行加工合成最终目标情感语音输出。Step 11, using a speech synthesis algorithm (pitch synchronous overlay technology, PSOLA) to process the speech features obtained in step 10 after feature transfer to synthesize the final target emotional speech output.
以上所述仅为本发明的优选实施案例而已,并不用于限制本发明,尽管参照前述实施例对本发明进行了详细的说明,对于本领域的技术人员来说,其依然可以对前述各实施例所记载的技术方案进行改进,或者对其中部分技术进行同等替换。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The above descriptions are only preferred implementation examples of the present invention, and are not intended to limit the present invention. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art can still understand the foregoing embodiments Improvements are made to the technical solutions described, or equivalent replacements are made to some of the technologies. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention shall be included within the protection scope of the present invention.
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201710222674.XACN107221344A (en) | 2017-04-07 | 2017-04-07 | A kind of speech emotional moving method |
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201710222674.XACN107221344A (en) | 2017-04-07 | 2017-04-07 | A kind of speech emotional moving method |
| Publication Number | Publication Date |
|---|---|
| CN107221344Atrue CN107221344A (en) | 2017-09-29 |
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201710222674.XAPendingCN107221344A (en) | 2017-04-07 | 2017-04-07 | A kind of speech emotional moving method |
| Country | Link |
|---|---|
| CN (1) | CN107221344A (en) |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2019218773A1 (en)* | 2018-05-15 | 2019-11-21 | 中兴通讯股份有限公司 | Voice synthesis method and device, storage medium, and electronic device |
| CN111951778A (en)* | 2020-07-15 | 2020-11-17 | 天津大学 | A low-resource approach to emotional speech synthesis using transfer learning |
| CN112786026A (en)* | 2019-12-31 | 2021-05-11 | 深圳市木愚科技有限公司 | Parent-child story personalized audio generation system and method based on voice migration learning |
| CN113421544A (en)* | 2021-06-30 | 2021-09-21 | 平安科技(深圳)有限公司 | Singing voice synthesis method and device, computer equipment and storage medium |
| CN113555004A (en)* | 2021-07-15 | 2021-10-26 | 复旦大学 | Voice depression state identification method based on feature selection and transfer learning |
| CN114495988A (en)* | 2021-08-31 | 2022-05-13 | 荣耀终端有限公司 | Emotion processing method and electronic device for input information |
| CN116955572A (en)* | 2023-09-06 | 2023-10-27 | 宁波尚煦智能科技有限公司 | Online service feedback interaction method based on artificial intelligence and big data system |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN1787074A (en)* | 2005-12-13 | 2006-06-14 | 浙江大学 | Method for distinguishing speak person based on feeling shifting rule and voice correction |
| CN101064104A (en)* | 2006-04-24 | 2007-10-31 | 中国科学院自动化研究所 | Emotion voice creating method based on voice conversion |
| CN101261832A (en)* | 2008-04-21 | 2008-09-10 | 北京航空航天大学 | Extraction and modeling method of emotional information in Chinese speech |
| CN102184731A (en)* | 2011-05-12 | 2011-09-14 | 北京航空航天大学 | Method for converting emotional speech by combining rhythm parameters with tone parameters |
| CN103198827A (en)* | 2013-03-26 | 2013-07-10 | 合肥工业大学 | Voice emotion correction method based on relevance of prosodic feature parameter and emotion parameter |
| CN103544963A (en)* | 2013-11-07 | 2014-01-29 | 东南大学 | A Speech Emotion Recognition Method Based on Kernel Semi-Supervised Discriminant Analysis |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN1787074A (en)* | 2005-12-13 | 2006-06-14 | 浙江大学 | Method for distinguishing speak person based on feeling shifting rule and voice correction |
| CN101064104A (en)* | 2006-04-24 | 2007-10-31 | 中国科学院自动化研究所 | Emotion voice creating method based on voice conversion |
| CN101261832A (en)* | 2008-04-21 | 2008-09-10 | 北京航空航天大学 | Extraction and modeling method of emotional information in Chinese speech |
| CN102184731A (en)* | 2011-05-12 | 2011-09-14 | 北京航空航天大学 | Method for converting emotional speech by combining rhythm parameters with tone parameters |
| CN103198827A (en)* | 2013-03-26 | 2013-07-10 | 合肥工业大学 | Voice emotion correction method based on relevance of prosodic feature parameter and emotion parameter |
| CN103544963A (en)* | 2013-11-07 | 2014-01-29 | 东南大学 | A Speech Emotion Recognition Method Based on Kernel Semi-Supervised Discriminant Analysis |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2019218773A1 (en)* | 2018-05-15 | 2019-11-21 | 中兴通讯股份有限公司 | Voice synthesis method and device, storage medium, and electronic device |
| CN112786026A (en)* | 2019-12-31 | 2021-05-11 | 深圳市木愚科技有限公司 | Parent-child story personalized audio generation system and method based on voice migration learning |
| CN112786026B (en)* | 2019-12-31 | 2024-05-07 | 深圳市木愚科技有限公司 | Parent-child story personalized audio generation system and method based on voice transfer learning |
| CN111951778A (en)* | 2020-07-15 | 2020-11-17 | 天津大学 | A low-resource approach to emotional speech synthesis using transfer learning |
| CN111951778B (en)* | 2020-07-15 | 2023-10-17 | 天津大学 | A method of using transfer learning for emotional speech synthesis with low resources |
| CN113421544A (en)* | 2021-06-30 | 2021-09-21 | 平安科技(深圳)有限公司 | Singing voice synthesis method and device, computer equipment and storage medium |
| CN113421544B (en)* | 2021-06-30 | 2024-05-10 | 平安科技(深圳)有限公司 | Singing voice synthesizing method, singing voice synthesizing device, computer equipment and storage medium |
| CN113555004A (en)* | 2021-07-15 | 2021-10-26 | 复旦大学 | Voice depression state identification method based on feature selection and transfer learning |
| CN114495988A (en)* | 2021-08-31 | 2022-05-13 | 荣耀终端有限公司 | Emotion processing method and electronic device for input information |
| CN116955572A (en)* | 2023-09-06 | 2023-10-27 | 宁波尚煦智能科技有限公司 | Online service feedback interaction method based on artificial intelligence and big data system |
| Publication | Publication Date | Title |
|---|---|---|
| Singh et al. | A multimodal hierarchical approach to speech emotion recognition from audio and text | |
| CN107993665B (en) | Method for determining role of speaker in multi-person conversation scene, intelligent conference method and system | |
| CN107221344A (en) | A kind of speech emotional moving method | |
| CN110910903B (en) | Speech emotion recognition method, device, equipment and computer readable storage medium | |
| CN112650831A (en) | Virtual image generation method and device, storage medium and electronic equipment | |
| CN113538636B (en) | A virtual object control method, device, electronic device and medium | |
| CN110634491A (en) | System and method for tandem feature extraction for general speech tasks in speech signals | |
| CN111145777A (en) | A virtual image display method, device, electronic device and storage medium | |
| CN110880198A (en) | Animation generation method and device | |
| CN107972028A (en) | Man-machine interaction method, device and electronic equipment | |
| CN104538025A (en) | Method and device for converting gestures to Chinese and Tibetan bilingual voices | |
| Wang et al. | Comic-guided speech synthesis | |
| CN113257225A (en) | Emotional voice synthesis method and system fusing vocabulary and phoneme pronunciation characteristics | |
| CN118646940A (en) | Video generation method, device and system based on multimodal input | |
| CN116597858A (en) | Voice mouth shape matching method and device, storage medium and electronic equipment | |
| CN118193702A (en) | Intelligent man-machine interaction system and method for English teaching | |
| CN118519524A (en) | Virtual digital person system and method for learning disorder group | |
| CN114550707A (en) | A semantic-integrated speech emotion recognition method | |
| Reddy et al. | Indian sign language generation from live audio or text for tamil | |
| CN117352000A (en) | Speech classification method, device, electronic equipment and computer readable medium | |
| CN118571229A (en) | Voice labeling method and device for voice feature description | |
| CN119763546B (en) | Speech synthesis method, system, electronic device and storage medium | |
| CN116226372A (en) | Multimodal Speech Emotion Recognition Method Based on Bi-LSTM-CNN | |
| CN116129868A (en) | Method and system for generating structured photo | |
| TWI574254B (en) | Speech synthesis method and apparatus for electronic system |
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| RJ01 | Rejection of invention patent application after publication | ||
| RJ01 | Rejection of invention patent application after publication | Application publication date:20170929 |