




技术领域technical field
本发明涉及语音信号处理领域,尤其涉及智能会议角色分类的方法、装置、设备及存储介质。The present invention relates to the field of voice signal processing, and in particular, to a method, device, device and storage medium for intelligent conference role classification.
背景技术Background technique
在传统的会议模式中,会议记录等相关的信息需要管理人员进行手工录入,存在耗时和效率低的问题。为提高会议效率和实现会议纪要的实时发布,采用了智能会议记录系统。智能会议记录系统通过麦克风实时接收会议的语音内容,获得会议语音信息,对语音信息进行语音识别处理以将语音信息转化为文字信息。In the traditional meeting mode, relevant information such as meeting minutes needs to be manually entered by managers, which is time-consuming and inefficient. In order to improve meeting efficiency and realize the real-time release of meeting minutes, an intelligent meeting recording system is adopted. The intelligent conference recording system receives the voice content of the conference in real time through the microphone, obtains the voice information of the conference, and performs voice recognition processing on the voice information to convert the voice information into text information.
虽然现有的智能会议记录系统解决了人工手动记录所带来的问题,但是对于多人进行对话交谈的会议场景中,无法进行语音角色分离,即无法自动识别出语音信息的说话内容对应的具体说话人。Although the existing intelligent conference recording system solves the problems caused by manual recording, it is impossible to separate the voice roles in the conference scene where multiple people have a conversation, that is, it is impossible to automatically identify the specific content corresponding to the speech content of the voice information. speaker.
发明内容SUMMARY OF THE INVENTION
本发明提供了一种智能会议角色分类的方法、装置、设备及存储介质,旨在实现便捷而有效地进行多人会议场景的语音角色分离。The present invention provides a method, device, equipment and storage medium for intelligent conference role classification, aiming at realizing the convenient and effective separation of voice roles in a multi-person conference scene.
本发明实施例的第一方面提供一种智能会议角色分类的方法,包括:A first aspect of the embodiments of the present invention provides a method for classifying roles in an intelligent conference, including:
获取会议音频数据,并对所述会议音频数据进行分割获得多个候选音频数据,所述多个候选音频数据中的每个候选音频数据对应包括一个编号;Obtaining conference audio data, and dividing the conference audio data to obtain a plurality of candidate audio data, each candidate audio data in the plurality of candidate audio data correspondingly includes a number;
对所述多个候选音频数据中的每个候选音频数据分别进行断点识别,获得目标时间节点;Breakpoint identification is performed on each candidate audio data in the plurality of candidate audio data, respectively, to obtain a target time node;
根据所述目标时间节点从所述多个候选音频数据中截取预设时段的第一音频数据和第二音频数据;Intercepting the first audio data and the second audio data of a preset period from the plurality of candidate audio data according to the target time node;
分别对所述第一音频数据和所述第二音频数据进行特征参数提取,获得第一特征参数和第二特征参数;Perform feature parameter extraction on the first audio data and the second audio data respectively to obtain the first feature parameter and the second feature parameter;
将所述第一特征参数和所述第二特征参数进行说话人对比分析,获得目标特征参数;Perform speaker comparison analysis on the first feature parameter and the second feature parameter to obtain the target feature parameter;
根据预置的角色数据库和所述编号确定所述目标特征参数对应的目标角色。The target character corresponding to the target characteristic parameter is determined according to the preset character database and the serial number.
可选的,在本发明实施例第一方面的第二种实现方式中,所述对所述多个候选音频数据中的每个候选音频数据分别进行断点识别,获得目标时间节点,包括:Optionally, in the second implementation manner of the first aspect of the embodiment of the present invention, performing breakpoint identification on each candidate audio data in the plurality of candidate audio data to obtain a target time node, including:
分别对所述每个候选音频数据进行重要点检测,获得分段数量;Perform important point detection on each candidate audio data respectively to obtain the number of segments;
通过预置的时间序列分段算法根据所述分段数量对所述多个候选音频数据进行分段,得到分段数据,并获取所述分段数据对应的音频曲线的转折点;Segment the plurality of candidate audio data according to the number of segments by using a preset time-series segmentation algorithm to obtain segmented data, and obtain the turning point of the audio curve corresponding to the segmented data;
获取所述转折点的左侧相邻点和右侧相邻点;Obtain the left adjacent point and the right adjacent point of the turning point;
根据所述转折点计算第一斜率和第二斜率,所述第一斜率为所述转折点与所述左侧相邻点连线的斜率,所述第二斜率为所述转折点与所述右侧相邻点连线的斜率;A first slope and a second slope are calculated according to the turning point, where the first slope is the slope of the line connecting the turning point and the left adjacent point, and the second slope is the difference between the turning point and the right side the slope of the line connecting adjacent points;
计算所述第一斜率与所述第二斜率的差值;calculating the difference between the first slope and the second slope;
将所述差值大于第一预设阈值的转折点作为所述多个候选音频数据中的目标时间节点。A turning point where the difference value is greater than the first preset threshold is used as a target time node in the plurality of candidate audio data.
可选的,在本发明实施例第一方面的第一种实现方式中,所述根据所述目标时间节点从所述多个候选音频数据中截取预设时段的第一音频数据和第二音频数据,包括:Optionally, in a first implementation manner of the first aspect of the embodiment of the present invention, the first audio data and the second audio of a preset time period are intercepted from the plurality of candidate audio data according to the target time node. data, including:
将所述目标时间节点作为末端时间点,根据所述末端时间点从所述多个候选音频数据中截取预设时段的第一音频数据;Taking the target time node as an end time point, intercepting the first audio data of a preset period from the plurality of candidate audio data according to the end time point;
将所述目标时间节点作为始发时间点,根据所述始发时间点从所述多个候选音频数据中截取预设时段的第二音频数据。Taking the target time node as the originating time point, intercepting second audio data of a preset period from the plurality of candidate audio data according to the originating time point.
可选的,在本发明实施例第一方面的第三种实现方式中,所述将所述第一特征参数和所述第二特征参数进行说话人对比分析,获得目标特征参数,包括:Optionally, in a third implementation manner of the first aspect of the embodiment of the present invention, performing speaker comparison analysis on the first feature parameter and the second feature parameter to obtain target feature parameters, including:
对所述第一特征参数和所述第二特征参数进行时间序列相似度的对比分析,得到初始特征参数,所述初始特征参数包括所述第一特征参数和/或所述第二特征参数;performing a comparative analysis of the similarity in time series on the first feature parameter and the second feature parameter to obtain an initial feature parameter, where the initial feature parameter includes the first feature parameter and/or the second feature parameter;
获取候选对比特征参数,所述候选对比特征参数对应的时间节点大于所述目标时间节点;Obtain candidate comparison feature parameters, where the time node corresponding to the candidate comparison feature parameters is greater than the target time node;
计算所述候选对比特征参数与所述初始特征参数之间的相似度,获得对比特征参数,所述对比特征参数为相似度大于第二预设阈值的候选对比特征参数;Calculate the similarity between the candidate comparison feature parameter and the initial feature parameter, and obtain a comparison feature parameter, where the comparison feature parameter is a candidate comparison feature parameter whose similarity is greater than a second preset threshold;
对所述对比特征参数和所述初始特征参数进行说话人识别处理,得到目标特征参数。Perform speaker recognition processing on the comparison feature parameters and the initial feature parameters to obtain target feature parameters.
可选的,在本发明实施例第一方面的第四种实现方式中,所述根据预置的角色数据库和所述编号确定所述目标特征参数对应的目标角色,包括:Optionally, in a fourth implementation manner of the first aspect of the embodiment of the present invention, determining the target character corresponding to the target feature parameter according to a preset character database and the serial number includes:
将所述目标特征参数在预置的角色数据库中进行角色匹配;performing role matching on the target feature parameters in a preset role database;
若在所述角色数据库中匹配到所述目标特征参数对应的角色,则将匹配到的所述角色作为目标角色;If the character corresponding to the target characteristic parameter is matched in the character database, the matched character is used as the target character;
若在所述角色数据库中匹配不到所述目标特征参数对应的角色,则获取所述目标特征参数对应的候选音频数据的编号,将获取到的所述编号所对应的角色作为与所述目标特征参数对应的目标角色。If the character corresponding to the target characteristic parameter cannot be matched in the character database, the serial number of the candidate audio data corresponding to the target characteristic parameter is obtained, and the character corresponding to the obtained serial number is used as the character corresponding to the target characteristic parameter. The target character corresponding to the feature parameter.
可选的,在本发明实施例第一方面的第五种实现方式中,所述根据预置的角色数据库和所述编号确定所述目标特征参数对应的目标角色之后,还包括:Optionally, in a fifth implementation manner of the first aspect of the embodiment of the present invention, after the target character corresponding to the target feature parameter is determined according to the preset character database and the serial number, the method further includes:
获取所述目标角色对应的编号,并根据所述目标角色对应的更新所述角色数据库,以及将所述目标角色对应的所述候选音频数据转化为文字形式,得到会议记录角色信息。The number corresponding to the target role is obtained, the role database is updated according to the target role, and the candidate audio data corresponding to the target role is converted into text form to obtain the role information of the meeting record.
可选的,在本发明实施例第一方面的第六种实现方式中,所述获取会议音频数据,并对所述会议音频数据进行分割获得多个候选音频数据,所述多个候选音频数据包括多个编号,包括:Optionally, in the sixth implementation manner of the first aspect of the embodiment of the present invention, the meeting audio data is obtained, and the meeting audio data is divided to obtain multiple candidate audio data, the multiple candidate audio data Include multiple numbers, including:
获取会议音频数据,并按照预设第一帧长对所述会议音频数据进行说话人声音识别,获得待分类数据;Acquiring conference audio data, and performing speaker voice recognition on the conference audio data according to the preset first frame length to obtain data to be classified;
基于说话人分割算法按照预设第二帧长对所述待分类数据进行说话人分类,得到分类数据;Perform speaker classification on the data to be classified according to the preset second frame length based on the speaker segmentation algorithm, and obtain classified data;
按照预设计算时间对所述分类数据进行分割,得到多个初始音频数据;According to the preset calculation time, the classification data is divided to obtain a plurality of initial audio data;
对多个所述初始音频数据分别进行角色编号,得到多个候选音频数据。Character numbers are respectively performed on a plurality of the initial audio data to obtain a plurality of candidate audio data.
本发明实施例的第二方面提供一种用于智能会议角色分类的装置,具有实现对应于上述第一方面提供的智能会议角色分类的方法的功能。所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。硬件或软件包括一个或多个与上述功能相对应的模块,所述单元可以是软件和/或硬件。A second aspect of the embodiments of the present invention provides an apparatus for intelligent conference role classification, which has a function of implementing the method corresponding to the intelligent conference role classification method provided in the first aspect. The functions can be implemented by hardware, or can be implemented by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above functions, and the units may be software and/or hardware.
所述智能会议角色分类的装置包括:The device for classifying intelligent conference roles includes:
分割单元,用于获取会议音频数据,并对所述会议音频数据进行分割获得多个候选音频数据,所述多个候选音频数据中的每个候选音频数据对应包括一个编号;a dividing unit, configured to obtain conference audio data, and to obtain a plurality of candidate audio data by dividing the conference audio data, and each candidate audio data in the plurality of candidate audio data correspondingly includes a number;
识别单元,用于对所述多个候选音频数据中的每个候选音频数据分别进行断点识别,获得目标时间节点;An identification unit, used to identify each candidate audio data in the plurality of candidate audio data respectively by breakpoint identification to obtain a target time node;
截取单元,用于根据所述目标时间节点从所述多个候选音频数据中截取预设时段的第一音频数据和第二音频数据;an intercepting unit, configured to intercept the first audio data and the second audio data of a preset time period from the plurality of candidate audio data according to the target time node;
提取单元,用于分别对所述第一音频数据和所述第二音频数据进行特征参数提取,获得第一特征参数和第二特征参数;an extraction unit, configured to perform feature parameter extraction on the first audio data and the second audio data, respectively, to obtain the first feature parameter and the second feature parameter;
对比分析单元,用于将所述第一特征参数和所述第二特征参数进行说话人对比分析,获得目标特征参数;a comparative analysis unit, configured to perform speaker comparative analysis on the first feature parameter and the second feature parameter to obtain a target feature parameter;
获取单元,用于根据预置的角色数据库和所述编号确定所述目标特征参数对应的目标角色。an obtaining unit, configured to determine the target character corresponding to the target characteristic parameter according to the preset character database and the serial number.
可选的,在本发明实施例第二方面的第二种实现方式中,所述识别单元具体用于:Optionally, in the second implementation manner of the second aspect of the embodiment of the present invention, the identifying unit is specifically configured to:
分别对所述每个候选音频数据进行重要点检测,获得分段数量;Perform important point detection on each candidate audio data respectively to obtain the number of segments;
通过预置的时间序列分段算法根据所述分段数量对所述多个候选音频数据进行分段,得到分段数据,并获取所述分段数据对应的音频曲线的转折点;Segment the plurality of candidate audio data according to the number of segments by using a preset time-series segmentation algorithm to obtain segmented data, and obtain the turning point of the audio curve corresponding to the segmented data;
获取所述转折点的左侧相邻点和右侧相邻点;Obtain the left adjacent point and the right adjacent point of the turning point;
根据所述转折点计算第一斜率和第二斜率,所述第一斜率为所述转折点与所述左侧相邻点连线的斜率,所述第二斜率为所述转折点与所述右侧相邻点连线的斜率;A first slope and a second slope are calculated according to the turning point, where the first slope is the slope of the line connecting the turning point and the left adjacent point, and the second slope is the difference between the turning point and the right side the slope of the line connecting adjacent points;
计算所述第一斜率与所述第二斜率的差值;calculating the difference between the first slope and the second slope;
将所述差值大于第一预设阈值的转折点作为所述多个候选音频数据中的目标时间节点。A turning point where the difference value is greater than the first preset threshold is used as a target time node in the plurality of candidate audio data.
可选的,在本发明实施例第二方面的第一种实现方式中,所述截取单元具体用于:Optionally, in the first implementation manner of the second aspect of the embodiment of the present invention, the intercepting unit is specifically configured to:
将所述目标时间节点作为末端时间点,根据所述末端时间点从所述多个候选音频数据中截取预设时段的第一音频数据;Taking the target time node as an end time point, intercepting the first audio data of a preset period from the plurality of candidate audio data according to the end time point;
将所述目标时间节点作为始发时间点,根据所述始发时间点从所述多个候选音频数据中截取预设时段的第二音频数据。Taking the target time node as the originating time point, intercepting second audio data of a preset period from the plurality of candidate audio data according to the originating time point.
可选的,在本发明实施例第二方面的第三种实现方式中,所述对比分析单元具体用于:Optionally, in a third implementation manner of the second aspect of the embodiment of the present invention, the comparison and analysis unit is specifically used for:
对所述第一特征参数和所述第二特征参数进行时间序列相似度的对比分析,得到初始特征参数,所述初始特征参数包括所述第一特征参数和/或所述第二特征参数;performing a comparative analysis of the similarity in time series on the first feature parameter and the second feature parameter to obtain an initial feature parameter, where the initial feature parameter includes the first feature parameter and/or the second feature parameter;
获取候选对比特征参数,所述候选对比特征参数对应的时间节点大于所述目标时间节点;Obtain candidate comparison feature parameters, where the time node corresponding to the candidate comparison feature parameters is greater than the target time node;
计算所述候选对比特征参数与所述初始特征参数之间的相似度,获得对比特征参数,所述对比特征参数为相似度大于第二预设阈值的候选对比特征参数;Calculate the similarity between the candidate comparison feature parameter and the initial feature parameter, and obtain a comparison feature parameter, where the comparison feature parameter is a candidate comparison feature parameter whose similarity is greater than a second preset threshold;
对所述对比特征参数和所述初始特征参数进行说话人识别处理,得到目标特征参数。Perform speaker recognition processing on the comparison feature parameters and the initial feature parameters to obtain target feature parameters.
可选的,在本发明实施例第二方面的第四种实现方式中,所述获取单元具体用于:Optionally, in a fourth implementation manner of the second aspect of the embodiment of the present invention, the acquiring unit is specifically configured to:
将所述目标特征参数在预置的角色数据库中进行角色匹配;performing role matching on the target feature parameters in a preset role database;
若在所述角色数据库中匹配到所述目标特征参数对应的角色,则将匹配到的所述角色作为目标角色;If the character corresponding to the target characteristic parameter is matched in the character database, the matched character is used as the target character;
若在所述角色数据库中匹配不到所述目标特征参数对应的角色,则获取所述目标特征参数对应的候选音频数据的编号,将获取到的所述编号所对应的角色作为与所述目标特征参数对应的目标角色。If the character corresponding to the target characteristic parameter cannot be matched in the character database, the serial number of the candidate audio data corresponding to the target characteristic parameter is obtained, and the character corresponding to the obtained serial number is used as the character corresponding to the target characteristic parameter. The target character corresponding to the feature parameter.
可选的,在本发明实施例第二方面的第五种实现方式中,所述智能会议角色分类的装置还包括:Optionally, in a fifth implementation manner of the second aspect of the embodiment of the present invention, the device for classifying roles in an intelligent conference further includes:
更新单元,用于获取所述目标角色对应的编号,并根据所述目标角色对应的编号更新所述角色数据库,以及将所述目标角色对应的所述候选音频数据转化为文字形式,得到会议记录角色信息。The updating unit is used to obtain the number corresponding to the target role, and update the role database according to the number corresponding to the target role, and convert the candidate audio data corresponding to the target role into text form to obtain the minutes of the meeting role information.
可选的,在本发明实施例第二方面的第六种实现方式中,所述分割单元具体用于:Optionally, in a sixth implementation manner of the second aspect of the embodiment of the present invention, the dividing unit is specifically configured to:
获取会议音频数据,并按照预设第一帧长对会议音频数据进行说话人声音识别,获得待分类数据;Acquire conference audio data, and perform speaker voice recognition on the conference audio data according to the preset first frame length to obtain data to be classified;
按照预设第二帧长对所述待分类数据进行说话人分类,得到分类数据;Perform speaker classification on the data to be classified according to the preset second frame length to obtain classified data;
按照预设计算时间对所述分类数据进行分割,得到多个初始音频数据;According to the preset calculation time, the classification data is divided to obtain a plurality of initial audio data;
对多个所述初始音频数据分别进行角色编号,得到多个候选音频数据。Character numbers are respectively performed on a plurality of the initial audio data to obtain a plurality of candidate audio data.
本发明实施例的第三方面提供了一种智能会议角色分类的设备,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现上述任一实施方式所述智能会议角色分类的方法。A third aspect of the embodiments of the present invention provides a device for intelligent conference role classification, including a memory, a processor, and a computer program stored in the memory and executable on the processor, the processor executing all When the computer program is described, the method for classifying roles in an intelligent conference described in any one of the foregoing embodiments is implemented.
本发明实施例的第四方面提供了一种计算机可读存储介质,包括指令,当所述指令在计算机上运行时,使得计算机执行上述任一实施方式所述的智能会议角色分类的方法。A fourth aspect of the embodiments of the present invention provides a computer-readable storage medium, including instructions, which, when the instructions are executed on a computer, cause the computer to execute the method for classifying roles in an intelligent conference described in any of the foregoing embodiments.
相较于现有技术,本发明实施例提供的技术方案中,通过对会议音频数据进行分割获得多个候选音频数据,所述多个候选音频数据中的每个候选音频数据对应包括一个编号;对多个候选音频数据中的每个候选音频数据分别进行断点识别,获得目标时间节点;根据所述目标时间节点从所述多个候选音频数据中截取预设时段的第一音频数据和第二音频数据;分别对所述第一音频数据和所述第二音频数据进行特征参数提取,获得第一特征参数和第二特征参数;将所述第一特征参数和所述第二特征参数进行说话人对比分析,获得目标特征参数;根据预置的角色数据库和所述编号确定所述目标特征参数对应的目标角色。本发明实施例,通过自动比较音频数据断点的前后两个音频数据的特征参数,判断两个音频数据对应的说话人是否同一人,并根据判断说话人是否同一人所得的特征参数在角色数据库中进行角色匹配获得目标角色或者将音频数据对应的编号所作为目标角色,实现便捷而有效地进行多人会议场景的语音角色分离。Compared with the prior art, in the technical solution provided by the embodiment of the present invention, a plurality of candidate audio data are obtained by dividing the conference audio data, and each candidate audio data in the plurality of candidate audio data correspondingly includes a number; Breakpoint identification is performed on each candidate audio data in the plurality of candidate audio data, respectively, to obtain a target time node; according to the target time node, the first audio data and the first audio data of the preset period are intercepted from the plurality of candidate audio data. two audio data; extract feature parameters from the first audio data and the second audio data respectively to obtain the first feature parameter and the second feature parameter; extract the first feature parameter and the second feature parameter The speakers are compared and analyzed to obtain target characteristic parameters; the target character corresponding to the target characteristic parameter is determined according to the preset character database and the serial number. In the embodiment of the present invention, by automatically comparing the characteristic parameters of the two audio data before and after the audio data breakpoint, it is judged whether the speakers corresponding to the two audio data are the same person, and according to the characteristic parameters obtained by judging whether the speakers are the same person, the characters are stored in the character database. The target character is obtained by character matching in the middle of the system, or the number corresponding to the audio data is used as the target character, so as to realize the convenient and effective separation of voice characters in the multi-person conference scene.
附图说明Description of drawings
图1为本发明实施例中智能会议角色分类的方法的一个实施例示意图;1 is a schematic diagram of an embodiment of a method for intelligent conference role classification in an embodiment of the present invention;
图2为本发明实施例中智能会议角色分类的方法的另一个实施例示意图;2 is a schematic diagram of another embodiment of a method for classifying roles in an intelligent conference according to an embodiment of the present invention;
图3为本发明实施例中智能会议角色分类的装置的一个实施例示意图;3 is a schematic diagram of an embodiment of an apparatus for intelligent conference role classification in an embodiment of the present invention;
图4为本发明实施例中智能会议角色分类的装置的另一个实施例示意图;4 is a schematic diagram of another embodiment of an apparatus for classifying roles in an intelligent conference according to an embodiment of the present invention;
图5为本发明实施例中智能会议角色分类的设备的一个实施例示意图。FIG. 5 is a schematic diagram of an embodiment of a device for classifying roles in an intelligent conference according to an embodiment of the present invention.
具体实施方式Detailed ways
本发明实施例提供了一种智能会议角色分类的方法、装置、设备及存储介质,用于通过自动比较音频数据断点的前后两个音频数据的特征参数,判断两个音频数据对应的说话人是否同一人,并根据判断说话人是否同一人所得的特征参数在角色数据库中进行角色匹配获得目标角色或者将音频数据对应的编号所作为目标角色,实现便捷而有效地进行多人会议场景的语音角色分离。Embodiments of the present invention provide a method, device, device, and storage medium for intelligent conference role classification, which are used to determine the speaker corresponding to the two audio data by automatically comparing the characteristic parameters of the two audio data before and after the audio data breakpoint Whether it is the same person, and according to the characteristic parameters obtained by judging whether the speaker is the same person, perform role matching in the role database to obtain the target role or use the number corresponding to the audio data as the target role, so as to realize the convenient and effective voice of the multi-person conference scene Role separation.
为了使本技术领域的人员更好地理解本发明方案,下面将结合本发明实施例中的附图,对本发明实施例进行描述。In order to make those skilled in the art better understand the solutions of the present invention, the embodiments of the present invention will be described below with reference to the accompanying drawings in the embodiments of the present invention.
应当理解,此处所描述的具体实施例仅用以解释本发明,并不用于限定本发明。本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或模块的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或模块,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或模块,本发明中所出现的模块的划分,仅仅是一种逻辑上的划分,实际应用中实现时可以有另外的划分方式,例如多个模块可以结合成或集成在另一个系统中,或一些特征可以忽略,或不执行。It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention. The terms "first", "second" and the like in the description and claims of the present invention and the above-mentioned drawings are used to distinguish similar objects, and are not necessarily used to describe a specific order or sequence. It is to be understood that data so used may be interchanged under appropriate circumstances so that the embodiments described herein can be practiced in sequences other than those illustrated or described herein. Furthermore, the terms "comprising" and "having", and any variations thereof, are intended to cover non-exclusive inclusion, for example, a process, method, system, product or device comprising a series of steps or modules is not necessarily limited to those expressly listed Those steps or modules, but may include other steps or modules that are not explicitly listed or inherent to these processes, methods, products or devices, the division of modules appearing in the present invention is only a logical division , in practical applications, there may be other division methods, for example, multiple modules may be combined or integrated in another system, or some features may be ignored or not implemented.
请参阅图1,本发明实施例提供的一种智能会议角色分类的方法的流程图,以下对本发明提供一种智能会议角色分类的方法进行举例说明,该方法由计算机设备执行,计算机设备可为服务器或者终端,本发明不对执行主体的类型作限制,具体包括:Please refer to FIG. 1 , which is a flowchart of a method for classifying roles in an intelligent conference provided by an embodiment of the present invention. The following describes a method for classifying roles in an intelligent conference provided by the present invention as an example. The method is executed by computer equipment, and the computer equipment may be Server or terminal, the present invention does not limit the type of execution subject, specifically including:
101、获取会议音频数据,并对会议音频数据进行分割获得多个候选音频数据,多个候选音频数据中的每个候选音频数据对应包括一个编号。101. Acquire conference audio data, and divide the conference audio data to obtain multiple candidate audio data, where each candidate audio data in the multiple candidate audio data corresponds to a number.
服务器通过声音采集设备采集多人会议场景下的语音数据,并对语音数据进行数字信号处理,得到会议音频数据。服务器通过基于Matlab双阈值的语音端点检测程序减除会议音频数据的静音段,得到初始音频数据,识别初始音频数据中前端点和后端点,前端点和后端点之间存在时间离散或者连续上的连接关系,根据前端点和后端点将初始音频数据进行分段,得到分段音频数据,通过唯一编号算法生成各分段音频数据对应的全局统一标识符Guid(即编号),得到多个候选音频数据。The server collects voice data in a multi-person conference scenario through a sound acquisition device, and performs digital signal processing on the voice data to obtain conference audio data. The server subtracts the silent segment of the conference audio data through the voice endpoint detection program based on Matlab double thresholds, obtains the initial audio data, and identifies the front end and back end in the initial audio data. There is a time discrete or continuous between the front end and the back end. The connection relationship is to segment the initial audio data according to the front-end and back-end points to obtain segmented audio data, and generate a global unified identifier Guid (ie number) corresponding to each segmented audio data through a unique numbering algorithm to obtain multiple candidate audios. data.
具体地,该步骤S101可以包括:获取会议音频数据,并基于语音端点检测算法按照预设第一帧长对会议音频数据进行说话人声音识别,获得待分类数据;基于说话人分割算法按照预设第二帧长对待分类数据进行说话人分类,得到分类数据;按照预设计算时间对分类数据进行分割,得到多个初始音频数据;对多个初始音频数据分别进行角色编号,得到多个候选音频数据。Specifically, this step S101 may include: acquiring conference audio data, and performing speaker voice recognition on the conference audio data according to a preset first frame length based on a voice endpoint detection algorithm to obtain data to be classified; based on the speaker segmentation algorithm according to preset The second frame length is to perform speaker classification on the data to be classified to obtain the classified data; to divide the classified data according to the preset calculation time to obtain a plurality of initial audio data; to perform role numbers on the plurality of initial audio data respectively to obtain a plurality of candidate audio data data.
例如:通过语音端点检测算法VAD和说话人分割算法SD共同找出多人会议场景中的每个人说的一句话。通过以20ms的VAD帧长判断会议音频数据对应的每段音频数据是否存在说话人的声音,将存在说话人的声音的会议音频数据作为待分类数据;以SD对应的64*20ms=1.28s帧长并以32*20=0.64s的步长作为分类窗口大小将待分类数据划分为当前说话人的音频数据(即说话人分类和分割),得到多个初始音频数据;在达到设定的计算时间后,根据通过VAD和SD得到的结果,计算出每个人每句话的前端点和后端点,将前端点和后端点之间的初始音频数据作为一个音频段,并按照时间的先后顺序对该音频段进行对应的说话人的角色编号和标记,得到多个候选音频数据,例如:丁说了一段话,将该段话分割得到三个音频段,按照时间先后顺序分别为音频段1、音频段2和音频段3,则编号音频段1为丁1,编号音频段2为丁2,编号音频段3为丁3。For example, the voice endpoint detection algorithm VAD and the speaker segmentation algorithm SD jointly find out what each person said in a multi-person conference scene. By judging whether each piece of audio data corresponding to the conference audio data has the speaker's voice with the VAD frame length of 20ms, the conference audio data with the speaker's voice is used as the data to be classified; the 64*20ms=1.28s frame corresponding to SD Long and use the step size of 32*20=0.64s as the classification window size to divide the data to be classified into the audio data of the current speaker (that is, speaker classification and segmentation), and obtain a plurality of initial audio data; after reaching the set calculation After time, according to the results obtained by VAD and SD, calculate the front end point and back end point of each sentence of each person, take the initial audio data between the front end point and the back end point as an audio segment, and analyze the data in the order of time. The audio segment is numbered and marked with the corresponding speaker, and multiple candidate audio data are obtained. For example, Ding said a paragraph, and divided the paragraph to obtain three audio segments, which are audio segment 1, Audio segment 2 and audio segment 3, the numbered audio segment 1 is D1, the numbered audio segment 2 is D2, and the numbered audio segment 3 is D3.
102、对多个候选音频数据中的每个候选音频数据分别进行断点识别,获得目标时间节点。102. Perform breakpoint identification on each of the plurality of candidate audio data, respectively, to obtain a target time node.
服务器通过矩阵工厂Matlab工具将多个候选音频数据中的每个候选音频数据生成语音信号图,服务器通过候选音频数据的语音信号图可获得静音部分的候选音频数据对应的能量值较小,有效语音部分的候选音频数据对应的能量值较大。其中,候选音频数据为按照时间排序的时间序列,时间序列对应的每个点的采样值大小反馈候选音频数据在采样点出的能量大小。服务器可通过能量检测判断出音频中的断点,断点为目标时间节点;或者服务器通过预置的断点检测算法对多个候选音频数据中的每个候选音频数据均进行语音断开点识别处理,从而获得目标时间节点。通过获取目标时间节点,以便于快速而准确地对音频数据进行采样。The server generates a voice signal diagram for each candidate audio data in the multiple candidate audio data through the Matlab tool of the matrix factory, and the server can obtain the corresponding energy value of the candidate audio data of the mute part through the voice signal diagram of the candidate audio data. The energy value corresponding to the part of the candidate audio data is relatively large. The candidate audio data is a time sequence sorted by time, and the sample value size of each point corresponding to the time sequence feeds back the energy size of the candidate audio data at the sampling point. The server can determine the breakpoint in the audio through energy detection, and the breakpoint is the target time node; or the server can perform voice breakpoint recognition on each of the multiple candidate audio data through a preset breakpoint detection algorithm process to obtain the target time node. By obtaining the target time node, the audio data can be sampled quickly and accurately.
具体地,该步骤102可以包括:分别对每个候选音频数据进行重要点检测,获得分段数量;根据分段数量对多个候选音频数据进行分段,得到分段数据,并获取分段数据对应的音频曲线的转折点;获取转折点的左侧相邻点和右侧相邻点;根据转折点计算第一斜率和第二斜率,第一斜率为转折点与左侧相邻点连线的斜率,第二斜率为转折点与右侧相邻点连线的斜率;计算第一斜率与第二斜率的差值;将差值大于第一预设阈值的转折点作为多个候选音频数据中的目标时间节点。Specifically, this
例如:服务器通过预置的重要点探测算法对多个候选音频数据进行重要点检测,获得分段数量为3,则服务器将多个候选音频数据组合成一段数据,并将该段数据分成3小段数据,得到分段数据1、分段数据2和分段数据3,以分段数据1为例说明,分段数据1对应的音频曲线的转折点为转折点1和转折点2,计算转折点1的第一斜率为0.3,第二斜率为0.5,计算转折点2的第一斜率为0.6,第二斜率为0.5,第一预设阈值为0.15,则以转折点1对应的时间点作为目标时间节点。由于说话人在说话时,存在着稍微的停顿,在候选音频数据的语音信号图中也会显示为能量骤降的时间节点,但是该稍微的停顿对应的时间节点不能作为目标时间节点,因而,通过预置的重要点探测算法对多个候选音频数据进行重要点检测,并通过和预置的时间序列分段算法根据分段数量对多个候选音频数据进行分段处理,以及进行斜率计算,提高第一音频数据和第二音频数据的准确性和保证第一音频数据和第二音频数据的数据质量。For example: the server detects the important points of multiple candidate audio data through the preset important point detection algorithm, and the number of segments obtained is 3, then the server combines the multiple candidate audio data into a segment of data, and divides the segment of data into 3 small segments Data, obtain segment data 1, segment data 2 and segment data 3, take segment data 1 as an example to illustrate, the turning points of the audio curve corresponding to segment data 1 are turning point 1 and turning point 2, calculate the first The slope is 0.3, the second slope is 0.5, the first slope for calculating turning point 2 is 0.6, the second slope is 0.5, and the first preset threshold is 0.15, then the time point corresponding to turning point 1 is used as the target time node. Since there is a slight pause when the speaker is speaking, it will also be displayed as a time node of sudden energy drop in the speech signal graph of the candidate audio data, but the time node corresponding to the slight pause cannot be used as the target time node. Therefore, The multiple candidate audio data is detected by the preset important point detection algorithm, and the multiple candidate audio data is segmented by the preset time series segmentation algorithm according to the number of segments, and the slope is calculated. The accuracy of the first audio data and the second audio data is improved and the data quality of the first audio data and the second audio data is guaranteed.
103、根据目标时间节点从多个候选音频数据中截取预设时段的第一音频数据和第二音频数据。103. Intercept the first audio data and the second audio data of a preset time period from the plurality of candidate audio data according to the target time node.
服务器截取目标时间节点前预设时段的候选音频数据和目标时间后预设时段后的候选音频数据,作为第一音频数据和第二音频数据,例如:在多个候选音频数据的语音信号图中识别到能量骤降的时间节点(即目标时间节点),再从该时间节点的前后截取5秒的候选音频数据,得到第一音频数据和第二音频数据。截取第一音频数据和第二音频数据,以便于后续进行断点前后说话人的声纹处理,以判断是否为同一说话人,进而实现简化操作,提高操作效率。The server intercepts the candidate audio data of the preset period before the target time node and the candidate audio data of the preset period after the target time as the first audio data and the second audio data, for example: in the voice signal diagram of multiple candidate audio data A time node (ie, a target time node) where the energy suddenly drops is identified, and then 5 seconds of candidate audio data are intercepted from before and after the time node to obtain the first audio data and the second audio data. The first audio data and the second audio data are intercepted to facilitate subsequent voiceprint processing of the speakers before and after the breakpoint to determine whether they are the same speaker, thereby simplifying the operation and improving the operation efficiency.
具体地,该步骤103可以包括:服务器将时间节点作为末端时间点,根据末端时间点从多个候选音频数据中截取预设时段的第一音频数据;服务器将目标时间节点作为始发时间点,根据始发时间点从多个候选音频数据中截取预设时段的第二音频数据。Specifically, this
例如:目标时间节点为2分01秒,预设时段为20秒,则以2分01秒为末端时间点,从多个候选音频数据中截取前20秒的第一音频数据,该第一音频数据的始末时间为1分41秒-2分01秒,以2分01秒为始发时间点,从多个候选音频数据中截取后20秒的第二音频数据,该第二音频数据的始末时间为2分01秒-2分21秒。For example, if the target time node is 2 minutes and 01 seconds, and the preset time period is 20 seconds, then take 2 minutes and 01 seconds as the end time point, and intercept the first audio data of the first 20 seconds from the multiple candidate audio data. The start and end time of the data is 1 minute 41 seconds to 2 minutes 01 seconds. Taking 2 minutes 01 seconds as the starting time point, the second audio data of the next 20 seconds is intercepted from the multiple candidate audio data. The start and end of the second audio data The time is 2 minutes 01 seconds - 2 minutes 21 seconds.
104、分别对第一音频数据和第二音频数据进行特征参数提取,获得第一特征参数和第二特征参数。104. Perform feature parameter extraction on the first audio data and the second audio data, respectively, to obtain the first feature parameter and the second feature parameter.
具体地,该步骤104可以包括:通过基于聚类算法的隐式马尔可夫模型分别对第一音频数据和第二音频数据进行声纹识别,得到第一声纹数据和第二声纹数据;对第一声纹数据和第二声纹数据进行数据处理和离散余弦变换,得到第一语音特征向量和第二语音特征向量;对第一语音特征向量和第二语音特征向量分别进行聚类算法分析,获得第一特征参数和第二特征参数。Specifically, this
在本实施例中,服务器可通过线性预测分析法、感知线性预测系数法、线性预测倒谱系数法和基于滤波器组的Fbank特征法对分别对第一音频数据和第二音频数据进行特征参数提取,分别获得第一特征参数和第二特征参数,特征参数提取为声纹的语音特征参数提取。服务器也可以通过预置的计算公式获取第一音频数据(或第二音频数据)的声强和声强级,通过等响度曲线分析第一音频数据(或第二音频数据)获得响度,通过主观音高与实际频率的关系曲线分析第一音频数据(或第二音频数据)获得音高,通过倒谱算法获得基音周期和基音频率,声强和声强级、响度、音高、基音周期和基音频率为第一特征参数(或第二特征参数)。其中,数据预处理包括时域结合频域的高频分量能量增加处理和根据预设规则选择的函数窗进行的分割处理。通过将第一语音特征向量和第二语音特征向量分别进行聚类算法分析,获得混合特征参数(即第一特征参数或第二特征参数),增强特征参数的获取和提高特征参数的分析力度。In this embodiment, the server may perform feature parameters on the first audio data and the second audio data respectively by using a linear prediction analysis method, a perceptual linear prediction coefficient method, a linear prediction cepstral coefficient method, and a filter bank-based Fbank feature method. Extraction to obtain the first feature parameter and the second feature parameter respectively, and the feature parameter extraction is the voice feature parameter extraction of the voiceprint. The server can also obtain the sound intensity and sound intensity level of the first audio data (or the second audio data) through a preset calculation formula, obtain the loudness by analyzing the first audio data (or the second audio data) through the equal loudness curve, and obtain the loudness through the main The relationship curve between Guanyin height and actual frequency analyzes the first audio data (or second audio data) to obtain the pitch, obtains the pitch period and pitch frequency, sound intensity and sound intensity level, loudness, pitch, pitch period and The pitch is the first characteristic parameter (or the second characteristic parameter). Wherein, the data preprocessing includes the energy increase processing of high frequency components in the time domain combined with the frequency domain and the segmentation processing performed by the function window selected according to the preset rule. By performing clustering algorithm analysis on the first voice feature vector and the second voice feature vector respectively, a mixed feature parameter (ie, the first feature parameter or the second feature parameter) is obtained, which enhances the acquisition of the feature parameter and improves the analysis strength of the feature parameter.
105、将第一特征参数和第二特征参数进行说话人对比分析,获得目标特征参数。105. Perform speaker comparison analysis on the first feature parameter and the second feature parameter to obtain the target feature parameter.
服务器将第一特征参数和第二特征参数进行对比分析,若第一特征参数和第二特征参数近似,则第一特征参数和第二特征参数对应的说话人为同一人(即说话人对比分析),则只需将参数数量多的或者特征参数识别力度强的第一特征参数或第二特征参数作为目标特征参数,进行后续的角色匹配分析即可;若第一特征参数和第二特征参数存在非常大的差异,则第一特征参数和第二特征参数对应的说话人不为同一人(即说话人对比分析),则将第一特征参数和第二特征参数作为目标特征参数,进行后续的角色匹配分析。The server compares and analyzes the first feature parameter and the second feature parameter. If the first feature parameter and the second feature parameter are similar, the speakers corresponding to the first feature parameter and the second feature parameter are the same person (that is, the speaker comparison analysis) , then you only need to use the first feature parameter or the second feature parameter with a large number of parameters or a strong feature parameter recognition force as the target feature parameter, and then perform the subsequent role matching analysis; if the first feature parameter and the second feature parameter exist If the difference is very large, then the speakers corresponding to the first feature parameter and the second feature parameter are not the same person (that is, the speaker comparison analysis), then the first feature parameter and the second feature parameter are used as the target feature parameter, and the subsequent Character matching analysis.
具体地,该步骤105可以包括:对第一特征参数和第二特征参数进行时间序列相似度的对比分析,得到初始特征参数,初始特征参数包括第一特征参数和/或第二特征参数;获取候选对比特征参数,候选对比特征参数对应的时间节点大于目标时间节点;计算候选对比特征参数与初始特征参数之间的相似度,获得对比特征参数,对比特征参数为相似度大于第二预设阈值的候选对比特征参数;对该对比特征参数和初始特征参数进行说话人识别处理,得到目标特征参数。Specifically, this
服务器通过动态时间规整算法对第一特征参数和第二特征参数进行时间序列相似度的对比分析,即对第一特征参数和第二特征参数分别对应的长度不同的时间序列进行相似度计算,并根据时间序列相似度对第一特征参数和第二特征参数进行对比分析,获得相同或相似的特征参数(即初始特征参数),获取时间节点在初始特征参数之后的候选对比特征参数,通过计算所述候选对比特征参数与初始特征参数之间的相似度且该相似度大于第二预设阈值以判断初始特征参数对应的说话人和候选对比特征参数对应的说话人为同一人。服务器通过分析时间节点大于目标时间节点和候选对比特征参数与初始特征参数的相似度,获得在初始特征参数之后的对比特征参数;通过结合初始特征参数和对比特征参数进行角色匹配,提高角色匹配的准确性。其中,初始特征参数对应的说话人和对比特征参数对应的说话人为同一人,例如:会议中有甲、乙和丙三人在说话,说话顺序1为甲1—乙—丙—甲2,说话顺序2为甲3—乙—丙—乙—丙—甲4,则初始特征参数和对比特征参数分别对应说话顺序1中的甲1和甲2,初始特征参数和对比特征参数分别对应说话顺序2中的甲3和甲4。The server performs a comparative analysis of the time series similarity between the first feature parameter and the second feature parameter through the dynamic time warping algorithm, that is, the similarity calculation is performed on the time series with different lengths corresponding to the first feature parameter and the second feature parameter, and Compare and analyze the first feature parameter and the second feature parameter according to the similarity of the time series, obtain the same or similar feature parameter (ie, the initial feature parameter), obtain the candidate comparative feature parameter of the time node after the initial feature parameter, and calculate the The similarity between the candidate comparison feature parameter and the initial feature parameter is greater than the second preset threshold to determine that the speaker corresponding to the initial feature parameter and the speaker corresponding to the candidate comparison feature parameter are the same person. The server obtains the comparison feature parameters after the initial feature parameters by analyzing the time node is greater than the target time node and the similarity between the candidate comparison feature parameters and the initial feature parameters; by combining the initial feature parameters and the comparison feature parameters to perform role matching, the performance of role matching is improved. accuracy. Among them, the speaker corresponding to the initial feature parameter and the speaker corresponding to the comparison feature parameter are the same person. For example, there are three people A, B and C talking in the meeting, and the speaking order 1 is A1-B-C-A2, speaking Sequence 2 is A3-B-C-B-C-A4, then the initial feature parameters and comparative feature parameters correspond to A1 and A2 in speaking sequence 1, respectively, and the initial feature parameters and comparative feature parameters correspond to speaking sequence 2 respectively in A3 and A4.
进一步地,上述的对第一特征参数和第二特征参数进行时间序列相似度的对比分析,得到初始特征参数,初始特征参数包括第一特征参数和/或第二特征参数可以包括:通过动态时间规整算法计算第一特征参数和第二特征参数之间的规整路径距离;若规整路径距离小于预设值,则判断第一音频数据和第二音频数据对应的说话人为同一人,将特征参数数量多的第一特征参数或第二特征参数作为初始特征参数;若规整路径距离大于或等于预设值,则判断第一音频数据和第二音频数据对应的说话人非为同一人,将第一特征参数和第二特征参数作为初始特征参数。通过动态时间规整算DTW分析第一特征参数和所述第二特征参数对应的说话人是否为同一人,有效地计算语音的时间序列数据之间的相似度。Further, the above-mentioned comparative analysis of the time series similarity is performed on the first characteristic parameter and the second characteristic parameter to obtain the initial characteristic parameter, and the initial characteristic parameter includes the first characteristic parameter and/or the second characteristic parameter may include: The regularization algorithm calculates the regularized path distance between the first feature parameter and the second feature parameter; if the regularized path distance is less than the preset value, it is judged that the speakers corresponding to the first audio data and the second audio data are the same person, and the number of feature parameters The first characteristic parameter or the second characteristic parameter is more than the initial characteristic parameter; if the regular path distance is greater than or equal to the preset value, it is judged that the speakers corresponding to the first audio data and the second audio data are not the same person, and the first The characteristic parameter and the second characteristic parameter are used as the initial characteristic parameter. Through the dynamic time warping algorithm DTW, it is analyzed whether the speakers corresponding to the first characteristic parameter and the second characteristic parameter are the same person, and the similarity between the time series data of the speech is effectively calculated.
106、根据预置的角色数据库和编号确定目标特征参数对应的目标角色。106. Determine the target character corresponding to the target feature parameter according to the preset character database and serial number.
服务器通过预置的声学模型组分析目标特征参数,在声学模型组对应的角色数据中匹配与目标特征参数对应的语音段,并获取该语音段上标记的角色名称,将该角色名称作为目标角色。若在声学模型组对应的角色数据中匹配不到与目标特征参数对应的语音段,则代表角色数据库中未存有目标特征参数对应的语音段以及目标特征参数对应的说话人名称,则将目标特征参数对应的候选音频数据上的编号作为目标角色,在接收用户输入的该目标角色的目标名称后,将该编号创建目标名称的对应关系,对应地更新角色数据库。The server analyzes the target feature parameters through the preset acoustic model group, matches the voice segment corresponding to the target feature parameter in the character data corresponding to the acoustic model group, obtains the character name marked on the voice segment, and uses the character name as the target character. . If the voice segment corresponding to the target feature parameter cannot be matched in the character data corresponding to the acoustic model group, it means that the voice segment corresponding to the target feature parameter and the speaker name corresponding to the target feature parameter do not exist in the character database. The number on the candidate audio data corresponding to the feature parameter is used as the target character. After receiving the target name of the target character input by the user, the corresponding relationship of the target name is created with the number, and the character database is correspondingly updated.
具体地,该步骤106可以包括:将目标特征参数在预置的角色数据库中进行角色匹配;若在角色数据库中匹配到目标特征参数对应的角色,则将匹配到的角色作为目标角色;若在角色数据库中匹配不到目标特征参数对应的角色,则获取目标特征参数对应的候选音频数据的编号,将获取到的编号所对应的角色作为与目标特征参数对应的目标角色。Specifically, this
服务器可将各类特征参数和对应的角色生成哈希表或树结构,该树结构可为二叉树或B+树,并将该哈希表(或树结构)存储在预置的角色数据库中,以目标特征参数作为索引,遍历该哈希表,若在该哈希表(或树结构)中获取到与目标特征参数对应的特征参数,则获取该特征参数对应的角色,得到目标角色;若在该哈希表(或树结构)中无法获取到与目标特征参数对应的特征参数,则获取该目标特征参数对应的候选音频数据上标记的编号,获取该编号对应的角色,并将该角色作为与目标特征参数对应的目标角色。其中,当在角色数据库中匹配不到目标特征参数对应的具体人名时,通过目标角色对应的编号,达到多人语音的角色分离效果。通过编号清晰地显示该角色和该角色对应的音频数据。The server can generate a hash table or tree structure for various characteristic parameters and corresponding roles, and the tree structure can be a binary tree or a B+ tree, and store the hash table (or tree structure) in a preset role database to The target feature parameter is used as an index, and the hash table is traversed. If the feature parameter corresponding to the target feature parameter is obtained in the hash table (or tree structure), the role corresponding to the feature parameter is obtained, and the target role is obtained; If the feature parameter corresponding to the target feature parameter cannot be obtained in the hash table (or tree structure), the number marked on the candidate audio data corresponding to the target feature parameter is obtained, the character corresponding to the number is obtained, and the character is used as The target character corresponding to the target feature parameter. Wherein, when the specific person name corresponding to the target feature parameter cannot be matched in the role database, the role separation effect of multi-person voice is achieved through the number corresponding to the target role. Clearly display the character and the audio data corresponding to the character by numbering.
本发明实施例,通过自动比较音频数据断点的前后两个音频数据的特征参数,判断两个音频数据对应的说话人是否同一人,并根据判断说话人是否同一人所得的特征参数在角色数据库中进行角色匹配获得目标角色或者将音频数据对应的编号所作为目标角色,实现便捷而有效地进行多人会议场景的语音角色分离。In the embodiment of the present invention, by automatically comparing the characteristic parameters of the two audio data before and after the audio data breakpoint, it is judged whether the speakers corresponding to the two audio data are the same person, and according to the characteristic parameters obtained by judging whether the speakers are the same person, the characters are stored in the character database. The target character is obtained by character matching in the middle of the system, or the number corresponding to the audio data is used as the target character, so as to realize the convenient and effective separation of voice characters in the multi-person conference scene.
请参阅图2,本发明实施例中智能会议角色分类的方法的另一个实施例包括:Referring to FIG. 2, another embodiment of the method for classifying roles in an intelligent conference according to an embodiment of the present invention includes:
201、获取会议音频数据,并对会议音频数据进行分割获得多个候选音频数据,多个候选音频数据中的每个候选音频数据对应包括一个编号;201. Obtain conference audio data, and divide the conference audio data to obtain multiple candidate audio data, and each candidate audio data in the multiple candidate audio data corresponds to a number;
202、对多个候选音频数据中的每个候选音频数据分别进行断点识别,获得目标时间节点;202. Perform breakpoint identification on each candidate audio data in the plurality of candidate audio data, respectively, to obtain a target time node;
203、根据目标时间节点从多个候选音频数据中截取预设时段的第一音频数据和第二音频数据;203, intercepting the first audio data and the second audio data of a preset time period from a plurality of candidate audio data according to the target time node;
204、分别对第一音频数据和第二音频数据进行特征参数提取,获得第一特征参数和第二特征参数;204. Perform feature parameter extraction on the first audio data and the second audio data respectively to obtain the first feature parameter and the second feature parameter;
205、将第一特征参数和第二特征参数进行说话人对比分析,获得目标特征参数;205. Perform speaker comparison analysis on the first feature parameter and the second feature parameter to obtain the target feature parameter;
206、根据预置的角色数据库和编号确定目标特征参数对应的目标角色;206. Determine the target role corresponding to the target feature parameter according to the preset role database and number;
本发明实施例中,201至206的方法可参见101至106,此处不再赘述。In this embodiment of the present invention, reference may be made to 101 to 106 for the methods of 201 to 206, and details are not described herein again.
207、获取目标角色对应的编号,并根据目标角色对应的编号更新角色数据库,以及将目标角色对应的候选音频数据转化为文字形式,得到会议记录角色信息。207. Obtain the serial number corresponding to the target role, update the role database according to the serial number corresponding to the target role, and convert the candidate audio data corresponding to the target role into text to obtain the role information of the meeting record.
服务器将第一次获取到的声纹信息(即目标角色对应的编号对应的目标特征信息)制作为语音模板,存放在角色数据库中,便于后续对该角色的会议音频数据的角色识别,自动匹配到相关的角色。通过将目标角色对应的候选音频数据转化为文字形式并进行整理得到会议记录角色信息,增强会议记录角色信息的清晰度和可读性。The server makes the voiceprint information obtained for the first time (that is, the target feature information corresponding to the number corresponding to the target character) as a voice template, and stores it in the character database, which is convenient for subsequent character recognition and automatic matching of the conference audio data of the character. to the relevant role. By converting the candidate audio data corresponding to the target role into text and sorting out the role information of the meeting minutes, the clarity and readability of the role information in the meeting minutes are enhanced.
具体地,该步骤207可以包括:服务器获取目标角色编号对应的角色名称,以及制作目标角色编号对应的目标特征参数的语音模板,目标角色编号用于指示目标角色对应的编号;服务器在语音模板上标记角色名称,并将语音模板存储在角色数据库中,得到更新后的角色数据库;服务器通过基于预置的隐马尔可夫模型和深度神经网络将目标角色对应的候选音频数据转化为文字形式,得到文字信息;服务器对文字信息进行自然语言处理、重点内容标记处理和方案链接处理,得到候选会议信息,方案链接处理用于将文字信息中的问题内容相似的历史策略或爬取所得的解决方案添加链接至文字信息;服务器将候选会议信息按照会议音频数据的说话顺序进行排序,得到会议记录角色信息。Specifically, this
服务器将用户输入的目标角色编号对应的角色名称标记在语音模板上。重点内容标记处理可通过结合声谱图和长短期记忆网络的模型对目标角色编号对应的目标特征信息的会议音频数据进行情感识别,以及根据语义表达的情感特征训练所得的模型对文字信息进行情感识别,根据情感识别所得的情感识别文字信息的重点内容,并对该重点内容进行标记。通过更新角色数据库以角色数据库,通过对会议音频数据转化的文字信息进行自然语言处理、重点内容标记处理和方案链接处理,丰富会议音频数据转化的文字信息,以便于用户对会议音频数据转化的文字信息的多方向阅览。The server marks the character name corresponding to the target character number input by the user on the speech template. The key content labeling processing can perform emotion recognition on the conference audio data of the target feature information corresponding to the target character number by combining the model of the spectrogram and the long short-term memory network, and the model obtained by training the emotional features of the semantic expression can perform emotion on the text information. Identify, identify the key content of the text information according to the emotion obtained by the emotion recognition, and mark the key content. By updating the role database to the role database, and by performing natural language processing, key content marking processing and program link processing on the text information converted from the conference audio data, the text information converted from the conference audio data can be enriched, so that users can easily understand the text information converted from the conference audio data. Multi-directional reading of information.
本发明实施例,通过根据目标角色对应的编号更新角色数据库,以及将目标角色对应的候选音频数据转化为文字形式,得到会议记录角色信息,不仅实现便捷而有效地进行多人会议场景的语音角色分离,还能够获得具有可读性强和能多角度展现的会议记录角色信息。In the embodiment of the present invention, by updating the role database according to the number corresponding to the target role, and converting the candidate audio data corresponding to the target role into text form, the role information of the meeting record is obtained, which not only realizes the convenient and effective voice role of the multi-person conference scene Separation can also obtain the role information of the meeting record that is highly readable and can be displayed from multiple angles.
上面对本发明实施例中智能会议角色分类的方法进行了描述,下面对本发明实施例中智能会议角色分类的装置进行描述,请参阅图3,本发明实施例中智能会议角色分类的装置的一个实施例包括:The method for classifying intelligent conference roles in the embodiment of the present invention is described above, and the apparatus for classifying intelligent conference roles in the embodiment of the present invention is described below. Please refer to FIG. 3 , an implementation of the device for classifying intelligent conference roles in the embodiment of the present invention. Examples include:
分割单元301,用于获取会议音频数据,并对会议音频数据进行分割获得多个候选音频数据,多个候选音频数据中的每个候选音频数据对应包括一个编号;The dividing
识别单元302,用于对多个候选音频数据中的每个候选音频数据分别进行断点识别,获得目标时间节点;The
截取单元303,用于根据目标时间节点从多个候选音频数据中截取预设时段的第一音频数据和第二音频数据;Intercepting
提取单元304,用于分别对第一音频数据和第二音频数据进行特征参数提取,获得第一特征参数和第二特征参数;The
对比分析单元305,用于将第一特征参数和第二特征参数进行说话人对比分析,获得目标特征参数;A comparative analysis unit 305, configured to perform speaker comparative analysis on the first feature parameter and the second feature parameter to obtain the target feature parameter;
获取单元306,用于根据预置的角色数据库和编号确定目标特征参数对应的目标角色。The obtaining
上述智能会议角色分类的装置中各个单元的功能实现与上述智能会议角色分类的方法实施例中各步骤相对应,其功能和实现过程在此处不再一一赘述。The function implementation of each unit in the above apparatus for classifying roles of a smart conference corresponds to each step in the above method embodiment for classifying roles for a smart conference, and the functions and implementation processes thereof are not repeated here.
本发明实施例,通过自动比较音频数据断点的前后两个音频数据的特征参数,判断两个音频数据对应的说话人是否同一人,并根据判断说话人是否同一人所得的特征参数在角色数据库中进行角色匹配获得目标角色或者将音频数据对应的编号所作为目标角色,实现便捷而有效地进行多人会议场景的语音角色分离。In the embodiment of the present invention, by automatically comparing the characteristic parameters of the two audio data before and after the audio data breakpoint, it is judged whether the speakers corresponding to the two audio data are the same person, and according to the characteristic parameters obtained by judging whether the speakers are the same person, the characters are stored in the character database. The target character is obtained by character matching in the middle of the system, or the number corresponding to the audio data is used as the target character, so as to realize the convenient and effective separation of voice characters in the multi-person conference scene.
请参阅图4,本发明实施例中智能会议角色分类的装置的另一个实施例包括:Referring to FIG. 4 , another embodiment of the apparatus for classifying roles in an intelligent conference according to an embodiment of the present invention includes:
分割单元301,用于获取会议音频数据,并对会议音频数据进行分割获得多个候选音频数据,多个候选音频数据包括多个编号;A
识别单元302,用于对多个候选音频数据中的每个候选音频数据分别进行断点识别,获得目标时间节点;The
截取单元303,用于根据目标时间节点从多个候选音频数据中截取预设时段的第一音频数据和第二音频数据;Intercepting
提取单元304,用于分别对第一音频数据和第二音频数据进行特征参数提取,获得第一特征参数和第二特征参数;The
对比分析单元305,用于将第一特征参数和第二特征参数进行说话人对比分析,获得目标特征参数;A comparative analysis unit 305, configured to perform speaker comparative analysis on the first feature parameter and the second feature parameter to obtain the target feature parameter;
获取单元306,用于根据预置的角色数据库和编号确定目标特征参数对应的目标角色;an
更新单元307,用于获取目标角色对应的编号,并根据目标角色对应的编号更新角色数据库,以及将目标角色对应的候选音频数据转化为文字形式,得到会议记录角色信息。The updating
可选的,分割单元301具体用于:获取会议音频数据,并按照预设第一帧长对会议音频数据进行说话人声音识别,获得待分类数据;按照预设第二帧长对待分类数据进行说话人分类,得到分类数据;按照预设计算时间对分类数据进行分割,得到多个初始音频数据;对多个初始音频数据分别进行角色编号,得到多个候选音频数据。Optionally, the
可选的,识别单元具体302用于:分别对每个候选音频数据进行重要点检测,获得分段数量;根据分段数量对多个候选音频数据进行分段,得到分段数据,并获取分段数据对应的音频曲线的转折点;获取转折点的左侧相邻点和右侧相邻点;根据转折点计算第一斜率和第二斜率,第一斜率为转折点与左侧相邻点连线的斜率,第二斜率为转折点与右侧相邻点连线的斜率;计算第一斜率与第二斜率的差值;将差值大于第一预设阈值的转折点作为多个候选音频数据中的目标时间节点。Optionally, the identifying
可选的,截取单元303具体用于:将目标时间节点作为末端时间点,根据末端时间点从多个候选音频数据中截取预设时段的第一音频数据;将目标时间节点作为始发时间点,根据始发时间点从多个候选音频数据中截取预设时段的第二音频数据。Optionally, the intercepting
可选的,提取单元304具体用于:通过基于聚类算法的隐式马尔可夫模型分别对第一音频数据和第二音频数据进行声纹识别,得到第一声纹数据和第二声纹数据;对第一声纹数据和第二声纹数据进行数据处理和离散余弦变换,得到第一语音特征向量和第二语音特征向量;对第一语音特征向量和第二语音特征向量分别进行聚类算法分析,获得第一特征参数和第二特征参数。Optionally, the extracting
可选的,对比分析单元305包括:对第一特征参数和第二特征参数进行时间序列相似度的对比分析,得到初始特征参数,初始特征参数包括第一特征参数和/或第二特征参数;获取候选对比特征参数,候选对比特征参数对应的时间节点大于目标时间节点;计算候选对比特征参数与初始特征参数之间的相似度,获得对比特征参数,对比特征参数为相似度大于第二预设阈值的候选对比特征参数;对该对比特征参数和初始特征参数进行说话人识别处理,得到目标特征参数。Optionally, the comparative analysis unit 305 includes: performing a comparative analysis on the similarity of the time series on the first feature parameter and the second feature parameter to obtain an initial feature parameter, where the initial feature parameter includes the first feature parameter and/or the second feature parameter; Obtain candidate comparison feature parameters, the time node corresponding to the candidate comparison feature parameters is greater than the target time node; calculate the similarity between the candidate comparison feature parameters and the initial feature parameters, obtain the comparison feature parameters, and the comparison feature parameters are that the similarity is greater than the second preset. The candidate comparison feature parameter of the threshold; the speaker identification process is performed on the comparison feature parameter and the initial feature parameter to obtain the target feature parameter.
可选的,对比分析单元305还用于:通过动态时间规整算法计算第一特征参数和第二特征参数之间的规整路径距离;若规整路径距离小于预设值,则判断第一音频数据和第二音频数据对应的说话人为同一人,将特征参数数量多的第一特征参数或第二特征参数作为初始特征参数;若规整路径距离大于或等于预设值,则判断第一音频数据和第二音频数据对应的说话人非为同一人,将第一特征参数和第二特征参数作为初始特征参数。Optionally, the comparative analysis unit 305 is also used to: calculate the regular path distance between the first characteristic parameter and the second characteristic parameter by the dynamic time regularization algorithm; if the regular path distance is less than the preset value, then judge the first audio data and The speaker corresponding to the second audio data is the same person, and the first feature parameter or the second feature parameter with a large number of feature parameters is used as the initial feature parameter; if the regular path distance is greater than or equal to the preset value, then the first audio data and the The speakers corresponding to the two audio data are not the same person, and the first feature parameter and the second feature parameter are used as initial feature parameters.
可选的,获取单元306具体用于:将目标特征参数在预置的角色数据库中进行角色匹配;若在角色数据库中匹配到目标特征参数对应的角色,则将匹配到的角色作为目标角色;若在角色数据库中匹配不到目标特征参数对应的角色,则获取目标特征参数对应的候选音频数据的编号,将获取到的编号所对应的角色作为与目标特征参数对应的目标角色。Optionally, the obtaining
可选的,更新单元307具体用于:获取目标角色编号对应的角色名称,以及制作目标角色编号对应的目标特征参数的语音模板,目标角色编号用于指示目标角色对应的编号;在语音模板上标记角色名称,并将语音模板存储在角色数据库中,得到更新后的角色数据库;通过基于预置的隐马尔可夫模型和深度神经网络将目标角色对应的候选音频数据转化为文字形式,得到文字信息;对文字信息进行自然语言处理、重点内容标记处理和方案链接处理,得到候选会议信息,方案链接处理用于将文字信息中的问题内容相似的历史策略或爬取所得的解决方案添加链接至文字信息;将候选会议信息按照会议音频数据的说话顺序进行排序,得到会议记录角色信息。Optionally, the updating
上述智能会议角色分类的装置中各个单元的功能实现与上述智能会议角色分类的方法实施例中各步骤相对应,其功能和实现过程在此处不再一一赘述。The function implementation of each unit in the above apparatus for classifying roles of a smart conference corresponds to each step in the above method embodiment for classifying roles for a smart conference, and the functions and implementation processes thereof are not repeated here.
本发明实施例,通过自动比较音频数据断点的前后两个音频数据的特征参数,判断两个音频数据对应的说话人是否同一人,并根据判断说话人是否同一人所得的特征参数在角色数据库中进行角色匹配获得目标角色或者将音频数据对应的编号所作为目标角色,实现便捷而有效地进行多人会议场景的语音角色分离。In the embodiment of the present invention, by automatically comparing the characteristic parameters of the two audio data before and after the audio data breakpoint, it is judged whether the speakers corresponding to the two audio data are the same person, and according to the characteristic parameters obtained by judging whether the speakers are the same person, the characters are stored in the character database. The target character is obtained by character matching in the middle of the system, or the number corresponding to the audio data is used as the target character, so as to realize the convenient and effective separation of voice characters in the multi-person conference scene.
上面图3至图4从模块化功能实体的角度对本发明实施例中的智能会议角色分类的装置进行详细描述,下面从硬件处理的角度对本发明实施例中智能会议角色分类的设备进行详细描述。Figures 3 to 4 above describe in detail the device for classifying intelligent conference roles in the embodiment of the present invention from the perspective of modular functional entities. The following describes the device for classifying roles for intelligent conference in the embodiment of the present invention in detail from the perspective of hardware processing.
图5是本发明实施例提供的一种智能会议角色分类的设备的结构示意图,该智能会议角色分类的设备500可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上处理器(central processing units,CPU)501(例如,一个或一个以上处理器)和存储器509,一个或一个以上存储应用程序507或数据506的存储介质508(例如一个或一个以上海量存储装置)。其中,存储器509和存储介质508可以是短暂存储或持久存储。存储在存储介质508的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对签到管理设备中的一系列指令操作。更进一步地,处理器501可以设置为与存储介质508通信,在智能会议角色分类的设备500上执行存储介质508中的一系列指令操作。5 is a schematic structural diagram of a device for intelligent conference role classification according to an embodiment of the present invention. The
智能会议角色分类的设备500还可以包括一个或一个以上电源502,一个或一个以上有线或无线网络接口503,一个或一个以上输入输出接口504,和/或,一个或一个以上操作系统505,例如Windows Serve,Mac OS X,Unix,Linux,FreeBSD等等。本领域技术人员可以理解,图5中示出的智能会议角色分类的设备结构并不构成对智能会议角色分类的设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。处理器501可以执行上述实施例中分割单元301、识别单元302、截取单元303、提取单元304、对比分析单元305、获取单元306和更新单元307的功能。The
下面结合图5对智能会议角色分类的设备的各个构成部件进行具体的介绍:Below in conjunction with Fig. 5, each constituent component of the device of intelligent conference role classification will be introduced in detail:
处理器501是智能会议角色分类的设备的控制中心,可以按照智能会议角色分类的方法进行处理。处理器501利用各种接口和线路连接整个智能会议角色分类的设备的各个部分,通过运行或执行存储在存储器509内的软件程序和/或模块,以及调用存储在存储器509内的数据,执行智能会议角色分类的设备的各种功能和处理数据,从而实现便捷而有效地进行多人会议场景的语音角色分离的功能。存储介质508和存储器509都是存储数据的载体,本发明实施例中,存储介质508可以是指储存容量较小,但速度快的内存储器,而存储器509可以是储存容量大,但储存速度慢的外存储器。The
存储器509可用于存储软件程序以及模块,处理器501通过运行存储在存储器509的软件程序以及模块,从而执行智能会议角色分类的设备500的各种功能应用以及数据处理。存储器509可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(获取会议音频数据等)等;存储数据区可存储根据签到管理设备的使用所创建的数据(根据目标时间节点从多个候选音频数据中截取预设时段的第一音频数据和第二音频数据等)等。此外,存储器509可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。在本发明实施例中提供的智能会议角色分类的方法程序和接收到的数据流存储在存储器中,当需要使用时,处理器501从存储器509中调用。The
在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本发明实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、双绞线)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存储的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,光盘)、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of the present invention are generated. The computer may be a general purpose computer, special purpose computer, computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server, or data center Transmission to another website site, computer, server, or data center by wire (eg, coaxial cable, optical fiber, twisted pair) or wireless (eg, infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be stored by a computer, or a data storage device such as a server, data center, etc., which includes one or more available media integrated. The usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, optical disks), or semiconductor media (eg, solid state disks (SSDs)), and the like.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working process of the system, device and unit described above may refer to the corresponding process in the foregoing method embodiments, which will not be repeated here.
在本发明所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided by the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
另外,在本发明实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in this embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The integrated unit, if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention is essentially or the part that contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes: U disk, removable hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program codes .
以上所述,以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。As mentioned above, the above embodiments are only used to illustrate the technical solutions of the present invention, but not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand: The technical solutions described in the embodiments are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions depart from the spirit and scope of the technical solutions of the embodiments of the present invention.
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010136440.5ACN111462758A (en) | 2020-03-02 | 2020-03-02 | Method, device and equipment for intelligent conference role classification and storage medium |
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202010136440.5ACN111462758A (en) | 2020-03-02 | 2020-03-02 | Method, device and equipment for intelligent conference role classification and storage medium |
| Publication Number | Publication Date |
|---|---|
| CN111462758Atrue CN111462758A (en) | 2020-07-28 |
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202010136440.5APendingCN111462758A (en) | 2020-03-02 | 2020-03-02 | Method, device and equipment for intelligent conference role classification and storage medium |
| Country | Link |
|---|---|
| CN (1) | CN111462758A (en) |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112233680A (en)* | 2020-09-27 | 2021-01-15 | 科大讯飞股份有限公司 | Speaker role identification method and device, electronic equipment and storage medium |
| CN113077784A (en)* | 2021-03-31 | 2021-07-06 | 重庆风云际会智慧科技有限公司 | Intelligent voice equipment for role recognition |
| CN113542810A (en)* | 2021-07-14 | 2021-10-22 | 上海眼控科技股份有限公司 | Video processing method and device, electronic equipment and storage medium |
| CN113596261A (en)* | 2021-07-19 | 2021-11-02 | 电信科学技术第十研究所有限公司 | Voice line detection method and device |
| CN113808578A (en)* | 2021-11-16 | 2021-12-17 | 阿里巴巴达摩院(杭州)科技有限公司 | Audio signal processing method, device, device and storage medium |
| CN114360484A (en)* | 2020-09-27 | 2022-04-15 | 华为技术有限公司 | An audio optimization method, device, system and medium |
| CN114465737A (en)* | 2022-04-13 | 2022-05-10 | 腾讯科技(深圳)有限公司 | Data processing method and device, computer equipment and storage medium |
| CN115174283A (en)* | 2022-06-30 | 2022-10-11 | 上海掌门科技有限公司 | Hosting authority configuration method and equipment |
| CN115941993A (en)* | 2022-12-14 | 2023-04-07 | 成都爱奇艺智能创新科技有限公司 | Role splitting method, device, electronic device and storage medium |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101625857A (en)* | 2008-07-10 | 2010-01-13 | 新奥特(北京)视频技术有限公司 | Self-adaptive voice endpoint detection method |
| US20110222782A1 (en)* | 2010-03-10 | 2011-09-15 | Sony Corporation | Information processing apparatus, information processing method, and program |
| CN104021789A (en)* | 2014-06-25 | 2014-09-03 | 厦门大学 | Self-adaption endpoint detection method using short-time time-frequency value |
| CN105161093A (en)* | 2015-10-14 | 2015-12-16 | 科大讯飞股份有限公司 | Method and system for determining the number of speakers |
| CN107393527A (en)* | 2017-07-17 | 2017-11-24 | 广东讯飞启明科技发展有限公司 | The determination methods of speaker's number |
| US20180197548A1 (en)* | 2017-01-09 | 2018-07-12 | Onu Technology Inc. | System and method for diarization of speech, automated generation of transcripts, and automatic information extraction |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101625857A (en)* | 2008-07-10 | 2010-01-13 | 新奥特(北京)视频技术有限公司 | Self-adaptive voice endpoint detection method |
| US20110222782A1 (en)* | 2010-03-10 | 2011-09-15 | Sony Corporation | Information processing apparatus, information processing method, and program |
| CN104021789A (en)* | 2014-06-25 | 2014-09-03 | 厦门大学 | Self-adaption endpoint detection method using short-time time-frequency value |
| CN105161093A (en)* | 2015-10-14 | 2015-12-16 | 科大讯飞股份有限公司 | Method and system for determining the number of speakers |
| US20180197548A1 (en)* | 2017-01-09 | 2018-07-12 | Onu Technology Inc. | System and method for diarization of speech, automated generation of transcripts, and automatic information extraction |
| CN107393527A (en)* | 2017-07-17 | 2017-11-24 | 广东讯飞启明科技发展有限公司 | The determination methods of speaker's number |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114360484A (en)* | 2020-09-27 | 2022-04-15 | 华为技术有限公司 | An audio optimization method, device, system and medium |
| CN112233680A (en)* | 2020-09-27 | 2021-01-15 | 科大讯飞股份有限公司 | Speaker role identification method and device, electronic equipment and storage medium |
| CN112233680B (en)* | 2020-09-27 | 2024-02-13 | 科大讯飞股份有限公司 | Speaker character recognition method, speaker character recognition device, electronic equipment and storage medium |
| CN113077784A (en)* | 2021-03-31 | 2021-07-06 | 重庆风云际会智慧科技有限公司 | Intelligent voice equipment for role recognition |
| CN113077784B (en)* | 2021-03-31 | 2022-06-14 | 重庆风云际会智慧科技有限公司 | A character recognition intelligent voice device |
| CN113542810A (en)* | 2021-07-14 | 2021-10-22 | 上海眼控科技股份有限公司 | Video processing method and device, electronic equipment and storage medium |
| CN113596261A (en)* | 2021-07-19 | 2021-11-02 | 电信科学技术第十研究所有限公司 | Voice line detection method and device |
| CN113596261B (en)* | 2021-07-19 | 2024-01-05 | 电信科学技术第十研究所有限公司 | Voice line detection method and device |
| CN113808578B (en)* | 2021-11-16 | 2022-04-15 | 阿里巴巴达摩院(杭州)科技有限公司 | Audio signal processing method, device, equipment and storage medium |
| CN113808578A (en)* | 2021-11-16 | 2021-12-17 | 阿里巴巴达摩院(杭州)科技有限公司 | Audio signal processing method, device, device and storage medium |
| CN114465737A (en)* | 2022-04-13 | 2022-05-10 | 腾讯科技(深圳)有限公司 | Data processing method and device, computer equipment and storage medium |
| CN115174283A (en)* | 2022-06-30 | 2022-10-11 | 上海掌门科技有限公司 | Hosting authority configuration method and equipment |
| CN115174283B (en)* | 2022-06-30 | 2024-05-07 | 上海掌门科技有限公司 | Hosting authority configuration method and equipment |
| CN115941993A (en)* | 2022-12-14 | 2023-04-07 | 成都爱奇艺智能创新科技有限公司 | Role splitting method, device, electronic device and storage medium |
| Publication | Publication Date | Title |
|---|---|---|
| CN111462758A (en) | Method, device and equipment for intelligent conference role classification and storage medium | |
| CN110147726B (en) | Service quality inspection method and device, storage medium and electronic device | |
| CN111524527B (en) | Speaker separation method, speaker separation device, electronic device and storage medium | |
| WO2021128741A1 (en) | Voice emotion fluctuation analysis method and apparatus, and computer device and storage medium | |
| CN110910891B (en) | Speaker segmentation labeling method based on long-time and short-time memory deep neural network | |
| Maghilnan et al. | Sentiment analysis on speaker specific speech data | |
| CN103500579B (en) | Audio recognition method, Apparatus and system | |
| CN110349564A (en) | Across the language voice recognition methods of one kind and device | |
| CN112151015A (en) | Keyword detection method and device, electronic equipment and storage medium | |
| CN107767881B (en) | Method and device for acquiring satisfaction degree of voice information | |
| CN112735385A (en) | Voice endpoint detection method and device, computer equipment and storage medium | |
| CN117157708A (en) | Speaker log with support for episodic content | |
| US20250131938A1 (en) | Speech processing method, device and storage medium | |
| Flamary et al. | Spoken WordCloud: Clustering recurrent patterns in speech | |
| CN113420178B (en) | A data processing method and device | |
| WO2017045429A1 (en) | Audio data detection method and system and storage medium | |
| CN110633475A (en) | Natural language understanding method, device and system based on computer scene and storage medium | |
| CN119673173A (en) | A streaming speaker log method and system | |
| CN117877510A (en) | Voice automatic test method, device, electronic equipment and storage medium | |
| CN106710588B (en) | Speech data sentence recognition method, device and system | |
| CN118312890B (en) | Method for training keyword recognition model, method and device for recognizing keywords | |
| CN119629636A (en) | Spam call identification method, device, computer equipment and storage medium | |
| CN114822557B (en) | Methods, devices, equipment and storage media for distinguishing different sounds in the classroom | |
| CN111640450A (en) | Multi-person audio processing method, device, equipment and readable storage medium | |
| CN113539269A (en) | Audio information processing method, system and computer readable storage medium |
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| WD01 | Invention patent application deemed withdrawn after publication | ||
| WD01 | Invention patent application deemed withdrawn after publication | Application publication date:20200728 |