Movatterモバイル変換


[0]ホーム

URL:


CN114900590B - Tracking display method, teleprompter, electronic device, storage medium - Google Patents

Tracking display method, teleprompter, electronic device, storage medium
Download PDF

Info

Publication number
CN114900590B
CN114900590BCN202210293071.XACN202210293071ACN114900590BCN 114900590 BCN114900590 BCN 114900590BCN 202210293071 ACN202210293071 ACN 202210293071ACN 114900590 BCN114900590 BCN 114900590B
Authority
CN
China
Prior art keywords
text
candidate
tracking
determining
segment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210293071.XA
Other languages
Chinese (zh)
Other versions
CN114900590A (en
Inventor
司马华鹏
李慧水
汤毅平
汪成
张亚萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Silicon based Intelligent Technology Group Co.,Ltd.
Original Assignee
Nanjing Silicon Intelligence Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Silicon Intelligence Technology Co LtdfiledCriticalNanjing Silicon Intelligence Technology Co Ltd
Priority to CN202210293071.XApriorityCriticalpatent/CN114900590B/en
Publication of CN114900590ApublicationCriticalpatent/CN114900590A/en
Application grantedgrantedCritical
Publication of CN114900590BpublicationCriticalpatent/CN114900590B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

Translated fromChinese

本申请提供一种跟踪显示方法及提词器、电子设备、存储介质。解决了相关技术中提词器在工作过程中无法实现理想语音跟踪的技术问题。本申请提出的跟踪显示方法,适用于不同的语言及语境场景。通过判断确定对应的跟踪显示方案以及语言逻辑。具体采用定位与语音跟踪的方式,通过候选位置的筛选与相似度的比较方法实现语音跟踪。一方面令语音跟踪与定位的过程并不严格依赖于语音识别的效果。另一方面,在定位过程中是基于迭代的思路对可能的位置进行筛选并完成定位,其比较的对象不是待跟踪全部文本。进而缩短了处理时间,并可在语音识别错误以及用户口误的情况下,仍可根据用户的实际发言内容实现精准定位。从而实现快速而精准的语音跟踪。

The present application provides a tracking display method, a teleprompter, electronic equipment, and a storage medium. The technical problem that the teleprompter cannot realize ideal voice tracking in the working process in the related art is solved. The tracking display method proposed in this application is applicable to different language and context scenarios. Determine the corresponding tracking display scheme and language logic through judgment. Specifically, positioning and voice tracking are adopted, and voice tracking is realized by screening candidate positions and comparing similarities. On the one hand, the process of voice tracking and positioning does not strictly depend on the effect of voice recognition. On the other hand, in the positioning process, the possible positions are screened and positioned based on an iterative idea, and the comparison object is not all texts to be tracked. In turn, the processing time is shortened, and in the case of speech recognition errors and user slips of the tongue, precise positioning can still be achieved based on the user's actual speech content. This enables fast and precise voice tracking.

Description

Translated fromChinese
跟踪显示方法及提词器、电子设备、存储介质Tracking display method, teleprompter, electronic device, storage medium

技术领域technical field

本申请涉及电子设备技术领域,具体而言,涉及跟踪显示方法及提词器、电子设备、存储介质。The present application relates to the technical field of electronic equipment, and in particular, relates to a tracking display method, a teleprompter, electronic equipment, and a storage medium.

背景技术Background technique

在用户进行录播或直播的过程中需要电子设备辅助提示用户相关的目标文本,以实现提词功能的同时帮助用户顺利的进行工作。通常提词功能可以通过独立的电子设备实现如提词器,也可以通过电子设备中以应用程序的形式实现。In the process of recording or live broadcasting, the user needs electronic equipment to assist the user to prompt the relevant target text, so as to realize the teleprompter function and help the user to work smoothly. Generally, the teleprompter function can be realized through an independent electronic device such as a teleprompter, or it can be realized in the form of an application program in the electronic device.

在用户使用提词功能的过程中,目标文本会实时显示在提词器的屏幕上供用户浏览,同时提词器会根据用户发言时的语音,在显示目标文本的过程中实时标记用户的当前发言位置,以便于用户获取下一刻所需的提示文本。进而在实现提词功能的同时进行语音跟踪。When the user uses the teleprompter function, the target text will be displayed on the screen of the teleprompter in real time for the user to browse. At the same time, the teleprompter will mark the user's current text in real time during the process of displaying the target text according to the voice of the user when speaking. The speaking position, so that the user can obtain the prompt text needed at the next moment. Then, the voice tracking is carried out while realizing the prompt function.

然而,提词器在实现语音跟踪功能的过程中,通常采用自动语音识别(ASR)模型对目标文本进行语音识别。在语音识别的过程中,受限于ASR模型自身的识别误差,例如在用户在发言时存在语速变化、停顿发言或出现口误的场景下,会出现无法准确识别用户发言位置的问题,进而降低用户的使用体验。针对相关技术中,提词器在工作过程中无法实现理想语音跟踪的问题,尚未提出有效的解决方案。However, in the process of realizing the speech tracking function of the teleprompter, an automatic speech recognition (ASR) model is usually used to perform speech recognition on the target text. In the process of speech recognition, it is limited by the recognition error of the ASR model itself. For example, when the user has speech speed changes, pauses in speech, or slips of the tongue when speaking, the problem of being unable to accurately identify the user's speaking position will occur, thereby reducing the User experience. Aiming at the problem in the related art that the teleprompter cannot realize ideal voice tracking during the working process, no effective solution has been proposed yet.

发明内容Contents of the invention

本申请实施例提供了跟踪显示方法及提词器、电子设备、存储介质,以至少解决相关技术中无法实现理想语音跟踪的技术问题。The embodiment of the present application provides a tracking display method, a teleprompter, an electronic device, and a storage medium, so as to at least solve the technical problem that ideal voice tracking cannot be realized in related technologies.

在本申请的一个实施例中,提出了一种跟踪显示方法,方法包括:根据用户当前时刻输入的语音数据得到识别文本,并确定识别文本中的初始位置;其中,初始位置为上一时刻进行语音跟踪所确定的文本位置。根据初始位置、预设的候选位置总数以及预设的文本长度,确定多个候选位置以及每个候选位置对应的候选文本片段;其中,候选位置总数用于指示候选位置的数量,文本长度用于指示每一候选文本片段的文本长度。将每个候选位置对应的候选文本片段与识别文本进行相似度匹配,确定文本匹配结果。根据文本匹配结果确定当前时刻的语音跟踪位置,以及将语音跟踪位置呈现在用户界面中。In one embodiment of the present application, a tracking display method is proposed, the method includes: obtaining the recognized text according to the speech data input by the user at the current moment, and determining the initial position in the recognized text; wherein, the initial position is carried out at the previous moment The text position determined by speech tracking. According to the initial position, the total number of preset candidate positions and the preset text length, determine multiple candidate positions and the corresponding candidate text segment for each candidate position; wherein, the total number of candidate positions is used to indicate the number of candidate positions, and the text length is used for Indicates the text length of each candidate text segment. Match the candidate text segment corresponding to each candidate position with the recognized text to determine the text matching result. The voice tracking position at the current moment is determined according to the text matching result, and the voice tracking position is presented in the user interface.

在本申请的一个实施例中,在识别文本为第一语言类型的情况下,根据初始位置、预设的候选位置总数以及预设的文本长度,确定多个候选位置以及每个候选位置对应的候选文本片段,包括:根据候选位置总数选取候选字符集合,候选字符集合包括多个第一字符;其中,第一字符为初始位置之后的多个连续的字符,第一字符的数量与候选位置总数相等;每一第一字符分别对应一个候选位置。根据文本长度与第一字符确定候选文本片段;其中,候选文本片段包括文本长度对应数量的多个连续的字符;第一字符位于对应的候选文本片段的末位。In one embodiment of the present application, when the recognized text is the first language type, multiple candidate positions and the corresponding The candidate text segment includes: selecting a candidate character set according to the total number of candidate positions, and the candidate character set includes a plurality of first characters; wherein, the first character is a plurality of consecutive characters after the initial position, and the number of first characters is equal to the total number of candidate positions equal; each first character corresponds to a candidate position. The candidate text segment is determined according to the text length and the first character; wherein, the candidate text segment includes a number of consecutive characters corresponding to the text length; and the first character is located at the end of the corresponding candidate text segment.

在本申请的一个实施例中,根据初始位置、预设的候选位置总数以及预设的文本长度,确定多个候选位置以及每个候选位置对应的候选文本片段,包括:根据候选位置总数选取候选字符集合,候选字符集合包括多个第一字符与第二字符;其中,第一字符为初始位置之后的多个连续的字符,第二字符包括初始位置对应的字符以及初始位置之前的多个连续的字符,第一字符与第二字符的总数与候选位置总数相等;每一第一字符以及第二字符分别对应一个候选位置。根据文本长度与第一字符、第二字符确定候选文本片段;其中,候选文本片段包括文本长度对应数量的多个连续的字符;第一字符位于对应的候选文本片段的末位,第二字符位于对应的候选文本片段的末位。In one embodiment of the present application, according to the initial position, the total number of preset candidate positions, and the preset text length, determining a plurality of candidate positions and the corresponding candidate text segments for each candidate position includes: selecting candidates according to the total number of candidate positions A character set, the candidate character set includes a plurality of first characters and second characters; wherein, the first character is a plurality of consecutive characters after the initial position, and the second character includes a character corresponding to the initial position and a plurality of consecutive characters before the initial position characters, the total number of first characters and second characters is equal to the total number of candidate positions; each first character and second character corresponds to one candidate position. Determine the candidate text segment according to the text length and the first character and the second character; wherein, the candidate text segment includes a plurality of continuous characters corresponding to the text length; the first character is located at the end of the corresponding candidate text segment, and the second character is located at the end of the corresponding candidate text segment corresponding to the end of the candidate text segment.

在本申请的一个实施例中,将每个候选位置对应的候选文本片段与识别文本进行相似度匹配,确定文本匹配结果;包括:根据每个候选文本片段与识别文本之间的相似度,确定每个候选文本片段对应的第一文本匹配值。根据多个候选位置,确定每个候选文本片段对应的距离惩罚值,距离惩罚值用于表征候选位置与初始位置之间的文本距离。根据每个候选文本片段对应的第一文本匹配值与距离惩罚值,确定每个候选文本片段对应的第二文本匹配值。根据多个候选文本片段对应的第二文本匹配值,确定文本匹配结果。In one embodiment of the present application, similarity matching is performed between the candidate text segment corresponding to each candidate position and the recognition text, and the text matching result is determined; including: according to the similarity between each candidate text segment and the recognition text, determining The first text matching value corresponding to each candidate text segment. According to multiple candidate positions, a distance penalty value corresponding to each candidate text segment is determined, and the distance penalty value is used to characterize the text distance between the candidate position and the initial position. A second text matching value corresponding to each candidate text segment is determined according to the first text matching value and the distance penalty value corresponding to each candidate text segment. A text matching result is determined according to the second text matching values corresponding to the plurality of candidate text segments.

在本申请的一个实施例中,用户界面显示有待跟踪文本,所语音跟踪位置为候选文本片段末端文本对应在待跟踪文本中的位置;将语音跟踪位置呈现在用户界面中,包括:将语音跟踪位置标记在待跟踪文本的显示跟踪位置,以及将显示跟踪位置呈现在用户界面中;其中,显示跟踪位置为语音跟踪位置之后的文本位置。In one embodiment of the present application, the user interface displays the text to be tracked, and the voice tracking position is the position corresponding to the end text of the candidate text segment in the text to be tracked; presenting the voice tracking position in the user interface includes: voice tracking The position is marked at the displayed tracking position of the text to be tracked, and the displayed tracking position is presented in the user interface; wherein, the displayed tracking position is a text position after the voice tracking position.

在本申请的一个实施例中,在识别文本为第二语言类型的情况下,根据初始位置、预设的候选位置总数以及预设的文本长度,确定多个候选位置以及每个候选位置对应的候选文本片段,包括:根据候选位置总数选取候选单词集合,候选单词集合包括多个第一单词;其中,第一单词为初始位置之后的多个连续的单词,第一单词的数量与候选位置总数相等;每一第一单词分别对应一个候选位置。根据文本长度与第一单词确定候选文本片段;其中,候选文本片段包括文本长度对应数量的多个连续的单词;第一单词位于对应的候选文本片段的末位。In one embodiment of the present application, when the recognized text is in the second language type, multiple candidate positions and the corresponding Candidate text fragments include: selecting a candidate word set according to the total number of candidate positions, and the candidate word set includes a plurality of first words; wherein, the first word is a plurality of consecutive words after the initial position, and the quantity of the first word is related to the total number of candidate positions equal; each first word corresponds to a candidate position. The candidate text segment is determined according to the text length and the first word; wherein, the candidate text segment includes a number of consecutive words corresponding to the text length; the first word is located at the end of the corresponding candidate text segment.

在本申请的一个实施例中,根据初始位置、预设的候选位置总数以及预设的文本长度,确定多个候选位置以及每个候选位置对应的候选文本片段,包括:根据候选位置总数选取候选单词集合,候选单词集合包括多个第一单词与第二单词;其中,第一单词为初始位置之后的多个连续的单词,第二单词包括初始位置对应的单词以及初始位置之前的多个连续的单词,第一单词与第二单词的总数与候选位置总数相等;每一第一单词以及第二单词分别对应一个候选位置。根据文本长度与第一单词、第二单词确定候选文本片段;其中,候选文本片段包括文本长度对应数量的多个连续的单词;第一单词位于对应的候选文本片段的末位,第二单词位于对应的候选文本片段的末位。In one embodiment of the present application, according to the initial position, the total number of preset candidate positions, and the preset text length, determining a plurality of candidate positions and the corresponding candidate text segments for each candidate position includes: selecting candidates according to the total number of candidate positions A word set, the candidate word set includes a plurality of first words and second words; wherein, the first word is a plurality of consecutive words after the initial position, and the second word includes a word corresponding to the initial position and a plurality of consecutive words before the initial position words, the total number of the first word and the second word is equal to the total number of candidate positions; each first word and second word corresponds to a candidate position respectively. Determine the candidate text segment according to the text length and the first word and the second word; wherein, the candidate text segment includes a plurality of continuous words corresponding to the text length; the first word is positioned at the end of the corresponding candidate text segment, and the second word is positioned at the end of the corresponding candidate text segment corresponding to the end of the candidate text segment.

在本申请的一个实施例中,将每个候选位置对应的候选文本片段与识别文本进行相似度匹配,确定文本匹配结果,包括:根据每个候选文本片段与识别文本之间的相似度,确定每个候选文本片段对应的第三文本匹配值。根据第三文本匹配值确定每个候选文本片段对应的第一比较值和第二比较值;第一比较值用于表征候选文本片段与识别文本同一位置的两个单词中每个字母位置对应的字母相同比例;第二比较值用于表征候选文本片段与识别文本同一位置的两个单词中相同字母占字母总数的比例。根据多个候选位置,确定每个候选文本片段对应的距离惩罚值,距离惩罚值用于表征候选位置与初始位置之间的文本距离。根据每个候选文本片段对应的第一比较值、第二比较值与距离惩罚值,确定每个候选文本片段对应的第四文本匹配值。根据多个候选文本片段对应的第四文本匹配值,确定文本匹配结果。In one embodiment of the present application, performing similarity matching between the candidate text segment corresponding to each candidate position and the recognition text, and determining the text matching result includes: determining according to the similarity between each candidate text segment and the recognition text A third text matching value corresponding to each candidate text segment. Determine the first comparison value and the second comparison value corresponding to each candidate text segment according to the third text matching value; The same proportion of letters; the second comparison value is used to characterize the proportion of the same letter in the total number of letters in the candidate text segment and the two words at the same position in the recognized text. According to multiple candidate positions, a distance penalty value corresponding to each candidate text segment is determined, and the distance penalty value is used to characterize the text distance between the candidate position and the initial position. A fourth text matching value corresponding to each candidate text segment is determined according to the first comparison value, the second comparison value, and the distance penalty value corresponding to each candidate text segment. A text matching result is determined according to the fourth text matching values corresponding to the plurality of candidate text segments.

在本申请的一个实施例中,还包括:识别文本为第一类语言类型时设定的候选位置总数小于识别文本为第二类语言类型或第一类语言与第二类语言混合类型时设定的候选位置总数。In one embodiment of the present application, it also includes: the total number of candidate positions set when the recognized text is the first type of language is less than the set when the recognized text is the second type of language or the mixed type of the first type and the second type of language The total number of selected candidate positions.

在本申请的一个实施例中,确定识别文本中的初始位置之前,还包括:检测识别文本的语言类型,以使根据不同的语言类型确定对应不同的初始位置。In an embodiment of the present application, before determining the initial position in the recognized text, it further includes: detecting the language type of the recognized text, so that different initial positions can be determined according to different language types.

在本申请的一个实施例中,还提供了一种提词器,提词器包括:In one embodiment of the present application, a teleprompter is also provided, and the teleprompter includes:

ASR模块,ASR模块被配置为根据用户当前时刻输入的语音数据得到识别文本;ASR module, the ASR module is configured to obtain the recognized text according to the voice data input by the user at the current moment;

语音跟踪模块,语音跟踪模块被配置为确定识别文本中的初始位置;其中,初始位置为上一时刻进行语音跟踪所确定的文本位置;根据初始位置、预设的候选位置总数以及预设的文本长度,确定多个候选位置以及每个候选位置对应的候选文本片段;其中,候选位置总数用于指示候选位置的数量,文本长度用于指示每一候选文本片段的文本长度;将每个候选位置对应的候选文本片段与识别文本进行相似度匹配,确定文本匹配结果;根据文本匹配结果确定当前时刻的语音跟踪位置;Voice tracking module, the voice tracking module is configured to determine the initial position in the recognized text; wherein, the initial position is the text position determined by voice tracking at the last moment; according to the initial position, the total number of preset candidate positions and the preset text length, to determine multiple candidate positions and the corresponding candidate text segments for each candidate position; wherein, the total number of candidate positions is used to indicate the number of candidate positions, and the text length is used to indicate the text length of each candidate text segment; each candidate position Carry out similarity matching between the corresponding candidate text segment and the recognized text to determine the text matching result; determine the voice tracking position at the current moment according to the text matching result;

显示模块,显示模块被配置为将语音跟踪位置呈现在用户界面中。A display module, the display module is configured to present the voice tracking position in a user interface.

在本申请的一个实施例中,用户界面显示有待跟踪文本,所语音跟踪位置为候选文本片段末端文本对应在待跟踪文本中的位置;将语音跟踪位置呈现在用户界面中,显示模块还被配置为:In one embodiment of the present application, the user interface displays the text to be tracked, and the voice tracking position is the position in the text to be tracked corresponding to the end text of the candidate text segment; the voice tracking position is presented in the user interface, and the display module is also configured for:

将语音跟踪位置标记在待跟踪文本的显示跟踪位置,以及将显示跟踪位置呈现在用户界面中;其中,显示跟踪位置为语音跟踪位置之后的文本位置。The voice tracking position is marked at the display tracking position of the text to be tracked, and the display tracking position is presented in the user interface; wherein, the display tracking position is a text position after the voice tracking position.

在本申请的一个实施例中,还提出了一种计算机可读的存储介质,存储介质中存储有计算机程序,其中,计算机程序被设置为运行时执行上述方法中的步骤。In one embodiment of the present application, a computer-readable storage medium is also provided, and a computer program is stored in the storage medium, wherein the computer program is configured to execute the steps in the above method when running.

在本申请的一个实施例中,还提出了一种电子设备,包括存储器和处理器,存储器中存储有计算机程序,处理器被设置为运行计算机程序以执行上述方法中的步骤。In one embodiment of the present application, an electronic device is also provided, including a memory and a processor, where a computer program is stored in the memory, and the processor is configured to run the computer program to perform the steps in the above method.

通过本申请实施例,本申请提供了跟踪显示方法及提词器、电子设备、存储介质。解决了相关技术中提词器在工作过程中无法实现理想语音跟踪的技术问题。本申请提出的跟踪显示方法,适用于不同的语言及语境场景。通过判断确定对应的跟踪显示方案以及语言逻辑。具体采用定位与语音跟踪的方式,通过候选位置的筛选与相似度的比较方法实现语音跟踪。一方面令语音跟踪与定位的过程并不严格依赖于语音识别的效果。另一方面,在定位过程中是基于迭代的思路对可能的位置进行筛选并完成定位,其比较的对象不是待跟踪全部文本。进而缩短了处理时间,并可在语音识别错误以及用户口误的情况下,仍可根据用户的实际发言内容实现精准定位。从而实现快速而精准的语音跟踪。Through the embodiments of the present application, the present application provides a tracking display method, a teleprompter, an electronic device, and a storage medium. The technical problem that the teleprompter cannot realize ideal voice tracking in the working process in the related art is solved. The tracking display method proposed in this application is applicable to different language and context scenarios. Determine the corresponding tracking display scheme and language logic through judgment. Specifically, positioning and voice tracking are adopted, and voice tracking is realized by screening candidate positions and comparing similarities. On the one hand, the process of voice tracking and positioning does not strictly depend on the effect of voice recognition. On the other hand, in the positioning process, the possible positions are screened and positioned based on an iterative idea, and the comparison object is not all texts to be tracked. In turn, the processing time is shortened, and in the case of speech recognition errors and user slips of the tongue, precise positioning can still be achieved based on the user's actual speech content. This enables fast and precise voice tracking.

附图说明Description of drawings

此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:The drawings described here are used to provide a further understanding of the application and constitute a part of the application. The schematic embodiments and descriptions of the application are used to explain the application and do not constitute an improper limitation to the application. In the attached picture:

图1是根据本申请实施例的一种可选的跟踪显示方法流程图;FIG. 1 is a flow chart of an optional tracking and display method according to an embodiment of the present application;

图2是根据本申请实施例的一种可选的确定候选文本片段流程图;FIG. 2 is an optional flowchart for determining candidate text segments according to an embodiment of the present application;

图3是根据本申请实施例的一种可选的确定候选文本片段流程图;FIG. 3 is an optional flowchart for determining candidate text segments according to an embodiment of the present application;

图4是根据本申请实施例的一种可选的提词器的结构示意图;Fig. 4 is a schematic structural diagram of an optional teleprompter according to an embodiment of the present application;

图5是根据本申请实施例的一种可选的电子设备的结构示意图。Fig. 5 is a schematic structural diagram of an optional electronic device according to an embodiment of the present application.

具体实施方式Detailed ways

下文中将参考附图并结合实施例来详细说明本申请。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。Hereinafter, the present application will be described in detail with reference to the drawings and embodiments. It should be noted that, in the case of no conflict, the embodiments in the present application and the features in the embodiments can be combined with each other.

需要说明的是,本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。It should be noted that the terms "first" and "second" in the description and claims of the present application and the above drawings are used to distinguish similar objects, but not necessarily used to describe a specific sequence or sequence.

首先对本申请实施例的应用场景进行说明。在用户进行主持、表演、录播或直播的过程中需要电子设备辅助提示用户相关的目标文本,以实现提词功能的同时帮助用户顺利的进行相应的工作。通常提词功能可以通过独立的电子设备实现如提词器,也可以通过电子设备中以应用程序的形式实现。First, the application scenario of the embodiment of the present application will be described. In the process of hosting, performing, recording or live broadcasting, electronic equipment is required to assist the user to prompt the relevant target text, so as to realize the teleprompter function and help the user to carry out the corresponding work smoothly. Generally, the teleprompter function can be realized through an independent electronic device such as a teleprompter, or it can be realized in the form of an application program in the electronic device.

在用户使用提词功能的过程中,目标文本会实时显示在提词器屏幕的用户界面中供用户浏览,同时提词器会根据用户发言时的语音,在显示目标文本的过程中实时标记用户的当前发言位置,以便于用户获取下一刻所需的提示文本。进而在实现提词功能的同时进行语言跟踪。为便于后续描述,将用户使用的目标文本称为待跟踪文本。When the user uses the teleprompter function, the target text will be displayed in the user interface of the teleprompter screen in real time for the user to browse, and the teleprompter will mark the user in real time during the process of displaying the target text according to the voice of the user when speaking The current speaking position of , so that the user can obtain the prompt text needed at the next moment. Then, language tracking is carried out while realizing the teleprompter function. For the convenience of subsequent description, the target text used by the user is called the text to be tracked.

示例性的,待跟踪文本为“我觉得你像一个一辈子都得走在路上的香客孜孜不倦地寻找一座完全不存在于这个世界的神庙……”。用户可以根据显示的文字进行跟读,在朗读的过程中以一句话为单位或一个字符为单位进行语音跟踪标记。例如用户跟读到“我觉得你像一个”中的“个”,在用户界面中会实时对“个”的位置进行标记。再例如用户朗读到“这个世界的神庙”中的“庙”后,在用户界面中会实时对“庙”的位置进行标记,并自动对下一句进行跟踪标记。若在用户跟读为当前页面的最后一个字的情况下,控制显示当前显示文本的下一页文本,继续进行实时标记。Exemplarily, the text to be tracked is "I think you are like a pilgrim who has to walk all his life, tirelessly looking for a temple that does not exist in this world...". The user can read along with the displayed text, and mark the voice tracking in units of one sentence or one character during the process of reading aloud. For example, when a user reads "a" in "I think you look like one", the location of "a" will be marked in real time in the user interface. For another example, after the user reads the word "temple" in "the temple of this world", the location of "temple" will be marked in real time in the user interface, and the next sentence will be automatically tracked and marked. If it is the last word of the current page that the user follows and reads, control and display the text of the next page of the currently displayed text, and continue to mark in real time.

在相关技术中提词器实现语音跟踪功能时,通常采用自动语音识别(ASR)模型对目标文本进行语音识别。同时提词器通常仅支持中文以及用户匀速发音的语境。通常,在语音识别的过程中,受限于ASR模型自身的识别误差以及提词器通常采用匀速播放的形式,例如用户的语音为除中文之外其他语言和/或用户在发言时存在语速变化、停顿发言或出现口误的场景下,均会出现无法准确识别用户发言位置,以及无法及时更新跟踪位置的问题,进而降低用户的使用体验。In the related art, when the teleprompter implements the voice tracking function, an automatic speech recognition (ASR) model is usually used to perform voice recognition on the target text. At the same time, the teleprompter usually only supports Chinese and the context where the user pronounces at a constant speed. Usually, in the process of speech recognition, it is limited by the recognition error of the ASR model itself and the teleprompter usually plays at a constant speed, for example, the user's voice is in a language other than Chinese and/or the user has a speech speed when speaking In the case of changes, speech pauses, or slips of the tongue, there will be problems such as the inability to accurately identify the user's speech location, and the inability to update the tracking location in time, thereby reducing the user experience.

为优化上述问题,本申请提供了一种跟踪显示方法,以至少解决相关技术中无法实现理想语音跟踪的技术问题。In order to optimize the above problems, the present application provides a tracking and display method to at least solve the technical problem in the related art that ideal voice tracking cannot be realized.

图1是根据本申请实施例的一种可选的跟踪显示方法流程图,参见图1,本申请提供的一种跟踪显示方法,具体过程如下:Fig. 1 is a flow chart of an optional tracking and display method according to an embodiment of the present application. Referring to Fig. 1, the specific process of a tracking and display method provided by the present application is as follows:

S1:根据用户当前时刻输入的语音数据得到识别文本,并确定识别文本中的初始位置;其中,初始位置为上一时刻进行语音跟踪所确定的文本位置。S1: Obtain the recognized text according to the voice data input by the user at the current moment, and determine the initial position in the recognized text; wherein, the initial position is the text position determined by voice tracking at the previous moment.

在本实施例中,识别文本为根据用户输入的语音数据进行识别后的文本,即用户当前发言的文本内容。待跟踪文本为用户创作设定的文本,如在场景下中辅助提示用户相关的台词、台本。其中,当用户根据显示的待跟踪文本进行跟读时,识别文本需与待跟踪文件关联,才能实时对用户跟读的文本位置在待跟踪文本中进行跟踪,以实现语音跟踪以及提词功能。In this embodiment, the recognized text is the text recognized according to the voice data input by the user, that is, the text content of the user's current speech. The text to be tracked is the text created and set by the user, such as assisting in prompting the user related lines and scripts in the scene. Among them, when the user reads according to the displayed text to be tracked, the recognized text needs to be associated with the file to be tracked, so that the position of the text to be tracked by the user can be tracked in real time in the text to be tracked, so as to realize the function of voice tracking and prompting.

在本实施例中,将上一时刻进行语音跟踪后所定位的位置定义为当前时刻的初始位置。需要说明的是,本申请中提供的跟踪显示方法是不同推进的。即上一时刻的跟踪结果作为当前时刻的初始位置,当前时刻的跟踪结果也作为下一时刻的初始位置,依次进行迭代。In this embodiment, the position located after voice tracking at the last time is defined as the initial position at the current time. It should be noted that the tracking and display methods provided in this application are carried out in different ways. That is, the tracking result at the previous time is used as the initial position at the current time, and the tracking result at the current time is also used as the initial position at the next time, and iterated sequentially.

示例性的,若用户上一时刻发言的文本内容为“我觉得你像一个一辈子都得走在路上的香客孜孜不倦”中的“倦”字,即上一时刻的语音跟踪结果也为“倦”的位置。进而,当前时刻的初始位置为“倦”。For example, if the text content of the user's speech at the last moment is the word "tired" in "I think you are like a pilgrim who has to walk on the road tirelessly for a lifetime", that is, the voice tracking result at the last moment is also "tired" s position. Furthermore, the initial position at the current moment is "tired".

在本实施例中,在确定识别文本中的初始位置之前,还包括:检测识别文本的语言类型,以使根据不同的语言类型确定对应不同的初始位置。In this embodiment, before determining the initial position in the recognized text, it further includes: detecting the language type of the recognized text, so that different initial positions can be determined according to different language types.

在本实施例中,提供的跟踪显示方法适用于不同种语言类型。示例性的,包括第一语言类型、第二语言类型以及第三语言类型。第一语言类型是指中文、日文、韩文等以字符为单位的语言。第二语言类型指英文、法文、德文等以字母为单位的拉丁语系语言。第三语言类型是指混杂类型语言,如第一类语言与第二类语言混合类型即中英文混杂等。需要说明的是,语言类型中的语音包括但不限于上述语言,可根据实际情况自行设计。In this embodiment, the tracking display method provided is applicable to different language types. Exemplarily, it includes a first language type, a second language type and a third language type. The first language type refers to languages such as Chinese, Japanese, and Korean that take characters as units. The second language type refers to English, French, German and other Latin languages with letters as the unit. The third language type refers to a mixed type of language, such as the mixed type of the first type of language and the second type of language, that is, a mixture of Chinese and English. It should be noted that the speech in the language type includes but is not limited to the above languages, and can be designed according to the actual situation.

S2:根据初始位置、预设的候选位置总数以及预设的文本长度,确定多个候选位置以及每个候选位置对应的候选文本片段;其中,候选位置总数用于指示候选位置的数量,文本长度用于指示每一候选文本片段的文本长度。S2: According to the initial position, the total number of preset candidate positions and the preset text length, determine a plurality of candidate positions and the corresponding candidate text fragments for each candidate position; wherein, the total number of candidate positions is used to indicate the number of candidate positions, the text length Used to indicate the text length of each candidate text fragment.

在本实施例中,基于初始位置的基础上构建候选位置列表,其中,候选位置列表包括候选位置总数、预设的文本长度以及候选文本片段。具体地,候选位置总数用于表征候选位置的数量;每个候选位置对应一个候选文本片段;文本长度用于指示每一候选文本片段的文本长度。需要说明的是,候选位置总数、预设的文本长度以及候选文本片段的设置本申请不进行具体限定,可根据实际情况自行设计。例如,不同的文本类型可设置不同的N。还可以通过根据待跟踪文本以及初始位置确定后续文本对应类型进而自动调整N的数值。In this embodiment, a candidate position list is constructed based on the initial position, wherein the candidate position list includes the total number of candidate positions, a preset text length, and candidate text segments. Specifically, the total number of candidate positions is used to represent the number of candidate positions; each candidate position corresponds to a candidate text segment; the text length is used to indicate the text length of each candidate text segment. It should be noted that the total number of candidate positions, the preset text length, and the settings of candidate text segments are not specifically limited in this application, and can be designed according to actual conditions. For example, different text types can be set with different N. It is also possible to automatically adjust the value of N by determining the corresponding type of subsequent text according to the text to be tracked and the initial position.

图2是根据本申请实施例的一种可选的确定候选文本片段流程图,参见图2,具体过程如下:Fig. 2 is an optional flow chart for determining candidate text segments according to an embodiment of the present application. Referring to Fig. 2, the specific process is as follows:

在识别文本为第一语言类型的情况下,根据初始位置、预设的候选位置总数以及预设的文本长度,确定多个候选位置以及每个候选位置对应的候选文本片段,包括:In the case where the recognized text is the first language type, according to the initial position, the total number of preset candidate positions and the preset text length, multiple candidate positions and candidate text segments corresponding to each candidate position are determined, including:

S21、根据候选位置总数选取候选字符集合,候选字符集合包括多个第一字符;其中,第一字符为初始位置之后的多个连续的字符,第一字符的数量与候选位置总数相等;每一第一字符分别对应一个候选位置。S21. Select a candidate character set according to the total number of candidate positions, and the candidate character set includes a plurality of first characters; wherein, the first character is a plurality of consecutive characters after the initial position, and the number of first characters is equal to the total number of candidate positions; each The first characters respectively correspond to a candidate position.

S22、根据文本长度与第一字符确定候选文本片段;其中,候选文本片段包括文本长度对应数量的多个连续的字符;第一字符位于对应的候选文本片段的末位。S22. Determine a candidate text segment according to the text length and the first character; wherein, the candidate text segment includes a number of consecutive characters corresponding to the text length; and the first character is located at the end of the corresponding candidate text segment.

示例性的,候选位置总数N为10,每一个候选位置对应的文本长度K为5。以下述文本作为示例:“我觉得你像一个一辈子都得走在路上的香客孜孜不倦地寻找一座完全不存在于这个世界的神庙……”。初始位置为“倦”。则第一字符是以“倦”为出发点,选取“倦”之后的10个字符,分别对应为“地、寻、找、一、座、完、全、不、存、在”。上述10个字符对应10个候选位置。进一步,以上述每个字符为出发点,向前逐字截取K-1个字符,以得到每一个候选位置。Exemplarily, the total number N of candidate positions is 10, and the text length K corresponding to each candidate position is 5. Take the following text as an example: "I feel like a pilgrim who spends his life on the road in search of a temple that doesn't exist in this world at all...". The initial position is "tired". Then the first character is "juan" as the starting point, and the 10 characters after "juan" are selected, which correspond to "land, search, find, one, seat, complete, complete, not, exist, exist" respectively. The above 10 characters correspond to 10 candidate positions. Further, taking each of the above characters as a starting point, intercept K-1 characters forward word by word, so as to obtain each candidate position.

在“地、寻、找、一、座、完、全、不、存、在”,上述每一个字符的基础上,向前截取4个字符,如“地”之前的字符为“孜、孜、不、倦”,则“地”对应的候选位置为“孜孜不倦地”;又如“座”之前的字符为“地、寻、找、一”,则“座”对应的候选位置为“地寻找一座”。以此类推,所有的候选位置为:On the basis of each of the above-mentioned characters, 4 characters are intercepted forward on the basis of "ground, search, search, one, seat, complete, complete, non-existence, existence", for example, the characters before "ground" are "zi, zi , no, tired", then the candidate position corresponding to "地" is "satisfactorily"; Find one." By analogy, all candidate positions are:

候选位置1:孜孜不倦地;Candidate position 1: tirelessly;

候选位置2:孜不倦地寻;Candidate position 2: search tirelessly;

候选位置3:不倦地寻找;Candidate position 3: search tirelessly;

候选位置4:倦地寻找一;Candidate position 4: tirelessly looking for one;

……...

候选位置10:完全不存在。Candidate position 10: Absolutely absent.

因此,每个候选位置对应一个候选文本片段,每个候选文本片段的文本长度相同。Therefore, each candidate position corresponds to a candidate text segment, and each candidate text segment has the same text length.

需要说明的是,在提词器实际运用过程中,也会存在部分用户对之前的内容反复诵读的情况,对此,本申请还提供了一种确定多个候选位置以及每个候选位置对应的候选文本片段的实现方式。图3是根据本申请实施例的一种可选的确定候选文本片段流程图,参见图3,具体过程如下:It should be noted that during the actual operation of the teleprompter, some users may recite the previous content repeatedly. For this, the application also provides a method for determining multiple candidate positions and each candidate position The implementation of candidate text fragments. Fig. 3 is an optional flow chart of determining a candidate text segment according to an embodiment of the present application. Referring to Fig. 3, the specific process is as follows:

S31、根据候选位置总数选取候选字符集合,候选字符集合包括多个第一字符与第二字符;其中,第一字符为初始位置之后的多个连续的字符,第二字符包括初始位置对应的字符以及初始位置之前的多个连续的字符,第一字符与第二字符的总数与候选位置总数相等;每一第一字符以及第二字符分别对应一个候选位置。S31. Select a candidate character set according to the total number of candidate positions, the candidate character set includes a plurality of first characters and second characters; wherein, the first character is a plurality of consecutive characters after the initial position, and the second character includes characters corresponding to the initial position As well as a plurality of consecutive characters before the initial position, the total number of first characters and second characters is equal to the total number of candidate positions; each first character and second character corresponds to a candidate position respectively.

S32、根据文本长度与第一字符、第二字符确定候选文本片段;其中,候选文本片段包括文本长度对应数量的多个连续的字符;第一字符位于对应的候选文本片段的末位,第二字符位于对应的候选文本片段的末位。S32. Determine the candidate text segment according to the text length, the first character, and the second character; wherein, the candidate text segment includes a plurality of consecutive characters corresponding to the text length; the first character is located at the end of the corresponding candidate text segment, and the second The characters are located at the end of the corresponding candidate text segment.

示例性的,候选位置总数N为10,每一个候选位置对应的文本长度K为5。以下述文本作为示例:“我觉得你像一个一辈子都得走在路上的香客孜孜不倦地寻找一座完全不存在于这个世界的神庙……”。初始位置为“倦”。设定第二字符为M,则第一字符为N-M。仍以上述示例进行说明,设定M为4,则以“倦”为出发点,选取“倦”之前的“孜、孜、不、倦”以及“倦”之后的“地、寻、找、一、座、完”;再在上述每一个字符的基础上确定对应的候选位置。一般而言,M的设置要远小于N-M。Exemplarily, the total number N of candidate positions is 10, and the text length K corresponding to each candidate position is 5. Take the following text as an example: "I feel like a pilgrim who spends his life on the road in search of a temple that doesn't exist in this world at all...". The initial position is "tired". If the second character is set to M, then the first character is N-M. Still using the above example for illustration, if M is set to 4, then "Jian" is taken as the starting point, and "Zi, Zi, No, Tired" before "Jian" and "Land, Search, Find, Yi" after "Jian" are selected. , seat, end"; and then determine the corresponding candidate position on the basis of each character above. Generally speaking, the setting of M is much smaller than that of N-M.

进一步,以上述每个字符为出发点,向前逐字截取K-1个字符,以得到每一个候选位置。Further, taking each of the above characters as a starting point, intercept K-1 characters forward word by word, so as to obtain each candidate position.

其中,候选字符集合为“孜、孜、不、倦、地、寻、找、一、座、完”,上述每一个字符的基础上,向前截取4个字符,如“地”之前的字符为“孜、孜、不、倦”,则“地”对应的候选位置为“孜孜不倦地”;又如“座”之前的字符为“地、寻、找、一”,则“座”对应的候选位置为“地寻找一座”。以此类推,所有的候选位置为:Among them, the set of candidate characters is "Zi, Zi, Bu, Tired, Di, Xun, Find, Yi, Zuo, Wan". On the basis of each of the above characters, 4 characters are intercepted forward, such as the character before "Di". For "Zi, Zi, Bu, Tired", then the candidate position corresponding to "地" is "ZiZiZi"; The candidate location is "looking for a building". By analogy, all candidate positions are:

候选位置1:上的香客孜;Candidate position 1: Xiangkezi on the Internet;

候选位置2:的香客孜孜;Candidate position 2: the pilgrim Zizi;

候选位置3:香客孜孜不;Candidate position 3: Pilgrim diligently;

候选位置4:孜孜不倦地;Candidate position 4: tirelessly;

……...

候选位置10:寻找一座完。Candidate location 10: looking for a complete.

在识别文本为第二语言类型的情况下,本申请还提供了两种确定多个候选位置以及每个候选位置对应的候选文本片段的实现方式,第一种具体包括以下步骤:In the case of identifying the text as the second language type, the present application also provides two implementations for determining multiple candidate positions and the candidate text fragments corresponding to each candidate position. The first one specifically includes the following steps:

根据初始位置、预设的候选位置总数以及预设的文本长度,确定多个候选位置以及每个候选位置对应的候选文本片段,包括:According to the initial position, the total number of preset candidate positions, and the preset text length, multiple candidate positions and candidate text segments corresponding to each candidate position are determined, including:

根据候选位置总数选取候选单词集合,候选单词集合包括多个第一单词;其中,第一单词为初始位置之后的多个连续的单词,第一单词的数量与候选位置总数相等;每一第一单词分别对应一个候选位置;Select the candidate word set according to the total number of candidate positions, and the candidate word set includes a plurality of first words; Wherein, the first word is a plurality of continuous words after the initial position, and the quantity of the first word is equal to the total number of candidate positions; each first Each word corresponds to a candidate position;

根据文本长度与第一单词确定候选文本片段;其中,候选文本片段包括文本长度对应数量的多个连续的单词;第一单词位于对应的候选文本片段的末位。The candidate text segment is determined according to the text length and the first word; wherein, the candidate text segment includes a number of consecutive words corresponding to the text length; the first word is located at the end of the corresponding candidate text segment.

在本实施例中,第二种具体包括以下步骤:In this embodiment, the second method specifically includes the following steps:

根据候选位置总数选取候选单词集合,候选单词集合包括多个第一单词与第二单词;其中,第一单词为初始位置之后的多个连续的单词,第二单词包括初始位置对应的单词以及初始位置之前的多个连续的单词,第一单词与第二单词的总数与候选位置总数相等;每一第一单词以及第二单词分别对应一个候选位置;Select the candidate word set according to the total number of candidate positions, and the candidate word set includes a plurality of first words and second words; wherein, the first word is a plurality of continuous words after the initial position, and the second word includes the word corresponding to the initial position and the initial A plurality of consecutive words before the position, the total number of the first word and the second word is equal to the total number of candidate positions; each first word and the second word correspond to a candidate position respectively;

根据文本长度与第一单词、第二单词确定候选文本片段;其中,候选文本片段包括文本长度对应数量的多个连续的单词;第一单词位于对应的候选文本片段的末位,第二单词位于对应的候选文本片段的末位。Determine the candidate text segment according to the text length and the first word and the second word; wherein, the candidate text segment includes a plurality of continuous words corresponding to the text length; the first word is positioned at the end of the corresponding candidate text segment, and the second word is positioned at the end of the corresponding candidate text segment corresponding to the end of the candidate text segment.

需要说明的是,在识别文本为第二语言类型的情况下,初始位置、预设的候选位置总数、预设的文本长度以及每个候选位置对应的候选文本片段的确定方式均与识别文本为第一语言类型相同,在此不再赘述。It should be noted that, when the recognized text is of the second language type, the initial position, the total number of preset candidate positions, the preset text length, and the candidate text segment corresponding to each candidate position are determined in the same manner as the recognized text. The first language type is the same, and will not be repeated here.

在本实施例中,识别文本为第一类语言类型时设定的候选位置总数小于识别文本为第二类语言类型或第一类语言与第二类语言混合类型时设定的候选位置总数。In this embodiment, the total number of candidate positions set when the recognized text is in the first language type is smaller than the total number of candidate positions set when the recognized text is in the second language type or a mixed type of the first language and the second language.

S3:将每个候选位置对应的候选文本片段与识别文本进行相似度匹配,确定文本匹配结果。S3: Perform similarity matching between the candidate text segment corresponding to each candidate position and the recognized text, and determine the text matching result.

在识别文本为第一语言类型的情况下,将每个候选位置对应的候选文本片段与识别文本进行相似度匹配,确定文本匹配结果;包括:In the case that the recognized text is the first language type, the similarity matching is performed between the candidate text segment corresponding to each candidate position and the recognized text, and the text matching result is determined; including:

根据每个候选文本片段与识别文本之间的相似度,确定每个候选文本片段对应的第一文本匹配值;According to the similarity between each candidate text segment and the recognition text, determine the first text matching value corresponding to each candidate text segment;

根据多个候选位置,确定每个候选文本片段对应的距离惩罚值,距离惩罚值用于表征候选位置与初始位置之间的文本距离;According to multiple candidate positions, determine the distance penalty value corresponding to each candidate text segment, and the distance penalty value is used to represent the text distance between the candidate position and the initial position;

根据每个候选文本片段对应的第一文本匹配值与距离惩罚值,确定每个候选文本片段对应的第二文本匹配值;Determine the second text matching value corresponding to each candidate text segment according to the first text matching value and the distance penalty value corresponding to each candidate text segment;

根据多个候选文本片段对应的第二文本匹配值,确定文本匹配结果。A text matching result is determined according to the second text matching values corresponding to the plurality of candidate text segments.

具体实现时,采用第一公式得出第一文本匹配值第一公式如下:In specific implementation, the first text matching value is obtained by using the first formula The first formula is as follows:

其中,表示识别文本中第i个字符的拼音;/>表示第j个候选位置中第i个字符的拼音;/>表示针对识别文本中的第i个字符与第j个候选位置中的第i个字符的拼音进行比较,如相同则输出1,反之不同则输出0。in, Indicates the pinyin of the i-th character in the recognized text; /> Indicates the pinyin of the i-th character in the j-th candidate position; /> Indicates that the i-th character in the recognized text is compared with the pinyin of the i-th character in the j-th candidate position, and if they are the same, 1 is output, otherwise, 0 is output.

在本实施例中,根据多个候选位置,确定每个候选文本片段对应的距离惩罚值,距离惩罚值用于表征候选位置与初始位置之间的文本距离。In this embodiment, according to multiple candidate positions, a distance penalty value corresponding to each candidate text segment is determined, and the distance penalty value is used to characterize the text distance between the candidate position and the initial position.

其中,通常用户的发言是连贯的,所以当前时刻用户发言的位置,在理论上不应与初始位置相隔过远。进而对于候选位置列表中位置较为靠前的候选位置(j较小的候选位置)赋予较高优先级,对位置较为靠后的候选位置(j较大的候选位置)赋予较低优先级。即对于j较小的候选位置,则距离惩罚值较小,对相似度结果的影响小。对j较大的候选位置,则距离惩罚值较大,对相似度结果的影响大。也可以理解为,越靠近初始位置A对应的候选位置距离惩罚值越小,远离初始位置对应的候选位置惩罚值越大。Wherein, usually the user's speech is coherent, so the position of the user's speech at the current moment should not be too far away from the initial position in theory. Furthermore, a higher priority is given to a candidate position with a relatively earlier position in the candidate position list (a candidate position with a smaller j), and a lower priority is given to a candidate position with a relatively later position (a candidate position with a larger j). That is, for a candidate position with a smaller j, the distance penalty value is smaller, and the impact on the similarity result is small. For a candidate position with a larger j, the distance penalty value is larger, which has a greater impact on the similarity result. It can also be understood that the closer to the initial position A the corresponding candidate position distance penalty value is smaller, and the candidate position farther away from the initial position is corresponding to a larger penalty value.

在本申请实施例中,根据每个候选文本片段对应的第一文本匹配值与距离惩罚值,确定每个候选文本片段对应的第二文本匹配值;其中,第二文本匹配值采用第二公式得出,第二公式如下:In the embodiment of the present application, according to the first text matching value and the distance penalty value corresponding to each candidate text segment, the second text matching value corresponding to each candidate text segment is determined; wherein, the second text matching value adopts the second formula It is obtained that the second formula is as follows:

其中,j表示从0到jmax之间的多个候选位置;jmax表示最大的候选位置,即候选位置中列表中最后一个候选位置;k表示第j个候选位置对应的文本长度;α表示允许最大的惩罚值,一般取0.1;表示距离惩罚值。Among them, j represents multiple candidate positions from 0 to jmax ; jmax represents the largest candidate position, that is, the last candidate position in the list of candidate positions; k represents the text length corresponding to the jth candidate position; α represents The maximum penalty value allowed, generally 0.1; Indicates the distance penalty value.

在本申请实施例中,根据多个候选文本片段对应的第二文本匹配值,确定文本匹配结果。其中,文本匹配结果采用第三公式得出,第三公式如下:In the embodiment of the present application, the text matching result is determined according to the second text matching values corresponding to the plurality of candidate text segments. Among them, the text matching result is obtained by using the third formula, and the third formula is as follows:

jbest=argmax(simj);jbest = argmax(simj );

其中,jbest表示匹配目标最好的候选位置,即与识别文本最相似的候选位置。Among them, jbest represents the candidate position that matches the target best, that is, the candidate position that is most similar to the recognized text.

本申请实施例,将第j个候选位置对应文本中的每一个字符与识别文本中对应位置的字符进行拼音的比对,如相同则输出1,如不同则输出0。在输出结果的基础上进一步引入位置权重,该位置权重用于指示根据该字符在候选位置对应文本中的位置所分配的权重。为了进行语音跟踪,故文本末尾是否相似较于文本其它位置是否相似更为重要。具体而言,该字符越靠近文本的末端,权重越高,反之权重越低。In the embodiment of the present application, each character in the text corresponding to the jth candidate position is compared with the character in the corresponding position in the recognized text, and if they are the same, 1 is output, and if they are different, 0 is output. A position weight is further introduced on the basis of the output result, and the position weight is used to indicate the weight assigned according to the position of the character in the text corresponding to the candidate position. For speech tracking, similarity at the end of the text is more important than similarity at other places in the text. Specifically, the closer the character is to the end of the text, the higher the weight, and vice versa.

上述i则表示字符在识别文本或候选文本中的位置,以文本末端为起点,例如上述示例中的位置1“孜孜不倦地”,“地”对应的i应为1,“不”对应的i应为3。需要说明的是,每个候选文本片段中的位置i均以文本末尾为起点。The above i indicates the position of the character in the recognized text or the candidate text, starting from the end of the text, for example, in the above example, the position 1 is "untiringly", the i corresponding to "ground" should be 1, and the i corresponding to "not" should be for 3. It should be noted that the position i in each candidate text segment starts from the end of the text.

在识别文本为第二语言类型的情况下,将每个候选位置对应的候选文本片段与识别文本进行相似度匹配,确定文本匹配结果,包括:In the case that the recognized text is a second language type, the similarity matching is performed between the candidate text segment corresponding to each candidate position and the recognized text, and the text matching result is determined, including:

根据每个候选文本片段与识别文本之间的相似度,确定每个候选文本片段对应的第三文本匹配值;According to the similarity between each candidate text segment and the recognition text, determine the third text matching value corresponding to each candidate text segment;

根据第三文本匹配值确定每个候选文本片段对应的第一比较值和第二比较值;第一比较值用于表征候选文本片段与识别文本同一位置的两个单词中每个字母位置对应的字母相同比例;第二比较值用于表征候选文本片段与识别文本同一位置的两个单词中相同字母占字母总数的比例;Determine the first comparison value and the second comparison value corresponding to each candidate text segment according to the third text matching value; The same proportion of letters; the second comparison value is used to characterize the proportion of the same letter in the total number of letters in the two words in the same position of the candidate text segment and the recognition text;

根据多个候选位置,确定每个候选文本片段对应的距离惩罚值,距离惩罚值用于表征候选位置与初始位置之间的文本距离;According to multiple candidate positions, determine the distance penalty value corresponding to each candidate text segment, and the distance penalty value is used to represent the text distance between the candidate position and the initial position;

根据每个候选文本片段对应的第一比较值、第二比较值与距离惩罚值,确定每个候选文本片段对应的第四文本匹配值;Determine a fourth text matching value corresponding to each candidate text segment according to the first comparison value, the second comparison value, and the distance penalty value corresponding to each candidate text segment;

根据多个候选文本片段对应的第四文本匹配值,确定文本匹配结果。A text matching result is determined according to the fourth text matching values corresponding to the plurality of candidate text segments.

在本申请实施例中,识别文本为第二语言类型区别在于第一语言类型时,是以单词作为定位、截取以及相似度比较的最小单位。具体地,根据每个候选文本片段与识别文本之间的相似度,确定每个候选文本片段对应的第三文本匹配值;In the embodiment of the present application, when the recognition text is of the second language type and is different from the first language type, words are used as the minimum unit for positioning, interception and similarity comparison. Specifically, according to the similarity between each candidate text segment and the recognition text, determine the third text matching value corresponding to each candidate text segment;

具体实现时,采用第四公式得出第三文本匹配值第四公式如下:In specific implementation, the fourth formula is used to obtain the third text matching value The fourth formula is as follows:

其中,表示第j个候选位置中第i个单词(下文中的/>)中第g个字母;/>表示识别文本中第i个单词(下文中的/>)中第g个字母。需要说明的是,上述第i个字符与第g个字母,均是从文本或单词的末尾字符/字母为起点向前推算,即文本或单词的末尾字符/字母对应的i或g记为1,倒数第二个字符/字母对应的i或g记为2,以此类推。in, Indicates the i-th word in the j-th candidate position (the following /> ) in the gth letter; /> Indicates the i-th word in the recognition text (the /> below ) in the gth letter. It should be noted that the i-th character and the g-th letter above are calculated from the end character/letter of the text or word as the starting point, that is, i or g corresponding to the end character/letter of the text or word is recorded as 1 , the i or g corresponding to the penultimate character/letter is recorded as 2, and so on.

表示针对识别文本中的第i个单词与第j个候选位置对应文本中的第i个单词进行比较,如上述两个单词中第g个字母相同则输出1,反之不同则输出0。 Indicates that the i-th word in the recognition text is compared with the i-th word in the text corresponding to the j-th candidate position. If the g-th letter in the above two words is the same, then output 1, otherwise, output 0.

在本申请实施例中,根据第三文本匹配值确定每个候选文本片段对应的第一比较值和第二比较值;第一比较值用于表征候选文本片段与识别文本同一位置的两个单词中每个字母位置对应的字母相同比例;第二比较值用于表征候选文本片段与识别文本同一位置的两个单词中相同字母占字母总数的比例;In the embodiment of the present application, the first comparison value and the second comparison value corresponding to each candidate text segment are determined according to the third text matching value; the first comparison value is used to characterize the two words in the same position of the candidate text segment and the recognition text The same proportion of letters corresponding to each letter position in ; the second comparison value is used to characterize the proportion of the same letter in the total number of letters in the two words in the same position of the candidate text segment and the recognition text;

具体实现时,采用第五公式得出第一比较值和第二比较值,第五公式如下:During specific implementation, the fifth formula is used to obtain the first comparison value and the second comparison value, and the fifth formula is as follows:

其中,表示识别文本中第i个单词,i,g为从单词末尾起点;/>表示第j个候选位置中第i个单词,/>表示,针对识别文本中的第i个单词与第j个候选位置对应文本中的第i个单词,计算两个单词中第g个字母相同的比例,其中,z表示第i个单词的字母总数;/>表示,针对识别文本中的第i个单词与第j个候选位置对应文本中的第i个单词,计算两个单词中存在相同字母占字母总数的比例,即将两个单词所包含的字母分别构建两个集合,比较上述两个集合中字母重合的比例;intersection函数表示w1和wj中的字母集合的重合个数;len函数表示w1与wj对应的字母集合中的字母个数。in, Indicates the i-th word in the recognition text, i, g is the starting point from the end of the word; /> Indicates the i-th word in the j-th candidate position, /> Indicates that, for the i-th word in the recognition text and the i-th word in the text corresponding to the j-th candidate position, calculate the same ratio of the g-th letter in the two words, where z represents the total number of letters in the i-th word ;/> Indicates that, for the i-th word in the recognition text and the i-th word in the text corresponding to the j-th candidate position, calculate the proportion of the same letter in the two words to the total number of letters, that is, construct the letters contained in the two words separately Two sets, compare the proportion of overlapping letters in the above two sets; the intersection function indicates the number of overlapping letters in w1 and wj ; the len function indicates the number of letters in the letter sets corresponding to w1 and wj .

在本申请实施例中,根据多个候选位置,确定每个候选文本片段对应的距离惩罚值,距离惩罚值用于表征候选位置与初始位置之间的文本距离;In the embodiment of the present application, according to multiple candidate positions, the distance penalty value corresponding to each candidate text segment is determined, and the distance penalty value is used to represent the text distance between the candidate position and the initial position;

根据每个候选文本片段对应的第一比较值、第二比较值与距离惩罚值,确定每个候选文本片段对应的第四文本匹配值;Determine a fourth text matching value corresponding to each candidate text segment according to the first comparison value, the second comparison value, and the distance penalty value corresponding to each candidate text segment;

具体实现时,采用第六公式得出第四文本匹配值,第六公式如下:During specific implementation, the sixth formula is used to obtain the fourth text matching value, and the sixth formula is as follows:

其中,j表示从0到jmax之间的多个候选位置;jmax表示最大的候选位置,即候选位置中列表中最后一个候选位置;当后续文本中英文较多时,jmax取较大值;k表示第j个候选位置对应的文本长度(单词个数);α表示允许最大的惩罚值,一般取0.1;表示距离惩罚值。Among them, j represents multiple candidate positions from 0 to jmax ; jmax represents the largest candidate position, that is, the last candidate position in the list of candidate positions; when there are more Chinese and English in the subsequent text, jmax takes a larger value ;k represents the text length (number of words) corresponding to the jth candidate position; α represents the maximum penalty value allowed, generally 0.1; Indicates the distance penalty value.

在本申请实施例中,根据多个候选文本片段对应的第四文本匹配值,确定文本匹配结果。In the embodiment of the present application, the text matching result is determined according to the fourth text matching values corresponding to the plurality of candidate text segments.

具体实现时,采用第七公式得出文本匹配结果,第七公式如下:During specific implementation, the seventh formula is used to obtain the text matching result, and the seventh formula is as follows:

jbest=argmax(simj);jbest = argmax(simj );

其中,jbest表示匹配目标最好的位置,即与识别文本最相似的候选位置。Among them, jbest represents the position that matches the target best, that is, the candidate position that is most similar to the recognized text.

需要说明的是,由于英文受时态、单复数等语法影响,提词器产品在实际使用过程中存在以下情况:用户发言时由于口误,将待跟踪文本中某个单词读成另一个单词,进而造成识别文本与待跟踪文本不同。但从语音跟踪的角度,其所需跟踪的位置仍然是待跟踪文本中对应单词的位置。示例性的,用户将待跟踪文本中的throw读成threw。虽然单词不同,但该不同是因为用户口误造成的,其需要跟踪的位置仍然是待跟踪文本中throw对应的位置。It should be noted that because English is affected by tense, singular and plural grammar, the following situations exist in the actual use of teleprompter products: when the user speaks, due to a slip of the tongue, a word in the text to be tracked is read as another word, In turn, the recognized text is different from the text to be tracked. But from the perspective of voice tracking, the position to be tracked is still the position of the corresponding word in the text to be tracked. Exemplarily, the user reads throw in the text to be tracked as throw. Although the words are different, the difference is caused by a slip of the tongue of the user, and the position to be tracked is still the position corresponding to throw in the text to be tracked.

对此,本申请在将第j个候选位置对应文本中的每一个单词与识别文本中对应位置的单词进行字母的比对的过程中,并非单纯的一一比较,而是采取了顺序比较与非顺序比较综合计算的方式。其中,顺序比较指的是上述公式中得到的第一比较值,非顺序比较指的是上述公式中得到的第二比较值。In this regard, in the process of comparing each word in the text corresponding to the jth candidate position with the word in the corresponding position in the recognition text, the present application does not simply compare one by one, but adopts sequential comparison and The method of non-sequential comparison comprehensive calculation. Wherein, sequential comparison refers to the first comparison value obtained in the above formula, and non-sequential comparison refers to the second comparison value obtained in the above formula.

具体而言,在计算l(w1,wj)的过程中,首先对于识别文本中的第j个单词(称为单词1)与第j个候选位置对应文本中的第j个单词(称为单词2),按照字母的顺序进行一一比较。如单词1与单词2中,同一位置的字母相同,则输出1,反之则输出0。而对于单词1与单词2中的全部字母,通过/>将上述结果叠加,即可确定顺序比较中单词1与单词2的相似度。在此基础上,进一步将单词1与单词2中的所有字母分别构建一集合,通过比较两个集合中字母重合的比例,即通过/>即可确定非顺序比较中单词1与单词2的相似度。最后,根据不同的权重赋值,进而最终确定单词1与单词2的相似度。Specifically, in the process of calculating l(w1 ,wj ), firstly, for the jth word in the recognition text (called word 1) and the jth candidate position corresponding to the jth word in the text (called For word 2), compare them one by one in alphabetical order. If word 1 and word 2 have the same letter in the same position, then Output 1, otherwise output 0. And for all letters in word 1 and word 2, pass /> By superimposing the above results, the similarity between word 1 and word 2 in the sequential comparison can be determined. On this basis, further construct a set of all letters in word 1 and word 2 respectively, by comparing the proportion of overlapping letters in the two sets, that is, by /> The similarity between word 1 and word 2 in the non-sequential comparison can be determined. Finally, according to different weight assignments, the similarity between word 1 and word 2 is finally determined.

上述实施例是以单词作为定位、截取以及相似度比较中的最小单位,通常识别文本与候选位置文本中的文本长度,即单词个数可取4。从而计算得到的两个文本的相似度表示为整体单词相似度。In the above-mentioned embodiment, words are used as the minimum unit in positioning, interception and similarity comparison, and usually the length of the text in the recognition text and the candidate position text, that is, the number of words can be 4. The calculated similarity between the two texts is expressed as the overall word similarity.

在此基础上,本申请可以以字母作为定位、截取以及相似度比较中的最小单位的方案,具体计算方式同识别文本为第一语言类型方案,在此不再赘述。在该方案中,通常识别文本与候选位置文本中的文本长度,即字母个数可取10至20。从而计算得到的两个文本的相似度表示为重合相似度。On this basis, this application can use letters as the smallest unit in positioning, interception and similarity comparison. The specific calculation method is the same as that of recognizing text as the first language type, and will not be repeated here. In this solution, usually the length of the text in the recognition text and the candidate position text, that is, the number of letters can be 10 to 20. The calculated similarity between the two texts is expressed as coincidence similarity.

进一步地,对于识别文本为第二语言类型的情况,关于上述整体单词相似度与重合相似度,在实际计算过程中可同时计算,并最终取最大值或取均值作为最终的相似度结果。Furthermore, for the case where the recognized text is in the second language type, the above-mentioned overall word similarity and overlapping similarity can be calculated simultaneously in the actual calculation process, and finally take the maximum value or average value as the final similarity result.

在本申请实施例中,在语言类型为数字的情况下,由于数字的种类不同,其对应的发言方式也不同。如电话号码等没有大小、顺序、位数概念的数字,一般用户会直接读每一位的数字。而对于金额、计数等有大小、顺序、位数概念的数字,一般用户会在发言中加入“十百千万”等其它内容。对此,本申请可通过预设的NLP模型对于待跟踪文本进行预先检测,预先识别对应的数字类型,如是电话一类的数字,则采用第二语言类型方案,如是金额一类的数字,则采用第一语言类型方案。In the embodiment of the present application, when the language type is a number, the corresponding speech manners are also different due to the different types of numbers. For numbers without the concept of size, order, and digits, such as phone numbers, the general user will directly read each digit. For numbers with the concept of size, order, and digits, such as amounts and counts, general users will add other content such as "ten, million, and ten million" in their speeches. In this regard, this application can pre-detect the text to be tracked through the preset NLP model, and pre-identify the corresponding number type. If it is a number such as a telephone, then use the second language type scheme. Adopt the first language type scheme.

在本申请实施例中,当识别文本为第三语言类型时,可根据上述第一语言类型方案和第二语言类型方案进行混合运用。对此,相似度计算、候选位置、候选位置总数、文本长度以及每个候选位置对应的候选文本片段均同上述说明,在此不再赘述。本申请并不对此进行具体限定,可根据实际情况自行设置。本申请可实现识别文本如中英文混合时的语音跟踪,适用于中英文混合的待跟踪文本的应用场景。通过相应的计算与调整,提高语音跟踪显示的精度。此外,本申请中的语音跟踪采用迭代的思路进行,进而令整体定位和跟踪效果始终保持在较好的状态,避免了偶然性的定位以及跟踪失误问题。In the embodiment of the present application, when the recognized text is in the third language type, mixed operation may be performed according to the above-mentioned first language type scheme and the second language type scheme. In this regard, similarity calculation, candidate positions, total number of candidate positions, text length, and candidate text segments corresponding to each candidate position are the same as those described above, and will not be repeated here. This application does not specifically limit this, and it can be set according to actual conditions. This application can realize speech tracking when the recognized text is mixed in Chinese and English, and is suitable for the application scenario of the mixed Chinese and English text to be tracked. Through corresponding calculation and adjustment, the accuracy of voice tracking display is improved. In addition, the voice tracking in this application is carried out in an iterative way, so that the overall positioning and tracking effects are always kept in a good state, and accidental positioning and tracking errors are avoided.

S4:根据文本匹配结果确定当前时刻的语音跟踪位置,以及将语音跟踪位置呈现在用户界面中。S4: Determine the voice tracking position at the current moment according to the text matching result, and present the voice tracking position in the user interface.

示例性的,若“候选位置2:的香客孜孜”为最相似的候选位置,则当前时刻的语音跟踪位置为“的香客孜孜”中的最后一个“孜”。并将语音跟踪位置呈现在用户界面中。即待跟踪文本中“我觉得你像一个一辈子都得走在路上的香客孜孜不倦地寻找一座完全不存在于这个世界的神庙……”的“孜”。Exemplarily, if "candidate position 2: The Xiangke Zizi" is the most similar candidate position, then the voice tracking position at the current moment is the last "Zi" in the "Xiangzi Zizi". And present the voice tracking position in the user interface. That is to say, "Zi" in the text "I think you are like a pilgrim who has to walk on the road all his life, tirelessly looking for a temple that does not exist in this world...".

本申请实施例中,用户界面显示有待跟踪文本,语音跟踪位置为候选文本片段末端文本对应在待跟踪文本中的位置;将语音跟踪位置呈现在用户界面中,还包括:将语音跟踪位置标记在待跟踪文本的显示跟踪位置,以及将显示跟踪位置呈现在用户界面中;其中,显示跟踪位置为语音跟踪位置之后的文本位置。In the embodiment of the present application, the user interface displays the text to be tracked, and the voice tracking position is the position corresponding to the end text of the candidate text segment in the text to be tracked; presenting the voice tracking position in the user interface also includes: marking the voice tracking position in The display tracking position of the text to be tracked, and presenting the display tracking position in the user interface; wherein, the display tracking position is a text position after the voice tracking position.

具体实施时,待完成上述语音跟踪位置的确定后,考虑到提词器在实现过程中,提词器进行语音识别以及上述相似度计算均需要一定的时间。进而,在该时间内用户可能又存在后续发言,故将语音跟踪位置呈现在用户界面中时,其标记显示的位置是上述语音跟踪位置之后。示例性的,语音跟踪位置为“的香客孜孜”中的最后一个“孜”。在实际计算得到结果时,用户很有可能往后继续诵读,故在用户界面呈现的最终跟踪位置为在“孜”的基础上直接向后跨越预设数量字符。其中,预设数量字符可根据实际情况自行设置。由上述方案可知,本申请通过上述提前机制,提词器在实际使用过程中不会出现用户实际发言位置提前于显示在用户界面的语音跟踪位置,避免了用户出现延迟感受。During specific implementation, after the determination of the above-mentioned voice tracking position is completed, it will take a certain amount of time for the teleprompter to perform voice recognition and the above-mentioned similarity calculation during the implementation process of the teleprompter. Furthermore, the user may make a subsequent speech during this time, so when the voice tracking position is displayed on the user interface, the position where the mark is displayed is after the above voice tracking position. Exemplarily, the voice tracking position is the last "Zi" in "The Pilgrims Zizi". When the actual calculation results are obtained, the user is likely to continue reciting later, so the final tracking position presented on the user interface is to directly cross the preset number of characters backwards on the basis of "Zi". Wherein, the preset number of characters can be set according to the actual situation. It can be seen from the above solution that, through the above advance mechanism, the teleprompter will not cause the user's actual speaking position to be ahead of the voice tracking position displayed on the user interface during the actual use of the teleprompter, avoiding the user's feeling of delay.

应该理解的是,虽然图1至图3的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图1至图3中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其他步骤或者其他步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that although the various steps in the flow charts of FIG. 1 to FIG. 3 are shown sequentially as indicated by the arrows, these steps are not necessarily executed sequentially in the order indicated by the arrows. Unless otherwise specified herein, there is no strict order restriction on the execution of these steps, and these steps can be executed in other orders. Moreover, at least some of the steps in Figures 1 to 3 may include multiple sub-steps or multiple stages, these sub-steps or stages are not necessarily executed at the same time, but may be performed at different times, these sub-steps or The execution sequence of the stages is not necessarily performed sequentially, but may be executed alternately or alternately with at least a part of other steps or substeps of other steps or stages.

在一些实施例中,图4是根据本申请实施例的一种可选的提词器的结构示意图。参见图4,提词器包括:In some embodiments, FIG. 4 is a schematic structural diagram of an optional teleprompter according to an embodiment of the present application. Referring to Figure 4, the teleprompter includes:

ASR模块,ASR模块被配置为根据用户当前时刻输入的语音数据得到识别文本;ASR module, the ASR module is configured to obtain the recognized text according to the voice data input by the user at the current moment;

语音跟踪模块,语音跟踪模块被配置为确定识别文本中的初始位置;其中,初始位置为上一时刻进行语音跟踪所确定的文本位置;根据初始位置、预设的候选位置总数以及预设的文本长度,确定多个候选位置以及每个候选位置对应的候选文本片段;其中,候选位置总数用于指示候选位置的数量,文本长度用于指示每一候选文本片段的文本长度;将每个候选位置对应的候选文本片段与识别文本进行相似度匹配,确定文本匹配结果;根据文本匹配结果确定当前时刻的语音跟踪位置;Voice tracking module, the voice tracking module is configured to determine the initial position in the recognized text; wherein, the initial position is the text position determined by voice tracking at the last moment; according to the initial position, the total number of preset candidate positions and the preset text length, to determine multiple candidate positions and the corresponding candidate text segments for each candidate position; wherein, the total number of candidate positions is used to indicate the number of candidate positions, and the text length is used to indicate the text length of each candidate text segment; each candidate position Carry out similarity matching between the corresponding candidate text segment and the recognized text to determine the text matching result; determine the voice tracking position at the current moment according to the text matching result;

显示模块,显示模块被配置为将语音跟踪位置呈现在用户界面中。A display module, the display module is configured to present the voice tracking position in a user interface.

需要说明的是,用户界面显示有待跟踪文本,所语音跟踪位置为候选文本片段末端文本对应在待跟踪文本中的位置。It should be noted that the user interface displays the text to be tracked, and the speech tracking position is the position corresponding to the text at the end of the candidate text segment in the text to be tracked.

将语音跟踪位置呈现在用户界面中的过程中,显示模块还被配置为:In presenting the voice tracking location in the user interface, the display module is further configured to:

将语音跟踪位置标记在待跟踪文本的显示跟踪位置,以及将显示跟踪位置呈现在用户界面中;其中,显示跟踪位置为语音跟踪位置之后的文本位置。The voice tracking position is marked at the display tracking position of the text to be tracked, and the display tracking position is presented in the user interface; wherein, the display tracking position is a text position after the voice tracking position.

进一步地,本申请提供的提词器还被配置为执行上述跟踪显示方法中的所有步骤,在此不再赘述。Furthermore, the teleprompter provided by the present application is also configured to execute all the steps in the above tracking and display method, which will not be repeated here.

根据本申请实施例的又一个方面,还提供了一种用于实施上述一种跟踪显示方法的电子设备,上述电子设备可以但不限于应用于服务器中。如图5所示,该电子设备包括存储器502和处理器504,该存储器502中存储有计算机程序,该处理器504被设置为通过计算机程序执行上述任一项方法实施例中的步骤。According to still another aspect of the embodiments of the present application, an electronic device for implementing the above-mentioned tracking display method is also provided, and the above-mentioned electronic device can be applied to a server, but is not limited to. As shown in FIG. 5 , the electronic device includes a memory 502 and a processor 504 , where a computer program is stored in the memory 502 , and the processor 504 is configured to execute the steps in any one of the above method embodiments through the computer program.

可选地,在本实施例中,上述电子设备可以位于计算机网络的多个网络设备中的至少一个网络设备。Optionally, in this embodiment, the foregoing electronic device may be located in at least one network device among multiple network devices in the computer network.

需要说明的是,在本实施例中,上述处理器可以被设置为通过计算机程序执行上述跟踪显示方法中的所有步骤,在此不再赘述。It should be noted that, in this embodiment, the above-mentioned processor may be configured to execute all the steps in the above-mentioned tracking and display method through a computer program, which will not be repeated here.

可选地,本领域普通技术人员可以理解,图5所示的结构仅为示意,电子设备也可以是智能手机(如Android手机、iOS手机等)、平板电脑、掌上电脑以及移动互联网设备(Mobile Internet Devices,MID)、PAD等终端设备。图5其并不对上述电子设备的结构造成限定。例如,电子设备还可包括比图5中所示更多或者更少的组件(如网络接口等),或者具有与图5所示不同的配置。Optionally, those of ordinary skill in the art can understand that the structure shown in FIG. Internet Devices, MID), PAD and other terminal equipment. FIG. 5 does not limit the structure of the above-mentioned electronic equipment. For example, the electronic device may also include more or less components than those shown in FIG. 5 (such as network interfaces, etc.), or have a different configuration from that shown in FIG. 5 .

其中,存储器502可用于存储软件程序以及模块,如本申请实施例中的语音跟踪方法和装置对应的程序指令/模块,处理器504通过运行存储在存储器502内的软件程序以及模块,从而执行各种功能应用以及数据处理,即实现上述的语音跟踪方法。存储器502可包括高速随机存储器,还可以包括非易失性存储器,如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中,存储器502可进一步包括相对于处理器504远程设置的存储器,这些远程存储器可以通过网络连接至终端。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。其中,存储器502具体可以但不限于用于储存语音跟踪方法的程序步骤。Among them, the memory 502 can be used to store software programs and modules, such as the program instructions/modules corresponding to the voice tracking method and device in the embodiment of the present application, and the processor 504 executes various functions by running the software programs and modules stored in the memory 502. A functional application and data processing, that is, to realize the above-mentioned voice tracking method. The memory 502 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 502 may further include a memory that is remotely located relative to the processor 504, and these remote memories may be connected to the terminal through a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof. Wherein, the memory 502 may specifically be, but not limited to, be used for storing program steps of the voice tracking method.

可选地,上述的传输装置506用于经由一个网络接收或者发送数据。上述的网络具体实例可包括有线网络及无线网络。在一个实例中,传输装置506包括一个网络适配器(Network Interface Controller,NIC),其可通过网线与其他网络设备与路由器相连从而可与互联网或局域网进行通讯。在一个实例中,传输装置506为射频(Radio Frequency,RF)模块,其用于通过无线方式与互联网进行通讯。此外,上述电子设备还包括:显示器508,用于显示语音跟踪方法过程;和连接总线510,用于连接上述电子设备中的各个模块部件。Optionally, the above-mentioned transmission device 506 is configured to receive or send data via a network. The specific examples of the above-mentioned network may include a wired network and a wireless network. In one example, the transmission device 506 includes a network adapter (Network Interface Controller, NIC), which can be connected with other network devices and a router through a network cable so as to communicate with the Internet or a local area network. In one example, the transmission device 506 is a radio frequency (Radio Frequency, RF) module, which is used to communicate with the Internet in a wireless manner. In addition, the above-mentioned electronic device further includes: a display 508 for displaying the process of the voice tracking method; and a connection bus 510 for connecting various module components in the above-mentioned electronic device.

本申请的实施例还提供了一种计算机可读的存储介质,该存储介质中存储有计算机程序,其中,该计算机程序被设置为运行时执行上述任一项方法实施例中的步骤。Embodiments of the present application also provide a computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to execute the steps in any one of the above method embodiments when running.

在本实施例中,上述存储介质可以被设置为存储用于执行上述跟踪显示方法中的所有步骤的计算机程序,在此不再赘述。In this embodiment, the above-mentioned storage medium may be configured to store a computer program for executing all the steps in the above-mentioned tracking and display method, which will not be repeated here.

可选地,存储介质还被设置为存储用于执行上述实施例中的方法中所包括的步骤的计算机程序,本实施例中对此不再赘述。Optionally, the storage medium is further configured to store a computer program for executing the steps included in the methods in the foregoing embodiments, which will not be described in detail in this embodiment.

可选地,在本实施例中,本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令终端设备相关的硬件来完成,该程序可以存储于一计算机可读存储介质中,存储介质可以包括:闪存盘、只读存储器(Read-Only Memory,ROM)、随机存取器(Random Access Memory,RAM)、磁盘或光盘等。Optionally, in this embodiment, those skilled in the art can understand that all or part of the steps in the various methods of the above embodiments can be completed by instructing hardware related to the terminal device through a program, and the program can be stored in a In the computer-readable storage medium, the storage medium may include: a flash disk, a read-only memory (Read-Only Memory, ROM), a random access device (Random Access Memory, RAM), a magnetic disk or an optical disk, and the like.

本申请提供了跟踪显示方法及提词器、电子设备、存储介质。适用于不同的语言及语境场景。通过判断确定对应的跟踪显示方案以及语言逻辑。具体采用定位与语音跟踪的方式,通过候选位置的筛选与相似度的比较方法实现语音跟踪。一方面令语音跟踪与定位的过程并不严格依赖于语音识别的效果,从而使得语音识别错误不影响定位精度。另一方面,在定位过程中是基于迭代的思路对可能的位置进行筛选并完成定位,其比较的对象不是待跟踪全部文本,进而令实际定位过程中不仅缩短了处理时间,并可在语音识别错误以及用户口误的情况下,仍可根据用户的实际发言内容实现精准定位。进一步,在计算过程中通过优化计算方法与公式,以及字符/字母位置与距离等。进行相应的权重分配与调整,从而实现快速而精准的语音跟踪,避免在用户在发言时存在语速变化、停顿发言或出现口误的场景下,出现无法准确识别用户发言位置的问题。The application provides a tracking display method, a teleprompter, electronic equipment, and a storage medium. Applicable to different languages and context scenarios. Determine the corresponding tracking display scheme and language logic through judgment. Specifically, positioning and voice tracking are adopted, and voice tracking is realized by screening candidate positions and comparing similarities. On the one hand, the process of voice tracking and positioning does not strictly depend on the effect of voice recognition, so that voice recognition errors do not affect the positioning accuracy. On the other hand, in the positioning process, the possible positions are screened and positioned based on an iterative idea. The object of comparison is not all texts to be tracked, which not only shortens the processing time in the actual positioning process, but also can be used in speech recognition. In the case of mistakes and user slips of the tongue, accurate positioning can still be achieved based on the actual speech content of the user. Further, during the calculation process, the calculation method and formula, as well as the character/letter position and distance, etc. are optimized. Corresponding weight distribution and adjustment are carried out to achieve fast and accurate voice tracking, and to avoid the problem of being unable to accurately identify the user's speaking position in scenarios where the user's speech speed changes, pauses in speaking, or slips of the tongue occurs when the user is speaking.

上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。上述实施例中的集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在上述计算机可读取的存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在存储介质中,包括若干指令用以使得一台或多台计算机设备(可为个人计算机、服务器或者网络设备等)执行本申请各个实施例方法的全部或部分步骤。The serial numbers of the above embodiments of the present application are for description only, and do not represent the advantages and disadvantages of the embodiments. If the integrated units in the above embodiments are realized in the form of software function units and sold or used as independent products, they can be stored in the above computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or part of the contribution to the prior art, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium. Several instructions are included to enable one or more computer devices (which may be personal computers, servers or network devices, etc.) to execute all or part of the steps of the methods in the various embodiments of the present application.

在本申请的上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。In the above-mentioned embodiments of the present application, the descriptions of each embodiment have their own emphases, and for parts not described in detail in a certain embodiment, reference may be made to relevant descriptions of other embodiments.

在本申请所提供的几个实施例中,应该理解到,所揭露的客户端,可通过其他的方式实现。其中,以上所描述的装置实施例仅仅是示意性的,例如单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,单元或模块的间接耦合或通信连接,可以是电性或其他的形式。In the several embodiments provided in this application, it should be understood that the disclosed client can be implemented in other ways. Wherein, the device embodiments described above are only illustrative, such as the division of units, which is only a logical function division, and there may be other division methods in actual implementation, for example, multiple units or components can be combined or integrated into Another system, or some feature may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of units or modules may be in electrical or other forms.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.

以上所述仅是本申请的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本申请的保护范围。The above description is only the preferred embodiment of the present application. It should be pointed out that for those of ordinary skill in the art, without departing from the principle of the present application, some improvements and modifications can also be made. These improvements and modifications are also It should be regarded as the protection scope of this application.

Claims (13)

a voice tracking module configured to detect a language type of the recognized text such that different initial positions are determined according to different language types; determining an initial position in the identification text; wherein the user interface displays text to be tracked; the initial position is the text position of the text to be tracked, which is determined by voice tracking at the previous moment; the text to be tracked comprises candidate text fragments; determining a subsequent text type according to the text to be tracked and the initial position, and setting a total number of candidate positions according to the subsequent text type, wherein the subsequent text type is the language type of the candidate text segment after the initial position in the text to be tracked; the total number of candidate positions is used for indicating the number of candidate positions, and each candidate position corresponds to the candidate text segment; determining a plurality of candidate positions and candidate text fragments corresponding to each candidate position according to the initial positions, the total number of the candidate positions and the preset text length; wherein the text length is used for indicating the text length of each candidate text segment; performing similarity matching on the candidate text fragments corresponding to each candidate position and the identification text, and determining a text matching result; determining a voice tracking position at the current moment according to the text matching result;
CN202210293071.XA2022-03-232022-03-23 Tracking display method, teleprompter, electronic device, storage mediumActiveCN114900590B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202210293071.XACN114900590B (en)2022-03-232022-03-23 Tracking display method, teleprompter, electronic device, storage medium

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202210293071.XACN114900590B (en)2022-03-232022-03-23 Tracking display method, teleprompter, electronic device, storage medium

Publications (2)

Publication NumberPublication Date
CN114900590A CN114900590A (en)2022-08-12
CN114900590Btrue CN114900590B (en)2023-08-29

Family

ID=82715145

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202210293071.XAActiveCN114900590B (en)2022-03-232022-03-23 Tracking display method, teleprompter, electronic device, storage medium

Country Status (1)

CountryLink
CN (1)CN114900590B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN1920819A (en)*2006-09-142007-02-28浙江大学Writing brush calligraphy character seach method
CN110634487A (en)*2019-10-242019-12-31科大讯飞股份有限公司Bilingual mixed speech recognition method, device, equipment and storage medium
CN113724709A (en)*2021-08-312021-11-30北京字跳网络技术有限公司Text content matching method and device, electronic equipment and storage medium
CN114077649A (en)*2020-08-192022-02-22上海折耳根信息科技有限公司 A phonetic prompting method, system, device and storage medium
CN114117031A (en)*2021-12-032022-03-01建信金融科技有限责任公司 List screening method, apparatus, equipment and computer storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US8744239B2 (en)*2010-08-062014-06-03Apple Inc.Teleprompter tool for voice-over tool
CN109448793B (en)*2018-10-152021-04-20智慧芽信息科技(苏州)有限公司Method and system for labeling, searching and information labeling of right range of gene sequence

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN1920819A (en)*2006-09-142007-02-28浙江大学Writing brush calligraphy character seach method
CN110634487A (en)*2019-10-242019-12-31科大讯飞股份有限公司Bilingual mixed speech recognition method, device, equipment and storage medium
CN114077649A (en)*2020-08-192022-02-22上海折耳根信息科技有限公司 A phonetic prompting method, system, device and storage medium
CN113724709A (en)*2021-08-312021-11-30北京字跳网络技术有限公司Text content matching method and device, electronic equipment and storage medium
CN114117031A (en)*2021-12-032022-03-01建信金融科技有限责任公司 List screening method, apparatus, equipment and computer storage medium

Also Published As

Publication numberPublication date
CN114900590A (en)2022-08-12

Similar Documents

PublicationPublication DateTitle
JP6258191B2 (en) Input method and system
CN108170859B (en) Voice query method, device, storage medium and terminal device
US9710452B2 (en)Input method editor having a secondary language mode
US8543375B2 (en)Multi-mode input method editor
WO2020215554A1 (en)Speech recognition method, device, and apparatus, and computer-readable storage medium
US20180011931A1 (en)Master content summaries for variant content
WO2018040899A1 (en)Error correction method and device for search term
CN111739514B (en)Voice recognition method, device, equipment and medium
JP2020004382A (en)Method and device for voice interaction
CN111310440A (en)Text error correction method, device and system
CN103971684A (en)Method and system for adding punctuations and method and device for establishing language model for adding punctuations
CN112102833B (en)Speech recognition method, device, equipment and storage medium
CN111292752A (en)User intention identification method and device, electronic equipment and storage medium
US20190121533A1 (en)Method and device for secondary input of text
CN112541095A (en)Video title generation method and device, electronic equipment and storage medium
CN112364126B (en) A keyword prompting method, device, computer equipment and storage medium
WO2021139222A1 (en)Method for establishing sorting model, method for querying auto-completion and corresponding devices
CN103632668B (en)A kind of method and apparatus for training English speech model based on Chinese voice information
WO2014036827A1 (en)Text correcting method and user equipment
CN113408273A (en)Entity recognition model training and entity recognition method and device
US9881000B1 (en)Avoiding sentiment model overfitting in a machine language model
CN110929514B (en) Text proofreading method, device, computer-readable storage medium, and electronic device
CN111880668A (en)Input display method and device and electronic equipment
CN107247518A (en)Method, device, equipment and the readable storage medium storing program for executing of polynary association's input
CN111881255B (en)Synonymous text acquisition method and device, electronic equipment and storage medium

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant
CP03Change of name, title or address
CP03Change of name, title or address

Address after:5th Floor, Building C, Wanbo Science and Technology Park, No. 20 Fengxin Road, Yuhuatai District, Nanjing City, Jiangsu Province, China 210012

Patentee after:Nanjing Silicon based Intelligent Technology Group Co.,Ltd.

Country or region after:China

Address before:210012 4th floor, building C, Wanbo Science Park, 20 Fengxin Road, Yuhuatai District, Nanjing City, Jiangsu Province

Patentee before:NANJING SILICON INTELLIGENCE TECHNOLOGY Co.,Ltd.

Country or region before:China


[8]ページ先頭

©2009-2025 Movatter.jp