技术领域technical field
本申请涉及自然语言处理技术领域,尤其涉及一种语音断句的方法、装置和存储介质。The present application relates to the technical field of natural language processing, and in particular to a method, device and storage medium for speech sentence segmentation.
背景技术Background technique
语音断句,通常应用在对接收到的实时语音断句的场景中。对语音进行和准确的断句,是获取语音准确的语义的前提。例如,在同声传译系统需要对实时获取的语音进行断句,使得翻译系统能够获取实时语音的准确的语义,以进行正确翻译。目前,对语音进行断句的方式通常是先把语音转化成文本进行断句处理,以根据文本的断句结果,对该语音进行断句。Speech segmentation is usually applied in the scenario of segmenting the received real-time voice. Carrying out and accurately punctuating the speech is the premise of obtaining the accurate semantics of the speech. For example, in the simultaneous interpretation system, it is necessary to segment the speech obtained in real time, so that the translation system can obtain the accurate semantics of the real-time speech for correct translation. At present, the way of segmenting speech is usually to first convert the speech into text for sentence segmentation, so as to segment the speech according to the sentence segmentation result of the text.
现有技术中,对语音转化成的文本进行断句的方式为:获取一段完整的语音对应的文本,根据该文本的语义确定文本的断句位置。该种方式应用在同声传译的场景中时,需要获取完整的语音才能实现语音的断句,造成较大的时延。In the prior art, the method of segmenting the text converted from speech is: obtaining a complete piece of text corresponding to the speech, and determining the sentence segment position of the text according to the semantics of the text. When this method is applied in the scene of simultaneous interpretation, it is necessary to obtain the complete speech to realize the sentence segmentation of the speech, which causes a large delay.
发明内容Contents of the invention
本申请提供一种语音断句的方法、装置和存储介质,能够对语音进行断句,减少时延。The present application provides a method, device and storage medium for speech segmentation, which can segment speech and reduce time delay.
本申请的第一方面提供一种语音断句的方法,包括:The first aspect of the present application provides a method for speech sentence segmentation, including:
获取待断句语音对应的文本;Obtain the text corresponding to the voice of the sentence to be segmented;
采用断句模型,确定所述文本的断句位置,以及所述文本的断句位置的可信度,所述断句模型用于表征文本与断句位置、断句位置的可信度的对应关系;Using a sentence segmentation model to determine the sentence segmentation position of the text and the credibility of the sentence segmentation position of the text, the sentence segmentation model is used to represent the corresponding relationship between the text and the sentence segmentation position and the credibility of the sentence segmentation position;
若确定所述文本的断句位置的可信度大于阈值,则根据所述文本的断句位置,对所述待断句语音进行断句。If it is determined that the reliability of the sentence segmentation position of the text is greater than the threshold value, the speech to be segmented is segmented according to the sentence segmentation position of the text.
可选的,所述待断句语音为第一语音,所述方法还包括:Optionally, the speech to be punctuated is the first speech, and the method also includes:
若确定所述文本中不存在断句位置,或确定所述文本的断句位置的可信度小于所述阈值,则将所述第一语音和所述第一语音之后的第二语音作为所述待断句语音,并重新对所述待断句语音进行断句操作,所述第二语音对应的文本包括预设数量个单词。If it is determined that there is no sentence segmentation position in the text, or the reliability of the sentence segmentation position of the text is determined to be less than the threshold, then the first speech and the second speech after the first speech are used as the waiting speech. Sentence speech, and re-sentencing operation on the speech to be sentence, the text corresponding to the second speech includes a preset number of words.
可选的,所述方法还包括:Optionally, the method also includes:
使用历史语音对应的文本对语言模型进行训练,获取所述断句模型。The language model is trained by using the text corresponding to the historical speech to obtain the sentence segmentation model.
可选的,所述获取所述断句模型,包括:Optionally, the obtaining the sentence segmentation model includes:
根据所述历史语音对应的文本,获取训练语句序列,所述训练语句序列中包括多个训练语句,后一个训练语句包括:前一个训练语句、且相较于前一个训练语句增加至少一个单词;According to the text corresponding to the historical voice, a training sentence sequence is obtained, the training sentence sequence includes a plurality of training sentences, and the latter training sentence includes: the previous training sentence, and compared with the previous training sentence, at least one word is added;
根据每个所述训练语句,以及每个所述训练语句的期望断句位置对所述语言模型进行训练,获取所述断句模型,所述断句模型输出的每个所述训练语句的实际断句位置与期望断句位置相同。The language model is trained according to each of the training sentences and the expected sentence segmentation position of each of the training sentences to obtain the sentence segmentation model, and the actual sentence segmentation position of each of the training sentences output by the sentence segmentation model is the same as Expect the sentence breaks to be at the same position.
可选的,所述语言模型是基于BERT框架训练获取的。Optionally, the language model is acquired based on BERT framework training.
可选的,所述预设数量个单词为一个单词。Optionally, the preset number of words is one word.
可选的,所述待断句语音为同声传译场景中获取的待断句的实时语音。Optionally, the speech to be punctuated is the real-time speech of the sentence to be punctuated acquired in a simultaneous interpretation scene.
本申请的第二方面提供一种语音断句的装置,包括:The second aspect of the present application provides a device for speech sentence segmentation, including:
处理模块,用于获取待断句语音对应的文本;采用断句模型,确定所述文本的断句位置,以及所述文本的断句位置的可信度,所述断句模型用于表征文本与断句位置、断句位置的可信度的对应关系;若确定所述文本的断句位置的可信度大于阈值,则根据所述文本的断句位置,对所述待断句语音进行断句。The processing module is used to obtain the text corresponding to the speech to be punctuated; the sentence punctuation model is used to determine the sentence punctuation position of the text, and the credibility of the sentence punctuation position of the text, and the sentence punctuation model is used to characterize the text and the sentence punctuation position, sentence punctuation The corresponding relationship between the credibility of the position; if it is determined that the reliability of the sentence segmentation position of the text is greater than the threshold value, the speech to be segmented is segmented according to the sentence segmentation position of the text.
可选的,所述待断句语音为第一语音,所述处理模块,还用于若确定所述文本中不存在断句位置,或确定所述文本的断句位置的可信度小于所述阈值,则将所述第一语音和所述第一语音之后的第二语音作为所述待断句语音,并重新对所述待断句语音进行断句操作,所述第二语音对应的文本包括预设数量个单词。Optionally, the speech to be segmented is the first speech, and the processing module is further configured to determine that there is no segmented position in the text, or determine that the reliability of the segmented position in the text is less than the threshold, Then use the first voice and the second voice after the first voice as the voice to be punctuated, and re-sentence the voice to be punctuated, and the text corresponding to the second voice includes a preset number of word.
可选的,所述装置还包括:训练模块;Optionally, the device also includes: a training module;
所述训练模块,用于使用历史语音对应的文本对语言模型进行训练,获取所述断句模型。The training module is configured to use the text corresponding to the historical speech to train the language model to obtain the sentence segmentation model.
可选的,所述训练模块,具体用于根据所述历史语音对应的文本,获取训练语句序列,所述训练语句序列中包括多个训练语句,后一个训练语句包括:前一个训练语句、且相较于前一个训练语句增加至少一个单词;根据每个所述训练语句,以及每个所述训练语句的期望断句位置对所述语言模型进行训练,获取所述断句模型,所述断句模型输出的每个所述训练语句的实际断句位置与期望断句位置相同。Optionally, the training module is specifically configured to acquire a training sentence sequence according to the text corresponding to the historical voice, the training sentence sequence includes a plurality of training sentences, and the latter training sentence includes: the previous training sentence, and Compared with the previous training sentence, at least one word is added; the language model is trained according to each of the training sentences and the expected sentence segmentation position of each of the training sentences, and the sentence segmentation model is obtained, and the sentence segmentation model outputs The actual sentence segmentation position of each of the training sentences is the same as the expected sentence segmentation position.
可选的,所述语言模型是基于BERT框架训练获取的。Optionally, the language model is acquired based on BERT framework training.
可选的,所述预设数量个单词为一个单词。Optionally, the preset number of words is one word.
可选的,所述待断句语音为同声传译场景中获取的待断句的实时语音。Optionally, the speech to be punctuated is the real-time speech of the sentence to be punctuated acquired in a simultaneous interpretation scene.
本申请的第三方面提供一种语音断句的装置,包括:至少一个处理器和存储器;A third aspect of the present application provides a device for sentence segmentation, including: at least one processor and a memory;
所述存储器存储计算机执行指令;the memory stores computer-executable instructions;
所述至少一个处理器执行所述存储器存储的计算机执行指令,使得所述语音断句的装置执行上述语音断句的方法。The at least one processor executes the computer-executable instructions stored in the memory, so that the device for speech segmentation performs the above-mentioned method for speech segmentation.
本申请的第四方面提供一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机执行指令,当所述计算机执行指令被处理器执行时,实现上述语音断句的方法。A fourth aspect of the present application provides a computer-readable storage medium, on which computer-executable instructions are stored, and when the computer-executable instructions are executed by a processor, the above-mentioned method for speech punctuation is implemented.
本申请提供一种语音断句的方法、装置和存储介质,该方法包括:获取待断句语音对应的文本;采用断句模型,确定所述文本的断句位置,以及所述文本的断句位置的可信度,所述断句模型用于表征文本与断句位置、断句位置的可信度的对应关系;若确定所述文本的断句位置的可信度大于阈值,则根据所述文本的断句位置,对所述待断句语音进行断句。本实施例中通过预先获取的断句模型,能够实现实时对待断句语音进行断句,减少了时延。The present application provides a method, device, and storage medium for speech sentence segmentation. The method includes: obtaining the text corresponding to the speech to be segmented; using a sentence segmentation model to determine the sentence segmentation position of the text and the credibility of the sentence segmentation position of the text , the sentence segmentation model is used to represent the corresponding relationship between the text and the sentence segmentation position and the credibility of the sentence segmentation position; if it is determined that the credibility of the sentence segmentation position of the text is greater than a threshold, then according to the sentence segmentation position of the text, the Sentences are performed on the speech to be sentence-sentenced. In this embodiment, the pre-acquired sentence segmentation model can be used to perform sentence segmentation on the voice to be segmented in real time, reducing time delay.
附图说明Description of drawings
图1为本申请提供的语音断句的方法的流程示意图一;Fig. 1 is a schematic flow chart one of the method for the speech sentence sentence provided by the present application;
图2为本申请提供的语音断句的方法的流程示意图二;Fig. 2 is the second schematic flow diagram of the method for the speech sentence sentence provided by the present application;
图3为本申请提供的语音断句的装置的结构示意图一;Fig. 3 is a structural schematic diagram 1 of a device for speech punctuation provided by the present application;
图4为本申请提供的语音断句的装置的结构示意图二。FIG. 4 is a second structural schematic diagram of the device for speech segmentation provided by the present application.
具体实施方式Detailed ways
为使本申请的目的、技术方案和优点更加清楚,下面将结合本申请的实施例,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the purpose, technical solutions and advantages of the application clearer, the technical solutions in the embodiments of the application will be clearly and completely described below in conjunction with the embodiments of the application. Obviously, the described embodiments are part of the implementation of the application. example, not all examples. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.
为了更为清楚地说明本申请提供的语音断句的方法,下述对现有技术中的语音断句方法进行简要介绍。应理解,下述以同声传译场景进行示例说明。同声传译场景中,同声传译装置可以获取用户的实时语音,并将实时语音转化为文本进行断句,根据文本的断句结果对实时语音进行断句,进而对断句后的语音进行翻译。In order to more clearly illustrate the method of phonetic sentence segmentation provided by the present application, the following briefly introduces the method of phonetic sentence segmentation in the prior art. It should be understood that the simultaneous interpretation scenario is used as an example for illustration below. In the simultaneous interpretation scenario, the simultaneous interpretation device can obtain the user's real-time voice, convert the real-time voice into text for sentence segmentation, and then segment the real-time voice according to the sentence segmentation results of the text, and then translate the sentence-segmented voice.
例如,实时语音对应的文本为“As you can see the images”,现有的断句方式根据该文本的语义无法对该文本进行断句,需要获取完整的语音对应的文本才能进行断句。如,继续获取用户的语音对应的文本为“are not really special”。该完整的语音对应的文本为“As you can see the images are not really special”。因此,对该完整的语音对应的文本断句为“As you can see,the images are not really special.”。应理解,对文本进行断句即为获取文本最佳的加入标点符号的位置。上述中对文本为英文文本进行示例说明,对应的断句的标点符号为英文文本对应的标点符号,本申请提供的语音断句的方法对应用的文本的类型不做限制。在同声传译的场景中,若获取完整的语音才能完成断句,则用户需要等待较长时间才能获取到翻译结果。For example, the text corresponding to the real-time voice is "As you can see the images". The existing sentence segmentation method cannot segment the text according to the semantics of the text, and the complete text corresponding to the voice needs to be obtained to perform sentence segmentation. For example, continue to obtain the text corresponding to the user's voice as "are not really special". The text corresponding to the complete voice is "As you can see the images are not really special". Therefore, the text sentence corresponding to the complete speech is "As you can see, the images are not really special.". It should be understood that segmenting the text is to obtain the best position for adding punctuation marks to the text. In the above, the text is an example of English text, and the corresponding punctuation marks of sentence segmentation are the corresponding punctuation marks of English text. The method of phonetic sentence segmentation provided by this application does not limit the type of text used. In the simultaneous interpretation scenario, if the complete speech is obtained to complete sentence segmentation, the user needs to wait for a long time to obtain the translation result.
为了解决上述语音断句产生的时延较大的问题,本申请提供一种语音断句的方法,通过预先训练的断句模型,可以对实时确定语音对应的文本的断句位置,进而达到减少时延的目的。In order to solve the above-mentioned problem of large time delay caused by voice sentence segmentation, this application provides a method for voice sentence segmentation. Through the pre-trained sentence model, the sentence position of the text corresponding to the voice can be determined in real time, thereby reducing the time delay. .
图1为本申请提供的语音断句的方法的流程示意图一。本申请中的语音断句的方法的执行主体可以为语音断句的装置,该语音断句的装置可由任意的软件和/或硬件实现。如图1所示,本实施例提供的语音断句的方法可以包括:FIG. 1 is a first schematic flow diagram of the method for speech segmentation provided by the present application. The subject of execution of the method for phonetic sentence segmentation in this application may be a device for phonetic sentence segmentation, and the device for phonetic sentence segmentation may be implemented by arbitrary software and/or hardware. As shown in Figure 1, the method for speech sentence sentence provided by this embodiment may include:
S101,获取待断句语音对应的文本。S101. Obtain the text corresponding to the speech of the sentence to be segmented.
根据应用场景的不同,本实施例中的待断句语音不同。本实施例中的语音断句的方法可以应用在同声传译场景中,对应的,待断句的语音可以为同声传译系统中集成的语音断句的装置实时接收到的语音。本实施例中的语音断句的方法还可以应用在对预先录制的语音进行播放、实时翻译的场景中,语音断句的装置可以集成在翻译系统中,对应的,待断句语音可以为预先录制的语音。应理解,本实施例中的待断句语音还可以为其他场景中,需要断句的、语音断句的装置实时接收的语音。本实施例中的语音断句的装置也可以单独设置,本实施例对此不作限制。According to different application scenarios, the voices of sentences to be segmented in this embodiment are different. The voice sentence segmentation method in this embodiment can be applied in the simultaneous interpretation scene, and correspondingly, the speech to be segmented can be the voice received in real time by the voice sentence segmentation device integrated in the simultaneous interpretation system. The method of voice sentence segmentation in this embodiment can also be applied to the scene where pre-recorded voice is played and translated in real time. The device for voice sentence segmentation can be integrated in the translation system. Correspondingly, the voice to be segmented can be pre-recorded voice . It should be understood that the sentence-to-be-sentenced voice in this embodiment may also be the voice received in real time by the voice-sentenced device that needs to be segmented in other scenarios. The device for sentence segmentation in this embodiment can also be set separately, which is not limited in this embodiment.
本实施例中在接收到待断句语音后,需要对待断句语音对应的文本进行识别,获取待断句语音对应的文本。可选的,本实施例中可以对待断句语音进行分帧处理,采用预先训练的声学模型获取每帧语音对应的语音状态,其中,声学模型用于表征每帧语音的特征和每帧语音的语音状态的对应关系。对应的,每三个语音状态可以组合成一个音素,若干个音素可以组合成一个单词。据此,将待断句语音输入至声学模型中,就可以获取该待断句语音对应的文本。应理解,本实施例中还可以采用其他方式对待断句语音进行识别,获取待断句语音对应的文本。In this embodiment, after the speech to be punctuated is received, the text corresponding to the speech to be punctuated needs to be recognized to obtain the text corresponding to the speech to be punctuated. Optionally, in this embodiment, the speech to be segmented can be processed by frame division, and the speech state corresponding to each frame of speech can be obtained by using a pre-trained acoustic model, wherein the acoustic model is used to characterize the characteristics of each frame of speech and the speech of each frame of speech State correspondence. Correspondingly, every three speech states can be combined into a phoneme, and several phonemes can be combined into a word. Accordingly, the text corresponding to the speech to be punctuated can be obtained by inputting the speech to be punctuated into the acoustic model. It should be understood that in this embodiment, other methods may also be used to recognize the speech to be punctuated, and obtain the text corresponding to the speech to be punctuated.
应理解,本实施例中的待断句语音为同声传译场景中获取的待断句的实时语音。即当用户开始讲话时,语音断句的装置即获取待断句语音对应的文本。例如,用户说出“as”时,待断句语音对应的文本为“as”;当用户继续说出“as you”时,待断句语音对应的文本为“as you”。It should be understood that the voice of the sentence to be segmented in this embodiment is the real-time voice of the sentence to be segmented acquired in the simultaneous interpretation scene. That is, when the user starts to speak, the device for voice sentence segmentation obtains the text corresponding to the voice of the sentence to be broken. For example, when the user utters "as", the text corresponding to the speech to be punctuated is "as"; when the user continues to say "as you", the text corresponding to the speech to be punctuated is "as you".
S102,采用断句模型,确定文本的断句位置,以及文本的断句位置的可信度,断句模型用于表征文本与断句位置、断句位置的可信度的对应关系。S102, using a sentence segmentation model to determine the sentence segmentation position of the text and the credibility of the sentence segmentation position of the text. The sentence segmentation model is used to represent the corresponding relationship between the text and the sentence segmentation position and the credibility of the sentence segmentation position.
本实施例中可以采用断句模型确定文本的断句位置,以及文本的断句位置的可信度。其中,断句模型可以基于神经网络、向量机、贝叶斯等方法获取。其中,断句模型用于表征文本与断句位置、断句位置的可信度的对应关系。即本实施例中可以将待断句语音对应的文本输入至断句模型,以使断句模型对文本进行处理,获取文本的断句位置,以及文本的断句位置的可信度。In this embodiment, the sentence segmentation model can be used to determine the sentence segmentation position of the text and the credibility of the sentence segmentation position of the text. Among them, the sentence segmentation model can be obtained based on neural network, vector machine, Bayesian and other methods. Among them, the sentence segmentation model is used to represent the corresponding relationship between the text, the sentence segmentation position, and the credibility of the sentence segmentation position. That is, in this embodiment, the text corresponding to the speech to be segmented can be input into the segmenting model, so that the segmenting model can process the text, obtain the segmented position of the text, and the reliability of the segmented position of the text.
示例性的,当待断句语音对应的文本为“as”时,本实施例中采用断句模型对确定“as”断句位置,以及文本的断句位置的可信度;当待断句语音对应的文本为“as you”,本实施例中采用断句模型对确定“as you”断句位置,以及文本的断句位置的可信度。Exemplarily, when the text corresponding to the voice to be punctuated is "as", the sentence segmentation model is used in this embodiment to determine the sentence segmentation position of "as" and the reliability of the sentence segmentation position of the text; when the text corresponding to the voice to be punctuated is "as you", the sentence segmentation model is used in this embodiment to determine the sentence segmentation position of "as you" and the credibility of the sentence segmentation position of the text.
S103,若确定文本的断句位置的可信度大于阈值,则根据文本的断句位置,对待断句语音进行断句。S103. If it is determined that the reliability of the sentence segmentation position of the text is greater than the threshold, perform sentence segmentation on the speech to be segmented according to the sentence segmentation position of the text.
本实施例中预先设置有阈值。在获取文本的断句位置的可信度后可以与该阈值进行比较,若确定文本的断句位置的可信度大于阈值,则确定需要在该断句位置处对文本进行断句。对应的,根据文本的断句位置,对待断句语音进行断句。可选的,可以根据文本与待断句语音的对应关系,确定待断句语音的断句位置,进而实现对待断句语音的断句。In this embodiment, a threshold is preset. After obtaining the reliability of the sentence segmentation position of the text, it can be compared with the threshold. If it is determined that the reliability of the sentence segmentation position of the text is greater than the threshold, it is determined that the text needs to be segmented at the sentence segmentation position. Correspondingly, the speech to be segmented is segmented according to the segmented position of the text. Optionally, according to the corresponding relationship between the text and the speech to be punctuated, the sentence punctuation position of the speech to be punctuated can be determined, and then the sentence of the speech to be punctuated can be realized.
示例性的,如待断句语音对应的文本为“as you can see”。若确定该文本的断句位置为该文本的结尾处,且确定断句位置结尾处的可信度大于阈值,则根据该文本在结尾处的断句位置确定该文本对应的待断句语音的断句位置,也为“see”之后,则实现对待断句语音的断句。Exemplarily, the text corresponding to the voice of the sentence to be punctuated is "as you can see". If it is determined that the sentence-sentence position of the text is the end of the text, and the degree of confidence at the end of the sentence-sentence position is determined to be greater than the threshold, then the sentence-sentence position of the speech to be sentence-sentenced corresponding to the text is determined according to the sentence-sentence position at the end of the text, and also After being "see", then realizing the punctuation of the sentence-speech speech.
可选的,若根据断句模型确定文本中不存在断句位置,或确定文本的断句位置的可信度小于阈值,则本实施例中可以将待断句语音作为第一语音,并将第一语音和第一语音之后的第二语音作为待断句语音,重新对待断句语音进行断句操作。Optionally, if it is determined according to the sentence segmentation model that there is no sentence segmentation position in the text, or the reliability of the sentence segmentation position of the text is determined to be less than the threshold, then in this embodiment, the speech to be sentence segmentation can be used as the first voice, and the first voice and The second voice following the first voice is used as the voice to be punctuated, and the sentence punctuation operation is performed on the voice to be punctuated again.
示例性的,当待断句语音对应的文本为“as you”时,根据断句模型确定该文本中不存在断句位置,或者确定断句位置在文本结尾处的概率小于阈值,或者确定断句位置在“as”后的概率小于阈值,则将该待断句语音作为第一语音,继续获取第一语音之后的第二语音,如“can see”,将第一语音和第一语音之后的第二语音作为待断句语音。对应的,待断句语音对应的文本为“as you can see”,重新采用断句模型对待断句语音,即“as you cansee”进行断句操作。Exemplarily, when the text corresponding to the speech to be punctuated is "as you", it is determined according to the sentence segmentation model that there is no sentence punctuation position in the text, or the probability that the sentence punctuation position is at the end of the text is less than a threshold, or it is determined that the sentence punctuation position is in "as you". If the probability after " is less than the threshold, then the speech to be punctuated is taken as the first speech, and the second speech after the first speech is continued to be obtained, such as "can see", and the first speech and the second speech after the first speech are used as the speech to be read. Sentence speech. Correspondingly, the text corresponding to the sentence-to-be-sentence speech is "as you can see", and the sentence-sentencing model is used again to perform the sentence-sentencing operation on the sentence-sentencing speech, that is, "as you can see".
本实施例中的第二语音对应的文本包括预设数量个单词,该预设数量可以预先设置,可以为一个单词、两个单词……N个单词等,应理解,为了减少对待断句语音断句的时延,预设数量个单词应设置在预设范围内,以使用户感受不到停顿、延迟为准。示例性的,第一语音和第一语音之后的第二语音作为待断句语音对应的文本可以为“as you can”或“asyou can see”。应理解,本实施例中的单词为汉语文字中的一个字、英文文字中的一个单词。The text corresponding to the second voice in this embodiment includes a preset number of words, the preset number can be preset, and can be one word, two words... N words, etc. The delay, the preset number of words should be set within the preset range so that the user does not feel a pause, and the delay shall prevail. Exemplarily, the text corresponding to the first speech and the second speech following the first speech as the speech to be punctuated may be "as you can" or "asyou can see". It should be understood that the word in this embodiment is a word in the Chinese text and a word in the English text.
可选的,本实施例中为了最大程度的减少时延,可以将预设数量个单词设置为一个单词,即第二语音对应的文本为一个单词。也就是说,每接收到用户说出的一个单词,就采用断句模型对对应的待断句语音进行断句。可以想到的是,后一次进行断句的待断句语音对应的文本,相较于后一次进行断句的待断句语音对应的文本来说,多了一个单词。Optionally, in this embodiment, in order to minimize the time delay, a preset number of words may be set as one word, that is, the text corresponding to the second voice is one word. That is to say, every time a word spoken by the user is received, the sentence segmentation model is used to segment the corresponding speech to be segmented. It is conceivable that the text corresponding to the speech to be punctuated for the last punctuation has one more word than the text corresponding to the speech to be punctuated for the last punctuation.
对应的,对第一语音和第一语音之后的第二语音作为待断句语音,对待断句语音对应的文本进行断句操作的过程与上述“待断句语音”的断句过程相同。即,第一语音和第一语音之后的第二语音为待断句语音时,若采用断句模型,确定待断句语音对应的文本中不存在断句位置,或确定文本的断句位置的可信度小于阈值,需要继续获取第二语音之后的第三语音,将第一语音、第二语音和第三语音作为待断句语音,继续进行断句的操作,该过程是循环的过程,直至确定待断句语音对应的文本中的断句位置的可信度大于阈值。Correspondingly, the first voice and the second voice following the first voice are used as the voice to be segmented, and the process of segmenting the text corresponding to the voice to be segmented is the same as the above-mentioned process of segmenting the "voice to be segmented". That is, when the first voice and the second voice after the first voice are speech to be punctuated, if the sentence model is adopted, it is determined that there is no sentence punctuation position in the text corresponding to the voice to be punctuated, or the reliability of the sentence punctuation position of the text is less than the threshold , it is necessary to continue to obtain the third voice after the second voice, use the first voice, the second voice and the third voice as the voices to be punctuated, and continue the operation of sentence segmentation. This process is a cyclic process until the corresponding voice of the sentence to be punctuated is determined. The confidence of the segment position in the text is greater than the threshold.
例如,如下表一所示,表一中的左栏为待断句语音对应的文本,应理解,该待断句语音对应的文本为实时确定的不同时刻的待断句语音。右栏为断句模型输出的断句位置,以及断句位置对应的可信度。For example, as shown in Table 1 below, the left column in Table 1 is the text corresponding to the speech to be punctuated. It should be understood that the text corresponding to the speech to be punctuated is the speech to be punctuated at different times determined in real time. The right column shows the sentence segmentation position output by the sentence segmentation model, and the corresponding credibility of the sentence segmentation position.
表一Table I
应注意,上述表一中的φ表示待断句语音对应的文本中不存在断句位置;0表示断句位置在待断句语音对应的文本中的结尾处,括号中的数字表示断句位置的可信度。应理解,上述表一种的表示符号均为示例,也可采用其他符号进行表示。It should be noted that φ in the above Table 1 indicates that there is no sentence segmentation position in the text corresponding to the speech to be segmented; 0 indicates that the sentence segmentation position is at the end of the text corresponding to the voice to be segmented, and the numbers in parentheses indicate the reliability of the sentence segmentation position. It should be understood that the symbols in Table 1 above are examples, and other symbols may also be used for representation.
对应的,如表一所示,若第一语音对应的文本为“as you”,采用断句模型确定该文本中不存在断句位置,则继续获取第二语音“can”。此时,待断句语音对应的文本为“as youcan”,采用断句模型确定该文本中的断句位置为文本的结尾处,且断句位置的可信度为0.3,而阈值为0.5,则继续获取第三语音“see”。此时,待断句语音对应的文本为“as youcan see”,采用断句模型确定该文本中的断句位置为文本的结尾处,且断句位置的可信度为0.8,则确定文本的断句位置为该文本的结尾处;可选的,若采用断句模型确定“as youcan see”的断句位置为“see”的后面,且断句位置的可信度为0.3,则需要继续获取待断句语音,直至确定待断句语音对应的文本中的断句位置的可信度大于阈值。Correspondingly, as shown in Table 1, if the text corresponding to the first voice is "as you", and the sentence segmentation model is used to determine that there is no sentence segmentation position in the text, continue to obtain the second voice "can". At this time, the text corresponding to the speech to be segmented is "as youcan", and the segmented model is used to determine that the segmented position in the text is the end of the text, and the reliability of the segmented position is 0.3, and the threshold is 0.5, then continue to obtain the first Three voices "see". At this time, the text corresponding to the speech to be segmented is "as you can see", and the sentence segmenting model is used to determine that the segmented position in the text is the end of the text, and the reliability of the segmented position is 0.8, then it is determined that the segmented position of the text is the The end of the text; optionally, if the sentence segmentation model is used to determine that the sentence segmentation position of "as you can see" is after "see", and the reliability of the sentence segmentation position is 0.3, it is necessary to continue to obtain the speech to be segmented until the sentence to be segmented is determined. The reliability of the sentence sentence position in the text corresponding to the sentence sentence speech is greater than a threshold.
应理解,在确定待断句语音对应的文本中的断句位置的可信度大于阈值后,若待断句语音之后还存在待断句语音,则采用与上述相同的方法继续对新的待断句语音进行断句。It should be understood that after determining that the reliability of the sentence position in the text corresponding to the speech to be punctuated is greater than the threshold, if there is still a speech to be punctuated after the speech to be punctuated, the same method as above is used to continue segregating the new speech to be punctuated .
本实施例中提供的语音断句的方法包括:获取待断句语音对应的文本;采用断句模型,确定文本的断句位置,以及文本的断句位置的可信度,断句模型用于表征文本与断句位置、断句位置的可信度的对应关系;若确定文本的断句位置的可信度大于阈值,则根据文本的断句位置,对待断句语音进行断句。本实施例中通过预先获取的断句模型,能够实现实时对待断句语音进行断句,减少了时延。进一步的,在确定待断句语音对应的文本中不存在断句位置,或确定文本的断句位置的可信度小于阈值,则将第一语音和第一语音之后的第二语音作为待断句语音,并重新对待断句语音进行断句操作,且第二语音对应的文本包括预设数量个单词。本实施例中还可以实时参考待断句语音之后的单词重新进行断句操作,因为能够参考待断句语音之后的单词,可以提高断句的准确性。The method for speech sentence segmentation provided in this embodiment includes: obtaining the text corresponding to the speech to be segmented; using a sentence segmentation model to determine the sentence segmentation position of the text and the credibility of the sentence segmentation position of the text. The sentence segmentation model is used to represent the text and the sentence segmentation position, Correspondence between the credibility of the sentence segmentation position; if it is determined that the reliability of the sentence segmentation position of the text is greater than the threshold, then the speech to be segmented is segmented according to the sentence segmentation position of the text. In this embodiment, the pre-acquired sentence segmentation model can be used to perform sentence segmentation on the voice to be segmented in real time, reducing time delay. Further, there is no sentence-breaking position in the text corresponding to the speech to be broken, or the reliability of the sentence-breaking position of the determined text is less than a threshold, then the second voice after the first voice and the first voice is used as the voice to be broken, and The sentence segmentation operation is performed again on the voice to be segmented, and the text corresponding to the second voice includes a preset number of words. In this embodiment, it is also possible to re-segment the sentence with reference to the word after the speech to be segmented in real time, because the accuracy of sentence segmentation can be improved by referring to the word after the speech to be segmented.
在上述实施例的基础上,下面结合图2对本申请中的断句模型进行详细说明。图2为本申请提供的语音断句的方法的流程示意图二。如图2所示,本实施例提供的语音断句的方法可以包括:On the basis of the above embodiments, the sentence segmentation model in this application will be described in detail below with reference to FIG. 2 . FIG. 2 is a second schematic flow diagram of the method for speech segmentation provided by the present application. As shown in Figure 2, the method for speech sentence segmentation provided by this embodiment may include:
S201,使用历史语音对应的文本对语言模型进行训练,获取断句模型。S201. Use the text corresponding to the historical speech to train the language model to obtain a sentence segmentation model.
语言模型广泛应用于各种自然语言处理问题,如语音识别、机器翻译、分词、词性标注,等等。简单地说,采用语言模型,可以确定哪个句子的翻译结果的可信度高,或者给定若干个词,可以预测下一个最可能出现的单词。例如,输入拼音串为“nixianzaiganshenme”nixianzaiganshenme,对应的输出可以有多种形式,如“你现在干什么”、“你西安再赶什么”等,采用语言模型,可以确定前者的概率大于后者,因此将输入的拼音串转换成前者在多数情况下比较合理。再例如,输入汉语句子为“李明正在家里看电视”,对应的输出可以有多种形式,如“Li Ming is watching TV at home”、“Li Ming at home is watching TV”等,同样采用语言模型,可以确定前者的大于后者,因此翻译成前者比较合理。Language models are widely used in various natural language processing problems, such as speech recognition, machine translation, word segmentation, part-of-speech tagging, and so on. Simply put, using a language model, it can be determined which sentence has a high degree of reliability in the translation result, or given several words, the next most likely word can be predicted. For example, if the input pinyin string is "nixianzaiganshenme" nixianzaiganshenme, the corresponding output can have various forms, such as "what are you doing now", "what are you going to do in Xi'an", etc., using the language model, it can be determined that the probability of the former is greater than that of the latter, so Converting the input pinyin string to the former is more reasonable in most cases. For another example, if the input Chinese sentence is "Li Ming is watching TV at home", the corresponding output can have various forms, such as "Li Ming is watching TV at home", "Li Ming at home is watching TV", etc., also using language model, it can be determined that the former is greater than the latter, so it is more reasonable to translate it into the former.
本实施例中的语言模型是基于BERT框架训练获取的。BERT框架实际上是一个语言编码器,把输入的句子或者段落转化成特征向量(embedding)。本实施例中,基于BERT框架训练模型的过程整体分为两部分:The language model in this embodiment is obtained through training based on the BERT framework. The BERT framework is actually a language encoder that converts input sentences or paragraphs into feature vectors (embedding). In this embodiment, the process of training the model based on the BERT framework is divided into two parts as a whole:
1、预训练过程。预训练过程是一个迁移学习的任务的过程,目的是学习输入句子的向量。本实施例中可以将大量的语言文本输入至该于BERT框架进行训练,获取语言模型,其中,大量的语言文本可以是语音断句的装置从其他开源的服务器或者数据库中获取的语言文本。1. Pre-training process. The pre-training process is a transfer learning task process, the purpose is to learn the vector of the input sentence. In this embodiment, a large number of language texts can be input into the BERT framework for training to obtain a language model, wherein the large number of language texts can be language texts obtained by the speech segmentation device from other open source servers or databases.
2微调过程。基于少量监督学习样本对语言模型进行微调训练,获取断句模型。因为过程用少量的样本即可确定断句模型,使得训练时间短,获取模型的速度快。本实施例中,将历史语音对应的文本作为监督学习样本对对语言模型进行训练,获取断句模型。可选的,历史语音对应的文本可以为至少一段历史语音对应的文本。2 fine-tuning process. Based on a small number of supervised learning samples, the language model is fine-tuned and trained to obtain a sentence segmentation model. Because the sentence segmentation model can be determined with a small number of samples in the process, the training time is short and the model acquisition speed is fast. In this embodiment, the text corresponding to the historical speech is used as a supervised learning sample to train the language model to obtain a sentence segmentation model. Optionally, the text corresponding to the historical voice may be at least one piece of text corresponding to the historical voice.
下述以一段文本对对语言模型进行训练的过程进行示例说明。The following is an example of the process of training the language model with a piece of text.
本实施例中根据历史语音对应的文本,获取训练语句序列。其中,训练语句序列中包括多个训练语句,后一个训练语句包括:前一个训练语句、且相较于前一个训练语句增加至少一个单词。应理解,训练语句序列可以按照训练语句中单词的个数递增的方式进行排列。In this embodiment, the training sentence sequence is obtained according to the text corresponding to the historical speech. Wherein, the training sentence sequence includes multiple training sentences, and the latter training sentence includes: the previous training sentence, and at least one word is added compared with the previous training sentence. It should be understood that the training sentence sequence may be arranged in a manner that the number of words in the training sentence increases.
可选的,本实施例中可以对历史语音对应的文本按照单词进行划分,将第一个单词作为第一个训练语句,将前两个单词组合形成第二个训练语句,将前三个单词组合形成第二个训练语句……,依次类推,获取训练语句序列中的多个训练语句。Optionally, in this embodiment, the text corresponding to the historical voice can be divided according to words, the first word is used as the first training sentence, the first two words are combined to form the second training sentence, and the first three words Combine to form the second training sentence..., and so on, to obtain multiple training sentences in the training sentence sequence.
对应的,在获取多个训练语句后,可以根据每个训练语句,以及每个训练语句的期望断句位置对语言模型进行训练,获取断句模型。本实施例中最终训练获取的断句模型输出的每个训练语句的实际断句位置与期望断句位置相同。Correspondingly, after acquiring multiple training sentences, the language model may be trained according to each training sentence and the expected sentence segmentation position of each training sentence to obtain a sentence segmentation model. In this embodiment, the actual sentence segmentation position of each training sentence output by the sentence segmentation model obtained through the final training is the same as the expected sentence segmentation position.
示例性的,历史语音对应的文本为“As you can see,the images are notreally special.But combined they can create something like this.”。相应的,该文本对应的多个训练语句可以分别依次为“as”、“as you”、“as you can”……“as you cansee,the images are not really special.But combined they can create somethinglike this”。Exemplarily, the text corresponding to the historical speech is "As you can see, the images are not really special. But combined they can create something like this.". Correspondingly, the multiple training sentences corresponding to the text can be respectively "as", "as you", "as you can"... "as you can see, the images are not really special. But combined they can create something like this ".
例如,下表二的左栏为训练语句。本实施例中以断句位置为在文本中打句号的位置为例对获取断句模型进行说明。For example, the left column of Table 2 below is the training sentences. In this embodiment, the acquisition of the sentence segmentation model is described by taking the sentence segmentation position as the position of a period in the text as an example.
表二Table II
其中,上述表二中的1表示断句位置在待断句语音对应的文本中的结尾的前一个单词的后面,2表示断句位置在待断句语音对应的文本中的结尾的前两个单词的后面,3表示断句位置在待断句语音对应的文本中的结尾的前三个单词的后面。Wherein, 1 in the above-mentioned table two represents that the sentence sentence position is behind the previous word of the end in the text corresponding to the speech to be sentence sentence, and 2 represents that the sentence sentence position is behind the first two words at the end of the text corresponding to the speech to be sentence sentence, 3 indicates that the sentence segmentation position is behind the first three words at the end of the text corresponding to the speech to be sentence segmentation.
对应的,本实施例中可以将历史语音对应的文本,以及上述表二中对应的信息输入至语言模型中,对语言模型进行训练,获取断句模型。Correspondingly, in this embodiment, the text corresponding to the historical speech and the corresponding information in the above Table 2 can be input into the language model, and the language model can be trained to obtain the sentence segmentation model.
应理解,本实施例中的S201为预先执行的步骤,并非每次对待断句语音进行断句时必须执行的步骤。It should be understood that S201 in this embodiment is a pre-executed step, and is not a step that must be executed each time the speech to be segmented is segmented.
可选的,本实施例中在对待断句语音进行断句后,可以将待断句语音、待断句语音对应的文本,以及断句结果输入至断句模型中,对断句模型进行优化,使得断句模型的输出结果更为准确。Optionally, in this embodiment, after the sentence-to-be-sentence speech is segmented, the sentence-to-be-sentence speech, the text corresponding to the sentence-to-be-sentence speech, and the sentence-segmentation result can be input into the sentence-segmentation model, and the sentence-sentence model is optimized so that the output result of the sentence-sentence model more accurate.
S202,获取待断句语音对应的文本。S202. Obtain the text corresponding to the speech of the sentence to be segmented.
S203,采用断句模型,确定文本的断句位置,以及文本的断句位置的可信度。S203, using the sentence segmentation model to determine the sentence segmentation position of the text and the reliability of the sentence segmentation position of the text.
S204,若确定文本的断句位置的可信度大于阈值,则根据文本的断句位置,对待断句语音进行断句。S204. If it is determined that the reliability of the sentence segmentation position of the text is greater than the threshold, perform sentence segmentation on the speech to be segmented according to the sentence segmentation position of the text.
S205,若确定文本中不存在断句位置,或者确定文本中的断句位置的可信度小于阈值,则将所述第一语音和所述第一语音之后的第二语音作为所述待断句语音,并重新执行S202-S204。S205. If it is determined that there is no sentence segmentation position in the text, or the reliability of the sentence segmentation position in the text is determined to be less than a threshold, then use the first voice and the second voice after the first voice as the voice to be segmented, And execute S202-S204 again.
其中,所述第二语音对应的文本包括预设数量个单词,所述第一语音上述S202中的待断句语音。Wherein, the text corresponding to the second voice includes a preset number of words, and the first voice is the voice to be punctuated in S202.
应理解,本实施例中,若确定文本中不存在断句位置,或者确定文本中的断句位置的可信度小于阈值时,也可以参照上述实施例中的相关描述在本实施例中执行对应的操作。对应的,本实施例中的S202-S204可以参照上述实施例中的相关描述,在此不做限制。其中,S204和S205没有先后顺序的区分,二者是择一执行的步骤。It should be understood that in this embodiment, if it is determined that there is no sentence segmentation position in the text, or it is determined that the reliability of the sentence segmentation position in the text is less than the threshold, you can also refer to the relevant descriptions in the above embodiments to execute the corresponding sentence in this embodiment. operate. Correspondingly, for S202-S204 in this embodiment, reference may be made to relevant descriptions in the foregoing embodiments, and no limitation is made here. Wherein, S204 and S205 are not distinguished in sequence, and the two are steps to be executed alternatively.
本实施例中,通过在基于BERT框架训练获取的语言模型的训练,获取断句模型,其中采用BERT框架由于训练样本的数据量少,能够快速完成训练,提高获取断句模型的速度;进一步的,本实施例中对历史语音对应的文本拆分成多个训练语句,后一个训练语句包括:前一个训练语句、且相较于前一个训练语句增加至少一个单词,在该方式下训练获取的断句模型能够有效确定待断句语音的断句位置,达到实时对语音断句的目的。In this embodiment, the sentence model is obtained by training the language model based on the training of the BERT framework, wherein the BERT framework can quickly complete the training due to the small amount of data in the training samples, and improve the speed of obtaining the sentence model; further, this In the embodiment, the text corresponding to the historical voice is split into multiple training sentences, the latter training sentence includes: the previous training sentence, and compared with the previous training sentence, at least one word is added, and the sentence segmentation model obtained by training in this way It can effectively determine the segmenting position of the speech to be segmented, and achieve the purpose of segmenting the speech in real time.
图3为本申请提供的语音断句的装置的结构示意图一。如图3所示,该语音断句的装置300包括:处理模块301和训练模块302。FIG. 3 is a schematic diagram of the structure of the device for sentence segmentation provided by the present application. As shown in FIG. 3 , the device 300 for speech segmentation includes: a processing module 301 and a training module 302 .
处理模块301,用于获取待断句语音对应的文本;采用断句模型,确定文本的断句位置,以及文本的断句位置的可信度,断句模型用于表征文本与断句位置、断句位置的可信度的对应关系;若确定文本的断句位置的可信度大于阈值,则根据文本的断句位置,对待断句语音进行断句。The processing module 301 is used to obtain the text corresponding to the speech to be punctuated; the sentence punctuation model is used to determine the sentence punctuation position of the text, and the credibility of the sentence punctuation position of the text, and the sentence punctuation model is used to represent the text and the sentence punctuation position, and the credibility of the sentence punctuation position If it is determined that the reliability of the sentence segmentation position of the text is greater than the threshold value, the speech to be sentence segmentation is segmented according to the sentence segmentation position of the text.
可选的,待断句语音为第一语音,处理模块301,还用于若确定文本中不存在断句位置,或确定文本的断句位置的可信度小于阈值,则将第一语音和第一语音之后的第二语音作为待断句语音,并重新对待断句语音进行断句操作,第二语音对应的文本包括预设数量个单词。Optionally, the speech to be punctuated is the first speech, and the processing module 301 is further configured to combine the first speech and the first speech if it is determined that there is no sentence punctuation position in the text, or the reliability of the sentence punctuation position of the determined text is less than a threshold. The subsequent second voice is used as the voice to be sentence-sentenced, and the sentence-sentencing operation is performed again on the voice to be sentence-sentenced, and the text corresponding to the second voice includes a preset number of words.
训练模块302,用于使用历史语音对应的文本对语言模型进行训练,获取断句模型。The training module 302 is configured to use the text corresponding to the historical speech to train the language model to obtain a sentence segmentation model.
可选的,训练模块302,具体用于根据历史语音对应的文本,获取训练语句序列,训练语句序列中包括多个训练语句,后一个训练语句包括:前一个训练语句、且相较于前一个训练语句增加至少一个单词;根据每个训练语句,以及每个训练语句的期望断句位置对语言模型进行训练,获取断句模型,断句模型输出的每个训练语句的实际断句位置与期望断句位置相同。Optionally, the training module 302 is specifically used to obtain a training sentence sequence according to the text corresponding to the historical voice, the training sentence sequence includes a plurality of training sentences, and the latter training sentence includes: the previous training sentence, and compared with the previous training sentence Add at least one word to the training sentence; train the language model according to each training sentence and the expected sentence segmentation position of each training sentence to obtain a sentence segmentation model, and the actual sentence segmentation position of each training sentence output by the sentence segmentation model is the same as the expected sentence segmentation position.
可选的,语言模型是基于BERT框架训练获取的。Optionally, the language model is obtained through training based on the BERT framework.
可选的,预设数量个单词为一个单词。Optionally, a preset number of words is a word.
可选的,待断句语音为同声传译场景中获取的待断句的实时语音。Optionally, the voice of the sentence to be segmented is the real-time voice of the sentence to be segmented acquired in the simultaneous interpretation scene.
本实施例提供的语音断句的装置与上述语音断句的方法实现的原理和技术效果类似,在此不作赘述。The principle and technical effect of the device for speech sentence segmentation provided by this embodiment are similar to those of the above method for speech sentence segmentation, and details are not described here.
图4为本申请提供的语音断句的装置的结构示意图二。如图4所示,该语音断句的装置400包括:存储器401和至少一个处理器402。FIG. 4 is a second structural schematic diagram of the device for speech segmentation provided by the present application. As shown in FIG. 4 , the device 400 for speech segmentation includes: a memory 401 and at least one processor 402 .
存储器401,用于存储程序指令。The memory 401 is used for storing program instructions.
处理器402,用于在程序指令被执行时实现本实施例中的语音断句的方法,具体实现原理可参见上述实施例,本实施例此处不再赘述。The processor 402 is configured to implement the method for speech punctuation in this embodiment when the program instructions are executed. For specific implementation principles, refer to the above-mentioned embodiments, and details will not be repeated here in this embodiment.
该语音断句的装置400还可以包括及输入/输出接口403。The device 400 for speech punctuation may also include an input/output interface 403 .
输入/输出接口403可以包括独立的输出接口和输入接口,也可以为集成输入和输出的集成接口。其中,输出接口用于输出数据,输入接口用于获取输入的数据。The input/output interface 403 may include an independent output interface and an input interface, or may be an integrated interface integrating input and output. Wherein, the output interface is used to output data, and the input interface is used to obtain input data.
本申请还提供一种可读存储介质,可读存储介质中存储有执行指令,当语音断句的装置的至少一个处理器执行该执行指令时,当计算机执行指令被处理器执行时,实现上述实施例中的语音断句的方法。The present application also provides a readable storage medium, wherein execution instructions are stored in the readable storage medium, and when at least one processor of the device for speech punctuation executes the execution instructions, when the computer execution instructions are executed by the processor, the above-mentioned implementation is realized. The method of phonetic punctuation in the example.
本申请还提供一种程序产品,该程序产品包括执行指令,该执行指令存储在可读存储介质中。语音断句的装置的至少一个处理器可以从可读存储介质读取该执行指令,至少一个处理器执行该执行指令使得语音断句的装置实施上述的各种实施方式提供的语音断句的方法。The present application also provides a program product, which includes execution instructions, and the execution instructions are stored in a readable storage medium. At least one processor of the device for phonetic sentence segmentation can read the execution instruction from a readable storage medium, and at least one processor executes the execution instruction to make the device for phonetic sentence segmentation implement the methods for phonetic sentence segmentation provided in the above-mentioned various implementations.
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个模块或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或模块的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed devices and methods may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the modules is only a logical function division. In actual implementation, there may be other division methods. For example, multiple modules or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or modules may be in electrical, mechanical or other forms.
所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理模块,即可以位于一个地方,或者也可以分布到多个网络模块上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。The modules described as separate components may or may not be physically separated, and the components displayed as modules may or may not be physical modules, that is, they may be located in one place, or may be distributed to multiple network modules. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本申请各个实施例中的各功能模块可以集成在一个处理模块中,也可以是各个模块单独物理存在,也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用硬件加软件功能模块的形式实现。In addition, each functional module in each embodiment of the present application may be integrated into one processing module, each module may exist separately physically, or two or more modules may be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware, or in the form of hardware plus software function modules.
上述以软件功能模块的形式实现的集成的模块,可以存储在一个计算机可读取存储介质中。上述软件功能模块存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(英文:processor)执行本申请各个实施例所述方法的部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(英文:Read-Only Memory,简称:ROM)、随机存取存储器(英文:Random Access Memory,简称:RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The above-mentioned integrated modules implemented in the form of software function modules can be stored in a computer-readable storage medium. The above-mentioned software functional modules are stored in a storage medium, and include several instructions to enable a computer device (which may be a personal computer, server, or network device, etc.) or a processor (English: processor) to execute the functions described in various embodiments of the present application. part of the method. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (English: Read-Only Memory, abbreviated: ROM), random access memory (English: Random Access Memory, abbreviated: RAM), magnetic disk or optical disc, etc. Various media that can store program code.
在上述网络设备或者终端设备的实施例中,应理解,处理模块可以是中央处理单元(英文:Central Processing Unit,简称:CPU),还可以是其他通用处理器、数字信号处理器(英文:Digital Signal Processor,简称:DSP)、专用集成电路(英文:ApplicationSpecific Integrated Circuit,简称:ASIC)等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。In the above embodiments of the network device or the terminal device, it should be understood that the processing module may be a central processing unit (English: Central Processing Unit, CPU for short), and may also be other general-purpose processors, digital signal processors (English: Digital Signal Processor, referred to as: DSP), application specific integrated circuit (English: Application Specific Integrated Circuit, referred to as: ASIC), etc. A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, and the like. The steps of the methods disclosed in this application can be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor.
最后应说明的是:以上各实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述各实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and are not intended to limit it; although the application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: It is still possible to modify the technical solutions described in the foregoing embodiments, or perform equivalent replacements for some or all of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the technical solutions of the various embodiments of the application. scope.
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201910463478.0ACN110264997A (en) | 2019-05-30 | 2019-05-30 | The method, apparatus and storage medium of voice punctuate |
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201910463478.0ACN110264997A (en) | 2019-05-30 | 2019-05-30 | The method, apparatus and storage medium of voice punctuate |
| Publication Number | Publication Date |
|---|---|
| CN110264997Atrue CN110264997A (en) | 2019-09-20 |
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201910463478.0APendingCN110264997A (en) | 2019-05-30 | 2019-05-30 | The method, apparatus and storage medium of voice punctuate |
| Country | Link |
|---|---|
| CN (1) | CN110264997A (en) |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111414481A (en)* | 2020-03-19 | 2020-07-14 | 哈尔滨理工大学 | Chinese semantic matching method based on pinyin and BERT embedding |
| CN111737991A (en)* | 2020-07-01 | 2020-10-02 | 携程计算机技术(上海)有限公司 | Text sentence break position identification method and system, electronic device and storage medium |
| CN112101003A (en)* | 2020-09-14 | 2020-12-18 | 深圳前海微众银行股份有限公司 | Sentence text segmentation method, apparatus, device and computer-readable storage medium |
| CN114065785A (en)* | 2021-11-19 | 2022-02-18 | 蜂后网络科技(深圳)有限公司 | Real-time online communication translation method and system |
| CN114265918A (en)* | 2021-12-01 | 2022-04-01 | 北京捷通华声科技股份有限公司 | Text segmentation method and device and electronic equipment |
| CN114420102A (en)* | 2022-01-04 | 2022-04-29 | 广州小鹏汽车科技有限公司 | Method and device for speech sentence-breaking, electronic equipment and storage medium |
| CN115579009A (en)* | 2022-12-06 | 2023-01-06 | 广州小鹏汽车科技有限公司 | Voice interaction method, server and computer readable storage medium |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN1235312A (en)* | 1998-05-13 | 1999-11-17 | 国际商业机器公司 | Device and method for automatically generating punctuation marks in continuous speech recognition |
| US20140365202A1 (en)* | 2013-06-11 | 2014-12-11 | Facebook, Inc. | Translation and integration of presentation materials in cross-lingual lecture support |
| US20150019219A1 (en)* | 2013-07-10 | 2015-01-15 | GM Global Technology Operations LLC | Systems and methods for spoken dialog service arbitration |
| CN105096941A (en)* | 2015-09-02 | 2015-11-25 | 百度在线网络技术(北京)有限公司 | Voice recognition method and device |
| CN105244022A (en)* | 2015-09-28 | 2016-01-13 | 科大讯飞股份有限公司 | Audio and video subtitle generation method and apparatus |
| CN105427858A (en)* | 2015-11-06 | 2016-03-23 | 科大讯飞股份有限公司 | Method and system for achieving automatic voice classification |
| CN107247706A (en)* | 2017-06-16 | 2017-10-13 | 中国电子技术标准化研究院 | Text punctuate method for establishing model, punctuate method, device and computer equipment |
| CN107305541A (en)* | 2016-04-20 | 2017-10-31 | 科大讯飞股份有限公司 | Speech recognition text segmentation method and device |
| CN107632980A (en)* | 2017-08-03 | 2018-01-26 | 北京搜狗科技发展有限公司 | Voice translation method and device, the device for voiced translation |
| CN107679033A (en)* | 2017-09-11 | 2018-02-09 | 百度在线网络技术(北京)有限公司 | Text punctuate location recognition method and device |
| CN108073572A (en)* | 2016-11-16 | 2018-05-25 | 北京搜狗科技发展有限公司 | Information processing method and its device, simultaneous interpretation system |
| CN109710770A (en)* | 2019-01-31 | 2019-05-03 | 北京牡丹电子集团有限责任公司数字电视技术中心 | A kind of file classification method and device based on transfer learning |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN1235312A (en)* | 1998-05-13 | 1999-11-17 | 国际商业机器公司 | Device and method for automatically generating punctuation marks in continuous speech recognition |
| US20140365202A1 (en)* | 2013-06-11 | 2014-12-11 | Facebook, Inc. | Translation and integration of presentation materials in cross-lingual lecture support |
| US20150019219A1 (en)* | 2013-07-10 | 2015-01-15 | GM Global Technology Operations LLC | Systems and methods for spoken dialog service arbitration |
| CN105096941A (en)* | 2015-09-02 | 2015-11-25 | 百度在线网络技术(北京)有限公司 | Voice recognition method and device |
| CN105244022A (en)* | 2015-09-28 | 2016-01-13 | 科大讯飞股份有限公司 | Audio and video subtitle generation method and apparatus |
| CN105427858A (en)* | 2015-11-06 | 2016-03-23 | 科大讯飞股份有限公司 | Method and system for achieving automatic voice classification |
| CN107305541A (en)* | 2016-04-20 | 2017-10-31 | 科大讯飞股份有限公司 | Speech recognition text segmentation method and device |
| CN108073572A (en)* | 2016-11-16 | 2018-05-25 | 北京搜狗科技发展有限公司 | Information processing method and its device, simultaneous interpretation system |
| CN107247706A (en)* | 2017-06-16 | 2017-10-13 | 中国电子技术标准化研究院 | Text punctuate method for establishing model, punctuate method, device and computer equipment |
| CN107632980A (en)* | 2017-08-03 | 2018-01-26 | 北京搜狗科技发展有限公司 | Voice translation method and device, the device for voiced translation |
| CN107679033A (en)* | 2017-09-11 | 2018-02-09 | 百度在线网络技术(北京)有限公司 | Text punctuate location recognition method and device |
| CN109710770A (en)* | 2019-01-31 | 2019-05-03 | 北京牡丹电子集团有限责任公司数字电视技术中心 | A kind of file classification method and device based on transfer learning |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111414481A (en)* | 2020-03-19 | 2020-07-14 | 哈尔滨理工大学 | Chinese semantic matching method based on pinyin and BERT embedding |
| CN111414481B (en)* | 2020-03-19 | 2023-09-26 | 哈尔滨理工大学 | Chinese semantic matching method based on pinyin and BERT embedding |
| CN111737991A (en)* | 2020-07-01 | 2020-10-02 | 携程计算机技术(上海)有限公司 | Text sentence break position identification method and system, electronic device and storage medium |
| CN111737991B (en)* | 2020-07-01 | 2023-12-12 | 携程计算机技术(上海)有限公司 | Text sentence breaking position identification method and system, electronic equipment and storage medium |
| CN112101003A (en)* | 2020-09-14 | 2020-12-18 | 深圳前海微众银行股份有限公司 | Sentence text segmentation method, apparatus, device and computer-readable storage medium |
| CN114065785A (en)* | 2021-11-19 | 2022-02-18 | 蜂后网络科技(深圳)有限公司 | Real-time online communication translation method and system |
| CN114265918A (en)* | 2021-12-01 | 2022-04-01 | 北京捷通华声科技股份有限公司 | Text segmentation method and device and electronic equipment |
| CN114420102A (en)* | 2022-01-04 | 2022-04-29 | 广州小鹏汽车科技有限公司 | Method and device for speech sentence-breaking, electronic equipment and storage medium |
| CN114420102B (en)* | 2022-01-04 | 2022-10-14 | 广州小鹏汽车科技有限公司 | Method and device for speech sentence-breaking, electronic equipment and storage medium |
| WO2023130951A1 (en)* | 2022-01-04 | 2023-07-13 | 广州小鹏汽车科技有限公司 | Speech sentence segmentation method and apparatus, electronic device, and storage medium |
| CN115579009A (en)* | 2022-12-06 | 2023-01-06 | 广州小鹏汽车科技有限公司 | Voice interaction method, server and computer readable storage medium |
| WO2024120450A1 (en)* | 2022-12-06 | 2024-06-13 | 广州小鹏汽车科技有限公司 | Voice interaction method, server, and computer-readable storage medium |
| Publication | Publication Date | Title |
|---|---|---|
| CN113811946B (en) | End-to-end automatic speech recognition of digital sequences | |
| CN112599128B (en) | Voice recognition method, device, equipment and storage medium | |
| US11514891B2 (en) | Named entity recognition method, named entity recognition equipment and medium | |
| CN110264997A (en) | The method, apparatus and storage medium of voice punctuate | |
| JP7679468B2 (en) | Transformer Transducer: A Model for Unifying Streaming and Non-Streaming Speech Recognition | |
| CN108766414B (en) | Method, apparatus, device and computer-readable storage medium for speech translation | |
| CN112528637B (en) | Text processing model training method, device, computer equipment and storage medium | |
| CN111402861B (en) | Voice recognition method, device, equipment and storage medium | |
| US12154581B2 (en) | Cascaded encoders for simplified streaming and non-streaming ASR | |
| US9502036B2 (en) | Correcting text with voice processing | |
| CN116884391B (en) | Multi-modal fusion audio generation method and device based on diffusion model | |
| CN114299930B (en) | End-to-end speech recognition model processing method, speech recognition method and related device | |
| KR20240065125A (en) | Large-scale language model data selection for rare word speech recognition. | |
| EP3405912A1 (en) | Analyzing textual data | |
| CN117043856A (en) | End-to-end model on high-efficiency streaming non-recursive devices | |
| CN108228574B (en) | Text translation processing method and device | |
| CN112397056B (en) | Voice evaluation method and computer storage medium | |
| CN111613215B (en) | Voice recognition method and device | |
| CN111539199A (en) | Text error correction method, device, terminal, and storage medium | |
| TW202232468A (en) | Method and system for correcting speaker diarisation using speaker change detection based on text | |
| CN112133285B (en) | Speech recognition method, device, storage medium and electronic equipment | |
| CN119547136A (en) | Context-aware neural confidence estimation for rare word speech recognition | |
| CN113793598A (en) | Training method and data enhancement method, device and equipment for speech processing model | |
| CN115150567A (en) | A subtitle generation method, device, electronic device and readable storage medium | |
| CN116189663A (en) | Prosodic prediction model training method and device, human-computer interaction method and device |
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| RJ01 | Rejection of invention patent application after publication | Application publication date:20190920 | |
| RJ01 | Rejection of invention patent application after publication |