Movatterモバイル変換


[0]ホーム

URL:


CN110673748A - Method and device for providing candidate long sentences in input method - Google Patents

Method and device for providing candidate long sentences in input method
Download PDF

Info

Publication number
CN110673748A
CN110673748ACN201910927584.XACN201910927584ACN110673748ACN 110673748 ACN110673748 ACN 110673748ACN 201910927584 ACN201910927584 ACN 201910927584ACN 110673748 ACN110673748 ACN 110673748A
Authority
CN
China
Prior art keywords
candidate
words
prediction model
long
long sentence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910927584.XA
Other languages
Chinese (zh)
Other versions
CN110673748B (en
Inventor
龚建
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co LtdfiledCriticalBeijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201910927584.XApriorityCriticalpatent/CN110673748B/en
Publication of CN110673748ApublicationCriticalpatent/CN110673748A/en
Application grantedgrantedCritical
Publication of CN110673748BpublicationCriticalpatent/CN110673748B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

Translated fromChinese

本申请提出一种输入法中候选长句的提供方法及装置,其中,该方法获取用户在输入法应用中输入的当前输入序列,并获取与该当前输入序列相匹配的候选词语,以及结合预先训练的长句预测模型和候选词语,获取对应的候选长句,并在输入法应用上展示候选词语的同时显示候选长句,由此,结合预先训练的长句预测模型,快速得到匹配的候选长句,并向用户提供候选长句,方便用户根据候选长句快速完成长句的输入,减少了用户的输入成本,提高了用户体验度。

Figure 201910927584

The present application provides a method and device for providing a candidate long sentence in an input method, wherein the method obtains a current input sequence input by a user in an input method application, obtains candidate words matching the current input sequence, and combines the The trained long sentence prediction model and candidate words, obtain the corresponding candidate long sentences, and display the candidate long sentences while displaying the candidate words on the input method application. Therefore, combined with the pre-trained long sentence prediction model, the matching candidates can be quickly obtained. long sentences, and provide the user with candidate long sentences, so that the user can quickly complete the input of long sentences according to the candidate long sentences, which reduces the user's input cost and improves the user experience.

Figure 201910927584

Description

Translated fromChinese
输入法中候选长句的提供方法及装置Method and device for providing candidate long sentences in input method

技术领域technical field

本申请涉及人工智能技术领域,尤其涉及一种输入法中候选长句的提供方法及装置。The present application relates to the technical field of artificial intelligence, and in particular, to a method and device for providing long candidate sentences in an input method.

背景技术Background technique

目前,在输入法应用中,输入法应用可根据用户输入的拼音序列,提供该拼音序列对应的候选词语以及候选词语对应的下一个字、字、词、短语等较短的文本,然而,在实际应用中在用户需要通过输入法应用输入一个完整句子时,用户需要通过输入法应用多次输入对应完整句子中的相应拼音序列,以完成完整句子的输入,用户输入完整句子的输入成本较高,用户的输入法体验并不理想。At present, in input method applications, the input method application can provide short texts such as candidate words corresponding to the pinyin sequence and the next word, word, word, phrase, etc. corresponding to the candidate words according to the pinyin sequence input by the user. In practical applications, when the user needs to input a complete sentence through the input method application, the user needs to input the corresponding pinyin sequence in the corresponding complete sentence multiple times through the input method application to complete the input of the complete sentence, and the input cost of the user inputting the complete sentence is relatively high. , the user's input method experience is not ideal.

发明内容SUMMARY OF THE INVENTION

本申请旨在至少在一定程度上解决相关技术中的技术问题之一。The present application aims to solve one of the technical problems in the related art at least to a certain extent.

为此,本申请的第一个目的在于提出一种输入法中候选长句的提供方法。Therefore, the first objective of the present application is to propose a method for providing candidate long sentences in an input method.

本申请的第二个目的在于提出一种输入法中候选长句的提供装置。The second purpose of the present application is to provide a device for providing long sentence candidates in an input method.

本申请的第三个目的在于提出一种电子设备。The third object of the present application is to propose an electronic device.

本申请的第四个目的在于提出一种计算机可读存储介质。A fourth object of the present application is to propose a computer-readable storage medium.

本申请的第五个目的在于提出一种计算机程序产品。A fifth object of the present application is to propose a computer program product.

为达上述目的,本申请第一方面实施例提出了一种输入法中候选长句的提供方法,包括:获取用户在输入法应用中输入的当前输入序列;获取与所述当前输入序列相匹配的候选词语;根据预先训练的长句预测模型,获取与所述候选词语相匹配的候选长句;在所述输入法应用上展示候选词语和所述候选长句。In order to achieve the above purpose, an embodiment of the first aspect of the present application proposes a method for providing long candidate sentences in an input method, including: obtaining a current input sequence input by a user in an input method application; obtaining a sequence matching the current input sequence candidate words; obtain long candidate sentences matching the candidate words according to the pre-trained long sentence prediction model; display the candidate words and the candidate long sentences on the input method application.

本申请实施例的输入法中候选长句的提供方法,获取用户在输入法应用中输入的当前输入序列,并获取与该当前输入序列相匹配的候选词语,以及结合预先训练的长句预测模型和候选词语,获取对应的候选长句,并在输入法应用上展示候选词语的同时显示候选长句,由此,结合预先训练的长句预测模型,快速得到匹配的候选长句,并向用户提供候选长句,方便用户根据候选长句快速完成长句的输入,减少了用户的输入成本,提高了用户体验度。The method for providing a candidate long sentence in an input method according to an embodiment of the present application acquires a current input sequence input by a user in an input method application, acquires candidate words matching the current input sequence, and combines a pre-trained long sentence prediction model and candidate words, obtain the corresponding candidate long sentences, and display the candidate long sentences while displaying the candidate words on the input method application. Therefore, combined with the pre-trained long sentence prediction model, the matching candidate long sentences can be quickly obtained and sent to the user. The candidate long sentences are provided, so that the user can quickly complete the input of the long sentences according to the candidate long sentences, which reduces the input cost of the user and improves the user experience.

为达上述目的,本申请第三方面实施例提出了一种输入法中候选长句的提供装置,包括:第一获取模块,用于获取用户在输入法应用中输入的当前输入序列;第二获取模块,用于获取与所述当前输入序列相匹配的候选词语;第三获取模块,用于根据预先训练的长句预测模型,获取与所述候选词语相匹配的候选长句;展示模块,用于在所述输入法应用上展示候选词语和所述候选长句。In order to achieve the above purpose, a third aspect of the present application provides a device for providing candidate long sentences in an input method, including: a first acquisition module for acquiring a current input sequence input by a user in an input method application; a second an acquisition module for acquiring candidate words matching the current input sequence; a third acquisition module for acquiring candidate long sentences matching the candidate words according to a pre-trained long sentence prediction model; a display module, for displaying the candidate word and the candidate long sentence on the input method application.

本申请实施例的输入法中候选长句的提供装置,获取用户在输入法应用中输入的当前输入序列,并获取与该当前输入序列相匹配的候选词语,以及结合预先训练的长句预测模型和候选词语,获取对应的候选长句,并在输入法应用上展示候选词语的同时显示候选长句,由此,结合预先训练的长句预测模型,快速得到匹配的候选长句,并向用户提供候选长句,方便用户根据候选长句快速完成长句的输入,减少了用户的输入成本,提高了用户体验度。The device for providing candidate long sentences in the input method according to the embodiment of the present application acquires the current input sequence input by the user in the input method application, acquires candidate words matching the current input sequence, and combines with the pre-trained long sentence prediction model and candidate words, obtain the corresponding candidate long sentences, and display the candidate long sentences while displaying the candidate words on the input method application. Therefore, combined with the pre-trained long sentence prediction model, the matching candidate long sentences can be quickly obtained and sent to the user. The candidate long sentences are provided, so that the user can quickly complete the input of the long sentences according to the candidate long sentences, which reduces the input cost of the user and improves the user experience.

为达上述目的,本申请第三方面实施例提出了一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现如上所述的输入法中候选长句的提供方法。In order to achieve the above purpose, an embodiment of the third aspect of the present application proposes an electronic device, including a memory, a processor, and a computer program stored in the memory and running on the processor, and the processor implements the program when executing the program. The method for providing candidate long sentences in the input method as described above.

为了实现上述目的,本申请第四方面实施例提出了一种计算机可读存储介质,当所述存储介质中的指令被处理器执行时,实现如上所述的输入法中候选长句的提供方法。In order to achieve the above-mentioned purpose, a fourth aspect of the present application provides a computer-readable storage medium. When an instruction in the storage medium is executed by a processor, the above-mentioned method for providing a candidate long sentence in an input method is implemented. .

为了实现上述目的,本申请第五方面实施例提出了一种计算机程序产品,当所述计算机程序产品中的指令处理器执行时,实现如上所述的资质审核方法。In order to achieve the above objective, a fifth aspect of the present application provides a computer program product, when an instruction processor in the computer program product is executed, the above-mentioned qualification verification method is implemented.

本申请附加的方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得明显,或通过本申请的实践了解到。Additional aspects and advantages of the present application will be set forth, in part, in the following description, and in part will be apparent from the following description, or learned by practice of the present application.

附图说明Description of drawings

本申请上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解,其中:The above and/or additional aspects and advantages of the present application will become apparent and readily understood from the following description of embodiments taken in conjunction with the accompanying drawings, wherein:

图1是根据本申请一个实施例的输入法中候选长句的提供方法的流程示意图;1 is a schematic flowchart of a method for providing a candidate long sentence in an input method according to an embodiment of the present application;

图2是包含候选长句的用户界面的示例图;Figure 2 is an example diagram of a user interface comprising candidate long sentences;

图3是图1所示实施例中步骤103的细化流程示意图一;FIG. 3 is a schematic diagram 1 of the refinement process ofstep 103 in the embodiment shown in FIG. 1;

图4是图1所示实施例中步骤103的细化流程示意图二;FIG. 4 is a second schematic diagram of the refinement process ofstep 103 in the embodiment shown in FIG. 1;

图5是根据本申请另一个实施例的输入法中候选长句的提供方法的流程示意图;5 is a schematic flowchart of a method for providing a candidate long sentence in an input method according to another embodiment of the present application;

图6是根据本申请一个实施例的输入法中候选长句的提供装置的结构示意图;FIG. 6 is a schematic structural diagram of a device for providing a candidate long sentence in an input method according to an embodiment of the present application;

图7是根据本申请另一个实施例的输入法中候选长句的提供装置的结构示意图;FIG. 7 is a schematic structural diagram of an apparatus for providing a candidate long sentence in an input method according to another embodiment of the present application;

图8是根据本申请另一个实施例的输入法中候选长句的提供装置的结构示意图;FIG. 8 is a schematic structural diagram of an apparatus for providing a candidate long sentence in an input method according to another embodiment of the present application;

图9是根据本申请另一个实施例的输入法中候选长句的提供装置的结构示意图FIG. 9 is a schematic structural diagram of an apparatus for providing a long sentence candidate in an input method according to another embodiment of the present application

图10是根据本申请一个实施例的电子设备的结构示意图。FIG. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

具体实施方式Detailed ways

下面详细描述本申请的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,旨在用于解释本申请,而不能理解为对本申请的限制。The following describes in detail the embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary, and are intended to be used to explain the present application, but should not be construed as a limitation to the present application.

下面参考附图描述本申请实施例的输入法中候选长句的提供方法、装置和电子设备。The following describes the method, apparatus, and electronic device for providing long sentence candidates in the input method according to the embodiments of the present application with reference to the accompanying drawings.

图1是根据本申请一个实施例的输入法中候选长句的提供方法的流程示意图。其中,需要说明的是,本实施例提供的输入法中候选长句的提供方法的执行主体为输入法中候选长句的提供装置,该装置可以配置在电子设备或者云端服务器中,该实施例对此不作具体限定。FIG. 1 is a schematic flowchart of a method for providing a candidate long sentence in an input method according to an embodiment of the present application. It should be noted that the execution body of the method for providing long candidate sentences in the input method provided by this embodiment is a device for providing long candidate sentences in the input method, and the device may be configured in an electronic device or a cloud server. This embodiment This is not specifically limited.

如图1所示,该输入法中候选长句的提供方法可以包括:As shown in Figure 1, the method for providing candidate long sentences in the input method may include:

步骤101,获取用户在输入法应用中输入的当前输入序列。Step 101: Acquire the current input sequence input by the user in the input method application.

步骤102,获取与当前输入序列相匹配的候选词语。Step 102: Obtain candidate words matching the current input sequence.

作为一种示例性的实施方式,在用户需要通过输入法应用输入信息时,终端设备可获取用户在输入法应用中输入的当前输入序列,并将当前输入序列上传至云端服务器,以由云端服务器可对当前输入序列进行转换,以得到与当前输入序列相匹配的候选词语。As an exemplary implementation, when the user needs to input information through the input method application, the terminal device can obtain the current input sequence input by the user in the input method application, and upload the current input sequence to the cloud server, so that the cloud server can The current input sequence can be transformed to obtain candidate words that match the current input sequence.

可以理解的是,获取与当前输入序列相匹配的候选词语除了可以由云端服务器执行外,还可以由终端设备来执行,例如,终端设备可根据用户在输入法应用中输入的当前输入序列,结合按键模型对当前输入序列进行转换,以得到与该当前输入序列相匹配的候选词语,并将对应的候选词语上传至云端服务器,该实施例对此不作限定。It can be understood that obtaining candidate words that match the current input sequence can be performed by the terminal device in addition to the cloud server. For example, the terminal device can combine The button model converts the current input sequence to obtain candidate words matching the current input sequence, and uploads the corresponding candidate words to the cloud server, which is not limited in this embodiment.

其中,终端设备可以包括但不限于个人计算机、平板电脑、手机、智能手机等具有输入法应用的硬件设备,该实施例对此不作具体限定。The terminal device may include, but is not limited to, a personal computer, a tablet computer, a mobile phone, a smart phone, and other hardware devices with input method applications, which are not specifically limited in this embodiment.

例如,用户在输入法应用中输入的当前输入序列为nihaozai,对当前输入序列“nihaozai”进行转换,可以获取该当前输入序列对应的首先候选词为“你好在”。For example, the current input sequence entered by the user in the input method application is nihaozai, and by converting the current input sequence "nihaozai", the first candidate word corresponding to the current input sequence can be obtained as "Hello".

又例如,用户在输入法应用中输入的当前输入序列为guonianhao,对当前输入序列“guonianhao”进行转换,可以获取该当前输入序列对应的首先候选词为“过年好”。For another example, the current input sequence entered by the user in the input method application is guonianhao, and by converting the current input sequence "guonianhao", the first candidate word corresponding to the current input sequence can be obtained as "Happy New Year".

步骤103,根据预先训练的长句预测模型,获取与候选词语相匹配的候选长句。Step 103 , according to the pre-trained long sentence prediction model, obtain candidate long sentences matching the candidate words.

具体地,在获取与当前输入序列相匹配的候选词语后,为了方便用户通过输入法应用快速输入一个完整长句,可结合预先训练的长句预测模型,获取与该候选词语相匹配的候选长句。Specifically, after obtaining a candidate word matching the current input sequence, in order to facilitate the user to quickly input a complete long sentence through the input method application, a pre-trained long sentence prediction model can be combined to obtain a candidate long sentence matching the candidate word. sentence.

其中,需要说明的是,在不同应用场景中,根据长句预测模型,获取与候选词语相匹配的候选长句的方式不同,例如,可将该候选词语直接输入到长句预测模型中,长句预测模型将直接输出与该候选词语相匹配的候选长句,其中,长句预测模型已学习到候选词语和候选长句之间的对应关系。Among them, it should be noted that in different application scenarios, according to the long sentence prediction model, the method of obtaining the candidate long sentence matching the candidate word is different. For example, the candidate word can be directly input into the long sentence prediction model. The sentence prediction model will directly output a candidate long sentence matching the candidate word, wherein the long sentence prediction model has learned the correspondence between the candidate word and the candidate long sentence.

其中,关于根据长句预测模型,获取与候选词语相匹配的候选长句的其他方式将在后续实施例中描述。Wherein, other ways of acquiring candidate long sentences matching the candidate words according to the long sentence prediction model will be described in subsequent embodiments.

其中,需要理解的是,候选长句中包括候选词语,以及候选词语之后的后缀词语。It should be understood that the candidate long sentence includes candidate words and suffix words after the candidate words.

在实际应用中,不同用户的语句使用习惯会有所不同,为了使得所提供的候选长句更加符合用户需求,作为一种示例性的实施方式,在获取与候选词语相匹配的候选长句之后,可获取该用户的语句偏好特征,并结合语句偏好特征,对所获取的候选长句进行调整,并向终端设备反馈调整后的候选长句。In practical applications, the sentence usage habits of different users will be different. In order to make the provided candidate long sentences more in line with the needs of users, as an exemplary implementation, after obtaining the candidate long sentences matching the candidate words , the user's sentence preference feature can be obtained, and the obtained candidate long sentence can be adjusted in combination with the sentence preference feature, and the adjusted long candidate sentence can be fed back to the terminal device.

步骤104,在输入法应用上展示候选词语和候选长句。Step 104, displaying candidate words and candidate long sentences on the input method application.

在本实施例中,与候选词语相匹配的候选长句可以为一个或者多个。In this embodiment, there may be one or more candidate long sentences matching the candidate words.

在本实施例中,为了避免候选长句中出现敏感性词语,例如,不文明语句,在获取与候选长句之后,可确定对应候选长句是否包含黑名单词表中的词语,若包含黑名单词表中的词语,则过滤掉对应候选长句,如果不存在,则保存该候选长句。In this embodiment, in order to avoid the appearance of sensitive words in the candidate long sentences, for example, uncivilized sentences, after obtaining the candidate long sentences, it can be determined whether the corresponding candidate long sentences contains words in the black word list, if it contains black words If there is no word in the name list, the corresponding candidate long sentence is filtered out, and if it does not exist, the candidate long sentence is saved.

其中,黑名单词表中保存了一些预设的不文明词语以及非法词语等。Among them, some pre-set uncivilized words and illegal words are saved in the black name word list.

作为一种示例性的实施方式,在确定与候选词语相匹配的候选长句为多个时,为了可准确向用户提供候选长句,可获取每个候选长句的得分,并在输入法应用上得分最高的候选长句。As an exemplary implementation, when it is determined that there are multiple candidate long sentences matching the candidate words, in order to accurately provide the user with candidate long sentences, the score of each candidate long sentence can be obtained, and applied to the input method. The candidate long sentence with the highest score.

例如,假设有三个候选长句,候选长句分别为“唉那怎么办”、“唉那怎么了”、“唉那怎么样”,得分依次为8、7、6,如果确定每个候选长句中均不包含黑名单词表中词语,此时,可获取得分最高的候选长句,并将得分最高的候选长句反馈给终端设备进行显示。For example, assuming there are three candidate long sentences, the candidate long sentences are "what to do with that", "what's up with that", "how about that", and the scores are 8, 7, and 6 in turn. If each candidate long sentence is determined The sentences do not contain any words in the blacklist. At this time, the candidate long sentence with the highest score can be obtained, and the candidate long sentence with the highest score can be fed back to the terminal device for display.

作为另一种示例性的实施方式,在确定与候选词语相匹配的候选长句为多个,可获取每个候选长句对应的得分,并按照得分由高到低的顺序对候选长句进行排序,并在输入法应用上展示排序后的候选长句。As another exemplary implementation, when it is determined that there are multiple candidate long sentences matching the candidate words, a score corresponding to each candidate long sentence may be obtained, and the candidate long sentences may be evaluated in descending order of scores. Sort and display the sorted candidate long sentences on the input method application.

在本实施例中,为了不影响用户通过输入法输入信息,可在输入法应用上的左上角或者右上角等位置显示候选长句,该实施例对候选长句的展示位置不作具体限定。In this embodiment, in order not to affect the user's input of information through the input method, the candidate long sentence may be displayed in the upper left corner or the upper right corner of the input method application. This embodiment does not specifically limit the display position of the candidate long sentence.

例如,用户在输入法应用中当前输入序列“nihaozai”,其在输入法应用中显示候选词语,以及候选词语相匹配的候选长句“你好在吗,我找你有事”,其中,对应的用户界面上的示例图,如图2所示,其中,需要说明的是,图2中以在右上角上显示候选长句进行示例。For example, the user currently inputs the sequence "nihaozai" in the input method application, which displays the candidate words in the input method application, and the candidate long sentence "How are you, I'm looking for you for something" matching the candidate words, wherein the corresponding An example diagram on the user interface is shown in FIG. 2 , wherein, it should be noted that in FIG. 2 , a candidate long sentence is displayed on the upper right corner for an example.

本申请实施例的输入法中候选长句的提供方法,获取用户在输入法应用中输入的当前输入序列,并获取与该当前输入序列相匹配的候选词语,以及结合预先训练的长句预测模型和候选词语,获取对应的候选长句,并在输入法应用上展示候选词语的同时显示候选长句,由此,结合预先训练的长句预测模型,快速得到匹配的候选长句,并向用户提供候选长句,方便用户根据候选长句快速完成长句的输入,减少了用户的输入成本,提高了用户体验度。The method for providing a candidate long sentence in an input method according to an embodiment of the present application acquires a current input sequence input by a user in an input method application, acquires candidate words matching the current input sequence, and combines a pre-trained long sentence prediction model and candidate words, obtain the corresponding candidate long sentences, and display the candidate long sentences while displaying the candidate words on the input method application. Therefore, combined with the pre-trained long sentence prediction model, the matching candidate long sentences can be quickly obtained and sent to the user. The candidate long sentences are provided, so that the user can quickly complete the input of the long sentences according to the candidate long sentences, which reduces the input cost of the user and improves the user experience.

如图3所示,在一个实施例中,上述步骤103的具体实现过程可以包括:As shown in FIG. 3, in one embodiment, the specific implementation process of the foregoingstep 103 may include:

步骤301,将候选词语作为长句预测模型的当前输入。Step 301, using the candidate word as the current input of the long sentence prediction model.

在本实施例中,为了可通过长句预测模型准确预测出每个词语之后出现的下一个词语,在将当前输入至长句预测模型中,以获取长句预测模型的当前输出之前,还可结合训练语料数据对长句预测模型进行训练。In this embodiment, in order to accurately predict the next word that appears after each word through the long-sentence prediction model, before the current input is input into the long-sentence prediction model to obtain the current output of the long-sentence prediction model, the Combine the training corpus data to train the long sentence prediction model.

其中,训练长句预测模型的具体过程为:Among them, the specific process of training the long sentence prediction model is as follows:

步骤a,获取训练语料数据,训练语料数据包括前缀样本词语,以及与前缀样本词语对应的后缀样本词语。Step a: Obtain training corpus data, where the training corpus data includes prefix sample words and suffix sample words corresponding to the prefix sample words.

其中,后缀样本词语为在前缀样本词之后出现的词语。The suffix sample words are words that appear after the prefix sample words.

在本实施例中,可结合即时通信聊天场景中的大量聊天语料,构造出训练语料数据。In this embodiment, training corpus data can be constructed by combining a large amount of chat corpus in the instant messaging chat scene.

作为一种示例性的实施方式,为保证拥有足够前序信息,在选取聊天语句时,可选择输入字数大于或者等于预设字数阈值的句子,作为聊天语料。As an exemplary implementation, in order to ensure sufficient pre-order information, when selecting a chat sentence, a sentence with a word count greater than or equal to a preset word count threshold may be selected as the chat corpus.

其中,预设字数阈值是预先设置的字数临界值,如果一个聊天语句中的字数等于或者大于该字数临界值,可将该聊天语句作为构造训练语料的句子。例如,预设字数阈值为7个,聊天语句为“我们晚上去哪吃饭”,可以确定该聊天语句的字数超过7个此时,该聊天语句可作为构造训练语料的句子。The preset word count threshold is a preset word count threshold. If the word count in a chat sentence is equal to or greater than the word count threshold, the chat sentence can be used as a sentence for constructing training corpus. For example, the preset word count threshold is 7, and the chat sentence is "Where are we going to have dinner at night", it can be determined that the word count of the chat sentence exceeds 7. At this time, the chat sentence can be used as a sentence for constructing training corpus.

其中,根据聊天语料构建训练语料数据的大致过程如下:通过预设分隔符对聊天语料中的聊天语句进行分隔处理,根据分隔处理结果,构建训练语料数据,其中,聊天语句中每个预设分隔符之前的词语为前缀样本词语,对应预设分隔符之后的词语为后缀样本词语。Among them, the general process of constructing training corpus data according to the chat corpus is as follows: the chat sentences in the chat corpus are separated and processed by a preset separator, and the training corpus data is constructed according to the separation processing result, wherein each preset separator in the chat sentence The words before the character are prefix sample words, and the words after the corresponding preset separator are suffix sample words.

其中,预设分隔符是预先设置的,例如,预设分隔符可以为“|”。The preset separator is preset, for example, the preset separator can be "|".

例如,聊天对话为“我们晚上去哪吃饭”,在应用分隔符“|”对聊天语句进行分割后,分割后的聊天对话为“我们|晚上|去哪|吃饭”,对于第一分隔符而言,其对应的前缀样本词语为“我们”,后缀样本词语为“晚上”,第二个分隔符而言,前缀样本词语为“我们晚上”,后缀样本词语为“去哪”,第三个分隔符而言,前缀样本词语为“我们晚上去哪”,后缀样本词语为“吃饭”。For example, the chat dialogue is "where are we going to have dinner at night", after the chat sentence is segmented by the separator "|", the split chat dialogue is "we|night|where to go for dinner", for the first separator, the language, the corresponding prefix sample word is "we", the suffix sample word is "evening", in terms of the second separator, the prefix sample word is "our evening", the suffix sample word is "where to go", and the third In terms of separators, the prefix sample word is "where are we going at night", and the suffix sample word is "eat".

步骤b,根据前缀样本词语和后缀样本词语对长句预测模型进行训练。In step b, the long sentence prediction model is trained according to the prefix sample words and the suffix sample words.

具体地,将前缀样本词语作为长句预测模型的输入特征,并将后缀样本词语作为长句预测模型的输出特征,对长句预测模型进行训练。Specifically, the prefix sample words are used as input features of the long sentence prediction model, and the suffix sample words are used as output features of the long sentence prediction model to train the long sentence prediction model.

例如,可结合循环神经网络RNN(Recurrent Neural Network)以及前缀样本词语和后缀样本词语对长句预测模型进行训练。For example, a long-sentence prediction model can be trained by combining a Recurrent Neural Network (RNN), prefix sample words and suffix sample words.

其中,RNN可以使用LSTM或者GRNN等结构,输入特征为汉字,输出特征为下一个汉字。然后通过一层Embedding层,后经RNN层建模。然后经分级hierarchical Softmax输出网络结构,选择其对应的汉字。Among them, RNN can use structures such as LSTM or GRNN, the input feature is Chinese characters, and the output feature is the next Chinese character. Then pass through a layer of Embedding layer, and then model by RNN layer. Then, the network structure is output through hierarchical hierarchical Softmax, and its corresponding Chinese characters are selected.

其中,需要说明的是,采用分级Softmax输出网络结构,可降低分类计算量,进而可提高训练模型的效率。Among them, it should be noted that the use of the hierarchical Softmax output network structure can reduce the amount of classification calculation, thereby improving the efficiency of the training model.

本实施例中,所使用的长句预测模型具有以下优点:输出候选长句的效率较高,并且该长句预测模型所需要的存储空间较小,对存储空间的要求不高,减少了模型所占用的存储资源。In this embodiment, the used long sentence prediction model has the following advantages: the efficiency of outputting candidate long sentences is high, and the storage space required by the long sentence prediction model is small, and the storage space requirement is not high, which reduces the number of models occupied storage resources.

其中,在训练模型的过程中,可使用BP算法优化模型中的参数,得到最终的音字的序列转换模型。Among them, in the process of training the model, the BP algorithm can be used to optimize the parameters in the model to obtain the final phonetic-word sequence conversion model.

其中,需要理解的是,训练后的长句预测模块可以准确预测中每个输入词语之后所出现的下一个词语。Among them, it needs to be understood that the trained long sentence prediction module can accurately predict the next word that appears after each input word.

步骤302,将当前输入至长句预测模型中,以获取长句预测模型的当前输出,其中,当前输出包括当前输入之后的下一个词语。Step 302 , input the current input into the long sentence prediction model to obtain the current output of the long sentence prediction model, wherein the current output includes the next word after the current input.

步骤303,在确定下一个词语与预设语句终止信息不匹配时,根据当前输出和当前输入更新长句预测模型的当前输入,并通过长句预测模型获取当前输入对应的当前输出,直至长句预测模型的当前输出与预设语句终止信息匹配。Step 303, when it is determined that the next word does not match the preset sentence termination information, update the current input of the long sentence prediction model according to the current output and the current input, and obtain the current output corresponding to the current input through the long sentence prediction model, until the long sentence The current output of the predictive model matches the canned statement termination information.

步骤304,在长句预测模型的当前输出与预设语句终止信息匹配时,根据长句预测模型的当前输入,生成与候选词语相匹配的候选长句。Step 304 , when the current output of the long sentence prediction model matches the preset sentence termination information, generate a candidate long sentence matching the candidate word according to the current input of the long sentence prediction model.

也就是说,本实施例中结合候选词语以及长句预测模型预测出了候选词语之后出现的下一个词语,并将候选词语和下一个词语作为语言模型下一次预测的输入,进一步预测在下一个词,如此反复利用长句预测模型,直至长句预测模型输出语句终止符。That is to say, in this embodiment, the candidate word and the long sentence prediction model are combined to predict the next word that appears after the candidate word, and the candidate word and the next word are used as the input for the next prediction of the language model, and the next word is further predicted. , and the long-sentence prediction model is used repeatedly until the long-sentence prediction model outputs the sentence terminator.

语句终止信息用于指示语句终止的信息。语句终止信息是预先设置的。例如,语句终止信息可以为语句终止符,语句终止符可以为NULL。Statement termination information is used for information indicating statement termination. Statement termination information is preset. For example, the statement termination information can be the statement terminator, and the statement terminator can be NULL.

举例而言,语句终止信息可以为NULL,假设根据用户在输入法应用中输入的当前输入序列,所获取的候选词语为“我们晚上”,此时,可将候选词语“我们晚上”作为长句预测模型的当前输入,将“我们晚上”输入到长句预测模型后,如果长句预测模型的当前输出为“去哪”,也就是说,“我们晚上”之后出现的下一个词语为“去哪”,此时,可确定当前输出不是语句终止信息,此时,可结合当前输入之后拼接当前输出,以得到更新后的当前输入,更新后的当前输入为“我们晚上去哪”。对应地,长句预测模型的当前输出为“吃饭”。对应地,在当前输入之后再次拼接当前输出,得到更新后的当前输入,更新后的当前输入为“我们晚上去哪吃饭”,此时,长句预测模型的当前输出为“NULL”,长句预测模型对应的当前输入为“我们晚上去哪吃饭”,即为与候选词语相匹配的候选长句。For example, the sentence termination information can be NULL. Suppose that according to the current input sequence input by the user in the input method application, the obtained candidate word is "our evening", at this time, the candidate word "our evening" can be used as a long sentence The current input of the prediction model, after inputting "our evening" into the long sentence prediction model, if the current output of the long sentence prediction model is "where to go", that is, the next word that appears after "our evening" is "go to" Which", at this time, it can be determined that the current output is not the statement termination information. At this time, the current output can be combined with the current input and then spliced to obtain the updated current input. The updated current input is "Where are we going at night". Correspondingly, the current output of the long sentence prediction model is "eat". Correspondingly, the current output is spliced again after the current input, and the updated current input is obtained. The updated current input is "where are we going to have dinner at night". At this time, the current output of the long sentence prediction model is "NULL", and the long sentence The current input corresponding to the prediction model is "where are we going to have dinner at night", which is a candidate long sentence that matches the candidate word.

如图4所示,在另一个实施例中,上述步骤103的具体实现过程可以包括:As shown in FIG. 4 , in another embodiment, the specific implementation process of the foregoingstep 103 may include:

步骤401,通过长句预测模型确定与候选词语相匹配的后缀词语,其中,长句预测模型,已学习得到候选词语与后缀词语之间的对应关系。Step 401 , determining the suffix words matching the candidate words through the long sentence prediction model, wherein the long sentence prediction model has learned the correspondence between the candidate words and the suffix words.

在本实施例中,为了可通过长句预测模型准确预测出候选词语相匹配的后缀词语,在将当前输入至长句预测模型中,以获取长句预测模型的当前输出之前,还可结合训练语料数据对长句预测模型进行训练。In this embodiment, in order to accurately predict the suffix words that match the candidate words through the long sentence prediction model, before the current input is input into the long sentence prediction model to obtain the current output of the long sentence prediction model, training can also be combined The corpus data is used to train the long sentence prediction model.

其中,训练长句预测模型的具体过程为:Among them, the specific process of training the long sentence prediction model is as follows:

步骤a,获取训练语料数据,训练语料数据包括前缀样本词语,以及与前缀样本词语对应的后缀样本词语,其中,前缀样本词语和后缀样本词语可组成长句。Step a: Obtain training corpus data, where the training corpus data includes prefix sample words and suffix sample words corresponding to the prefix sample words, wherein the prefix sample words and the suffix sample words can form a long sentence.

在本实施例中,可结合即时通信聊天场景中的大量聊天语料,构造出训练语料数据。In this embodiment, training corpus data can be constructed by combining a large amount of chat corpus in the instant messaging chat scene.

作为一种示例性的实施方式,为保证拥有足够前序信息,在选取聊天语句时,可选择输入字数大于或者等于预设字数阈值的句子,作为聊天语料。As an exemplary implementation, in order to ensure sufficient pre-order information, when selecting a chat sentence, a sentence with a word count greater than or equal to a preset word count threshold may be selected as the chat corpus.

其中,预设字数阈值是预先设置的字数临界值,如果一个聊天语句中的字数等于或者大于该字数临界值,可将该聊天语句作为构造训练语料的句子。例如,预设字数阈值为7个,聊天语句为“我们晚上去哪吃饭”,可以确定该聊天语句的字数超过7个此时,该聊天语句可作为构造训练语料的句子。The preset word count threshold is a preset word count threshold. If the word count in a chat sentence is equal to or greater than the word count threshold, the chat sentence can be used as a sentence for constructing training corpus. For example, the preset word count threshold is 7, and the chat sentence is "Where are we going to have dinner at night", it can be determined that the word count of the chat sentence exceeds 7. At this time, the chat sentence can be used as a sentence for constructing training corpus.

其中,根据聊天语料构建训练语料数据的大致过程如下,通过预设分隔符对聊天语料中的聊天语句进行分隔处理,根据分隔处理结果,确定训练语料数据。其中,聊天语句中预设分隔符之前的词语为前缀样本词语,对应预设分隔符之后的词语为后缀样本词语。Wherein, the general process of constructing training corpus data according to the chat corpus is as follows. The chat sentences in the chat corpus are separated and processed by a preset separator, and the training corpus data is determined according to the separation processing result. The words before the preset separator in the chat sentence are prefix sample words, and the words after the corresponding preset separator are suffix sample words.

其中,预设分隔符是预先设置的,例如,预设分隔符可以为“|”。The preset separator is preset, for example, the preset separator can be "|".

例如,聊天对话为“我们晚上去哪吃饭”,在应用分隔符“|”对聊天语句进行分割后,分割后的聊天对话为“我们|晚上|去哪吃饭”,前缀样本词语为“我们晚上”,后缀样本词语为“去哪吃饭”。For example, the chat dialogue is "where do we go to eat at night", after the chat sentence is segmented by the separator "|", the split chat dialogue is "we | night | where to eat", and the prefix sample word is "we are at night ", the suffix sample word is "where to eat".

再例如,聊天对话为“我们晚上去哪吃饭”,在应用分隔符“|”对聊天语句进行分割后,分割后的聊天对话为“我们|晚上去哪吃饭”,前缀样本词语为“我们”,后缀样本词语为“晚上去哪吃饭”。For another example, the chat dialogue is "Where do we go for dinner at night", after the chat sentence is segmented by the delimiter "|", the segmented chat dialogue is "We | Where to eat at night", and the prefix sample word is "we" , the suffix sample word is "where to eat at night".

步骤b,根据前缀样本词语和后缀样本词语对长句预测模型进行训练。In step b, the long sentence prediction model is trained according to the prefix sample words and the suffix sample words.

在本实施例中,可结合典型的序列到序列的神经网络翻译模型,以及前缀词语和后缀词语对长句预测模型进行训练。In this embodiment, the long sentence prediction model can be trained in combination with a typical sequence-to-sequence neural network translation model, as well as prefix words and suffix words.

步骤402,根据候选词语和后缀词语,生成候选长句。Step 402: Generate a candidate long sentence according to the candidate words and the suffix words.

在本实施中,在获取候选词语和后缀词语之后,可将候选词语之后拼接后缀词语,以生成候选长句。In this implementation, after the candidate words and the suffix words are obtained, the suffix words may be spliced after the candidate words to generate a candidate long sentence.

举例而言,假设候选词语为“我们晚上”,如果根据长句预测模型预测出该候选词语之后的后缀词语为“去哪吃饭”,此时,根据候选词语和后缀词语,所生成的候选长句为“我们去哪吃饭”。For example, assuming the candidate word is "we are in the evening", if the suffix word after the candidate word is predicted to be "where to eat" according to the long sentence prediction model, at this time, according to the candidate word and the suffix word, the generated candidate long sentence "Where are we going to eat?"

可以理解的是,在实际应用中,在用户通过输入法应用输入该当前输入序列之前,有可能用户已通过输入法在对应用户界面中的文本编辑框中已上屏了一下词语,为了可准确向用户提供候选长句,在上述任意实施例的基础上,可结合已上屏词语和用户在输入法应用中输入的当前输入序列,确定候选长句。It can be understood that, in practical applications, before the user enters the current input sequence through the input method application, it is possible that the user has already displayed a word in the text editing box in the corresponding user interface through the input method. The candidate long sentences are provided to the user. On the basis of any of the above embodiments, the candidate long sentences can be determined by combining the words on the screen and the current input sequence input by the user in the input method application.

下面结合图5对该实施例的输入法中候选长句的提供方法进行进一步描述。The method for providing long candidate sentences in the input method of the embodiment will be further described below with reference to FIG. 5 .

图5是根据本申请另一个实施例的输入法中候选长句的提供方法的流程示意图。FIG. 5 is a schematic flowchart of a method for providing a candidate long sentence in an input method according to another embodiment of the present application.

如图5所示,该输入法中候选长句的提供方法可以包括:As shown in Figure 5, the method for providing long candidate sentences in the input method may include:

步骤501,获取用户在输入法应用中输入的当前输入序列。Step 501: Obtain the current input sequence input by the user in the input method application.

步骤502,获取与当前输入序列相匹配的候选词语。Step 502: Obtain candidate words matching the current input sequence.

其中,需要说明的是,前述对步骤101-步骤102的解释说明也适用于该实施例的步骤501-步骤502,此处不再赘述。Among them, it should be noted that the foregoing explanations onsteps 101 to 102 are also applicable tosteps 501 to 502 in this embodiment, and details are not repeated here.

步骤503,获取在当前输入序列之前的已上屏词语。Step 503: Acquire the words on the screen before the current input sequence.

其中,需要说明的是,本实施例中的步骤502和步骤503的执行顺序不分先后。Among them, it should be noted that the execution order ofstep 502 and step 503 in this embodiment is not in order.

作为一种示例性的实施方式,可获取用户在用户界面中文本编辑框中已输入的文本信息,其中,该文本信息即为已上屏词语。As an exemplary implementation, the text information that the user has entered in the text editing box in the user interface can be obtained, where the text information is the words that have been displayed on the screen.

步骤504,采用预先训练的长句预测模型,获取与候选词语和上屏词语相匹配的候选长句。Step 504 , using a pre-trained long sentence prediction model to acquire candidate long sentences matching the candidate words and the words on the screen.

其中,需要说明的是,在不同应用场景中,采用预先训练的长句预测模型,获取与候选词语和上屏词语相匹配的候选长句的方式不同,举例说明如下:Among them, it should be noted that in different application scenarios, the pre-trained long sentence prediction model is used to obtain the candidate long sentences matching the candidate words and the upper screen words in different ways. Examples are as follows:

在第一种实施场景中,将已上屏词语和候选词语作为长句预测模型的当前输入;将当前输入至长句预测模型中,以获取长句预测模型的当前输出,其中,当前输出包括当前输入之后的下一个词语;在确定下一个词语与预设语句终止信息不匹配时,根据当前输出和当前输入更新长句预测模型的当前输入,并通过长句预测模型获取当前输入对应的当前输出,直至长句预测模型的当前输出与预设语句终止信息匹配;在长句预测模型的当前输出与预设语句终止信息匹配时,根据长句预测模型的当前输入,生成与候选词语相匹配的候选长句。In the first implementation scenario, the words on the screen and the candidate words are used as the current input of the long sentence prediction model; the current input is input into the long sentence prediction model to obtain the current output of the long sentence prediction model, wherein the current output includes The next word after the current input; when it is determined that the next word does not match the preset sentence termination information, the current input of the long sentence prediction model is updated according to the current output and the current input, and the current input corresponding to the current input is obtained through the long sentence prediction model. output until the current output of the long-sentence prediction model matches the preset sentence termination information; when the current output of the long-sentence prediction model matches the preset sentence termination information, generate words that match the candidate words according to the current input of the long-sentence prediction model candidate long sentences.

作为一种示例,可将已上屏词语之后拼接候选词语,并将拼接所的拼接词语作为长句预测模型的当前输入。As an example, the candidate words may be spliced after the words that have been displayed on the screen, and the spliced words may be used as the current input of the long sentence prediction model.

例如,已上屏词语为“我们”,当前输入序列对应的候选词语为“晚上”,可将已上屏词语和候选词语拼接,得到长句预测模型的当前输入为“我们晚上”,将“我们晚上”输入到长句预测模型后,如果长句预测模型的当前输出为“去哪”,也就是说,“我们晚上”之后出现的下一个词语为“去哪”,此时,可确定当前输出不是语句终止符,此时,可结合当前输入之后拼接当前输出,以得到更新后的当前输入,更新后的当前输入为“我们晚上去哪”。对应地,长句预测模型的当前输出为“吃饭”。对应地,在当前输入之后再次拼接当前输出,得到更新后的当前输入,更新后的当前输入为“我们晚上去哪吃饭”,此时,长句预测模型的当前输出为“NULL”,长句预测模型对应的当前输入为“我们晚上去哪吃饭”,即为与候选词语相匹配的候选长句。For example, the displayed word is "we", and the candidate word corresponding to the current input sequence is "night". You can splicing the displayed word and the candidate word to obtain the current input of the long sentence prediction model as "we are in the evening". After inputting "we at night" into the long sentence prediction model, if the current output of the long sentence prediction model is "where to go", that is to say, the next word that appears after "we at night" is "where to go", at this time, it can be determined that The current output is not a statement terminator. At this time, the current output can be combined with the current input to obtain the updated current input. The updated current input is "Where are we going at night". Correspondingly, the current output of the long sentence prediction model is "eat". Correspondingly, the current output is spliced again after the current input, and the updated current input is obtained. The updated current input is "where are we going to have dinner at night". At this time, the current output of the long sentence prediction model is "NULL", and the long sentence The current input corresponding to the prediction model is "where are we going to have dinner at night", which is a candidate long sentence that matches the candidate word.

第二种实施场景中,通过长句预测模型确定与候选词语和已上屏词语相匹配的后缀词语,其中,长句预测模型,已学习得到候选词语与后缀词语之间的对应关系;根据候选词语和后缀词语,生成候选长句。In the second implementation scenario, the long-sentence prediction model is used to determine the suffix words that match the candidate words and the words already on the screen, wherein the long-sentence prediction model has learned the correspondence between the candidate words and the suffix words; words and suffix words to generate candidate long sentences.

具体地,可将已上屏词语之后拼接候选词语,得到拼接词语,并将拼接词语输入到长句预测模型,以通过长句预测模型获取该拼接词语对应的后缀词语,然后,在拼接词语之后拼接后缀词语,以得到候选长句。Specifically, the candidate words can be spliced after the words that have been displayed on the screen to obtain the spliced words, and the spliced words can be input into the long sentence prediction model to obtain the suffix words corresponding to the spliced words through the long sentence prediction model. Then, after the spliced words Splice suffix words to get candidate long sentences.

举例而言,假设已上屏词语为“我们”,当前输入序列对应的候选词语为“晚上”,可将已上屏词语和候选词语进行拼接,得到拼接词语为“我们晚上”,如果根据长句预测模型预测出该拼接词之后的后缀词语为“去哪吃饭”,即,“去哪吃饭”为与已上屏词语和候选词语匹配的后缀词语,此时,根据候选词语和后缀词语,所生成的候选长句为“我们去哪吃饭”。For example, assuming that the displayed word is "we" and the candidate word corresponding to the current input sequence is "night", the displayed word and the candidate word can be spliced to obtain the spliced word "our evening". The sentence prediction model predicts that the suffix word after the spliced word is "where to eat", that is, "where to eat" is a suffix word that matches the words on the screen and the candidate words. At this time, according to the candidate words and the suffix words, The generated candidate long sentence is "where are we going to eat".

在实际应用中,不同用户的语句使用习惯会有所不同,为了使得所提供的候选长句更加符合用户需求,作为一种示例性的实施方式,在获取与候选词语相匹配的候选长句之后,可获取该用户的语句偏好特征,并结合语句偏好特征,对所获取的候选长句,并向终端设备反馈调整后的候选长句。In practical applications, the sentence usage habits of different users will be different. In order to make the provided candidate long sentences more in line with the needs of users, as an exemplary implementation, after obtaining the candidate long sentences matching the candidate words , the sentence preference feature of the user can be obtained, and combined with the sentence preference feature, the obtained candidate long sentence can be fed back to the terminal device after the adjustment of the candidate long sentence.

步骤505,在输入法应用上展示候选词语和候选长句。Step 505, displaying candidate words and candidate long sentences on the input method application.

本申请实施例的输入法中候选长句的提供方法,获取用户在输入法应用中输入的当前输入序列,并获取与该当前输入序列相匹配的候选词语,以及获取在输入当前输入序列之前的已上屏词语,以及结合预先训练的长句预测模型,获取与候选词语和已上屏词语对应的候选长句,并在输入法应用上展示候选词语的同时显示候选长句,由此,结合预先训练的长句预测模型,快速得到匹配的候选长句,并向用户提供候选长句,方便用户根据候选长句快速完成长句的输入,减少了用户的输入成本,提高了用户体验度。The method for providing a candidate long sentence in an input method according to an embodiment of the present application is to acquire the current input sequence input by the user in the input method application, acquire the candidate words matching the current input sequence, and acquire the current input sequence before the input of the current input sequence. The words that have been displayed on the screen, and combined with the pre-trained long sentence prediction model, obtain the candidate long sentences corresponding to the candidate words and the words that have been displayed on the screen, and display the candidate long sentences while displaying the candidate words on the input method application. The pre-trained long sentence prediction model quickly obtains matching candidate long sentences, and provides the user with candidate long sentences, which is convenient for users to quickly complete the input of long sentences according to the candidate long sentences, which reduces the user's input cost and improves the user experience.

图6是根据本申请一个实施例的输入法中候选长句的提供装置的结构示意图。FIG. 6 is a schematic structural diagram of an apparatus for providing a candidate long sentence in an input method according to an embodiment of the present application.

图6所示,该输入法中候选长句的提供装置包括第一获取模块110、第二获取模块120、第三获取模块130和展示模块140,其中:As shown in FIG. 6 , the device for providing candidate long sentences in the input method includes afirst acquisition module 110, asecond acquisition module 120, athird acquisition module 130 and adisplay module 140, wherein:

第一获取模块110,用于获取用户在输入法应用中输入的当前输入序列。The first obtainingmodule 110 is configured to obtain the current input sequence input by the user in the input method application.

第二获取模块120,用于获取与当前输入序列相匹配的候选词语。The second obtainingmodule 120 is configured to obtain candidate words matching the current input sequence.

第三获取模块130,用于根据预先训练的长句预测模型,获取与候选词语相匹配的候选长句。The third obtainingmodule 130 is configured to obtain candidate long sentences matching the candidate words according to the pre-trained long sentence prediction model.

展示模块140,用于在输入法应用上展示候选词语和候选长句。Thedisplay module 140 is used for displaying candidate words and candidate long sentences on the input method application.

在本申请一个实施例中,第三获取模块130,具体用于:将候选词语作为长句预测模型的当前输入。将当前输入至长句预测模型中,以获取长句预测模型的当前输出,其中,当前输出包括当前输入之后的下一个词语。在确定下一个词语与预设语句终止信息不匹配时,根据当前输出和当前输入更新长句预测模型的当前输入,并通过长句预测模型获取当前输入对应的当前输出,直至长句预测模型的当前输出与预设语句终止信息匹配。在长句预测模型的当前输出与预设语句终止信息匹配时,根据长句预测模型的当前输入,生成与候选词语相匹配的候选长句。In an embodiment of the present application, the third obtainingmodule 130 is specifically configured to: use the candidate word as the current input of the long sentence prediction model. The current input is input into the long sentence prediction model to obtain the current output of the long sentence prediction model, wherein the current output includes the next word after the current input. When it is determined that the next word does not match the preset sentence termination information, the current input of the long-sentence prediction model is updated according to the current output and the current input, and the current output corresponding to the current input is obtained through the long-sentence prediction model. The current output matches the preset statement termination information. When the current output of the long sentence prediction model matches the preset sentence termination information, a candidate long sentence matching the candidate word is generated according to the current input of the long sentence prediction model.

在本申请一个实施例中,在图6所示的装置实施例的基础上,如图7所示,该装置可以包括:In an embodiment of the present application, based on the apparatus embodiment shown in FIG. 6 , as shown in FIG. 7 , the apparatus may include:

第四获取模块150,用于获取训练语料数据,训练语料数据包括前缀样本词语,以及与前缀样本词语对应的后缀样本词语,其中,后缀样本词语为在前缀样本词之后出现的词语。The fourth obtainingmodule 150 is configured to obtain training corpus data, where the training corpus data includes prefix sample words and suffix sample words corresponding to the prefix sample words, wherein the suffix sample words are words that appear after the prefix sample words.

第一训练模块160,用于根据前缀样本词语和后缀样本词语对长句预测模型进行训练。Thefirst training module 160 is configured to train the long sentence prediction model according to the prefix sample words and the suffix sample words.

在本申请一个实施例中,第三获取模块130,具体用于:通过长句预测模型确定与候选词语相匹配的后缀词语,其中,长句预测模型,已学习得到候选词语与后缀词语之间的对应关系。根据候选词语和后缀词语,生成候选长句。In an embodiment of the present application, the third obtainingmodule 130 is specifically configured to: determine the suffix words that match the candidate words through the long sentence prediction model, wherein the long sentence prediction model has learned to obtain the difference between the candidate words and the suffix words corresponding relationship. According to candidate words and suffix words, candidate long sentences are generated.

在本申请一个实施例中,在图6所示的装置实施例的基础上,如图8所示,该装置还包括:In an embodiment of the present application, on the basis of the apparatus embodiment shown in FIG. 6 , as shown in FIG. 8 , the apparatus further includes:

第五获取模块170,用于获取训练语料数据,训练语料数据包括前缀样本词语,以及与前缀样本词语对应的后缀样本词语,其中,前缀样本词语和后缀样本词语可组成长句。The fifth obtainingmodule 170 is configured to obtain training corpus data, where the training corpus data includes prefix sample words and suffix sample words corresponding to the prefix sample words, wherein the prefix sample words and suffix sample words can form a long sentence.

第二训练模块180,用于根据前缀样本词语和后缀样本词语对长句预测模型进行训练。Thesecond training module 180 is configured to train the long sentence prediction model according to the prefix sample words and the suffix sample words.

在本申请一个实施例中,在图6所示的装置实施例的基础上,如图9所示,该装置还可以包括:In an embodiment of the present application, on the basis of the apparatus embodiment shown in FIG. 6 , as shown in FIG. 9 , the apparatus may further include:

第六获取模块190,用于获取在当前输入序列之前的已上屏词语。The sixth obtaining module 190 is configured to obtain the words displayed on the screen before the current input sequence.

第三获取模块130,具体用于:采用预先训练的长句预测模型,获取与候选词语和上屏词语相匹配的候选长句。The third obtainingmodule 130 is specifically configured to: adopt a pre-trained long sentence prediction model to obtain candidate long sentences matching the candidate words and the upper screen words.

其中,需要理解的是,前述图9所示装置实施例的第六获取模块190的结构也可以包含在前述图7或图8所示的装置实施例中,该实施对此不作限定。It should be understood that the structure of the sixth acquisition module 190 in the apparatus embodiment shown in FIG. 9 may also be included in the apparatus embodiment shown in FIG. 7 or FIG. 8 , and the implementation is not limited thereto.

其中,需要说明的是,前述对输入法中候选长句的提供方法实施例的解释说明也适用于该实施例的输入法中候选长句的提供装置,其实现原理类似,此处不再赘述。Among them, it should be noted that the foregoing explanations of the embodiment of the method for providing candidate long sentences in the input method are also applicable to the device for providing long candidate sentences in the input method of this embodiment, and the implementation principle is similar, which will not be repeated here. .

本申请实施例的输入法中候选长句的提供装置,获取用户在输入法应用中输入的当前输入序列,并获取与该当前输入序列相匹配的候选词语,以及结合预先训练的长句预测模型和候选词语,获取对应的候选长句,并在输入法应用上展示候选词语的同时显示候选长句,由此,结合预先训练的长句预测模型,快速得到匹配的候选长句,并向用户提供候选长句,方便用户根据候选长句快速完成长句的输入,减少了用户的输入成本,提高了用户体验度。The device for providing candidate long sentences in the input method according to the embodiment of the present application acquires the current input sequence input by the user in the input method application, acquires candidate words matching the current input sequence, and combines with the pre-trained long sentence prediction model and candidate words, obtain the corresponding candidate long sentences, and display the candidate long sentences while displaying the candidate words on the input method application. Therefore, combined with the pre-trained long sentence prediction model, the matching candidate long sentences can be quickly obtained and sent to the user. The candidate long sentences are provided, so that the user can quickly complete the input of the long sentences according to the candidate long sentences, which reduces the input cost of the user and improves the user experience.

图10是根据本申请一个实施例的电子设备的结构示意图。该电子设备包括:FIG. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic equipment includes:

存储器1001、处理器1002及存储在存储器1001上并可在处理器1002上运行的计算机程序。Memory 1001 ,processor 1002 , and computer programs stored onmemory 1001 and executable onprocessor 1002 .

处理器1002执行程序时实现上述实施例中提供的输入法中候选长句的提供方法。When theprocessor 1002 executes the program, the method for providing a candidate long sentence in the input method provided in the above embodiment is implemented.

进一步地,电子设备还包括:Further, the electronic device also includes:

通信接口1003,用于存储器1001和处理器1002之间的通信。Thecommunication interface 1003 is used for communication between thememory 1001 and theprocessor 1002 .

存储器1001,用于存放可在处理器1002上运行的计算机程序。Thememory 1001 is used to store computer programs that can be executed on theprocessor 1002 .

存储器1001可能包含高速RAM存储器,也可能还包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。Thememory 1001 may include high-speed RAM memory, and may also include non-volatile memory, such as at least one disk memory.

处理器1002,用于执行程序时实现上述实施例的输入法中候选长句的提供方法。Theprocessor 1002 is configured to implement the method for providing candidate long sentences in the input method of the above embodiment when executing the program.

如果存储器1001、处理器1002和通信接口1003独立实现,则通信接口1003、存储器1001和处理器1002可以通过总线相互连接并完成相互间的通信。总线可以是工业标准体系结构(Industry Standard Architecture,简称为ISA)总线、外部设备互连(PeripheralComponent,简称为PCI)总线或扩展工业标准体系结构(Extended Industry StandardArchitecture,简称为EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图10中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。If thememory 1001, theprocessor 1002, and thecommunication interface 1003 are independently implemented, thecommunication interface 1003, thememory 1001, and theprocessor 1002 can be connected to each other through a bus and complete communication with each other. The bus may be an Industry Standard Architecture (referred to as ISA) bus, a Peripheral Component (referred to as PCI) bus, or an Extended Industry Standard Architecture (referred to as EISA) bus or the like. The bus can be divided into address bus, data bus, control bus and so on. For ease of presentation, only one thick line is used in FIG. 10, but it does not mean that there is only one bus or one type of bus.

可选的,在具体实现上,如果存储器1001、处理器1002及通信接口1003,集成在一块芯片上实现,则存储器1001、处理器1002及通信接口1003可以通过内部接口完成相互间的通信。Optionally, in terms of specific implementation, if thememory 1001, theprocessor 1002 and thecommunication interface 1003 are integrated on one chip, thememory 1001, theprocessor 1002 and thecommunication interface 1003 can communicate with each other through an internal interface.

处理器1002可能是一个中央处理器(Central Processing Unit,简称为CPU),或者是特定集成电路(Application Specific Integrated Circuit,简称为ASIC),或者是被配置成实施本申请实施例的一个或多个集成电路。Theprocessor 1002 may be a central processing unit (Central Processing Unit, referred to as CPU), or a specific integrated circuit (Application Specific Integrated Circuit, referred to as ASIC), or is configured to implement one or more of the embodiments of the present application integrated circuit.

本实施例还提供一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现如上的输入法中候选长句的提供方法。This embodiment also provides a computer-readable storage medium on which a computer program is stored, characterized in that, when the program is executed by a processor, the above-mentioned method for providing a candidate long sentence in an input method is implemented.

在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本申请的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不必须针对的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外,在不相互矛盾的情况下,本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。In the description of this specification, description with reference to the terms "one embodiment," "some embodiments," "example," "specific example," or "some examples", etc., mean specific features described in connection with the embodiment or example , structure, material or feature is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, those skilled in the art may combine and combine the different embodiments or examples described in this specification, as well as the features of the different embodiments or examples, without conflicting each other.

此外,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。在本申请的描述中,“多个”的含义是至少两个,例如两个,三个等,除非另有明确具体的限定。In addition, the terms "first" and "second" are only used for descriptive purposes, and should not be construed as indicating or implying relative importance or implying the number of indicated technical features. Thus, a feature delimited with "first", "second" may expressly or implicitly include at least one of that feature. In the description of the present application, "plurality" means at least two, such as two, three, etc., unless expressly and specifically defined otherwise.

流程图中或在此以其他方式描述的任何过程或方法描述可以被理解为,表示包括一个或更多个用于实现定制逻辑功能或过程的步骤的可执行指令的代码的模块、片段或部分,并且本申请的优选实施方式的范围包括另外的实现,其中可以不按所示出或讨论的顺序,包括根据所涉及的功能按基本同时的方式或按相反的顺序,来执行功能,这应被本申请的实施例所属技术领域的技术人员所理解。Any process or method description in the flowcharts or otherwise described herein may be understood to represent a module, segment or portion of code comprising one or more executable instructions for implementing custom logical functions or steps of the process , and the scope of the preferred embodiments of the present application includes alternative implementations in which the functions may be performed out of the order shown or discussed, including performing the functions substantially concurrently or in the reverse order depending upon the functions involved, which should It is understood by those skilled in the art to which the embodiments of the present application belong.

在流程图中表示或在此以其他方式描述的逻辑和/或步骤,例如,可以被认为是用于实现逻辑功能的可执行指令的定序列表,可以具体实现在任何计算机可读介质中,以供指令执行系统、装置或设备(如基于计算机的系统、包括处理器的系统或其他可以从指令执行系统、装置或设备取指令并执行指令的系统)使用,或结合这些指令执行系统、装置或设备而使用。就本说明书而言,"计算机可读介质"可以是任何可以包含、存储、通信、传播或传输程序以供指令执行系统、装置或设备或结合这些指令执行系统、装置或设备而使用的装置。计算机可读介质的更具体的示例(非穷尽性列表)包括以下:具有一个或多个布线的电连接部(电子装置),便携式计算机盘盒(磁装置),随机存取存储器(RAM),只读存储器(ROM),可擦除可编辑只读存储器(EPROM或闪速存储器),光纤装置,以及便携式光盘只读存储器(CDROM)。另外,计算机可读介质甚至可以是可在其上打印程序的纸或其他合适的介质,因为可以例如通过对纸或其他介质进行光学扫描,接着进行编辑、解译或必要时以其他合适方式进行处理来以电子方式获得程序,然后将其存储在计算机存储器中。The logic and/or steps represented in flowcharts or otherwise described herein, for example, may be considered an ordered listing of executable instructions for implementing the logical functions, may be embodied in any computer-readable medium, For use with, or in conjunction with, an instruction execution system, apparatus, or device (such as a computer-based system, a system including a processor, or other system that can fetch instructions from and execute instructions from an instruction execution system, apparatus, or apparatus) or equipment. For the purposes of this specification, a "computer-readable medium" can be any device that can contain, store, communicate, propagate, or transport the program for use by or in connection with an instruction execution system, apparatus, or apparatus. More specific examples (non-exhaustive list) of computer readable media include the following: electrical connections with one or more wiring (electronic devices), portable computer disk cartridges (magnetic devices), random access memory (RAM), Read Only Memory (ROM), Erasable Editable Read Only Memory (EPROM or Flash Memory), Fiber Optic Devices, and Portable Compact Disc Read Only Memory (CDROM). In addition, the computer-readable medium may even be paper or other suitable medium on which the program may be printed, as may be done, for example, by optically scanning the paper or other medium, followed by editing, interpretation, or other suitable means as necessary process to obtain the program electronically and then store it in computer memory.

应当理解,本申请的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中,多个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。如,如果用硬件来实现和在另一实施方式中一样,可用本领域公知的下列技术中的任一项或他们的组合来实现:具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路,具有合适的组合逻辑门电路的专用集成电路,可编程门阵列(PGA),现场可编程门阵列(FPGA)等。It should be understood that various parts of this application may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware as in another embodiment, it can be implemented by any one of the following techniques known in the art, or a combination thereof: discrete with logic gates for implementing logic functions on data signals Logic circuits, application specific integrated circuits with suitable combinational logic gates, Programmable Gate Arrays (PGA), Field Programmable Gate Arrays (FPGA), etc.

本技术领域的普通技术人员可以理解实现上述实施例方法携带的全部或部分步骤是可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,该程序在执行时,包括方法实施例的步骤之一或其组合。Those skilled in the art can understand that all or part of the steps carried by the methods of the above embodiments can be completed by instructing the relevant hardware through a program, and the program can be stored in a computer-readable storage medium, and the program can be stored in a computer-readable storage medium. When executed, one or a combination of the steps of the method embodiment is included.

此外,在本申请各个实施例中的各功能单元可以集成在一个处理模块中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。In addition, each functional unit in each embodiment of the present application may be integrated into one processing module, or each unit may exist physically alone, or two or more units may be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware, and can also be implemented in the form of software function modules. If the integrated modules are implemented in the form of software functional modules and sold or used as independent products, they may also be stored in a computer-readable storage medium.

上述提到的存储介质可以是只读存储器,磁盘或光盘等。尽管上面已经示出和描述了本申请的实施例,可以理解的是,上述实施例是示例性的,不能理解为对本申请的限制,本领域的普通技术人员在本申请的范围内可以对上述实施例进行变化、修改、替换和变型。The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, and the like. Although the embodiments of the present application have been shown and described above, it should be understood that the above embodiments are exemplary and should not be construed as limitations to the present application. Embodiments are subject to variations, modifications, substitutions and variations.

Claims (14)

Translated fromChinese
1.一种输入法中候选长句的提供方法,其特征在于,所述方法包括:1. the providing method of candidate long sentence in a kind of input method, is characterized in that, described method comprises:获取用户在输入法应用中输入的当前输入序列;Get the current input sequence entered by the user in the input method application;获取与所述当前输入序列相匹配的候选词语;obtain candidate words that match the current input sequence;根据预先训练的长句预测模型,获取与所述候选词语相匹配的候选长句;According to the pre-trained long sentence prediction model, obtain candidate long sentences matching the candidate words;在所述输入法应用上展示候选词语和所述候选长句。The candidate words and the candidate long sentences are displayed on the input method application.2.根据权利要求1所述的方法,其特征在于,所述根据预先训练的长句预测模型,获取与所述候选词语相匹配的候选长句,包括:2. The method according to claim 1, characterized in that, according to a pre-trained long sentence prediction model, obtaining a candidate long sentence matching the candidate word, comprising:将所述候选词语作为所述长句预测模型的当前输入;using the candidate word as the current input of the long sentence prediction model;将所述当前输入至所述长句预测模型中,以获取所述长句预测模型的当前输出,其中,所述当前输出包括所述当前输入之后的下一个词语;inputting the current input into the long sentence prediction model to obtain the current output of the long sentence prediction model, wherein the current output includes the next word after the current input;在确定所述下一个词语与预设语句终止信息不匹配时,根据所述当前输出和所述当前输入更新所述长句预测模型的当前输入,并通过所述长句预测模型获取当前输入对应的当前输出,直至所述长句预测模型的当前输出与预设语句终止信息匹配;When it is determined that the next word does not match the preset sentence termination information, update the current input of the long sentence prediction model according to the current output and the current input, and obtain the corresponding current input through the long sentence prediction model until the current output of the long sentence prediction model matches the preset sentence termination information;在所述长句预测模型的当前输出与预设语句终止信息匹配时,根据所述长句预测模型的当前输入,生成与所述候选词语相匹配的候选长句。When the current output of the long sentence prediction model matches the preset sentence termination information, a candidate long sentence matching the candidate word is generated according to the current input of the long sentence prediction model.3.根据权利要求2所述的方法,其特征在于,在所述将当前输入至所述长句预测模型中,以获取所述长句预测模型的当前输出之前,还包括:3. The method according to claim 2, wherein before the current input into the long sentence prediction model to obtain the current output of the long sentence prediction model, the method further comprises:获取训练语料数据,所述训练语料数据包括前缀样本词语,以及与所述前缀样本词语对应的后缀样本词语,其中,所述后缀样本词语为在前缀样本词之后出现的词语;Acquiring training corpus data, the training corpus data includes prefix sample words and suffix sample words corresponding to the prefix sample words, wherein the suffix sample words are words that appear after the prefix sample words;根据所述前缀样本词语和所述后缀样本词语对所述长句预测模型进行训练。The long sentence prediction model is trained according to the prefix sample words and the suffix sample words.4.根据权利要求1所述的方法,其特征在于,所述根据预先训练的长句预测模型,获取与所述候选词语相匹配的候选长句,包括:4. The method according to claim 1, characterized in that, according to a pre-trained long sentence prediction model, obtaining a candidate long sentence matching the candidate word, comprising:通过所述长句预测模型确定与所述候选词语相匹配的后缀词语,其中,所述长句预测模型,已学习得到所述候选词语与后缀词语之间的对应关系;The suffix words matching the candidate words are determined by the long sentence prediction model, wherein the long sentence prediction model has learned the correspondence between the candidate words and the suffix words;根据所述候选词语和所述后缀词语,生成所述候选长句。The candidate long sentence is generated according to the candidate word and the suffix word.5.根据权利要求4所述的方法,其特征在于,在所述通过所述长句预测模型确定与所述候选词语相匹配的后缀词语之前,还包括:5 . The method according to claim 4 , wherein before the determining the suffix words matching the candidate words by the long sentence prediction model, the method further comprises: 6 .获取训练语料数据,所述训练语料数据包括前缀样本词语,以及与所述前缀样本词语对应的后缀样本词语,其中,所述前缀样本词语和所述后缀样本词语可组成长句;Acquiring training corpus data, where the training corpus data includes prefix sample words and suffix sample words corresponding to the prefix sample words, wherein the prefix sample words and the suffix sample words can form a long sentence;根据所述前缀样本词语和所述后缀样本词语对所述长句预测模型进行训练。The long sentence prediction model is trained according to the prefix sample words and the suffix sample words.6.根据权利要求1-5任一项所述的方法,其特征在于,还包括:6. The method according to any one of claims 1-5, further comprising:获取在所述当前输入序列之前的已上屏词语;obtaining the words on the screen before the current input sequence;所述根据预先训练的长句预测模型,获取与所述候选词语相匹配的候选长句,包括:According to the pre-trained long sentence prediction model, obtaining the candidate long sentences matching the candidate words, including:采用预先训练的长句预测模型,获取与所述候选词语和所述上屏词语相匹配的候选长句。A pre-trained long sentence prediction model is used to obtain candidate long sentences matching the candidate words and the upper-screen words.7.一种输入法中候选长句的提供装置,其特征在于,所述装置包括:7. A device for providing candidate long sentences in an input method, wherein the device comprises:第一获取模块,用于获取用户在输入法应用中输入的当前输入序列;The first acquisition module is used to acquire the current input sequence input by the user in the input method application;第二获取模块,用于获取与所述当前输入序列相匹配的候选词语;A second acquisition module, for acquiring candidate words that match the current input sequence;第三获取模块,用于根据预先训练的长句预测模型,获取与所述候选词语相匹配的候选长句;The third acquisition module is used to acquire candidate long sentences matching the candidate words according to the pre-trained long sentence prediction model;展示模块,用于在所述输入法应用上展示候选词语和所述候选长句。The display module is used for displaying the candidate words and the candidate long sentences on the input method application.8.根据权利要求7所述的装置,其特征在于,所述第三获取模块,具体用于:8. The device according to claim 7, wherein the third acquisition module is specifically used for:将所述候选词语作为所述长句预测模型的当前输入;using the candidate word as the current input of the long sentence prediction model;将所述当前输入至所述长句预测模型中,以获取所述长句预测模型的当前输出,其中,所述当前输出包括所述当前输入之后的下一个词语;inputting the current input into the long sentence prediction model to obtain the current output of the long sentence prediction model, wherein the current output includes the next word after the current input;在确定所述下一个词语与预设语句终止信息不匹配时,根据所述当前输出和所述当前输入更新所述长句预测模型的当前输入,并通过所述长句预测模型获取当前输入对应的当前输出,直至所述长句预测模型的当前输出与预设语句终止信息匹配;When it is determined that the next word does not match the preset sentence termination information, update the current input of the long sentence prediction model according to the current output and the current input, and obtain the corresponding current input through the long sentence prediction model until the current output of the long sentence prediction model matches the preset sentence termination information;在所述长句预测模型的当前输出与预设语句终止信息匹配时,根据所述长句预测模型的当前输入,生成与所述候选词语相匹配的候选长句。When the current output of the long sentence prediction model matches the preset sentence termination information, a candidate long sentence matching the candidate word is generated according to the current input of the long sentence prediction model.9.根据权利要求8所述的装置,其特征在于,还包括:9. The apparatus of claim 8, further comprising:第四获取模块,用于获取训练语料数据,所述训练语料数据包括前缀样本词语,以及与所述前缀样本词语对应的后缀样本词语,其中,所述后缀样本词语为在前缀样本词之后出现的词语;The fourth acquisition module is used for acquiring training corpus data, where the training corpus data includes prefix sample words and suffix sample words corresponding to the prefix sample words, wherein the suffix sample words appear after the prefix sample words words;第一训练模块,用于根据所述前缀样本词语和所述后缀样本词语对所述长句预测模型进行训练。The first training module is used for training the long sentence prediction model according to the prefix sample words and the suffix sample words.10.根据权利要求7所述的装置,其特征在于,所述第三获取模块,具体用于:10. The device according to claim 7, wherein the third acquisition module is specifically configured to:通过所述长句预测模型确定与所述候选词语相匹配的后缀词语,其中,所述长句预测模型,已学习得到所述候选词语与后缀词语之间的对应关系;The suffix words matching the candidate words are determined by the long sentence prediction model, wherein the long sentence prediction model has learned the correspondence between the candidate words and the suffix words;根据所述候选词语和所述后缀词语,生成所述候选长句。The candidate long sentence is generated according to the candidate word and the suffix word.11.根据权利要求10所述的装置,其特征在于,还包括:11. The apparatus of claim 10, further comprising:第五获取模块,用于获取训练语料数据,所述训练语料数据包括前缀样本词语,以及与所述前缀样本词语对应的后缀样本词语,其中,所述前缀样本词语和所述后缀样本词语可组成长句;a fifth acquisition module, configured to acquire training corpus data, where the training corpus data includes prefix sample words and suffix sample words corresponding to the prefix sample words, wherein the prefix sample words and the suffix sample words can be composed of long sentence第二训练模块,用于根据所述前缀样本词语和所述后缀样本词语对所述长句预测模型进行训练。The second training module is configured to train the long sentence prediction model according to the prefix sample words and the suffix sample words.12.根据权利要求7-11任一项所述的装置,其特征在于,还包括:12. The device according to any one of claims 7-11, further comprising:第六获取模块,用于获取在所述当前输入序列之前的已上屏词语;The sixth acquisition module is used to acquire the words on the screen before the current input sequence;所述第三获取模块,具体用于:The third acquisition module is specifically used for:采用预先训练的长句预测模型,获取与所述候选词语和所述上屏词语相匹配的候选长句。A pre-trained long sentence prediction model is used to obtain candidate long sentences matching the candidate words and the upper-screen words.13.一种电子设备,其特征在于,包括:13. An electronic device, characterized in that, comprising:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现如权利要求1-6中任一所述的输入法中候选长句的提供方法。A memory, a processor, and a computer program stored in the memory and running on the processor, characterized in that, when the processor executes the program, the input method candidate according to any one of claims 1-6 is implemented How to provide long sentences.14.一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现如权利要求1-6中任一所述的输入法中候选长句的提供方法。14. A computer-readable storage medium having a computer program stored thereon, characterized in that, when the program is executed by a processor, a method for providing candidate long sentences in the input method according to any one of claims 1-6 is realized. .
CN201910927584.XA2019-09-272019-09-27 Method and device for providing candidate long sentences in input methodActiveCN110673748B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201910927584.XACN110673748B (en)2019-09-272019-09-27 Method and device for providing candidate long sentences in input method

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201910927584.XACN110673748B (en)2019-09-272019-09-27 Method and device for providing candidate long sentences in input method

Publications (2)

Publication NumberPublication Date
CN110673748Atrue CN110673748A (en)2020-01-10
CN110673748B CN110673748B (en)2023-04-28

Family

ID=69079711

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201910927584.XAActiveCN110673748B (en)2019-09-272019-09-27 Method and device for providing candidate long sentences in input method

Country Status (1)

CountryLink
CN (1)CN110673748B (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN112052649A (en)*2020-10-122020-12-08腾讯科技(深圳)有限公司Text generation method and device, electronic equipment and storage medium
CN112506359A (en)*2020-12-212021-03-16北京百度网讯科技有限公司Method and device for providing candidate long sentences in input method and electronic equipment
CN112527127A (en)*2020-12-232021-03-19北京百度网讯科技有限公司Training method and device for input method long sentence prediction model, electronic equipment and medium
CN113449515A (en)*2021-01-272021-09-28心医国际数字医疗系统(大连)有限公司Medical text prediction method and device and electronic equipment
CN113589946A (en)*2020-04-302021-11-02北京搜狗科技发展有限公司Data processing method and device and electronic equipment
CN113589949A (en)*2020-04-302021-11-02北京搜狗科技发展有限公司Input method and device and electronic equipment
CN113589953A (en)*2020-04-302021-11-02北京搜狗科技发展有限公司Information display method and device and electronic equipment
CN113589954A (en)*2020-04-302021-11-02北京搜狗科技发展有限公司Data processing method and device and electronic equipment
CN113589947A (en)*2020-04-302021-11-02北京搜狗科技发展有限公司Data processing method and device and electronic equipment
CN113589948A (en)*2020-04-302021-11-02北京搜狗科技发展有限公司Data processing method and device and electronic equipment
CN113589952A (en)*2020-04-302021-11-02北京搜狗科技发展有限公司Information display method and device and electronic equipment
CN113655893A (en)*2021-07-082021-11-16华为技术有限公司Word and sentence generation method, model training method and related equipment

Citations (10)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20030234821A1 (en)*2002-03-252003-12-25Agere Systems Inc.Method and apparatus for the prediction of a text message input
JP2007034871A (en)*2005-07-292007-02-08Sanyo Electric Co LtdCharacter input apparatus and character input apparatus program
JP2011128958A (en)*2009-12-182011-06-30Chiteki Mirai:KkDevice, method and program for inputting sentence
CN102866782A (en)*2011-07-062013-01-09哈尔滨工业大学Input method and input method system for improving sentence generating efficiency
CN105718070A (en)*2016-01-162016-06-29上海高欣计算机系统有限公司Pinyin long sentence continuous type-in input method and Pinyin long sentence continuous type-in input system
CN105929979A (en)*2016-06-292016-09-07百度在线网络技术(北京)有限公司Long-sentence input method and device
US20180302350A1 (en)*2016-08-032018-10-18Tencent Technology (Shenzhen) Company LimitedMethod for determining candidate input, input prompting method and electronic device
US20190121533A1 (en)*2016-02-062019-04-25Shanghai Chule (Coo Tek) Information Technology Co., Ltd.Method and device for secondary input of text
CN110187780A (en)*2019-06-102019-08-30北京百度网讯科技有限公司 Long text prediction method, device, equipment and storage medium
CN110286778A (en)*2019-06-272019-09-27北京金山安全软件有限公司 A Chinese deep learning input method, device and electronic equipment

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20030234821A1 (en)*2002-03-252003-12-25Agere Systems Inc.Method and apparatus for the prediction of a text message input
JP2007034871A (en)*2005-07-292007-02-08Sanyo Electric Co LtdCharacter input apparatus and character input apparatus program
JP2011128958A (en)*2009-12-182011-06-30Chiteki Mirai:KkDevice, method and program for inputting sentence
CN102866782A (en)*2011-07-062013-01-09哈尔滨工业大学Input method and input method system for improving sentence generating efficiency
CN105718070A (en)*2016-01-162016-06-29上海高欣计算机系统有限公司Pinyin long sentence continuous type-in input method and Pinyin long sentence continuous type-in input system
US20190121533A1 (en)*2016-02-062019-04-25Shanghai Chule (Coo Tek) Information Technology Co., Ltd.Method and device for secondary input of text
CN105929979A (en)*2016-06-292016-09-07百度在线网络技术(北京)有限公司Long-sentence input method and device
US20180302350A1 (en)*2016-08-032018-10-18Tencent Technology (Shenzhen) Company LimitedMethod for determining candidate input, input prompting method and electronic device
CN110187780A (en)*2019-06-102019-08-30北京百度网讯科技有限公司 Long text prediction method, device, equipment and storage medium
CN110286778A (en)*2019-06-272019-09-27北京金山安全软件有限公司 A Chinese deep learning input method, device and electronic equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
IUY;: "拼音输入法词库广度及选词精度全测试"*
袁哲;: "人工智能在拼音输入法中的应用"*

Cited By (21)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN113589952A (en)*2020-04-302021-11-02北京搜狗科技发展有限公司Information display method and device and electronic equipment
CN113589947B (en)*2020-04-302024-08-09北京搜狗科技发展有限公司Data processing method and device and electronic equipment
CN113589949B (en)*2020-04-302025-08-01北京搜狗科技发展有限公司Input method and device and electronic equipment
CN113589952B (en)*2020-04-302025-05-06北京搜狗科技发展有限公司 Information display method, device and electronic equipment
CN113589946A (en)*2020-04-302021-11-02北京搜狗科技发展有限公司Data processing method and device and electronic equipment
CN113589949A (en)*2020-04-302021-11-02北京搜狗科技发展有限公司Input method and device and electronic equipment
CN113589953A (en)*2020-04-302021-11-02北京搜狗科技发展有限公司Information display method and device and electronic equipment
CN113589954A (en)*2020-04-302021-11-02北京搜狗科技发展有限公司Data processing method and device and electronic equipment
CN113589947A (en)*2020-04-302021-11-02北京搜狗科技发展有限公司Data processing method and device and electronic equipment
CN113589948A (en)*2020-04-302021-11-02北京搜狗科技发展有限公司Data processing method and device and electronic equipment
CN113589948B (en)*2020-04-302024-10-29北京搜狗科技发展有限公司Data processing method and device and electronic equipment
CN113589954B (en)*2020-04-302024-09-03北京搜狗科技发展有限公司Data processing method and device and electronic equipment
CN113589946B (en)*2020-04-302024-07-26北京搜狗科技发展有限公司Data processing method and device and electronic equipment
CN112052649B (en)*2020-10-122024-05-31腾讯科技(深圳)有限公司Text generation method, device, electronic equipment and storage medium
CN112052649A (en)*2020-10-122020-12-08腾讯科技(深圳)有限公司Text generation method and device, electronic equipment and storage medium
CN112506359B (en)*2020-12-212023-07-21北京百度网讯科技有限公司 Method, device and electronic equipment for providing long sentence candidates in input method
CN112506359A (en)*2020-12-212021-03-16北京百度网讯科技有限公司Method and device for providing candidate long sentences in input method and electronic equipment
CN112527127B (en)*2020-12-232022-01-28北京百度网讯科技有限公司Training method and device for input method long sentence prediction model, electronic equipment and medium
CN112527127A (en)*2020-12-232021-03-19北京百度网讯科技有限公司Training method and device for input method long sentence prediction model, electronic equipment and medium
CN113449515A (en)*2021-01-272021-09-28心医国际数字医疗系统(大连)有限公司Medical text prediction method and device and electronic equipment
CN113655893A (en)*2021-07-082021-11-16华为技术有限公司Word and sentence generation method, model training method and related equipment

Also Published As

Publication numberPublication date
CN110673748B (en)2023-04-28

Similar Documents

PublicationPublication DateTitle
CN110673748B (en) Method and device for providing candidate long sentences in input method
EP4141695A1 (en)Response method in man-machine dialogue, dialogue system, and storage medium
CN108985358B (en) Emotion recognition method, device, device and storage medium
CN110188202A (en) Training method, device and terminal of semantic relationship recognition model
US10803850B2 (en)Voice generation with predetermined emotion type
CN109359196B (en)Text multi-modal representation method and device
JP2021197133A (en) Meaning Matching methods, devices, electronic devices, storage media and computer programs
CN104573099B (en)The searching method and device of topic
CN110187780B (en) Long text prediction method, device, equipment and storage medium
WO2020143320A1 (en)Method and apparatus for acquiring word vectors of text, computer device, and storage medium
CN107193807A (en)Language conversion processing method, device and terminal based on artificial intelligence
US20200057811A1 (en)Hybrid Natural Language Understanding
CN113836303A (en) A text category identification method, device, computer equipment and medium
CN110413760A (en) Man-machine dialogue method, device, storage medium and computer program product
CN108062303A (en)The recognition methods of refuse messages and device
CN108304376B (en)Text vector determination method and device, storage medium and electronic device
KR20200127947A (en)A document classification method with an explanation that provides words and sentences with high contribution in document classification
CN108475264A (en)Machine translation method and device
CN106951413A (en)Segmenting method and device based on artificial intelligence
CN112632962B (en) Methods and devices for realizing natural language understanding in human-computer interaction systems
CN115345669A (en)Method and device for generating file, storage medium and computer equipment
CN118171662A (en)Man-machine interaction method and system based on artificial intelligence
CN109670047B (en)Abstract note generation method, computer device and readable storage medium
CN110909768A (en)Method and device for acquiring marked data
CN110188327A (en) Text de-colloquial method and device

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp