Disclosure of Invention
The embodiment of the invention provides a method and a device for updating configuration information, computer equipment and a storage medium, so as to improve the updating efficiency of the configuration information.
In order to solve the foregoing technical problem, an embodiment of the present application provides an update method of configuration information, including:
receiving a configuration updating request, acquiring a service identifier and an initial statement contained in the configuration updating request, and matching a service scene according to the service identifier to obtain a target service scene;
acquiring preset scene words corresponding to the target service scene as candidate words, wherein the candidate words comprise logic words and service object words;
performing word segmentation processing on the initial sentence to obtain a basic word segmentation sequence;
sequentially identifying service object participles contained in the basic participle sequence as target object participles, identifying logic participles contained in the basic participle sequence as target logic participles, and sequencing the target object participles and the target logic participles according to the front-back sequence of positions to obtain a conversion participle sequence;
and performing expression conversion on the converted word segmentation sequence according to a preset grammar conversion rule to generate a target expression, and updating the configuration information based on the target expression.
Optionally, performing word segmentation processing on the initial sentence to obtain a basic word segmentation sequence includes:
acquiring a preset training corpus, and analyzing the preset training corpus by using an N-gram model to obtain word sequence data of the preset training corpus;
performing word segmentation analysis on the initial sentence to obtain M word segmentation sequences, wherein M is a positive integer;
aiming at each word segmentation sequence, calculating the occurrence probability of each word segmentation sequence according to word sequence data of a preset training corpus to obtain the occurrence probability of M word segmentation sequences;
selecting a word segmentation sequence corresponding to the occurrence probability reaching a preset probability threshold value from the occurrence probabilities of the M word segmentation sequences as a target word segmentation sequence;
and taking each participle in the target participle sequence as a basic participle contained in the initial sentence, and sequencing according to the front-back sequence of the basic participle position to obtain a basic participle sequence.
Optionally, the calculating, for each word segmentation sequence, the occurrence probability of each word segmentation sequence according to word sequence data of a preset training corpus, and obtaining the occurrence probabilities of the M word segmentation sequences includes:
aiming at each word segmentation sequence, acquiring all the word segmentations a in the word segmentation sequence1,a2,...,an-1,anWherein n is a positive integer greater than 1;
according to the word sequence data, calculating the nth participle a in the n participles by using the following formulanAppear in the word sequence (a)1a2…an-1) And then taking the probability as the occurrence probability of the word segmentation sequence:
wherein ,P(an|a1a2…an-1an) For the nth participle a in the n participlesnAppear in the word sequence (a)1a2…an-1) Probability of later, C (a)1a2…an-1an) As a sequence of words a1a2…an-1anWord sequence frequency of C (a)1a2…an-1) Is a word sequence (a)1a2…an-1) The word sequence frequency of.
Optionally, before the obtaining a preset training corpus and analyzing the preset training corpus by using an N-gram model to obtain word sequence data of the preset training corpus, the method for updating configuration information further includes:
constructing a service scene information base;
generating a supplementary corpus based on the service scene information base;
and combining the supplementary corpus with a preset basic corpus to obtain the preset training corpus.
Optionally, the generating a supplementary corpus based on the service scenario information base includes:
extracting the service information in the service scene information base;
performing word segmentation processing on the service information to obtain key words;
and establishing a mapping relation between the key participles and the corresponding service information, and correspondingly storing the service information, the key participles and the mapping relation into the supplementary corpus.
Optionally, the performing expression transformation on the transformed participle sequence according to a preset grammar transformation rule, and generating a target expression includes:
respectively carrying out logic expression on each target object participle and each logic participle in the conversion participle sequence to obtain a logic expression object;
determining the association relation between adjacent logic expression objects according to the word segmentation of each service object in the conversion word segmentation sequence and the position information of the logic word segmentation;
generating a splicing instruction according to the incidence relation;
and generating the rule expression by combining the logic expression object and the splicing instruction according to a preset instruction writing rule.
In order to solve the foregoing technical problem, an embodiment of the present application further provides an apparatus for updating configuration information, including:
the request analysis module is used for receiving a configuration updating request, acquiring a service identifier and an initial statement contained in the configuration updating request, and matching a service scene according to the service identifier to obtain a target service scene;
the word stock selection module is used for acquiring preset scene words corresponding to the target service scene as candidate words, wherein the candidate words comprise logic words and service object words;
the sentence segmentation module is used for performing word segmentation processing on the initial sentence to obtain a basic word segmentation sequence;
the word segmentation conversion module is used for sequentially identifying service object word segmentation contained in the basic word segmentation sequence as target object word segmentation, identifying logic word segmentation contained in the basic word segmentation sequence as target logic word segmentation, and sequencing the target object word segmentation and the target logic word segmentation according to the front-back sequence of positions to obtain a conversion word segmentation sequence;
and the configuration updating module is used for performing expression conversion on the converted word segmentation sequence according to a preset grammar conversion rule to generate a target expression and updating configuration information based on the target expression.
Optionally, the sentence segmentation module includes:
a corpus acquisition unit, configured to acquire a preset training corpus and analyze the preset training corpus by using an N-gram model to obtain word sequence data of the preset training corpus;
the word segmentation analysis unit is used for carrying out word segmentation analysis on the initial sentence to obtain M word segmentation sequences, wherein M is a positive integer;
the probability calculation unit is used for calculating the occurrence probability of each word segmentation sequence according to word sequence data of a preset training corpus aiming at each word segmentation sequence to obtain the occurrence probability of M word segmentation sequences;
the sequence determination unit is used for selecting the word segmentation sequence corresponding to the occurrence probability reaching a preset probability threshold from the occurrence probabilities of the M word segmentation sequences as a target word segmentation sequence;
and the matching unit is used for taking each participle in the target participle sequence as a basic participle contained in the initial sentence, and sequencing the participles according to the front-back sequence of the basic participle position to obtain a basic participle sequence.
Optionally, the probability calculation unit includes:
a word segmentation obtaining subunit, configured to obtain, for each word segmentation sequence, all the words a in the word segmentation sequence1,a2,...,an-1,anWherein n is a positive integer greater than 1;
an occurrence probability calculating subunit, configured to calculate an nth participle a of the n participles according to the word sequence data by using the following formulanAppear in the word sequence (a)1a2…an-1) And then taking the probability as the occurrence probability of the word segmentation sequence:
wherein ,P(an|a1a2…an-1an) For the nth participle a in the n participlesnAppear in the word sequence (a)1a2…an-1) Probability of later, C (a)1a2…an-1an) As a sequence of words a1a2…an-1anWord sequence frequency of,C(a1a2…an-1) Is a word sequence (a)1a2…an-1) The word sequence frequency of.
Optionally, the apparatus for updating configuration information further includes:
the scene information base construction module is used for constructing a service scene information base;
a supplementary corpus generation module, configured to generate a supplementary corpus based on the service scenario information base;
and combining the supplementary corpus with a preset basic corpus by a training corpus to obtain the preset training corpus.
Optionally, the supplementary corpus generating module includes:
the information extraction unit is used for extracting the service information in the service scene information base;
the word segmentation unit is used for performing word segmentation processing on the service information to obtain key words;
and the supplementary corpus establishing unit is used for establishing a mapping relation between the key participles and the corresponding business information and correspondingly storing the business information, the key participles and the mapping relation into the supplementary corpus.
Optionally, the configuration update module includes:
the word segmentation conversion unit is used for respectively carrying out logic expression on each target object word segmentation and each logic word segmentation in the conversion word segmentation sequence to obtain a logic expression object;
the relation establishing unit is used for determining the association relation between adjacent logic expression objects according to the position information of each service object word segmentation and the logic word segmentation in the conversion word segmentation sequence;
the instruction splicing unit is used for generating a splicing instruction according to the incidence relation;
and the expression generating unit is used for generating the regular expression by combining the logic expression object and the splicing instruction according to a preset instruction writing rule.
In order to solve the above technical problem, an embodiment of the present application further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the above method for updating configuration information when executing the computer program.
In order to solve the above technical problem, an embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored, and the computer program, when executed by a processor, implements the steps of the above method for updating configuration information.
The configuration information updating method, device, computer equipment and storage medium provided by the embodiments of the present invention obtain a service identifier and an initial sentence included in a configuration updating request by receiving the configuration updating request, match a service scene according to the service identifier to obtain a target service scene, further obtain a preset scene participle corresponding to the target service scene as a candidate participle, wherein the candidate participle includes a logical participle and a service object participle, perform participle processing on the initial sentence to obtain a basic participle sequence, sequentially identify the service object participles included in the basic participle sequence according to the candidate participle to serve as the target object participle, identify the logical participles included in the basic participle sequence to serve as the target logical participle, and sequence the target object participle and the target logical participle according to a front-back order of positions to obtain a converted participle sequence, and finally, performing expression conversion on the converted word segmentation sequence according to a preset grammar conversion rule to generate a target expression, updating the configuration information based on the target expression, and realizing the purpose of receiving the statement in the client configuration updating request, performing word segmentation and analysis to generate a computer-readable target expression, thereby realizing the updating of the configuration information according to the target expression and improving the updating efficiency of the configuration information.
Detailed Description
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, as shown in fig. 1, asystem architecture 100 may includeterminal devices 101, 102, 103, anetwork 104 and aserver 105. Thenetwork 104 serves as a medium for providing communication links between theterminal devices 101, 102, 103 and theserver 105.Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use theterminal devices 101, 102, 103 to interact with theserver 105 via thenetwork 104 to receive or send messages or the like.
Theterminal devices 101, 102, 103 may be various electronic devices having display screens and supporting web browsing, including but not limited to smart phones, tablet computers, E-book readers, MP3 players (Moving Picture E interface shows a properties Group Audio Layer III, motion Picture experts compress standard Audio Layer 3), MP4 players (Moving Picture E interface shows a properties Group Audio Layer IV, motion Picture experts compress standard Audio Layer 4), laptop portable computers, desktop computers, and the like.
Theserver 105 may be a server providing various services, such as a background server providing support for pages displayed on theterminal devices 101, 102, 103.
The configuration information updating method provided by the embodiment of the present application is executed by a server, and accordingly, an updating apparatus for configuration information is provided in the server.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. Any number of terminal devices, networks and servers may be provided according to implementation needs, and theterminal devices 101, 102 and 103 in this embodiment may specifically correspond to an application system in actual production.
Referring to fig. 2, fig. 2 shows a method for updating configuration information according to an embodiment of the present invention, which is described by taking the method applied to the server in fig. 1 as an example, and is detailed as follows:
s201: and receiving a configuration updating request, acquiring a service identifier and an initial statement contained in the configuration updating request, and matching a service scene according to the service identifier to obtain a target service scene.
Specifically, when a client performs rule configuration, a user selects service scenes (each service scene corresponds to a unique service identifier), then natural language is adopted to input configuration content needing to be updated or added, the client uses the identifier corresponding to the service scene as a service identifier, the input configuration content is used as an initial statement, a configuration request is generated according to the service identifier and the initial statement and is sent to a server, the server receives the configuration request, obtains the service identifier and the initial statement contained in the configuration request, and matches a specified service scene according to the service identifier to serve as a target service scene.
It should be noted that, in the present embodiment, the input configuration content may be a natural language, rather than a specific computer readable instruction or a combination thereof.
The specific way of matching the service scenario may be through traversal query, string matching, and the like, which is not limited herein.
S202: and acquiring preset scene words corresponding to the target service scene as candidate words, wherein the candidate words comprise logic words and service object words.
Specifically, after a target service scene is confirmed, a preset scene word corresponding to the target service scene is obtained from a database preset by a server and serves as a candidate word, wherein the candidate word comprises a logic word and a service object word.
It should be noted that different logic words and service object words are preset in different service scenarios according to actual requirements.
The logic words corresponding to the services include, but are not limited to: common operators, logic preassignors, brackets, numbers and the like, and the business object participle mainly refers to professional terms and the like in business, such as base salary, performance score, attendance and the like in salary management business.
S203: and performing word segmentation processing on the initial sentence to obtain a basic word segmentation sequence.
Specifically, the initial sentence is subjected to word segmentation processing through a preset word segmentation mode, so that each basic word segmentation is obtained.
The preset word segmentation mode includes but is not limited to: through a third-party word segmentation tool or a word segmentation algorithm, and the like.
Common third-party word segmentation tools include, but are not limited to: the system comprises a Stanford NLP word segmentation device, an ICTCLAS word segmentation system, an ansj word segmentation tool, a HanLP Chinese word segmentation tool and the like.
The word segmentation algorithm includes, but is not limited to: a Maximum forward Matching (MM) algorithm, a reverse direction Maximum Matching (RMM) algorithm, a Bi-directional Maximum Matching (BM) algorithm, a Hidden Markov Model (HMM), an N-gram Model, and the like.
Easily understood, basic participles are extracted in a participle mode, on one hand, some nonsense words in effective initial sentences can be filtered, and on the other hand, the method is also beneficial to performing semantic recognition by using the basic participles subsequently.
S204: and sequentially identifying the service object participles contained in the basic participle sequence as target object participles, identifying the logic participles contained in the basic participle sequence as target logic participles, and sequencing the target object participles and the target logic participles according to the front-back sequence of the positions to obtain a conversion participle sequence.
Specifically, matching the candidate word with the word segmentation in the basic word segmentation sequence by adopting a similar word matching mode to obtain a target logical word segmentation and a target object word segmentation, and sequencing according to the corresponding position of each target logical word segmentation and each target object word segmentation in the basic word segmentation sequence to obtain a converted word segmentation sequence.
The similar word matching method specifically includes, but is not limited to: similarity value calculation, fuzzy matching, semantic recognition, word segmentation clustering and the like.
S205: and performing expression conversion on the converted word segmentation sequence according to a preset grammar conversion rule to generate a target expression, and updating the configuration information based on the target expression.
Specifically, the conversion word segmentation sequence is converted into a regular expression according to a preset grammar conversion rule to obtain a target expression, the target expression is a computer program which can be directly executed by a computer processor, the background business rule can be rapidly configured through natural language, and the configuration efficiency is improved.
In this embodiment, natural semantics is converted into a computer program by constructing a mapping relationship between a logic word segmentation, a business object word segmentation and a function, and then splicing and repairing are performed through a grammar rule of the computer program.
In the embodiment, a configuration updating request is received, a service identifier and an initial sentence included in the configuration updating request are obtained, a service scene is matched according to the service identifier, a target service scene is obtained, a preset scene participle corresponding to the target service scene is further obtained and used as a candidate participle, the candidate participle comprises a logic participle and a service object participle, the initial sentence is participled to obtain a basic participle sequence, the service object participle included in the basic participle sequence is sequentially recognized according to the candidate participle and used as a target object participle, the logic participle included in the basic participle sequence is recognized and used as a target logic participle, the target object participle and the target logic participle are sequenced according to the front-back sequence of positions to obtain a converted participle sequence, and finally, expression conversion is performed on the converted participle sequence according to a preset grammar conversion rule, and generating a target expression, updating the configuration information based on the target expression, realizing the purpose of receiving the statement in the client configuration updating request, performing word segmentation and analysis, generating a computer-readable target expression, further realizing the updating of the configuration information according to the target expression, and improving the updating efficiency of the configuration information.
In some optional implementation manners of this embodiment, in step S203, performing word segmentation processing on the initial sentence to obtain a basic word segmentation sequence includes:
acquiring a preset training corpus, and analyzing the preset training corpus by using an N-gram model to obtain word sequence data of the preset training corpus;
performing word segmentation analysis on the initial sentence to obtain M word segmentation sequences, wherein M is a positive integer;
aiming at each word segmentation sequence, calculating the occurrence probability of each word segmentation sequence according to word sequence data of a preset training corpus to obtain the occurrence probability of M word segmentation sequences;
selecting a word segmentation sequence corresponding to the occurrence probability reaching a preset probability threshold value from the occurrence probabilities of the M word segmentation sequences as a target word segmentation sequence;
and taking each participle in the target participle sequence as a basic participle contained in the initial sentence, and sequencing according to the front-back sequence of the position of the basic participle to obtain the basic participle sequence.
Specifically, the training corpus is a corpus obtained by training relevant corpora to evaluate initial sentences in natural language, and an N-gram model is used to perform statistical analysis on each corpus in a preset training corpus to obtain the number of times that one corpus H appears behind another corpus I in the preset training corpus, so as to obtain word sequence data of word sequences consisting of "corpus I + corpus H". The content in the training corpus in the embodiment of the present invention includes, but is not limited to: web sites of business scenarios, consulting information, business corpora, general corpora, and the like.
The Corpus (Corpus) refers to a large-scale electronic text library which is scientifically sampled and processed. The corpus is a basic resource of linguistic research and also a main resource of an empirical language research method, is applied to aspects such as lexicography, language teaching, traditional language research, statistics or example-based research in natural language processing and the like, and is a corpus, namely a language material, which is the content of the linguistic research and also is a basic unit for forming the corpus.
For example, in one embodiment, the preset training corpus is a corpus obtained by crawling popular web topics and current news by means of web crawlers, and the corpus is in the field of "current news".
The Word sequence refers to a sequence formed by combining at least two linguistic data according to a certain sequence, the Word sequence frequency refers to the proportion of the occurrence frequency of the Word sequence to the occurrence frequency of Word Segmentation (Word Segmentation) in the whole corpus, and the Word Segmentation refers to a Word sequence obtained by combining continuous Word sequences according to a preset combination mode. For example, if the number of occurrences of a word sequence "love tomatoes" in the entire corpus is 100 times, and the sum of the number of occurrences of all the participles in the entire corpus is 100000 times, the frequency of the word sequence "love tomatoes" is 0.0001.
The N-gram model is a language model commonly used in large-vocabulary continuous character semantic recognition, and the sentence with the maximum probability can be calculated by utilizing collocation information between adjacent words in the context when continuous blank-free characters need to be converted into Chinese character strings (namely sentences), so that automatic conversion of Chinese characters is realized, manual selection of a user is not needed, and the accuracy of word sequence determination is improved.
Further, specifically, each initial sentence has a different sentence-breaking mode, the understood sentences may have differences, and in order to ensure the correctness of sentence understanding, the server obtains the composition of M word-segmentation sequences of the basic sentence after obtaining the initial sentence, where M is the total number of all possible word-segmentation sequences.
Each word segmentation sequence is a result obtained by dividing an initial sentence, and the obtained word sequence comprises at least two word segmentations.
For example, in one embodiment, an initial sentence is "true hot today", and the base sentence is parsed to obtain a word segmentation sequence a: "today", "true", "hot", the resulting segmentation sequence B is: "today", "Tianzhen", "hot", etc.
Further, specifically, according to the word sequence data acquired in step S32, the occurrence probability of each word segmentation sequence is calculated, so as to obtain the occurrence probabilities of M word segmentation sequences.
The occurrence probability of the partial word sequence can be calculated by using a Markov hypothesis theory: the occurrence of the Y-th word is only related to the previous Y-1 words, but not to any other words, and the probability of the whole sentence is the product of the occurrence probabilities of the words. These probabilities can be obtained by counting the number of times that Y words occur simultaneously directly from the corpus. Namely:
P(T)=P(W1W2...WY)=P(W1)P(W2|W1)...P(WY|W1W2...WY-1) Formula (1)
Wherein P (T) is the probability of the whole sentence appearing, P (W)Y|W1W2...WY-1) Is the probability that the Y-th participle appears after the word sequence consisting of Y-1 participles.
For example: after the Chinese nation is a nation with a long civilization history, the divided word sequence is as follows: the method comprises the steps of "Chinese nation", "is", "one", "having", "long", "civilization", "history", "being", "nationality", wherein 9 participles are appeared together, and when n is 9, the probability that the participle of the "nation" appears after the word sequence of the "Chinese nation is a word with long civilization history" is calculated.
Further, specifically, for each word segmentation sequence, an occurrence probability is obtained through calculation, and the occurrence probabilities of M word segmentation sequences are obtained in total, the occurrence probabilities of the M word segmentation sequences are respectively compared with a preset probability threshold, the occurrence probability greater than or equal to the preset probability threshold is selected as an effective occurrence probability, word segmentation sequences corresponding to the effective occurrence probability are further found, and the word segmentation sequences are used as target word segmentation sequences.
By comparing with a preset probability threshold value, the word segmentation sequences with the occurrence probability not meeting the requirement are filtered, so that the selected target word segmentation sequences are closer to the meaning expressed in the natural language, and the accuracy of semantic recognition is improved.
Preferably, in this embodiment, the word segmentation sequence corresponding to the maximum occurrence probability is selected as the target word segmentation sequence, so as to reduce the subsequent computation and improve the word segmentation efficiency of the initial sentence.
It should be noted that, in order to improve the word segmentation efficiency of the initial sentence, in this embodiment, a process of obtaining the word sequence data of the preset training corpus by obtaining the preset training corpus and analyzing the preset training corpus by using the N-gram model may be performed before the initial sentence is recognized, and the obtained word sequence data is stored, and when the initial sentence needs to be subjected to semantic recognition, the word sequence data is directly called.
In this embodiment, the word sequence data of the preset training corpus is obtained by obtaining the preset training corpus and analyzing the preset training corpus using the N-gram model, so that the word sequence data can be directly used when the occurrence probability is calculated subsequently, thereby saving the time for calculating the probability, being beneficial to improving the recognition efficiency of the initial sentence, and simultaneously, performing word segmentation analysis on the initial sentence to obtain M word segmentation sequences, further calculating the occurrence probability of each word segmentation sequence according to word sequence data of a preset training corpus for each word segmentation sequence to obtain the occurrence probability of the M word segmentation sequences, and then calculating the occurrence probability of the M word segmentation sequences from the occurrence probability of the M word segmentation sequences, selecting a word segmentation sequence corresponding to the occurrence probability reaching a preset probability threshold value as a target word segmentation sequence, and each participle in the target participle sequence is used as a basic participle contained in the basic sentence, so that the accuracy rate of the participle is improved.
In some optional implementation manners of this embodiment, for each word segmentation sequence, calculating an occurrence probability of each word segmentation sequence according to word sequence data of a preset training corpus, and obtaining the occurrence probabilities of the M word segmentation sequences includes:
aiming at each participle sequence, acquiring all participles a in the participle sequence1,a2,...,an-1,anWherein n is a positive integer greater than 1;
according to the word sequence data, calculating the nth participle a in the n participles by using the following formulanAppear in the word sequence (a)1a2…an-1) Then taking the probability as the probabilityProbability of occurrence of word segmentation sequence:
wherein ,P(an|a1a2…an-1an) For the nth participle a in the n participlesnAppear in the word sequence (a)1a2…an-1) Probability of later, C (a)1a2…an-1an) As a sequence of words a1a2…an-1anWord sequence frequency of C (a)1a2…an-1) Is a word sequence (a)1a2…an-1) The word sequence frequency of.
Specifically, the participles in the participle sequence are obtained sequentially according to the sequence of the word sequence from front to back, for example, for a participle sequence "i love in china", the participles are sequentially extracted according to the sequence of the word sequence from front to back to obtain a first participle "i", a second participle "love", and a third participle "china".
Further, the word sequence frequency of each word sequence is obtained by analyzing the training corpus through an N-gram model, and only calculation is performed according to formula (2) here.
It is worth to be noted that, because the training corpus used by the N-gram model is huge, the data is sparse and serious, the time complexity is high, and the occurrence probability value calculated for the interest point is small, the occurrence probability can also be calculated by using a binary model.
The bigram model is to calculate the probability a1 that the participle a2 appears after the participle a1, the probability a2 that the participle A3 appears after the participle a2, and the probability An-1 that the participle An appears after the participle An-1 by using the formula (2), and further calculate the occurrence probability of the whole word sequence (a1a2.. An-1An) by using the formula (3):
a-1 formula (3) is defined as P (T') ═ a1a2
In this embodiment, for each word segmentation sequence, all the words in the word segmentation sequence are acquired, the probability that the last word segmentation occurs after the word sequence formed by combining all the preceding words is calculated to obtain the probability that the whole sentence occurs, and then whether the word segmentation mode of the sentence is reasonable or not is evaluated, so that the semantics included in the speech information of the natural language is identified, information such as related words to be segmented is obtained, and the accuracy of word segmentation is improved.
In some optional implementation manners of this embodiment, in step S203, before obtaining the preset training corpus and analyzing the preset training corpus by using the N-gram model to obtain word sequence data of the preset training corpus, the method for updating configuration information further includes:
constructing a service scene information base;
generating a supplementary corpus based on the service scene information base;
and combining the supplementary corpus with a preset basic corpus to obtain a preset training corpus.
Specifically, before performing word segmentation recognition, in order to enhance the accuracy of the word segmentation recognition related to the service, a service scene information base containing more comprehensive service related information needs to be constructed, where the service scene information base contains the word segmentation information of each service scene, an existing general model may be used to generate the service scene information base, and the service scene information base may also be constructed in a manner of manually collecting points of interest, or a network crawler may be used to obtain service scenes to construct the service scene information base, where the specific manner is not specifically limited herein.
Preferably, the embodiment of the invention adopts a mode of using a web crawler to acquire the relevant information of the service scene, and constructs the information base of the service scene through the crawled data.
The preset basic corpus is selected according to actual needs, for example, news and events related to business in the same field in the last three years are selected, and the corpus generated through text cleaning and arrangement is used as the basic corpus.
In this embodiment, the service scene information base is constructed, the supplementary corpus is generated based on the service scene information base, and the supplementary corpus is combined with the preset basic corpus to obtain the training corpus, so that the training corpus for N-gram model analysis not only has the capability of evaluating whether sentences are reasonable, but also contains relevant information of a service scene, and thus, whether a sentence contains the service scene can be accurately evaluated, which is beneficial to improving the accuracy of information identification of natural language and the accuracy of information identification in the service scene.
In some optional implementations of this embodiment, generating the supplementary corpus based on the service context information base includes:
extracting service information in a service scene information base;
performing word segmentation processing on the service information to obtain key words;
and establishing a mapping relation between the key participles and the corresponding service information, and correspondingly storing the service information, the key participles and the mapping relation into a supplementary corpus.
In the embodiment, the service information is subjected to word segmentation analysis to obtain the key word segmentation, a mapping relation between the key word segmentation and the corresponding service information is further established and stored in the supplementary corpus, and semantic recognition and word segmentation processing can be rapidly performed according to the mapping relation in the follow-up process, so that the efficiency and the accuracy of word segmentation are improved.
In some optional implementation manners of this embodiment, in step S205, performing expression transformation on the transformed participle sequence according to a preset grammar transformation rule, and generating a target expression includes:
respectively carrying out logic expression on each target object participle and each logic participle in the conversion participle sequence to obtain a logic expression object;
determining the association relation between adjacent logic expression objects according to the word segmentation of each service object in the conversion word segmentation sequence and the position information of the logic word segmentation;
generating a splicing instruction according to the incidence relation;
and generating a rule expression by combining the logic expression object and the splicing instruction according to a preset instruction writing rule.
Specifically, the target object participles and the logic participles are standardized words, the standardized words are converted into corresponding logic expression objects by converting the logic expression, for example, the logic participles are converted into the logic expression of "i + +", the positions of each service object participle and the logic participle in the conversion participle sequence will cause the logic execution of the service object participles and the logic participle to be different, so that the association relationship between any two adjacent logic expression objects is determined by the position information of the positions of each service object participle and the logic participle, a splicing instruction is generated according to the association relationship, and a rule expression is generated by combining the logic expression objects and the splicing instruction according to a preset instruction writing rule.
The association relationship specifically includes but is not limited to: and generating a splicing instruction according to the association relationship, wherein the splicing instruction can be specifically realized by presetting a corresponding splicing instruction for each association relationship.
In the embodiment, the segmentation words in the converted segmentation word sequence are logically expressed to obtain logical expression objects, the incidence relation between adjacent logical expression objects is determined, a splicing instruction is generated according to the incidence relation, a rule is compiled according to a preset instruction, and the logical expression objects and the splicing instruction are combined to generate a regular expression, so that the standard segmentation words obtained after the natural language is identified into the regular expression which can be identified by a computer are converted, the configuration information can be accurately updated through the regular expression subsequently, and the accuracy and the efficiency of updating the configuration information are improved.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
Fig. 3 is a schematic block diagram of an updating apparatus of configuration information, which corresponds to the updating method of configuration information in one-to-one correspondence with the above-described embodiment. As shown in fig. 3, the configuration information updating apparatus includes arequest parsing module 31, a wordbank selecting module 32, asentence segmentation module 33, asegmentation conversion module 34, and aconfiguration updating module 35. The functional modules are explained in detail as follows:
therequest analysis module 31 is configured to receive the configuration update request, obtain a service identifier and an initial statement included in the configuration update request, and match a service scene according to the service identifier to obtain a target service scene;
the wordbank selecting module 32 is configured to obtain a preset scene word corresponding to the target service scene as a candidate word, where the candidate word includes a logic word and a service object word;
thesentence segmentation module 33 is configured to perform word segmentation processing on the initial sentence to obtain a basic word segmentation sequence;
the wordsegmentation conversion module 34 is configured to sequentially identify service object word segmentation included in the basic word segmentation sequence as a target object word segmentation, identify logical word segmentation included in the basic word segmentation sequence as a target logical word segmentation, and sequence the target object word segmentation and the target logical word segmentation according to a front-back order of positions to obtain a conversion word segmentation sequence;
and theconfiguration updating module 35 is configured to perform expression conversion on the converted word segmentation sequence according to a preset grammar conversion rule to generate a target expression, and update the configuration information based on the target expression.
Optionally, thesentence segmentation module 33 includes:
a corpus acquisition unit, configured to acquire a preset training corpus and analyze the preset training corpus by using an N-gram model to obtain word sequence data of the preset training corpus;
the word segmentation analysis unit is used for carrying out word segmentation analysis on the initial sentence to obtain M word segmentation sequences, wherein M is a positive integer;
the probability calculation unit is used for calculating the occurrence probability of each word segmentation sequence according to word sequence data of a preset training corpus aiming at each word segmentation sequence to obtain the occurrence probability of M word segmentation sequences;
the sequence determination unit is used for selecting the word segmentation sequence corresponding to the occurrence probability reaching a preset probability threshold from the occurrence probabilities of the M word segmentation sequences as a target word segmentation sequence;
and the matching unit is used for taking each participle in the target participle sequence as a basic participle contained in the initial sentence, and sequencing the participles according to the front-back sequence of the basic participle position to obtain a basic participle sequence.
Optionally, the probability calculation unit includes:
a word segmentation obtaining subunit, configured to obtain, for each word segmentation sequence, all the words a in the word segmentation sequence1,a2,...,an-1,anWherein n is a positive integer greater than 1;
an occurrence probability calculating subunit, configured to calculate an nth participle a of the n participles according to the word sequence data by using the following formulanAppear in the word sequence (a)1a2…an-1) And taking the probability as the occurrence probability of the word segmentation sequence:
wherein ,P(an|a1a2…an-1an) For the nth participle a in the n participlesnAppear in the word sequence (a)1a2…an-1) Probability of later, C (a)1a2…an-1an) As a sequence of words a1a2…an-1anWord sequence frequency of C (a)1a2…an-1) Is a word sequence (a)1a2…an-1) The word sequence frequency of.
Optionally, the apparatus for updating configuration information further includes:
the scene information base construction module is used for constructing a service scene information base;
the supplementary corpus generation module is used for generating a supplementary corpus based on the service scene information base;
the training corpus combines the supplementary corpus with a preset basic corpus to obtain a preset training corpus.
Optionally, the supplementary corpus generating module comprises:
the information extraction unit is used for extracting the service information in the service scene information base;
the word segmentation unit is used for performing word segmentation processing on the service information to obtain key words;
and the supplementary corpus establishing unit is used for establishing a mapping relation between the key participles and the corresponding business information and correspondingly storing the business information, the key participles and the mapping relation into the supplementary corpus.
Optionally, theconfiguration updating module 35 includes:
the word segmentation conversion unit is used for respectively carrying out logic expression on each target object word segmentation and each logic word segmentation in the conversion word segmentation sequence to obtain a logic expression object;
the relation establishing unit is used for determining the association relation between adjacent logic expression objects according to the participle of each service object in the conversion participle sequence and the position information of the logic participle;
the instruction splicing unit is used for generating a splicing instruction according to the incidence relation;
and the expression generating unit is used for generating a rule expression by combining the logic expression object and the splicing instruction according to a preset instruction writing rule.
For specific limitations of the configuration information updating apparatus, reference may be made to the above limitations of the configuration information updating method, which are not described herein again. The modules in the above configuration information updating device may be implemented wholly or partially by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In order to solve the technical problem, an embodiment of the present application further provides a computer device. Referring to fig. 4, fig. 4 is a block diagram of a basic structure of a computer device according to the present embodiment.
The computer device 4 comprises amemory 41, aprocessor 42, anetwork interface 43 communicatively connected to each other via a system bus. It is noted that only the computer device 4 having thecomponents connection memory 41,processor 42,network interface 43 is shown, but it is understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.
The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.
Thememory 41 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or D interface display memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, thememory 41 may be an internal storage unit of the computer device 4, such as a hard disk or a memory of the computer device 4. In other embodiments, thememory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the computer device 4. Of course, thememory 41 may also include both internal and external storage devices of the computer device 4. In this embodiment, thememory 41 is generally used for storing an operating system installed in the computer device 4 and various types of application software, such as program codes for controlling electronic files. Further, thememory 41 may also be used to temporarily store various types of data that have been output or are to be output.
Theprocessor 42 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. Theprocessor 42 is typically used to control the overall operation of the computer device 4. In this embodiment, theprocessor 42 is configured to execute the program code stored in thememory 41 or process data, such as program code for executing control of an electronic file.
Thenetwork interface 43 may comprise a wireless network interface or a wired network interface, and thenetwork interface 43 is generally used for establishing communication connection between the computer device 4 and other electronic devices.
The present application further provides another embodiment, which is to provide a computer-readable storage medium storing an interface display program, which is executable by at least one processor to cause the at least one processor to perform the steps of the method for updating configuration information as described above.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.
It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention and do not limit the scope of the invention. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.