The content of the invention
For above-mentioned technical problem, the embodiments of the invention provide a kind of word semantic analysis, word semantic analysisTerminal and storage medium, to be intended to the implication for the information truth for helping existing man-machine conversation's None- identified user to be stated, solveThe problem of information transmission mistake.
The first aspect of the embodiment of the present invention provides a kind of word semantic analysis, the word semantic analysis bagInclude following steps:
The text information of user's input is received, and morphological analysis is carried out to the text information of input, by the wordThe character string included in information is separated into independent word, obtains word sequence;
Syntactic analysis is carried out to the word sequence being separated out, judges to whether there is syntax error in the word sequence, andThe phrase formed there will be the word of syntax error or adjacent words filters out;
The word contained in word sequence is changed into corresponding metadata, calculates the semantic phase between each metadataLike degree and Feature item weighting, and according to the keyword of semantic similarity and Feature item weighting the extraction word sequence calculatedCharacteristic item, and the semantic marker text according to corresponding to the keyword feature item obtains each word, and the semanteme is markedNote text is stored in text database;
Put in order according to each word in word sequence, match corresponding language from the text database successivelyAdopted retrtieval, and the text message output display that will be synthesized after sequence.
Alternatively, the text information of user's input includes:The problem of identity information of user and user input information;
The identity information of the user includes:ID information byte, address name byte, phone number byte.
Alternatively, the described the step of character string included in the text information is separated into independent word, includes:
Using space as separator, the character string included in the text information is separated into independent word, and beEach word sets the point identification of unique corresponding number-mark and next metadata.
Alternatively, also include before receiving the text information of user's input:
Create for storing the metadatabase of metadata, and establish in word catalogue and metadatabase contained metadata itBetween incidence relation;
In the described the step of word contained in word sequence is changed into corresponding metadata, pass through the associationRelation, find out the metadata corresponding to the word.
Alternatively, the semantic similarity and Feature item weighting calculated between each metadata, and according to calculatingThe step of semantic similarity and Feature item weighting extract the keyword feature item of the word sequence includes:
Using the Words similarity analytic approach based on corpus and based on word vector space model, each metadata is calculatedBetween semantic similarity and Feature item weighting.
The second aspect of the embodiment of the present invention provides a kind of word semantic analysis terminal, the word semantic analysis terminal bagInclude:Processor, memory and the word semantic analyzer that can be run on the memory and on the processor is stored in,Following steps are realized when wherein described word semantic analyzer is by the computing device:
The text information of user's input is received, and morphological analysis is carried out to the text information of input, by the wordThe character string included in information is separated into independent word, obtains word sequence;
Syntactic analysis is carried out to the word sequence being separated out, judges to whether there is syntax error in the word sequence, andThe phrase formed there will be the word of syntax error or adjacent words filters out;
The word contained in word sequence is changed into corresponding metadata, calculates the semantic phase between each metadataLike degree and Feature item weighting, and according to the keyword of semantic similarity and Feature item weighting the extraction word sequence calculatedCharacteristic item, and the semantic marker text according to corresponding to the keyword feature item obtains each word, and the semanteme is markedNote text is stored in text database;
Put in order according to each word in word sequence, match corresponding language from the text database successivelyAdopted retrtieval, and the text message output display that will be synthesized after sequence.
Alternatively, when the word semantic analyzer is by the computing device, following steps are also realized:
Using space as separator, the character string included in the text information is separated into independent word, and beEach word sets the point identification of unique corresponding number-mark and next metadata.
Alternatively, when the word semantic analyzer is by the computing device, following steps are also realized:
Create for storing the metadatabase of metadata, and establish in word catalogue and metadatabase contained metadata itBetween incidence relation;
In the described the step of word contained in word sequence is changed into corresponding metadata, pass through the associationRelation, find out the metadata corresponding to the word.
Alternatively, when the word semantic analyzer is by the computing device, following steps are also realized:
Using the Words similarity analytic approach based on corpus and based on word vector space model, each metadata is calculatedBetween semantic similarity and Feature item weighting.
The third aspect of the embodiment of the present invention provides a kind of computer-readable recording medium, the computer-readable storage mediumUpper storage word semantic analyzer, semantic point of described word is realized when the word semantic analyzer is executed by processorAnalysis method.
In technical scheme provided in an embodiment of the present invention, metadata is used by the preservation for the information for inputting userForm is stored, and metadata can suitably be analyzed, identified, then feeds back to user by the architecture of metadata,When feeding back to user, get rid of to fall the information unrelated with user, the information of user's care is only pushed to user, used so as to convenientFamily obtains the information that machine feedback comes, correct understanding and use information.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, completeSite preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, rather than whole embodiments.It is based onEmbodiment in the present invention, the every other implementation that those skilled in the art are obtained under the premise of creative work is not madeExample, belongs to the scope of protection of the invention.
In computerese, semantic analysis is a logical stage of compilation process, and the task of semantic analysis is to knotCorrect source program carries out the examination of context-sensitive property on structure, carries out type examination.And incorrect source program in structureInspection phase is cannot be introduced into, it is likely that incorrect source program may in terms of context, in terms of type in this structureIt is correct, can simply report an error mistake during compiler.Semantic analysis is to examine source program whether there is semantic error, is code building rankSection collects type information.For example a job of semantic analysis is to carry out type examination, examines whether each operator has languageThe operand that specification allows, when not meeting linguistic norm, compiler should report mistake.If any compiler will be to realitySituation report mistake of the number as array index.Such as some procedure stipulation operands can be forced again, then when this fortuneWhen calculation imposes on an integer and a full mold object, integer should be converted to full mold and be not construed as the mistake of source program by compilerBy mistake.
Current interpersonal exchange, mainly using language, word as instrument, can just make the smooth progress of exchange, peopleThe meaning of expression obtains correct understanding, it is man-machine between session it is in the majority by the way of word, and computer machine can only identify " 0 "" 1 " two kinds of numerical chracters, man-machine conversation will be transmitted by computer instruction, during being transmitted, firstThe data inputs such as these instructions are stored in by input equipment in computer into computer, and by result, most afterwards through electricityThe output equipment of brain, display processing result, people are allowed to read and listen.But this data storage and transmit during, it is necessary toA series of processing is carried out to data, can be only achieved between people and machine it is smooth exchange, so as to reach interpersonal friendshipStream is correct.And the present invention using metadata management by the way of just give this process provides ensure and realization mechanism.
It is a kind of coding scheme in fact to metadata, and it is the data for describing other data;It is commonly used to description digitlization letterCease the coding scheme of resource, especially network information resource;It is also a kind of structural data simultaneously;Metadata refers to from informationWhat is extracted in resource is used to illustrate the data of the structuring of the feature, content of this information resources, such as course name, speakerPeople, duration etc., for tissue, retrieval, description, preservation, management information and knowledge resource;For example we give lessons always at online clubThe information of giving lessons (information resources) of teacher, we can retrieve obtained information, such as course name in the application of club:MatterBuret is managed, speaker:Shi Wei, speaker's time:On June 21st, 2017.Because a basic metadata be by metadata item andWhat content metadata was formed, utilize the metadata to after describing resource, resource is carried out effective filtering classification by our cans, then is addedThe standard criterion of upper metadata, this makes it possible to by effective content of resource information and can not content make a distinction out, also with regard to energyEnough correct implications for giving expression to information well;By development so for many years, the form of metadata has been able to support xml,The forms such as html, this form are easy to the people oneself to customize label, that is, so-called metadata, pass through this labelPattern, user can first look at label (metadata) so as to obtain the information needed for oneself when using data, first numberAccording to by using attribute, the extension to metadata is supported.
The invention provides a kind of semantic analysis, as shown in figure 1, the analysis method comprises the following steps:
Step 101, the text information for receiving user's input, and morphological analysis is carried out to the text information of input, willThe character string included in the text information is separated into independent word, obtains word sequence.
In this step, the text information that user is sent by client is received first.In the specific implementation, user passes through visitorFamily end, such as:App in mobile terminal sends text information, then client by the text information received send toServer end.
Specifically, the text information of user's input includes:The problem of identity information of user and user input information;
The identity information of the user includes:ID information byte, address name byte, phone number byte.
It is envisioned that the identity information of above-mentioned user needs the letter inputted when can send information every time for userBreath, first the identity information of user can also be preserved, when user needs to send information, the problem of user is inputted information with it is pre-The identity information first preserved transmits.
The step of character string included in the text information is separated into independent word described in this step includes:
Using space as separator, the character string included in the text information is separated into independent word, and beEach word sets the point identification of unique corresponding number-mark and next metadata.
Because the information of user's input is character, therefore this step first carries out morphological analysis to the information of input, by wordSymbol string separates successively according to the form of word, identifies the word contained in character string, and wherein None- identified is combinedCharacter, which is kicked, to be removed.
Step 102, syntactic analysis is carried out to the word sequence being separated out, judge to whether there is grammer in the word sequenceMistake, and there will be the phrase that the word of syntax error or adjacent words form to filter out.
Phraseological analysis is carried out to the word sequence that is separated out, judged whether containing not meeting phraseological group of wordsClose, by the way that the attribute of language construction is given on the nonterminal character for representing language construction, and property value is by being attached to grammerThe semantic rules of production calculates, and so as to produce code, carries out syntax-directed translation, and carry out the language of CFGJustice translation.
This step also includes:Sentenced by the analysis to the assignment statement in word sequence, arithmetic expression, logical expressionIt is disconnected, the inconsistent phrase of types of variables is filtered out.
Step 103, the word contained in word sequence is changed into corresponding metadata, calculated between each metadataSemantic similarity and Feature item weighting, and the word sequence is extracted according to the semantic similarity and Feature item weighting that calculateKeyword feature item, and the semantic marker text according to corresponding to the keyword feature item obtains each word, and by institutePredicate justice retrtieval is stored in text database.
Each word is changed into the metadata corresponding to it, the information inputted by establishing metadata schema to user is enteredRow semantic analysis, obtain the original idea of information.
Before the step of text information of the reception user input, in addition to:
Create for storing the metadatabase of metadata, and establish in word catalogue and metadatabase contained metadata itBetween incidence relation;
In the described the step of word contained in word sequence is changed into corresponding metadata, pass through the associationRelation, find out the metadata corresponding to the word.
Specifically, on the basis of having metadata management, word session and the semantic analysis of user profile are performed.The semantemeAnalyze by calculating semantic similarity and Feature item weighting between metadata, to obtain the crucial letter that user inputs problemCease, and the semantic marker text of problem is inputted according to key message establishment user, that is to say by semantic analysis to holdThe semantic marker of style of writing word session, and tape label text database (first number is arrived into the word or file storage with semantic markerAccording to storehouse).
Preferably, the semantic similarity and Feature item weighting calculated between each metadata, and according to calculatingThe step of semantic similarity and Feature item weighting extract the keyword feature item of the word sequence includes:
Using the Words similarity analytic approach based on corpus and based on word vector space model, each metadata is calculatedBetween semantic similarity and Feature item weighting.
Step 104, put in order according to each word in word sequence, successively matched from the text databaseCorresponding semantic marker text, and the text message output display that will be synthesized after sequence.
It is respectively independent information due to getting the semantic marker word or file corresponding with word sequence, does not combineInto text message, therefore in this step, according to first number corresponding to each word uniquely corresponding number-mark and next wordAccording to point identification, be ranked up for the semantic marker word or file of independent information, and synthesis text information exports.The text is believedBreath is the correct expression that user inputs problem.
Such as the interactive stream that Fig. 2 is the word session semantic analysis provided in an embodiment of the present invention based on metadata managementThe schematic block diagram of journey, for convenience of description with reference to Fig. 3, method of the present invention is further explained.The present inventionThe concrete application embodiment method and step of methods described includes:
Step H1, after user opens client or application in mobile phone, the text information of correlation is inputted, is sent to terminalRequest.
The problem of identity information for asking to include user and user input information.
After user is by the application input information of mobile phone terminal, our application also can be defeated by the information of user and userThe information entered is saved, it is desirable to is stored in database;This when is applied will send request to machine, in requestHold the information comprising user profile and input.As a kind of specific implementation, the input information includes ID information wordSection, address name byte, phone number byte, header byte, submission time byte.
Step H2, server terminal receives the request that client is sent, and the information of client input is carried out tentativelyMorphological analysis.
When server terminal receives the information for user's input that client passes over, while carried out to background serverTransmit data.During data are transmitted, server needs to carry out preliminary pretreatment operation to the information of user, carries out letterCease morphological analysis.
Specifically, the morphological analysis is:Information is inputted according to order from left to right to user to be scanned, according toThe morphological rule of language identifies all kinds of words, and produces the attribute word of respective word.The character string that namely user is inputtedBe converted to word (Token) sequence.Then qualitative, fixed length processing is provided to the word identified.
Pre-processed by inputting information to user, then classification processing, such as " I am can be carried out to wordThe such input information of Chinese ", because computer is not aware that this is two words being distinguished with space, only know thisIt is the character string being made up of common character.Can be by certain method (using space to be used as separator here) by morphemeSplit from input character string.Result after segmentation can represent as follows with XML:<sentence>
<word>I</word>
<word>am</word>
<word>Chinese</word>
</sentence>
Step H3, syntactic analysis is carried out to the word sequence that is obtained in above-mentioned steps H2, the mistake in terms of identification information grammerBy mistake, and filtered out.
Syntactic analysis is also a logical stage of compilation process, and the task of syntactic analysis is exactly on the basis of morphological analysisOn word sequence is combined into all kinds of grammatical phrases, then word sequence is judged in structure, judge whether it is normal, canWith by context-free grammar come description scheme.
Step H4, the word in word sequence is transformed into metadata, and semantic analysis is carried out to metadata, obtain userThe semantic marker text corresponding to information is inputted, the semantic marker text is stored in text database;
After morphological analysis and syntactic analysis phase processing, information data is basically available, but still can not eliminate discriminationJustice, understand the problem of not reciprocity aspect, this when, we were converted into first number using data format is carried out into classification restructuringAccording to tactic pattern stored, the management of systematization then is carried out to it, realizes that data are transformed into the tupe of metadata,Then carry out semantic analysis, obtain the real information purpose of user and intention, that is to say by institute's word sequence successivelyCarry out:After the processing that semantic meaning representation, semantic tissue, semantic storage and ambiguity eliminate, word sequence is changed into corresponding to itMetadata sequence.
Our source program have passed through morphological analysis before, syntactic analysis, be semantic analysis work to the phase III,This is the most substantial work of compiler.The first two steps, morphological analysis and syntactic analysis are all in source program formIt is identified and handles, and semantic analysis is that the semanteme of source program is made explanations, and causes source program to send the change of matter.And languageJustice analysis mainly has steps of:Grammer instructs translation, symbol table, type checking, intermediate language, generation intermediate code.WhenBackground server gets the data message that front end passes over, and machine will carry out semantic point to data message this whenAnalysis, it is that these data messages are packaged into metadata schema to carry out semantic analysis operation in of the invention, semantic module, is used forCarry out semantic similarity analysis and Feature item weighting calculates, the keyword feature item of extraction user's input, text is returnedClass, text vector lay the foundation.Semantic module internal body and entity dictionary.Body is used to carry out text semantic pointAnalysis, the basic component units of body are concept, and concept forms conceptional tree, conceptional tree composition body.Text concept solves oneThe problem of word ambiguity or adopted more words one.Entity dictionary is used to carry out entity extraction to text, does not have reality in text to abandonThe content of meaning, simplify the amount of calculation of follow-up text processing, made inferences by frame logic or description logic, collect information sourceIn data, and the pattern information of each local data bank is stored in metadatabase by prescribed form, passes through analysis of metadataBetween semantic relation, establish the global body in corresponding field, the semantic marker of text document performed by semantic analysis, andAnd tape label text document data storehouse is arrived into the text document storage with semantic marker.
Specifically, semantic similarity is to analyze the similarity degree between two words, it is mainly used in text word eliminationIn the fields such as ambiguity, information retrieval, information extraction, machine translation, subjectivity is stronger, therefore can not depart from specific application environmentCarry out analyzer semantic similarity.Have two kinds of computational methods in semantic similarity analysis field at present, one kind be by semantic dictionary,Concept structure about word is calculated in a tree-like structure;Another kind is by the information of word context, fortuneSolved with the method for statistics.With reference to the application scenarios of the present invention, the present invention uses semantic similarity and Feature item weighting meterThe algorithm of calculation is all existing ripe algorithm:Using the Words similarity analytic approach based on corpus, algorithmic formula:
Sim (W1, W2)=aDis (W1, W2)+a;
Wherein, similarity is Sim (W1, W2), and a is an adjustable parameter, and it is meant that:When similarity is 0.5The distance between word distance value, word W1, W2 be Dis (W1, W2).Feature item weighting calculation formula:W=tf × idf, itsIn, w is characterized weighted values of a t in document d, and tf represents the frequency that t occurs in d, and idf represents t inverse ratio text frequency.Using widely used word vectors spatial model in its method, this model includes following steps:Pretreatment-> texts are specialCosine is calculated after sign item selection-> weightings-> generation vector space models.The model by selecting one group of Feature Words in advance, soThe correlation of this group of Feature Words and each word is calculated afterwards, is obtained the feature term vector of the correlation of each word, is used theseSimilarity between vector is as the similarity between the two words.
By carrying out the conversion of metadata to user data, and after semantic analysis, machine generates data message correspondingCorrect option be stored in database, the information source as output end.
Step H5, after user data has carried out semantic analysis, machine can be generated as applying according to corresponding standardKBS, the feature of each data is clearly identified inside KBS, after user inputs information, just knownKnow in database and carry out searching choosing, the data for finding matching are responded, and be that is to say and are stored semantic analysis result to semantic knowledgeStorehouse, after user inputs information, detected from knowledge base, obtain matched knowledge, then found by semantic association, obtain instituteStating needs analysis result.
Although data message has been passed through the conversion of metadata and answered based on the semantic analysis on metadata structure and generationCase, but still can not immediately export and be shown to user terminal, because the information of this when is not also coherent, belong to isolated pointScattered state, just need this when to the further processing of data, by opening relationships between data and data, by establishing thisKind of relation, because each metadata data has a unique mark, numbering identification in this mark with user's input andThe point identification of next metadata, after user input data starts, go in problem knowledge storehouse to search for automatically, search correspondinglyThe problem of answer data text, text is combined with text, forms the corresponding final result that user inputs problem, then machineCould be by the feedback of the information that whole text synthesizes to user, as response of the machine to user, to reach user view.
The second aspect of the embodiment of the present invention provides a kind of word semantic analysis terminal, as shown in figure 3, the word is semanticAnalysing terminal 10 includes:Processor 110, memory 120 and it is stored on the memory and can runs on the processorWord semantic analyzer, wherein realizing following steps when the word semantic analyzer is by the computing device:
The text information of user's input is received, and morphological analysis is carried out to the text information of input, by the wordThe character string included in information is separated into independent word, obtains word sequence;
Syntactic analysis is carried out to the word sequence being separated out, judges to whether there is syntax error in the word sequence, andThe phrase formed there will be the word of syntax error or adjacent words filters out;
The word contained in word sequence is changed into corresponding metadata, calculates the semantic phase between each metadataLike degree and Feature item weighting, and according to the keyword of semantic similarity and Feature item weighting the extraction word sequence calculatedCharacteristic item, and the semantic marker text according to corresponding to the keyword feature item obtains each word, and the semanteme is markedNote text is stored in text database;
Put in order according to each word in word sequence, match corresponding language from the text database successivelyAdopted retrtieval, and the text message output display that will be synthesized after sequence.
Further, when the word semantic analyzer is performed by the processor 110, following steps are also realized:
Using space as separator, the character string included in the text information is separated into independent word, and beEach word sets the point identification of unique corresponding number-mark and next metadata.
Preferably, when the word semantic analyzer is performed by the processor 110, following steps are also realized:
Create for storing the metadatabase of metadata, and establish in word catalogue and metadatabase contained metadata itBetween incidence relation;And contained catalogue establishes different points according to the difference of metadata type in the metadatabaseLayer, it is easy to faster according to directory to corresponding metadata.
In the described the step of word contained in word sequence is changed into corresponding metadata, pass through the associationRelation, find out the metadata corresponding to the word.
Preferably, when the word semantic analyzer is performed by the processor 110, following steps are also realized:
Using the Words similarity analytic approach based on corpus and based on word vector space model, each metadata is calculatedBetween semantic similarity and Feature item weighting.
Memory 120 is used as a kind of non-volatile computer readable storage medium storing program for executing, available for storage non-volatile software journeySequence, non-volatile computer executable program and module.Processor 110 is stored in non-easy in memory 120 by operationThe property lost software program, instruction and module, various function application and data processing so as to execute server, that is, realize above-mentionedThe word semantic analysis of embodiment of the method.
Memory 120 can include storing program area and storage data field, wherein, storing program area can store operation systemApplication program required for system, at least one function;Storage data field can store uses institute according to report automatic generatioin systemData of establishment etc..In addition, memory 120 can include high-speed random access memory, non-volatile memories can also be includedDevice, for example, at least a disk memory, flush memory device or other non-volatile solid state memory parts.In some embodimentsIn, memory 120 is optional including that can pass through net relative to the remotely located memory of processor 110, these remote memoriesNetwork is connected to word semantic analysis terminal.The example of above-mentioned network include but is not limited to internet, intranet, LAN,Mobile radio communication and combinations thereof.
One or more of modules are stored in the memory 120, when by one or more of processorsDuring 110 execution, the word semantic analysis in above-mentioned any means embodiment is performed.
The said goods can perform the method that the embodiment of the present application is provided, and possesses the corresponding functional module of execution method and hasBeneficial effect.Not ins and outs of detailed description in the present embodiment, reference can be made to the method that the embodiment of the present application is provided.
The third aspect of the embodiment of the present invention provides a kind of computer-readable recording medium, the computer-readable storage mediumUpper storage word semantic analyzer, semantic point of described word is realized when the word semantic analyzer is executed by processorAnalysis method.
Through the above description of the embodiments, those of ordinary skill in the art can be understood that each embodimentThe mode of general hardware platform can be added by software to realize, naturally it is also possible to pass through hardware.Those of ordinary skill in the art canTo understand that all or part of flow realized in above-described embodiment method is can to instruct the hard of correlation by computer programPart is completed, and described program can be stored in a computer read/write memory medium, the program is upon execution, it may include as aboveState the flow of the embodiment of each method.Wherein, described storage medium can be magnetic disc, CD, read-only memory (Read-Only Memory, ROM) or random access memory (Random Access Memory, RAM) etc..
In the present invention, when user needs to obtain information resources, user to machine by sending command adapted thereto order, at this momentMachine has got the order of user, further saves the command information of user;In the present invention, the guarantor of data messageDepositing is stored by the form of metadata, and when the information resources of user are saved in inside metadata, metadata can be carried outSuitably analyze, identify, user is then fed back to by the architecture of metadata, when feeding back to user, get rid of to fall andThe unrelated information of user, the information of user's care is only pushed to user, fed back so as to facilitate user to obtain semantic analysis terminalThe information come, correct understanding and use information.
It is understood that for those of ordinary skills, can be with technique according to the invention scheme and this hairBright design is subject to equivalent substitution or change, and all these changes or replacement should all belong to the guarantor of appended claims of the inventionProtect scope.