Embodiment
Hereinafter describe embodiments of the invention in detail with reference to the accompanying drawings.It should be appreciated that following embodiments and unawarenessThe figure limitation present invention, also, on the means solved the problems, such as according to the present invention, it is not absolutely required to be retouched according to following embodimentsThe whole combinations for each side stated.For simplicity, to identical structure division or step, identical has been used to mark or markNumber, and the description thereof will be omitted.
[hardware configuration of inquiry answering device]
Fig. 1 is the figure for the hardware construction for showing the inquiry answering device in the present invention.In the present embodiment, with smart phoneDescription is provided as the example of inquiry answering device.Although it is noted that illustrating smart phone in the present embodiment as inquiryAsk answering device 1000, but it is clear that not limited to this, inquiry answering device of the invention can be mobile terminal (smart mobile phone,Intelligent watch, Intelligent bracelet, music player devices), notebook computer, tablet personal computer, PDA (personal digital assistant), fax dressPut, printer or be with inquiry answering internet device (such as digital camera, refrigerator, television set)Etc. various devices.
First, the hardware configuration of the block diagram description inquiry answering device 1000 (2000,3000) of reference picture 1.In addition, at thisFollowing construction is described as example in embodiment, but the inquiry answering device of the present invention is not limited to the construction shown in Fig. 1.
Inquiry answering device 1000 includes input interface 101, CPU 102, the ROM being connected to each other via system bus103rd, RAM 105, storage device 106, output interface 104, communication unit 107 and short-distance wireless communication unit 108 and displayUnit 109.Input interface 101 isFor via such as microphone, button, button or touch-screen operating unit (not shown) receive from user input data andThe interface of operational order.It note that the display unit 109 being described later on and operating unit can be at least partly integrated, also,For example, it may be carrying out picture output in same picture and receiving the construction of user's operation.
CPU 102 is system control unit, and generally comprehensively answering device 1000 is inquired in control.In addition, for example,CPU 102 carries out the display control of the display unit 109 of inquiry answering device 1000.It is all that the storages of ROM 103 CPU 102 is performedThe fixed data of such as tables of data and control program and operating system (OS) program.In the present embodiment, stored in ROM 103Each control program, for example, under the OS stored in ROM 103 management, carry out at such as scheduling, task switching and interruptionThe software of reason etc. performs control.
RAM 105 is constructed such as SRAM (static RAM), DRAM as needing stand-by power supply.ThisIn the case of, RAM 105 can store the significant data of control variable of program etc. in a non-volatile manner.In addition, RAM 105Working storage and main storage as CPU 102.
The model of the storage training in advance of storage device 106 is (for example, word error correction mode, physical model, Rank models, semantemeModel etc.), for the database retrieved and for performing according to application program of inquiry answer method of the present invention etc..It note that database here can also be stored in the external device (ED) of such as server.In addition, storage device 106 stores allSuch as it is used for the information transmission/receiving control program for being transmitted/receiving via communication unit 107 and communicator (not shown)Various programs, and various information that these programs are used.In addition, storage device 106 also stores inquiry answering device 1000Configuration information, inquire the management data etc. of answering device 1000.
Output interface 104 is for being controlled the display picture with display information and application program to display unit 109The interface in face.Display unit 109 is for example constructed by LCD (liquid crystal display).Have such as by being arranged on display unit 109The soft keyboard of the key of numerical value enter key, mode setting button, decision key, cancel key and power key etc., can receive single via displayThe input from user of member 109.
Inquire answering device 100 via communication unit 107 for example, by radio communications such as Wi-Fi (Wireless Fidelity) or bluetoothMethod, data communication is performed with external device (ED) (not shown).
In addition, inquiry answering device 1000 can also via short-distance wireless communication unit 108, in short-range withExternal device (ED) etc. carries out wireless connection and performs data communication.And short-distance wireless communication unit 108 by with communication unit107 different communication means are communicated.It is, for example, possible to use its communication range is shorter than the communication means of communication unit 107Bluetooth Low Energy (BLE) as short-distance wireless communication unit 108 communication means.In addition, being used as short-distance wireless communication listThe communication means of member 108, for example, it is also possible to perceive (Wi-Fi Aware) using NFC (near-field communication) or Wi-Fi.
[first embodiment]
[according to the inquiry answer method of first embodiment]
It can be stored according to the inquiry answer method of the present invention by inquiring that the CPU 102 of answering device 1000 is readROM 103 or control program on storage device 106 or via communication unit 107 from passing through network and inquiry answering deviceThe webserver (not shown) of 1000 connections and the control program downloaded are realized.
, it is necessary to first preparation model and database before the inquiry answer method according to the present invention is carried out.Idiographic flow is such asUnder:
(1) crawl of related data:The crawl of solid data and the crawl of associated data such as label etc., itsIn, solid data refers to the entity in certain field (such as video field), such as film " private savings of husbands ", " the Mi months pass ", andLabel is exactly the word for describing the entity:Such as " social forest ", " love ".
(2) training of model:Word error correcting model:The mapping of the pinyin table and fuzzy phoneme of word is set up, passes through the instruction to language materialPractice, calculate the transition probability model between the probabilistic model of word and word;Physical model:Using language model to including solid dataLanguage material be trained, it is established that identification entity model;Rank models:Pass through ready data and feature extraction, trainingInto GBDT decision-tree model;Semantic model:By language model and training corpus, the model of semanteme can be extracted by being trained to.Sample needed for model above training process, can be crawled from public network.
(3) the index storage of data:To the field modeling, based on existing data and model, be processed into be available for retrieval andThe structural data and storage of semantic understanding.
Next, being illustrated with reference to Fig. 2 to Fig. 4 to inquiry answer method according to a first embodiment of the present invention.Wherein,Fig. 2 is the flow chart for illustrating inquiry answer method according to a first embodiment of the present invention;Fig. 3 is to illustrate the inquiry according to the present inventionThe flow chart of the semantic processes step of answer method;Fig. 4 is the sequence step for illustrating the inquiry answer method according to the present inventionFlow chart.
As shown in Fig. 2 first, in semantic processes step S101, language is carried out to the inquiry message (query) that user inputsJustice processing, with the user view of the inquiry purpose of reaction of formation inquiry message and for used in being retrieved according to inquiry messageRetrieve information.Preferably, as shown in figure 3, semantic processes step S101 further comprises:User view identification step S1011 is rightInquiry message carries out user view identification, obtains the user view corresponding to inquiry message;Entity recognition step S1012, passes throughThe physical model of training in advance, identifies solid data from inquiry message;And semantic understanding step S1013, by advanceThe semantic model of training, carries out semantic understanding, to obtain retrieval information to inquiry message.Here, inquiry message be user for exampleBy the text message of input through keyboard, by changing the text envelope that user is for example generated by the voice messaging of microphone inputOne in the text message of the text message and the text combination for being converted into user speech information of breath and user's inputKind.For example, user can input inquiry message " I will see The Shawshank Redemption ", now, pass through Entity recognition step, Ke YicongEntity " The Shawshank Redemption " is identified in the inquiry message, by semantic understanding step, semantic reason is carried out to the inquiry messageSolution, can obtain retrieval information, retrieval information here using the intelligible slot value pair of computer form, for example " title=The Shawshank Redemption ".
Next, in searching step S102, based on the retrieval information, the data based on participle are carried out from databaseRetrieval, obtains the list of candidate's solid data.Here, first by the slot value obtained in semantic understanding step S1013 to conversionInto the sentence that can be retrieved (for example, " title=The Shawshank Redemption " is converted into " film title:The Shawshank Redemption "),Then retrieval request is sent with returning result list to database., can be according to pre-prepd participle mould in retrievingType carries out participle to the value (such as " The Shawshank Redemption ") in retrieval information, and in the preparation of database, also can be in storehouseEach solid data carries out participle and is indexed with falling sequence, and the result of matching is found out this makes it possible to the result by participleCome.It is based on the advantage that participle is retrieved, even if the inquiry message of user's input error due to vagueness in memory is (for example" cucurbit baby brother "), by inquiry message participle it is " cucurbit baby " and " brother " by participle model, can be also examined from databaseRope goes out desired result (such as " Calabash Brothers ");Or, user may input incomplete inquiry message (such as " Xiao ShenkeRedeem "), by participle model by inquiry message participle be " Xiao Shenke " and " redeeming ", the phase can be also retrieved from databaseThe result (such as " The Shawshank Redemption ") of prestige.
Next, in sequence step S103, based on the degree of correlation between candidate's solid data and user view, to candidateSolid data is ranked up processing.Preferably, as shown in figure 4, sequence step S103 further comprises:Relatedness computation stepS1031, the degree of correlation between candidate's solid data and user view is calculated according to GBDT models;And relevancy ranking stepS1032, based on the degree of correlation calculated, is ranked up using Rank models to candidate's solid data.Here, candidate's reality is being calculatedDuring the degree of correlation between volume data and user view, first by context state, entity static information (such as label, name,Classification etc.), the multidate information (such as distance of temperature, marking and current time) of entity calculate characteristic value, then will be allCharacteristic value calculate the last degree of correlation by pre-prepd GBDT models.Here, the characteristic value of static information is logicalThe matching degree for the inquiry message that information is inputted with user is crossed come what is calculated, this matching degree can pass through phonetic (including fuzzy phoneme)Editing distance, the editing distance of word, semantic editing distance etc. determine that and the characteristic value of multidate information can be by certainFormula calculate.
Finally, in the first result determines step S104, will there is candidate's solid data of the highest degree of correlation in list, reallyIt is set to the response result for user's query information.Here it is possible to by display unit 109, by the time with the highest degree of correlationSolid data is selected as optimal result and returns to user.
Inquiry answer method according to a first embodiment of the present invention, by the way that based on retrieval information, base is carried out from databaseIn the data retrieval of participle, the list of candidate's solid data is obtained, and based on the phase between candidate's solid data and user viewGuan Du, processing is ranked up to candidate's solid data, can obtain following technique effect:Even if a. due to user's vagueness in memory orInput error and input incomplete inquiry message, can also retrieve preferable result;B. allow users to obtain with usingThe closer retrieval result of intention at family.
[according to the software configuration of the inquiry answering device of first embodiment]
Fig. 5 is the block diagram for the software configuration for illustrating the inquiry answering device according to first embodiment.As shown in figure 5, inquiryAnswering device 1000 includes semantic processing unit 1101, retrieval unit 1102, the result of sequencing unit 1103 and first and determines listMember 1104.
Specifically, semantic processing unit 1101 includes:User view recognition unit 11011, is used inquiry messageFamily intention assessment, obtains the user view corresponding to inquiry message;Entity recognition unit 11012, passes through the entity of training in advanceModel, identifies solid data from inquiry message;And semantic understanding unit 11013, by the semantic model of training in advance,Semantic understanding is carried out to inquiry message, to obtain retrieval information.Retrieval unit 1102 is based on the retrieval information, from databaseThe data retrieval based on participle is carried out, the list of candidate's solid data is obtained.Sequencing unit 1103 includes:Correlation calculating unit11031, the degree of correlation between candidate's solid data and user view is calculated according to GBDT models;And relevancy ranking unit11032, based on the degree of correlation calculated, candidate's solid data is ranked up.First result determining unit 1104, by listCandidate's solid data with the highest degree of correlation, is defined as the response result for user's query information.
[second embodiment]
[according to the inquiry answer method of second embodiment]
Inquiry answer method according to a second embodiment of the present invention is illustrated with reference to Fig. 6.Wherein, Fig. 6 is exampleShow the flow chart of inquiry answer method according to a second embodiment of the present invention.
As shown in fig. 6, according to the inquiry answer method of second embodiment and the inquiry answer method according to first embodimentDifference is, adds the first judgment step S204, the second judgment step S205 and the second result and determines step S206.
Specifically, in the first judgment step S204, according to similarity distance, the list obtained in step s 103 is calculatedIn there is first degree of correlation between the candidate's solid data and inquiry message of the highest degree of correlation, and whether judge first degree of correlationLess than first threshold.Here, the different attribute of solid data is equivalent to the different slots of semantic understanding, and the inquiry of attribute and userAsk that the degree of correlation of information is determined by similarity distance, similarity distance here include the editor of phonetic (including fuzzy phoneme) away fromEditing distance from, the editing distance of word and semanteme etc., wherein, the editing distance of word is for example because font is close, unisonance is differentSituations such as word, few word multiword and produce.If first degree of correlation is less than first threshold (being "Yes" in step S204), then it represents that fromDiffering greatly between the desired result of optimal result and user retrieved in database, at this moment, processing proceed to the second knotFruit determines step S206, and the solid data identified in Entity recognition step S1012 is defined as into response result and returned toUser so that it is not anticipated that in the case of result, user can also obtain preferable result in database.For example, withFamily inputs inquiry message " I will see dear Interpreter Officer " in step S101, and does not have the film in database, at this momentThe entity " dear Interpreter Officer " identified in Entity recognition step S1012 can be returned to user.
On the other hand, if first degree of correlation is more than or equal to first threshold (in step S204 be "No"), handle intoRow is to the second judgment step S205, to judge whether first degree of correlation is more than Second Threshold.If first degree of correlation is more than secondThreshold value (being "Yes" in step S205), then it represents that the optimal result retrieved from database is consistent with the desired result of user,And handle and proceed to step S104, by the optimal result, be defined as the response result for user's query information.So thatUser results in satisfied response result.
On the other hand, if first degree of correlation is not more than Second Threshold (being "No" in step S205), then it represents that from dataDifference is still suffered between the desired result of optimal result and user retrieved in storehouse, at this moment, processing proceeds to step S206, withThe solid data identified in Entity recognition step S1012 is defined as response result and returns to user.
It is advance in training, checking and the performance tested according to model for note that above first threshold and Second ThresholdDetermine, to ensure in the performance recalled with had in accuracy rate.
In addition, in above-mentioned second judgment step S205, if having candidate's solid data of the highest degree of correlation in listFirst degree of correlation between inquiry message is not more than Second Threshold, can also determine whether there is the second high correlation in listWhether the degree of correlation between the candidate's solid data and inquiry message of degree is more than Second Threshold, and be judged as the situation of "Yes"Under, proceed to step S104.It can so avoid leading to miss optimal response knot due to sequencing errors in step s 103Really.In the case where not appreciably affecting processing speed, preceding N that can be in step S205 successively in calculations list is (for example, N=3) degree of correlation between the candidate's solid data and inquiry message of position.
Inquiry answer method according to a second embodiment of the present invention, by calculating the phase between optimal result and inquiry messageGuan Du, carrys out the response result that certainly directional user returns, can obtain following technique effect:So that without pre- in databaseIn the case of phase result, user can also obtain preferable result.
[according to the software configuration of the inquiry answering device of second embodiment]
Fig. 7 is the block diagram for the software configuration for illustrating the inquiry answering device according to second embodiment.As shown in fig. 7, according toThe difference of the inquiry answering device 2000 of second embodiment and the inquiry answering device 1000 according to first embodiment is, increasesFirst judging unit 1204, the second result determining unit 1206 and the second judging unit 1205.
Specifically, the first judging unit is according to candidate's entity number in similarity distance calculations list with the highest degree of correlationAccording to first degree of correlation between inquiry message, and judge whether first degree of correlation is less than first threshold.Second result determines singleMember recognizes the Entity recognition unit in the case where first judging unit judges that first degree of correlation is less than first thresholdThe solid data gone out, is defined as response result.Second judging unit, judges whether first degree of correlation is more than Second Threshold, wherein,In the case where second judging unit judges that first degree of correlation is more than Second Threshold, the first result determining unit will haveThere is candidate's solid data of the highest degree of correlation, be defined as response result, and wherein, the similarity distance includes the editor of phoneticAt least one of editing distance of distance, the editing distance of word and semanteme.
[preferred embodiment]
[according to the inquiry answer method of preferred embodiment]
Inquiry answer method according to the preferred embodiment of the invention is illustrated with reference to Fig. 8.Fig. 8 is to illustrate basisThe flow chart of the inquiry answer method of the preferred embodiment of the present invention.
As shown in figure 8, according to the inquiry answer method and the inquiry answer method according to first embodiment of preferred embodimentDifference be, add pretreatment and error correction step S301.
Specifically, in pretreatment and error correction step S301, inquiry message is pre-processed, and by instructing in advanceExperienced word error correcting model, correction process is carried out to the inquiry message by pretreatment.Here, the pretreatment includes believing inquiryThe deletion of the stop words and spoken word that are included in breath and the capital and small letter of letter and number included in inquiry message is changedDeng.For example, when user input inquiry message in include some colloquial words when, carry out semantic processes step S101 itIt is preceding, it is necessary to remove these colloquial words.For example, in the feelings that the inquiry message that user inputs is " I will see dear diplomat "Under condition, colloquial word " I will see " can be deleted by pretreatment first.Then, will be pre- by the word error correcting model of training in advanceInquiry message " dear diplomat " after processing is corrected as " dear Interpreter Officer ".Next, to by pretreatment and error correctionInquiry message after processing carries out subsequent treatment.In addition, user is also possible to the inquiry message of input error due to pronunciation mistake,For example in the case where the inquiry message that user inputs is " Xiao Shengke's redeems ", entangled by the fuzzy phoneme in word correction processIt is wrong, additionally it is possible to be corrected as " The Shawshank Redemption ".
According to the inquiry answer method of preferred embodiment by being carried out before semantic processes are carried out at pretreatment and error correctionReason, can be corrected to the inquiry message that user inputs, so as to improve the accuracy of later retrieval.
[according to the software configuration of the inquiry answering device of preferred embodiment]
Fig. 9 is the block diagram for the software configuration for illustrating the inquiry answering device according to preferred embodiment.As shown in figure 9, according toThe difference of the inquiry answering device 3000 of preferred embodiment and the inquiry answering device 1000 according to first embodiment is, increasesPretreatment and error correction unit 1301.
Specifically, pretreatment and error correction unit 1301 are pre-processed to inquiry message, and pass through training in advanceWord error correcting model, correction process is carried out to the inquiry message by pretreatment.
In addition, present invention also offers a kind of inquiry response system based on semantic understanding.Figure 10 is to illustrate the present inventionThe schematic diagram of inquiry response system.As shown in Figure 10, inquiry response system 100 includes user terminal 1001 and server 1002,User terminal 1001 is connected with server 1002 via network 1003, and network 1003 can be cable network or wireless network.
User terminal 1001 includes input receiving unit 10011, semantic processing unit 10012 and transmitting element 10013.ClothesBusiness device 1002 includes receiving unit 10021, retrieval unit 10022, sequencing unit 10023 and result determining unit 10024.
Specifically, in user terminal 1001, input receiving unit 10011 receives the inquiry message of user's input;LanguageAdopted processing unit 10012 carries out semantic processes to inquiry message, with the user view of the inquiry purpose of reaction of formation inquiry messageWith for the retrieval information used in being retrieved according to inquiry message;Transmitting element 10013 is by inquiry message, the inquiry messageUser view and retrieval information are sent to server in the way of associated, and receive the response for inquiry message from serverAs a result.
On the other hand, in server 1002, receiving unit 10021 from user terminal receive inquiry message and with the inquiryThe associated user view of information and retrieval information;Retrieval unit 10022 is based on the retrieval information, and base is carried out from databaseIn the data retrieval of participle, the list of candidate's solid data is obtained;Sequencing unit 10023 is based on candidate's solid data and anticipated with userThe degree of correlation between figure, is ranked up to candidate's solid data;As a result determining unit 10024 will have the highest degree of correlation in listCandidate's solid data, be defined as the response result for user's query information, and response result is sent to user terminal.
Although with reference to exemplary embodiment, invention has been described above, above-described embodiment is only to illustrate this hairBright technical concepts and features, it is not intended to limit the scope of the present invention.It is all to be done according to spirit of the inventionAny equivalent variations or modification, should all be included within the scope of the present invention.