A kind of medical and beauty treatment knowledge mapping method for auto constructing, system and storage mediumTechnical field
The present invention relates to medical and beauty treatment fields more particularly to a kind of medical and beauty treatment knowledge mapping method for auto constructing, systemAnd computer readable storage medium.
Background technique
In terms of knowledge mapping is now widely used for artificial intelligence field, especially conversational system.The knowledge of knowledge mappingExpression and inferential capability can largely promote the question and answer effect of conversational system.From the point of view of the covering surface of knowledge mapping,It is broadly divided into world knowledge map and domain knowledge map.The research of knowledge mapping has focused largely on general field at present, such asThe Social Graph of the knowledge vault, Facebook of Google, Baidu's knowledge mapping of Baidu.World knowledge figureSpectrum is mainly used in the business scenarios such as the search, recommendation, question and answer of Internet.Due to it is emphasised that range, is difficult deeplyTo specific area.And domain knowledge map is towards specific industry, thus there is certain depth and completeness.It discloses at this stageDomain knowledge map it is considerably less, the especially knowledge mapping of medical and beauty treatment industry, almost without.
Currently, the truthful data of each reshaping device is all non-structured text data, although shaping project Datong District is smallIt is different, but the knowledge such as item package term, price, activity, interrogation words art, technology, instrument, material, expert are far from each other.Such asThe problem of what rapidly constructs knowledge, meets the customization demand of Different hospital, be urgent need to resolve from original dialogue data.
Summary of the invention
In view of this, the present invention extracts knowledge from actual medical beauty customer service dialogue automatically, a kind of medical and beauty treatment is proposedKnowledge mapping method for auto constructing, system and storage medium.
To reach above-mentioned purpose, the present invention adopts the following technical scheme: a kind of medical and beauty treatment knowledge mapping side of building automaticallyMethod includes the following steps:
(1) ontological construction step: the ontology of building medical and beauty treatment knowledge mapping;
The ontological construction of the medical and beauty treatment knowledge mapping includes to the ontology of medical and beauty treatment knowledge mapping and the mould of dataFormula layer is defined,
The ontology of the medical and beauty treatment knowledge mapping and the definition of the mode layer of data are comprising following: define class in ontology,The relationship between class in definition ontology, defined attribute slot;
The class is that is indicated is the set of object, the class in the ontology include it is following any one or it is a variety of: it is wholeShape project, technology, instrument, material, symptom, expert;
The relationship between class in the ontology of medical and beauty treatment knowledge mapping is divided into object properties and data attribute;
Relationship of the object properties between entity, between conceptual entity;
Relationship of the data attribute between entity and its attribute;
The object properties include it is following any one or it is a variety of: Project Technical, project instrument, project material;Wherein,The Project Technical, which is expressed as * * project, * * technology;The project instrument, which is expressed as * * project, * * instrument;The project materialMaterial, which is expressed as * * project, * * material;
The data attribute include it is following any one or it is a variety of: title, expert info general description, introduction, advantage,Price, discount;
The defined attribute slot is the attribute value that regulation can be inserted in slot, inserts attribute value for each example;
The mode layer of the data of the medical and beauty treatment knowledge mapping defines;
(2) entity extraction step:
The entity extracts to automatically identify name entity from medical and beauty treatment original dialogue corpus and extract;
The entity include it is following any one or it is a variety of: shaping project, technology, instrument, material, symptom, price, speciallyFamily, discount;
(3) attribute extraction step:
The attribute extraction is that entity extraction attributes extraction is entity structure attribute list;The attribute extraction includes objectAttribute value extracts and data attribute value extracts two processes;
The object attribute values extraction process is several entities extracted in dialogue using name entity identification algorithms;
The meaning that the data attribute value extraction process is talked about specifically by each sentence pair in the identification dialogue of intention assessment algorithmFigure will be intended to correspond with data attribute, then entity is corresponding with intention, realize the data attribute assignment of entity.
Further, in the object attribute values extraction process, if in one section of dialogue, only one shaping project and skillArt, instrument, material can then determine corresponding object attribute values: (shaping project, Project Technical, technology), (shaping project, itemMesh instrument, instrument), (shaping project, project material, material);If in one section of dialogue, while there is multiple shaping projects and skillArt, instrument, material then determine object attribute values according to position nearby principle.
Further, several entities are automatically identified from medical and beauty treatment original dialogue corpus using Bi-LSTM+CRF.
Further, pass through the intention of each sentence pair words in the identification dialogue of XGBoost intention assessment algorithm.
Further, it is described be intended to encompass it is following any one or it is a variety of: expert info general description, price, discount,It is preferential activity, ask symptom, ask medical history, introduction, advantage, therapeutic modality, device information, whether pain, convalescence, maintain whenBetween, preoperative points for attention, Postoperative item, interrogation process, specific some expert introduction, technology introduction, introduction of apparatus, materialIt introduces.
Further, the data attribute value extraction process is pressed if occurring multiple entities and data attribute value simultaneouslyData attribute value is determined according to position nearby principle, when the context of a data attribute value multiple entities occurs simultaneously, is then selectedThe entity nearest with the data attribute value space length is selected to be corresponding to it.
Further, in the present invention is implemented, the method further includes entity alignment step,
The entity alignment step, specifically, the mode for taking entity to map, first determines standards entities title, pass through wordThe different expression of same entity is mapped to title by similarity algorithm.
Using the real dialog data of medical and beauty treatment as training sample, training word2vec model can obtain Entity recognitionThe entity and standards entities arrived calculates similarity, selects the highest standards entities of similarity for final mapping entity.
Further, another embodiment of the present invention additionally provides a kind of medical and beauty treatment knowledge mapping automatic build system, instituteMedical and beauty treatment knowledge mapping automatic build system is stated to include: memory, processor and be stored on the memory and in instituteThe computer instruction run on processor is stated, when the computer instruction is run by the processor, completes above-mentioned medical and beauty treatmentThe step of knowledge mapping method for auto constructing.
Further, another embodiment of the present invention additionally provides a kind of computer readable storage medium, and the computer canReading operation on storage medium has computer program, when the computer program is run by processor, completes above-mentioned medical and beauty treatment and knowsThe step of knowing map method for auto constructing.
With the prior art, medical and beauty treatment knowledge mapping method for auto constructing, system and storage medium disclosed by the invention, sheetInvention is directly started with from original dialogue data, rapid build shaping domain knowledge map, and there is knowledge broad covered area to meet againThe advantages of individual demand, takes up a job as a doctor by ontological construction described above, entity extraction, attribute extraction and entity alignment schemesIt treats and constructs shaping domain knowledge map in the true customer service dialogue data of beauty, the knowledge for not meeting body construction is subjected to artificial schoolJust, while to the knowledge containing mistakes of an ordinary nature such as grammatically wrong sentence, wrong words it is manually cleaned, to obtain the knowledge graph of high qualitySpectrum.On the basis of this knowledge mapping, problem is converted structuring by natural language understanding the problem of by question answering systemQuery language, inquire knowledge mapping, obtain accurate answer, improve the question and answer effect of conversational system.
Detailed description of the invention
Attached drawing described herein is used to provide to further understand invention, constitutes a part of the invention, the present inventionIllustrative embodiments and their description be used to explain the present invention, do not constitute improper limitations of the present invention.In the accompanying drawings:
Fig. 1 is the process step figure of medical and beauty treatment knowledge mapping method for auto constructing of the invention;
Fig. 2 is BI-LSTM layers of schematic diagram.
Specific embodiment
In order to be clearer and more clear technical problems, technical solutions and advantages to be solved, tie belowDrawings and examples are closed, the present invention will be described in further detail.It should be appreciated that specific embodiment described herein is only usedTo explain the present invention, it is not intended to limit the present invention.
Natural language processing field, for knowledge mapping research mainly at two aspects: information extraction, from non-structuralDrawing-out structure knowledge in text data constructs knowledge mapping;Semanteme parsing, the natural language problem conversion face that user is inputtedTo the structuralized query of knowledge mapping.The present invention focuses in a first aspect, construct shaping domain knowledge map, on this basis may be usedFor the knowledge query of shaping industry conversational system, the question and answer effect of conversational system is improved.Process is realized in practical conversational systemIn, the problem of needing to visitor, carries out semantic parsing, parses the intention for including in visitor's problem and entity, and it is distributed intoKnowledge mapping query statement forms the structuralized query towards knowledge mapping, obtains relevant knowledge.
The building mode of knowledge mapping mainly has top-down and bottom-up two kinds of building modes.Wherein, top-downIt refers to first defining ontology and data pattern for knowledge mapping, then entity is added to knowledge base.The present invention is oneself usedIt pushes up downward mode and constructs medical and beauty treatment domain knowledge map, it is top-down to refer to first determining for medical and beauty treatment domain knowledge mapThe mode layer of justice good ontology and data, then entity is added to knowledge base.Next explanation is further described with specific embodiment.
Embodiment 1
A kind of medical and beauty treatment knowledge mapping method for auto constructing includes the following: as shown in Fig. 1
(1) ontological construction step: the ontology of building medical and beauty treatment knowledge mapping;
The ontological construction of the medical and beauty treatment knowledge mapping includes to the ontology of medical and beauty treatment knowledge mapping and the mould of dataFormula layer is defined,
The ontology of the medical and beauty treatment knowledge mapping and the definition of the mode layer of data are comprising following: define class in ontology,The relationship between class in definition ontology, defined attribute slot;
The class is that is indicated is the set of object (example).In embodiments of the present invention, it is preferable that in the ontologyClass include it is following any one or it is a variety of:
Shaping project (Item), technology (Technology), instrument (Instrument), material (Material), symptom(Symptom), expert (Expert).
The relationship between class in the ontology of medical and beauty treatment knowledge mapping is divided into object properties and data attribute;
Relationship of the object properties between entity, between conceptual entity;
Relationship of the data attribute between entity and its attribute;
The object properties include it is following any one or it is a variety of: Project Technical (HASTECHNOLOGY), project instrument(HASINSTRUMENT), project material (HASMATERIAL);
Wherein, Project Technical (HASTECHNOLOGY), which is expressed as * * project, * * technology, is in the triple form of expression(project (Item), Project Technical (HASTECHNOLOGY), technology (Technology));
Project instrument (HASINSTRUMENT), which is expressed as * * project, * * instrument, is (project in the triple form of expression(Item), project (Item), instrument (Instrument));
Project material (HASMATERIAL), which is expressed as * * project, * * material, is (project in the triple form of expression(Item), project material (HASMATERIAL), material (Material)).
The data attribute include it is following any one or it is a variety of:
Title (name), introduces (introduce), advantage at expert info general description (expert_info)(advantage), price (price), discount (discount).
The defined attribute slot is the attribute value that regulation can be inserted in slot, inserts attribute value for each example;
(2) entity extraction step:
The entity extracts to automatically identify name entity from medical and beauty treatment original dialogue corpus and extract;
The entity include it is following any one or it is a variety of: shaping project, technology, instrument, material, symptom, price, speciallyFamily, discount;
Such as the entities such as shaping project and technology, instrument, material.
(3) attribute extraction step:
The attribute extraction is that entity extraction attributes extraction is entity structure attribute list;The attribute extraction includes objectAttribute value extracts and data attribute value extracts two processes;
The object attribute values extraction process is several entities extracted in dialogue using name entity identification algorithms;
The meaning that the data attribute value extraction process is talked about specifically by each sentence pair in the identification dialogue of intention assessment algorithmFigure will be intended to correspond with data attribute, then entity is corresponding with intention, realize the data attribute assignment of entity.
In the embodiment of the present invention, optionally, the name entity identification algorithms are Bi-LSTM+CRF algorithm.For example, shapingThe attribute project name of project, Project Introduction, project advantage, item price, project discount etc., and it is then real that its attribute value, which extracts,The attribute additional attribute value of body.
If in one section of dialogue, only one shaping project and technology, instrument, material can then determine corresponding object categoryProperty value: (shaping project, Project Technical, technology), (shaping project, project instrument, instrument), (shaping project, project material, materialMaterial).
It is if in one section of dialogue, while there are multiple shaping projects and technology, instrument, material, then former nearby according to positionThen determine object attribute values.
LSTM (Long Short-Term Memory) is shot and long term memory network, is a kind of time recurrent neural network,It is suitable for being spaced and postpone relatively long critical event in processing and predicted time sequence.
CRF algorithm is condition random field algorithm (Conditional Random Field Algorithm), it is combinedThe characteristics of maximum entropy model and hidden Markov model, under the conditions of being given stochastic variable X, the Markov of stochastic variable Y withAirport.In conditional probability model in P (Y/X), Y is output variable, indicates flag sequence, and x is input variable, indicates observation sequenceColumn.Tranining database is utilized when training, and conditional probability model is obtained by Maximum-likelihood estimation, it is then pre- using the modelIt surveys.
The structure of BI-LSTM+CRF is made of word embedding, LSTM layers two-way, CRF layers:
Word embedding: the vector that word insertion or word insertion in sentence are constituted.Wherein, word insertion is random initialChange, word insertion is obtained by data training.
BI-LSTM layers, as shown in Fig. 2, wherein input of the word embedding as Bi-LSTM model, using Bi-Model can sufficiently extract the contextual feature of word, the accuracy of lift scheme.
CRF layers are the linear expression for connecting Bi-LSTM output, solve the problems, such as abstraction sequence feature using Bi-LSTM, makeThe mark information of sentence level is effectively utilized with CRF.Under LSTM+CRF model, output will no longer be independent from each other markLabel, but optimal sequence label (name Entity recognition result).
The meaning that the data attribute value extraction process is talked about specifically by each sentence pair in the identification dialogue of intention assessment algorithmFigure will be intended to correspond with data attribute, then entity is corresponding with intention, realize the data attribute assignment of entity.Such as it ordersName entity identification algorithms identify project entity " cutting double-edged eyelid ", it is intended that recognizer identifies that Project Introduction is intended to " double-edged eyelidMode sunken cord, 3 points, 6 points incision etc. it is a variety of.We here have special aesthetics design teacher.Can according to face ratio,Eye basic condition and you want the effect reached, design one you want to reach or be suitble to eyes skin type ", then can be trueThe value for determining the attribute " Project Introduction " of entity " cutting double-edged eyelid " is that " mode of double-edged eyelid sunkens cord, 3 points, 6 points incisions etc. are moreKind, we here have special aesthetics design teacher.Can according to face ratio, eye basic condition and you want the effect reachedFruit, design one you want to reach or be suitble to eyes skin type ".
Further, if occurring multiple entities and data attribute value simultaneously, data are determined according to position nearby principleAttribute value, when the context of a data attribute value simultaneously there are multiple entities when, then selection with the data attribute value space away fromIt is corresponding to it from nearest entity.
It is described be intended to encompass it is following any one or it is a variety of: expert info general description, price, discount, it is preferential activity,Ask symptom, ask medical history, introduction, advantage, therapeutic modality, device information, whether pain, convalescence, hold time, preoperative attentionItem, Postoperative item, interrogation process, specific some expert introduction, technology introduction, introduction of apparatus, disclosure.
In the embodiment of the present invention, optionally, the intention assessment algorithm is XGBoost algorithm.
XGBoost is the one of which of boosting algorithm, and the thought of Boosting algorithm is by many Weak Classifier collectionAt a strong classifier is formed together, because XGBoost is a kind of promotion tree-model, it is to integrate many tree-modelsTogether, a very strong classifier is formed.The algorithm idea is exactly that constantly addition is set, and constantly carries out feature and divides next lifeLong one tree, one tree of addition, is one new function of study in fact, goes the residual error of fitting last time prediction every time.When we trainCompletion obtains k tree, we will predict the score of a sample, in fact be exactly the feature according to this sample, in each treeAs soon as corresponding leaf node can be fallen on, each leaf node corresponds to a score, finally only needs each tree is correspondingScore add up be exactly the sample predicted value.
The mode layer building is the core of medical and beauty treatment knowledge mapping, is managed using ontology library on data LayerThe mode layer of medical and beauty treatment knowledge mapping.Here ontology is the concept template in structural knowledge library, is formed by ontology libraryMedical and beauty treatment knowledge base not only hierarchical structure is stronger, and degree of redundancy is smaller.
The building of the ontology of the medical and beauty treatment knowledge mapping includes following:
The relationship between the class or concept in class or concept, definition ontology in definition ontology, defined attribute slot,
The defined attribute slot is the attribute value that regulation can be inserted in slot, inserts attribute value for each example.
For example it further illustrates, such as specific project " cutting double-edged eyelid " is the one of " shaping project (Item) " this classA example.
The relationship between class in the ontology of medical and beauty treatment knowledge mapping is divided into object properties and data attribute;
Relationship of the object properties between entity, between conceptual entity;
Relationship of the data attribute between entity and its attribute.
The entity extracts i.e. name Entity recognition (Named Entity Recognition), and entity is in knowledge mappingMost basic element, the quality of knowledge mapping building will be directly influenced by extracting accurate rate, recall rate etc..Entity extraction is to knowKnow most basic and most critical a step in extracting.
There are many expression way, such as " cutting double-edged eyelid " " operation of artificial double-fold eyelid ", " to draw double for the same entity being extracted into real dialogIt is the same standards entities " cutting double-edged eyelid " that eyelid ", " whole double-edged eyelid " are actually corresponding, it is therefore desirable to merge entityBe aligned.Further, optionally, medical and beauty treatment knowledge mapping method for auto constructing of the invention, also comprising entity in stepSuddenly.
The entity alignment step, specifically, the mode for taking entity to map, first determines standards entities title, pass through wordThe different expression of same entity is mapped to title by similarity algorithm.
Further, the real dialog data of medical and beauty treatment is training sample, and training word2vec model can be to entityIt identifies that obtained entity and standards entities calculate similarity, selects the highest standards entities of similarity for final mapping entity.
Word2vec is the correlation model that a group is used to generate term vector, is the neural network to be shallow and double-deck, for instructingPractice the word text with construction linguistics again, network is existing with vocabulary, and need to guess the input word of adjacent position, in word2vecUnder middle bag of words are assumed, the sequence of word be it is unessential, after training is completed, word2vec model can be used to map each wordTo a vector, it can be used to indicate word to the relationship between word, which is the hidden layer of neural network.
Embodiment 2
The embodiment of the invention discloses a kind of medical and beauty treatment knowledge mapping automatic build system, the medical and beauty treatment knowledge graphSpectrum automatic build system includes: memory, processor and is stored on the memory and runs on the processorComputer instruction when the computer instruction is run by the processor, completes medical and beauty treatment knowledge mapping method for auto constructingThe step of.The specific implementation case of the medical and beauty treatment knowledge mapping method for auto constructing and the medical and beauty treatment of embodiment 1Knowledge mapping method for auto constructing, details are not described herein.
Embodiment 3
The embodiment of the invention discloses a kind of computer readable storage medium, run on the computer readable storage mediumThere is computer program, when the computer program is run by processor, completes medical and beauty treatment knowledge mapping method for auto constructingStep.Institute's medical and beauty treatment knowledge mapping method for auto constructing is the medical and beauty treatment knowledge mapping side of building automatically of embodiment 1Method, details are not described herein.
The computer readable storage medium include but is not limited to flash memory, hard disk, multimedia card, card-type memory (for example,SD or DX memory etc.), random access storage device (Random Access Memory, RAM), static random-access memory(Static Random-Access Memory, SRAM), read-only memory (Read Only Memory, ROM), electric erasableProgrammable read only memory (Electrically Erasable Programmable Read-Only Memory, EEPROM),Programmable read only memory (Programmable Read-Only Memory, PROM), magnetic storage, disk, CD etc.Non-volatile memory medium.
The foregoing is merely the embodiments of description of the invention one or more embodiment, are not limited to this hairBright specification one or more embodiment.To those skilled in the art, description of the invention one or more embodimentThere can be various modifications and variations.All any modification, equivalent replacement, improvement within the spirit and principles of the present applicationDeng should be included within scope of the claims.
Description above describe the preferred embodiment of the present invention, it is to be understood that the present invention is not limited to above-mentioned implementationExample, and excluding other embodiments should not be regarded as.Without departing from the principle and spirit of the present invention, art technologyPersonnel combine the known or prior art, knowledge also to should be regarded as a variety of change, modification, replacement and modification of these embodiments progressIn protection scope of the present invention.