Data mining device for assisting doctor in optimizing diagnosis and treatmentTechnical Field
The invention belongs to the technical field of artificial intelligence, and particularly relates to a data mining device for assisting a doctor in optimizing diagnosis and treatment.
Background
Currently, a CDSS (Clinical Decision Support System) is generally used in hospitals to manage cases and patients. The CDSS is mainly used for storing and managing database fields of all case data; the prediction of diagnosis and treatment processes and various risks is mainly set in a manual rule mode, and the flexibility is lacked. The doctor can only simply view and edit the data.
However, because the CDSS used between different hospitals is not generally universal, it can only manage the patient cases and data of the hospital, and the flexibility is poor. In addition, the current CDSS system generally lacks effective analysis of electronic data, and thus has limited ability to perform diagnosis and treatment procedures and risk prediction.
With the development of scientific technology. During diagnosis and treatment optimization and risk prediction, the CDSS applies knowledge graph data and performs risk prediction through data information in the knowledge graph (for example, operation a associates risk B and risk C).
In view of the high requirements of the knowledge-graph data construction itself, the CDSS now often encounters many problems, such as: the data scale of the knowledge graph is small, and the application scene is limited in practical application; the data are too many in errors and cannot be really used for actual medical work; the data is too rich, but scene customization is lacked (the data provided for different countries and different regions are completely the same), and the difference of external factors such as the disease probability and the treatment level among people in different regions is ignored, so that the guidance in practical use is poor.
In summary, no data mining device with pertinence, strong practicability and high efficiency is provided to acquire more accurate and more pertinent data so as to improve the effect of the data in practical application.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a data mining device for assisting a doctor in optimizing diagnosis and treatment.
According to an aspect of the present invention, there is provided a data mining device for assisting a doctor in optimizing a medical examination, the data mining device comprising: the system comprises a data mining module and a duplicate removal module;
the data mining module is used for mining data and acquiring the relationship between the entities from medical data;
the duplication removing module is used for carrying out duplication removing operation on the entity concept.
According to a specific embodiment of the present invention, the data mining module performs data mining by using a natural language processing technology or a grammar template-based method.
According to another embodiment of the present invention, the natural language processing technique comprises: NER entity naming recognition + relationship extraction algorithm.
According to yet another embodiment of the present invention,
and the duplication removing module is used for carrying out duplication removing operation on the entity concept by adopting an alias dictionary and/or a relationship similarity mode.
According to another embodiment of the present invention, the data mining apparatus further comprises: a relation probability obtaining module;
and the relation probability obtaining module is used for obtaining the association probability of the relation by adopting a prior probability calculation mode.
According to yet another embodiment of the present invention, the relationship probability obtaining module is further configured to,
acquiring the association probability of the relation;
and optimizing the association probability according to the granularity.
According to yet another embodiment of the invention, the granularity comprises: region, population, age and/or gender.
According to yet another embodiment of the invention, the relationship comprises: disease-symptom, population-probability of illness, age-risk of surgery.
The data mining device provided by the invention analyzes a large number of medical cases and literature data through a big data technology, and useful medical knowledge data is mined from the medical cases and the literature data. These data include: the common treatment means after a certain disease, the disease probability of people in different areas and ages, the risk of sequelae of taking a certain medicine to people in different sexes and ages, the influence of the patient's own condition on various operation risks and the like. The data has high accuracy and strong pertinence, can be used for helping doctors to optimize diagnosis and treatment processes, predicting operation/medication risks and the like in a targeted manner, and improves the working efficiency of the doctors.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments made with reference to the following drawings:
fig. 1 is a schematic structural diagram illustrating an embodiment of a data mining device for assisting a doctor in optimizing a diagnosis and treatment according to the present invention.
The same or similar reference numbers in the drawings identify the same or similar elements.
Detailed Description
The following disclosure provides many different embodiments, or examples, for implementing different features of the invention. To simplify the disclosure of the present invention, the components and arrangements of specific examples are described below. Furthermore, the present invention may repeat reference numerals and/or letters in the various examples. This repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed. It should be noted that the components illustrated in the figures are not necessarily drawn to scale. Descriptions of well-known components and processing techniques and procedures are omitted so as to not unnecessarily limit the invention.
Referring to fig. 1, fig. 1 is a schematic structural diagram illustrating an embodiment of a data mining device for assisting a doctor in optimizing a diagnosis and treatment according to the present invention. The data mining device: adata mining module 10 and adeduplication module 20.
Thedata mining module 10 is configured to perform data mining to obtain entities and relationships between the entities from the medical data. Preferably, thedata mining device 10 performs named entity recognition + entity relationship analysis on the medical data by using natural language processing technology, and mines medical entities and relationships among the entities. Further, data mining is carried out by adopting an NER entity naming recognition + relation extraction algorithm and/or a grammar template-based mode.
Wherein the entity comprises: disease name, symptom description, medication, operation name, medical equipment, etc.; the relationship includes: diseases and symptoms are related, the devices are used in the operation, the medicines are not suitable for people, and the like. For example, the relationship between the entities is: disease- > symptoms, population- > suffering from disease, age + surgery- > surgical risk.
Thedata mining module 10 mines large volumes of medical data, some of which may have problems of duplication or aliasing (different names of the same concept). In order to improve the accuracy of data application and the subsequent processing speed, the medical data needs to be subjected to deduplication operation.
Thededuplication module 20 is configured to perform deduplication operations on the concept of the entity.
Preferably, thededuplication module 20 performs deduplication operations on the concepts of the entities by using alias dictionaries and/or relationship similarities.
An alias dictionary: namely, the alias data of the same concept is unified by defining a certain concept.
Deduplication by relational similarity: namely, each concept can form a relationship with different other concepts, and all relationship information of the two concepts is calculated, so that the similarity between the two concepts can be obtained. If the similarity is high, it can be regarded as the same concept. For example, the following steps are carried out: concept A and concept B, concept C and concept D form a relationship; concept X and concept B, concept C, concept D and concept E form a relationship; then concept a and concept X have a higher similarity of relationship.
The data after the duplication removal has higher accuracy and is more concise, and can be more effectively applied to subsequent operation.
In addition, the data mining device further comprises: a relationshipprobability acquisition module 30. The relationshipprobability obtaining module 30 is configured to obtain the association probability of the relationship by using a prior probability calculation method. For example: in 80% of the literature, an association between disease a and symptom B occurs, and the probability of association between these two entities can be considered to be 80%.
In addition, since the medical data is from a label (i.e., the medical data is from the world, a country, a region, etc.), the relationshipprobability obtaining module 30 may perform the relationship probability analysis according to different granularities to optimize the relationship probability. For example: in the Beijing area, the association probability of disease A and symptom B is 80%; but in losa, the association probability may be 50%. Preferably, the particle sizes include, but are not limited to: region, population, age and/or gender, etc.
The data mining device provided by the invention is used for mining massive medical data in real time and optimizing the mined data according to different granularities, and has the advantages of high accuracy, higher data value and stronger pertinence. The mined data can be widely applied to the knowledge graph and carry out various knowledge reasoning, and the practicability and the applicability are strong.
Although the present invention has been described in detail with respect to the exemplary embodiments and advantages thereof, it should be understood that various changes, substitutions, and alterations can be made hereto without departing from the spirit and scope of the invention as defined by the appended claims. For other examples, one of ordinary skill in the art will readily appreciate that the order of the process steps may be varied while maintaining the scope of the present invention.
Moreover, the scope of the present application is not intended to be limited to the particular embodiments of the process, machine, manufacture, composition of matter, means, methods and steps described in the specification. As one of ordinary skill in the art will readily appreciate from the disclosure of the present invention, processes, machines, manufacture, compositions of matter, means, methods, or steps, presently existing or later to be developed, that perform substantially the same function or achieve substantially the same result as the corresponding embodiments described herein may be utilized according to the present invention. Accordingly, the appended claims are intended to include within their scope such processes, machines, manufacture, compositions of matter, means, methods, or steps.