CN106844325B

Movatterモバイル変換

Info

Publication number: CN106844325B
Application number: CN201510886242.XA
Authority: CN
Inventors: 王宏波
Original assignee: Peking University Medical Information Technology Co ltd
Current assignee: Peking University Medical Information Technology Co ltd
Priority date: 2015-12-04
Filing date: 2015-12-04
Publication date: 2022-01-25
Anticipated expiration: 2035-12-04
Also published as: CN106844325A

Abstract

The invention provides a medical information processing method and a medical information processing device, wherein the medical information processing method comprises the following steps: performing word segmentation on a plurality of medical texts, and clustering the plurality of medical texts; determining the association degree of every two medical texts according to the words of every two medical texts in the medical texts of the same category; judging whether words of any two medical texts in the medical texts of the same category have an association relation or not according to the association degree of every two medical texts; and when the judgment result is yes, performing association storage on the words with the association relation. Through the technical scheme of the invention, the words with the association relation in the medical text can be more accurately and comprehensively excavated, so that the medical word bank constructed according to the words with the association relation is more accurate and comprehensive.

Description

Medical information processing method and medical information processing apparatus

Technical Field

The present invention relates to the field of information processing technologies, and in particular, to a medical information processing method and a medical information processing apparatus.

Background

At present, the informatization of medical services is an international development trend, along with the rapid development of Information technology, more and more hospitals in China are accelerating to implement the overall construction based on an informatization platform and a Hospital Information System (HIS) so as to improve the service level and the core competitiveness of the hospitals, the medical informatization not only improves the working efficiency of doctors and enables the doctors to have more time to serve patients, but also improves the satisfaction and the trust of the patients, and the scientific and technological image of the hospitals is established invisibly. Therefore, the gradual integration of medical service application and basic network platform is becoming a new direction for the informatization development of domestic hospitals, especially large and medium-sized hospitals.

In the medical informatization process, the construction of the medical word stock is very important and fundamental work, and the construction of the medical word stock is beneficial to realizing the electronization of medical records, the analysis of a large number of unstructured medical texts on the Internet and the intelligent analysis of medical records of patients. Although there is a well-established medical word stock system abroad, it is not suitable for the domestic medical word stock with Chinese as the mother language. English-Chinese parallel corpus, Chinese medicine and pharmacy lexicon and the like are also constructed domestically, however, words in the domestic medical lexicon are not comprehensive and lack certain correctness.

Therefore, how to construct a more accurate and comprehensive medical word stock becomes a problem to be solved urgently.

Disclosure of Invention

Based on the problems, the invention provides a new technical scheme, which can more accurately and comprehensively dig out words with association relation in medical texts, so that a medical word bank constructed according to the words with association relation is more accurate and comprehensive.

In view of the above, an aspect of the present invention provides a medical information processing method, including: performing word segmentation on a plurality of medical texts, and clustering the plurality of medical texts; determining the association degree of every two medical texts according to the words of every two medical texts in the medical texts of the same category; judging whether words of any two medical texts in the medical texts of the same category have an association relation or not according to the association degree of every two medical texts; and when the judgment result is yes, performing association storage on the words with the association relation.

In the technical scheme, the association degree of every two medical texts is determined according to the words in every two medical texts in the medical texts of the same category, whether an association relationship exists between any two words in the medical texts of the same category is judged according to the association degree of every two medical texts, and the words with the association relationship are stored in an association manner, for example, in a medical word bank, so as to construct a more complete medical word bank. For example, the words in the a-medical text are: cold and fever, the words in the B medical text are: fever and cough, the words in the C medical text are: cough and cold, it can be seen that a and B have similar words: fever and fever, 30% correlation between a and B, with the same words in B and C: in the cough, the association degree between B and C is 50%, and A and C do not have the same or similar words, but because A and B have an association, the association between A and C can be determined, that is, the association between the words of A and C exists. Therefore, the method and the device can further dig out the words with the implicit association relationship, so that the words with the association relationship in the medical text can be more accurately and comprehensively duout. Furthermore, a search engine of medical treatment information can be constructed according to the words with the incidence relation, or automatic analysis of medical treatment text information is realized, and convenience is provided for outpatients doctors and patients to inquire diseases and symptoms.

Preferably, the plurality of medical texts may be electronic medical records in a medical system of a hospital, or may be obtained from a medical professional website by using a crawler program. Because the scale of the medical texts is larger, the distributed file system can store the medical texts.

In the above technical solution, preferably, the step of performing association storage on the words with association relationship further includes: determining the association degree of words in any two medical texts according to the association degree of any two medical texts; and storing the association degree of the words in any two medical texts.

In the technical scheme, the association degree of the words in any two medical texts is determined according to the association degree of any two medical texts, specifically, the association degree of any two medical texts can be used as the association degree of the words in any two medical texts, and the association degree of the words in any two medical texts can be calculated according to a preset algorithm, so that the association degree of the words can be reflected more accurately and intuitively according to the association degree of the words. For example, the words in the a-medical text are: cold and fever, the words in the C medical text are: cough and coolness, the degree of association between a and C is 10%, and the degree of association between cold and cough is 10%.

In any one of the above technical solutions, preferably, the step of segmenting the plurality of medical texts specifically includes: and performing word segmentation on the medical texts according to the dictionary and the parts of speech of the words in the medical texts.

In the technical scheme, the words of the medical texts can be cut according to words and parts of speech in a dictionary (preferably a medical dictionary), specifically, the words of the medical texts are cut according to the words in the dictionary, if the words in the medical texts do not exist in the dictionary, whether the words are associated with front and rear words or not is judged according to the parts of speech of the words, and whether new words need to be combined or not is judged, so that the situations of word miscut and word omission are effectively avoided, and the accuracy and the comprehensiveness of word cutting are further ensured.

In any one of the above technical solutions, preferably, the step of clustering the plurality of medical texts specifically includes: clustering the plurality of medical texts according to international disease classification and K-means algorithm.

In the technical scheme, the plurality of medical texts can be clustered according to International Classification of Disease (ICD) and a K-means algorithm, and since the medical texts of the same category obtained by clustering have the same Disease, the possibility that the words of the medical texts of the same category obtained by clustering are associated is high, and then the medical texts of the same category are further processed to ensure the processing speed.

In any one of the above technical solutions, preferably, the step of performing association storage on the words with association relations specifically includes: and storing the words with the association relation according to the attributes of the words with the association relation.

In the technical scheme, the word is stored according to the attribute of the word with the association relationship, for example, the attribute of the word is as follows: the medical information storage system comprises body parts (such as heads and limbs), predicates (such as pains and strains), diseases (such as fever and heart diseases), medicines (such as Gregorian tablets and glucose injection), treatment means (such as drip and anesthesia), and neglected words (such as home and patient) which do not contribute to information extraction), so that the storage of related words is more orderly.

Another aspect of the present invention provides a medical information processing apparatus including: the processing unit is used for segmenting a plurality of medical texts and clustering the medical texts; the first determination unit is used for determining the association degree of every two medical texts according to the words of every two medical texts in the medical texts of the same category; the judging unit is used for judging whether words of any two medical texts in the medical texts of the same category have an association relation or not according to the association degree of every two medical texts; and the storage unit is used for associating and storing the words with the association relation when the judgment result is yes.

In the above technical solution, preferably, the storage unit includes: the second determining unit is used for determining the association degree of the words in any two medical texts according to the association degree of any two medical texts; the storage unit is specifically configured to store the association degrees of the words in any two medical texts.

In any one of the above technical solutions, preferably, the processing unit includes: and the word cutting unit is used for cutting words of the medical texts according to the dictionary and the parts of speech of the words in the medical texts.

In any one of the above technical solutions, preferably, the processing unit includes: and the clustering unit is used for clustering the plurality of medical texts according to the international disease classification and the K-means algorithm.

In any of the foregoing technical solutions, preferably, the storage unit is specifically configured to store the words having an association relationship according to the attribute of the words having an association relationship.

Through the technical scheme of the invention, the words with the association relation in the medical text can be more accurately and comprehensively excavated, so that the medical word bank constructed according to the words with the association relation is more accurate and comprehensive.

Drawings

Fig. 1 shows a flow diagram of a medical information processing method according to an embodiment of the invention;

fig. 2 shows a schematic configuration diagram of a medical information processing apparatus according to an embodiment of the present invention;

fig. 3 shows a schematic diagram of a medical information processing apparatus according to an embodiment of the invention.

Detailed Description

So that the manner in which the above recited objects, features and advantages of the present invention can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to the embodiments thereof which are illustrated in the appended drawings. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and therefore the scope of the present invention is not limited by the specific embodiments disclosed below.

Fig. 1 shows a flow diagram of a medical information processing method according to an embodiment of the present invention.

As shown in fig. 1, a medical information processing method according to an embodiment of the present invention includes:

102, performing word segmentation on a plurality of medical texts, and clustering the plurality of medical texts;

104, determining the association degree of every two medical texts according to the words of every two medical texts in the medical texts of the same category;

step 106, judging whether words of any two medical texts in the medical texts of the same category have an association relation according to the association degree of each two medical texts, if so, enteringstep 108, otherwise, ending the process;

and step 108, performing association storage on the words with the association relation.

In the above technical solution, preferably, step 108 further includes: determining the association degree of words in any two medical texts according to the association degree of any two medical texts; and storing the association degree of the words in any two medical texts.

In the technical scheme, the words of the medical texts can be cut according to words and parts of speech in a dictionary (preferably a medical dictionary), specifically, the words of the medical texts are cut according to the words in the dictionary, if the words in the medical texts do not exist in the dictionary, whether the words are associated with front and rear words or not is judged according to the parts of speech of the words, and whether new words need to be combined or not is judged, so that the situations of word miscut and word omission are effectively avoided, and the accuracy and the comprehensiveness of word cutting are further ensured. Preferably, the words obtained by segmenting the medical text are medical words, so as to avoid interference of irrelevant words (such as every day, patients, home) in determining the relevance of the medical text.

In any of the above technical solutions, preferably, step 108 specifically includes: and storing the words with the association relation according to the attributes of the words with the association relation.

Fig. 2 shows a schematic configuration diagram of a medical information processing apparatus according to an embodiment of the present invention.

As shown in fig. 2, a medical information processing apparatus 200 according to an embodiment of the present invention includes: the processing unit 202 is configured to perform word segmentation on a plurality of medical texts and perform clustering on the plurality of medical texts; the first determining unit 204 is configured to determine, according to words of every two medical texts in the medical texts of the same category, a degree of association between every two medical texts; the judging unit 206 is configured to judge whether words of any two medical texts in the medical texts of the same category have an association relationship according to the association degree of each two medical texts; and a storage unit 208, configured to, if the determination result is yes, associate and store the words having an association relationship.

In the technical scheme, the association degree of every two medical texts is determined according to the words in every two medical texts in the medical texts of the same category, whether an association relationship exists between any two words in the medical texts of the same category is judged according to the association degree of every two medical texts, and the words with the association relationship are stored in an association manner, for example, in a medical word stock, so as to construct a more perfect medical word stock. For example, the words in the a-medical text are: cold and fever, the words in the B medical text are: fever and cough, the words in the C medical text are: cough and cold, it can be seen that a and B have similar words: fever and fever, 30% correlation between a and B, with the same words in B and C: in the cough, the association degree between B and C is 50%, and A and C do not have the same or similar words, but because A and B have an association, the association between A and C can be determined, that is, the association between the words of A and C exists. Therefore, the method and the device can further dig out the words with the implicit association relationship, so that the words with the association relationship in the medical text can be more accurately and comprehensively duout. Furthermore, a search engine of medical treatment information can be constructed according to the words with the incidence relation, or automatic analysis of medical treatment text information is realized, and convenience is provided for outpatients doctors and patients to inquire diseases and symptoms.

In the above technical solution, preferably, the storage unit 208 includes: the second determining unit 2082, configured to determine association degrees of words in any two medical texts according to the association degrees of any two medical texts; the storage unit 208 is specifically configured to store the association degrees of the words in any two medical texts.

In any of the above technical solutions, preferably, the processing unit 202 includes: the word segmentation unit 2022 is configured to segment words of the plurality of medical texts according to the dictionary and parts of speech of the words in the plurality of medical texts.

In any of the above technical solutions, preferably, the processing unit 202 includes: a clustering unit 2024, configured to cluster the plurality of medical texts according to international disease classification and K-means algorithm.

In the technical scheme, the plurality of medical texts can be clustered according to International Classification of Disease (International Classification of Disease) and a K-means algorithm, and since the medical texts of the same category obtained by clustering have the same Disease, the possibility of association among words of the medical texts of the same category obtained by clustering is high, and then the medical texts of the same category are further processed to ensure the processing speed.

In any of the foregoing technical solutions, preferably, the storage unit 208 is specifically configured to store the words having an association relationship according to the attribute of the words having an association relationship.

As shown in fig. 3, the medical information processing apparatus 300 first obtains a medical text from a medical professional website by using a crawler technology, and obtains an electronic medical record from a medical system in a hospital, and since the amounts of information obtained from the medical professional website and the medical system are large, the medical text and the electronic medical record obtained from the medical professional website are stored in a distributed file system as a plurality of medical texts, word segmentation and clustering are performed on the plurality of medical texts, and then the association degree of each two medical texts is calculated by using a Jacard method according to words in each two medical texts in the same category, for example, for two medical texts a and B, the word after word segmentation of a medical text is: "patient", "sore throat and itching throat", "no phlegm", "stomach distension", "lumbago", the words after the word segmentation of the B medical text are: "dry cough", "pharyngalgia and pharynx itch", "no phlegm", "stomachache", "waist soreness" and "fear of cold", exactly the same word pair can be obtained by calculation: "pharyngalgia pharynx itch" and "pharyngalgia pharynx itch", "no phlegm" and "no phlegm"; and the higher similarity terms are "gastrectasia" and "stomachache", "lumbago" and "soreness of waist". And then determining whether any two medical texts in the medical texts of the same category have an association relationship by adopting a vector cosine method, thereby obtaining the association relationship of some words, wherein the association relationship can not be obtained by calculating the similarity by adopting a Jacard method. For example, the two medical texts a and B and the other medical text C, C are the following words after word segmentation: the medical records A and C have an incidence relation through calculation, so that the words in the A and C have an incidence relation, for example, the words in the A and C have an incidence relation with the words in the tonsil inflammation, and then the words in the incidence relation are stored in a medical word stock, so that the medical word stock facing to a medical actual scene is constructed.

The technical scheme of the invention is explained in detail in the above with the help of the attached drawings, and by analyzing the real data (i.e. medical history) in the medical system of the hospital and the medical text in the medical professional website, words with association relation in the medical text can be more accurately and comprehensively excavated, so that a medical word stock facing to the medical actual scene is constructed.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A medical information processing method characterized by comprising:

performing word segmentation on a plurality of medical texts, and clustering the plurality of medical texts;

determining the association degree of every two medical texts according to the words of every two medical texts in the medical texts of the same category;

judging whether words of any two medical texts in the medical texts of the same category have an association relation or not according to the association degree of every two medical texts;

if so, performing association storage on the words with the association relation;

the step of performing association storage on the words with association relations specifically includes:

and storing the words with the association relation according to the attributes of the words with the association relation.

2. The medical information processing method according to claim 1, wherein the step of storing the words having the association relationship in association further includes:

determining the association degree of words in any two medical texts according to the association degree of any two medical texts;

and storing the association degree of the words in any two medical texts.

3. The medical information processing method according to claim 1, wherein the step of segmenting the plurality of medical texts specifically includes:

and performing word segmentation on the medical texts according to the dictionary and the parts of speech of the words in the medical texts.

4. The medical information processing method according to claim 1, wherein the step of clustering the plurality of medical texts specifically includes:

clustering the plurality of medical texts according to international disease classification and K-means algorithm.

5. A medical information processing apparatus characterized by comprising:

the processing unit is used for segmenting a plurality of medical texts and clustering the medical texts;

the first determination unit is used for determining the association degree of every two medical texts according to the words of every two medical texts in the medical texts of the same category;

the judging unit is used for judging whether words of any two medical texts in the medical texts of the same category have an association relation or not according to the association degree of every two medical texts;

the storage unit is used for performing association storage on the words with the association relation when the judgment result is yes;

the storage unit is specifically configured to store the words with the association relationship according to the attributes of the words with the association relationship.

6. The medical information processing apparatus according to claim 5, wherein the storage unit includes:

the second determining unit is used for determining the association degree of the words in any two medical texts according to the association degree of any two medical texts;

the storage unit is specifically configured to store the association degrees of the words in any two medical texts.

7. The medical information processing apparatus according to claim 5, wherein the processing unit includes:

and the word cutting unit is used for cutting words of the medical texts according to the dictionary and the parts of speech of the words in the medical texts.

8. The medical information processing apparatus according to claim 5, wherein the processing unit includes:

and the clustering unit is used for clustering the plurality of medical texts according to the international disease classification and the K-means algorithm.