CN118352012A

Movatterモバイル変換

Info

Publication number: CN118352012A
Application number: CN202410796744.2A
Authority: CN
Inventors: 崔亮亮; 雷家铿; 韩刚; 郑锦阳; 刘昌�
Original assignee: Dayi Zhicheng High Tech Co ltd
Current assignee: Dayi Zhicheng High Tech Co ltd
Priority date: 2024-06-20
Filing date: 2024-06-20
Publication date: 2024-07-16
Anticipated expiration: 2044-06-20
Also published as: CN118352012B

Abstract

The application relates to the technical field of data processing, in particular to a CDA document management method and system. The method comprises the following steps: acquiring a first document of a target patient, and performing word segmentation processing on the first document to acquire a plurality of words; window division is carried out on the first document by utilizing the target window length to obtain a plurality of windows; under the condition that a first word and a second word in the multiple words coexist in the same window, acquiring a first frequency of the first word, a second frequency of the second word and a word distance between the first word and the second word so as to determine a target evaluation value of the first word; according to the target evaluation value of the first word, a keyword is determined from a plurality of words of the first document, and a second document similar to the first document is determined according to the keyword, so that the second document is archived as a recommended document of the target patient. Through the technical scheme, the recommended document of the target patient can have a higher reference value, and CDA documents can be better managed.

Description

CDA document management method and system

Technical Field

The application relates to the technical field of data processing, in particular to a CDA document management method and system.

Background

The CDA (Clinical Document Architecture ) is a document type for recording patient information of a patient, for example, the CDA document can record information such as medical history, examination result, treatment plan, drug prescription and the like of the patient, and the patient can check information such as self-treatment condition and the like conveniently by recording the patient information of the patient through the CDA document, and medical staff can check the patient information and analyze the patient information conveniently, so that the quality and efficiency of medical service are improved.

The CDA file can also realize the exchange of patient information of patients among different medical institutions, for example, in the case of treatment by patient replacement hospitals, the patient information recorded by the patients in a first hospital can provide references for treatment of the patients in a second hospital, so that the communication cost between doctors and the patients is reduced, and the repeated examination of the patients can be reduced; therefore, CDA documents are effectively managed, the hospitalizing cost of patients can be effectively reduced, and the quality of medical service is improved.

In order to realize the management of CDA documents, a related technology, such as a Chinese patent application document with publication number of CN117954030A, provides an interconnection and intercommunication method for electronic medical records, combines PDF and CDA, and provides the patient with a shared PDF file medical record with the same medical record category according to the CDA document of the patient so as to facilitate the patient to download the shared PDF file medical record.

However, in the related art, the shared file duration is provided to the patient according to the medical record category of the patient, the data volume of the file medical record provided to the patient is large, the difficulty of the patient to acquire the document with the reference value from the downloadable shared file medical record is large, and the reference value of the file medical record to the patient is low, so that the related art cannot realize effective management of the CDA document.

Disclosure of Invention

In order to solve the problem that the CDA document cannot be effectively managed in the related art, the application provides a CDA document management method and system.

According to a first aspect of an embodiment of the present application, there is provided a CDA document management method, including: acquiring a first document with a CDA document type corresponding to a target patient, and performing word segmentation processing on the first document to acquire a plurality of words in the first document; window division is carried out on the first document by utilizing the target window length, so that a plurality of windows are obtained; under the condition that a first word and a second word in multiple words coexist in the same window, acquiring a first frequency of occurrence of the first word and a second frequency of occurrence of the second word when the first word and the second word are coincided, and acquiring a word distance between the first word and the second word when the first word and the second word are coincided; determining a target evaluation value of the first word according to the first frequency, the second frequency and the word distance when the first word is respectively coincided with a plurality of second words; the target evaluation value is used for representing the probability that the first word is a keyword; according to the target evaluation value of the first word, a keyword is determined from a plurality of words of the first document, and at least one second document similar to the first document is determined from a plurality of CDA documents according to the keyword, so that the second document is archived as a recommended document of the target patient.

Therefore, the keywords of the CDA documents can better reflect the actual conditions of the patient compared with the medical record types of the patient, and therefore the second documents which are more matched with the actual conditions of the target patient can be screened out from a plurality of CDA documents and can be used as recommended documents of the target patient to be filed, so that the recommended documents of the target patient have more reference value, and better management of the CDA documents is realized.

Furthermore, the target evaluation value for representing the probability that the first word is a keyword is determined according to the first frequency of the first word when the first word and the second word are concurrent, the second frequency of the second word and the word distance between the words, and compared with the frequency of occurrence of the words in the whole text of the first document, the word distance between the different words when the different words are concurrent can better reflect the importance degree of the words in the document, so that the obtained target evaluation value of the first word can better represent the probability that the first word is the keyword in the first document, and is beneficial to archiving the second document which is more matched with the actual condition of a patient as the recommended document of a target patient.

Optionally, determining the target evaluation value of the first word according to the first frequency, the second frequency and the word distance when the first word is co-present with the plurality of second words respectively includes: determining a first correlation coefficient between the first word and the second word according to the first frequency, the second frequency and the word distance when the first word and the second word coexist in the same window; determining a second correlation coefficient between the first word and the second word according to a first average value of a plurality of first correlation coefficients in which the first word and the second word coexist in a plurality of windows respectively; and determining the target evaluation value of the first word according to second average values of a plurality of second correlation coefficients between the first word and a plurality of second words.

Optionally, determining the target evaluation value of the first word according to the second average value of the plurality of second correlation coefficients between the first word and the plurality of second words, includes: inputting the first word into a pre-trained word vector model to obtain a first word vector; inputting the second word into a pre-trained word vector model to obtain a second word vector; taking the first similarity between the first word vector and the second word vector as a third correlation coefficient between the first word and the second word; and determining a third average value of a plurality of third correlation coefficients between the first word and a plurality of second words respectively, and adding the second average value and the third average value to obtain a target evaluation value of the first word.

In this way, the third average value of the plurality of third correlation coefficients between the first word and the plurality of second words can be used to comprehensively reflect the correlation between the first word and the other words than the first word as a whole, and the second average value and the third average value can be added to obtain the target evaluation value of the first word.

Optionally, determining the target evaluation value of the first word according to the second average value of the plurality of second correlation coefficients between the first word and the plurality of second words, includes: acquiring a first frequency of occurrence of a first word in all words of a first document; acquiring the inverse frequency of a first word in CDA documents except the first document; and multiplying the first frequency, the inverse frequency and the second average value to obtain a target evaluation value of the first word.

In this way, the first frequency of the first word in all words of the first document can better reflect the importance degree of the first word in the first document, the inverse frequency of the first word in other CDA documents except the first document can better reflect the importance degree of the first word relative to other documents, and the second average value can better reflect the relevance of the first word and other words in the document on semantic information, so that the first frequency, the inverse frequency and the second average value are multiplied to obtain the target evaluation value of the first word, and the target evaluation value can better represent the probability that the first word is the keyword in the first document.

Optionally, the target window length is obtained by: arranging the words in the first document according to the sequence to obtain word strings which comprise all the words in the first document after arrangement; dividing the word strings according to preset window lengths in a plurality of different preset window lengths to obtain a plurality of divided word substrings; acquiring the word quantity of words contained in the word substring, and taking the variance of the word quantity of the word substrings as a target parameter value of a preset window length; and taking the preset window length with the minimum target parameter value in the multiple preset window lengths as the target window length.

Therefore, the preset window length with the minimum target parameter value in various preset window lengths is used as the target window length, so that the loss degree of semantic information caused by splitting the same word into different character substrings can be reduced as much as possible.

Optionally, under the circumstance that the first word and the second word coexist in the plurality of words, acquiring the word distance between the first word and the second word when the first word and the second word coexist includes: under the condition that the first word and the second word coexist in the plurality of words, if the number of the second words which coexist with the first word is a plurality of, acquiring a plurality of first distances between the first word and a plurality of second words; and taking the largest one of the first distances as the word distance between the first word and the second word in the co-occurrence.

Therefore, the largest one of the first distances is used as the word distance between the first word and the second word when the first word and the second word are coincided, so that the influence of the relatively close distance between different words caused by window division on the word distance can be avoided, and the association degree between different words can be well represented through the word distance between different words, so that the target evaluation value of the words can be well determined.

Optionally, under the circumstance that the first word and the second word coexist in the plurality of words, acquiring the word distance between the first word and the second word when the first word and the second word coexist includes: under the condition that the first word and the second word coexist in the plurality of words, if the number of the second words which coexist with the first word is a plurality of, acquiring a plurality of first distances between the first word and a plurality of second words; and taking a fourth average value of the plurality of first distances as a word distance between the first word and the second word when the first word and the second word are coincided.

Thus, the average value of the first distances is taken as the word distance between the first words and the second words when the first words and the second words are coincided, so that adverse effects on the degree of association between the words caused by the fact that the mutual distances between the words are close due to window division can be avoided, and judgment on the degree of association between the words can be prevented from being influenced due to the fact that the mutual distances between the words are far.

Optionally, determining at least one second document similar to the first document from the plurality of CDA documents according to the keyword includes: determining, for a third document of the plurality of CDA documents, a second similarity between the third document and the first document according to the number of keywords contained in the third document; and taking the third document with the second similarity larger than the preset similarity threshold as a second document similar to the first document to obtain at least one second document similar to the first document.

Thus, the second document with the second similarity larger than the preset similarity threshold is used as a second document similar to the first document, so that at least one second document similar to the first document is obtained, the second document comprising at least one keyword in a plurality of keywords of the first document can be obtained, and the second document can provide valuable reference for the illness state of the target patient because the second similarity of the second document and the first document is larger than the preset similarity threshold.

Optionally, obtaining the first document with the document type corresponding to the target patient being the CDA document includes: acquiring an initial fourth document of which the document type of the target patient is a CDA document; and acquiring a key diagnosis information set, and processing the fourth document by using the key diagnosis information set to acquire a first document containing key diagnosis information in the key diagnosis information set.

In this way, the fourth document is processed through the key diagnosis information set to obtain the first document containing the key diagnosis information in the key diagnosis information set, so that the redundancy of data in the first document can be reduced, the calculation amount can be reduced under the condition that a recommended document with a reference value is recommended for a target patient, and the running memory pressure of the equipment is reduced.

According to a second aspect of an embodiment of the present application, there is provided a CDA document management system including: a processor and a memory storing computer program instructions which, when executed by the processor, implement the steps of the method for managing CDA documents provided in the first aspect of the present application.

The technical scheme provided by the embodiment of the application can comprise the following beneficial effects: because the second document is determined according to the keywords of the first document, compared with the medical record type of the patient, the keywords in the CDA document can better reflect the actual condition of the patient, so that the second document which is more matched with the actual condition of the target patient can be screened out from a plurality of CDA documents through the keywords of the CDA document, and the second document is filed as the recommended document of the target patient, so that the recommended document of the target patient has more reference value, and better management of the CDA document is realized.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.

FIG. 1 is a schematic diagram of an exemplary application scenario of an embodiment of the present application;

FIG. 2 is a flowchart illustrating a method of managing CDA documents according to an example embodiment;

fig. 3 is a schematic diagram illustrating a structure of a CDA document management system according to an exemplary embodiment.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the application. Rather, they are merely examples of apparatus and methods consistent with aspects of the application as detailed in the accompanying claims.

Firstly, the application scenario of the embodiment of the application is briefly introduced, in the application scenario of the application, as a document recorded with patient information of a patient, the CDA document can realize the exchange of patient information of the patient among different medical institutions, as shown in fig. 1, when the patient is in a first hospital, the patient information of the patient can be recorded at a first terminal of the first hospital, so as to generate a CDA document of the patient, and the first terminal can send the CDA document to a server; or a first terminal located at a first hospital may send the recorded patient information to a server, which generates a CDA document of the patient.

The second terminal may obtain the CDA document of the patient from the server so that patient information of the patient is known through the CDA document, and the second terminal may be, for example, a terminal of the patient; or the second terminal may be a second hospital different from the first hospital where the patient is at a visit, and medical staff in the second hospital may employ a targeted medical solution to the patient according to the patient information recorded in the CDA document.

In view of the above technical problems, the embodiments of the present application provide a method and a system for managing CDA documents, in which, since a second document is determined according to a keyword of a first document, the keyword in the CDA document can better reflect an actual situation of a patient compared with a medical record type of the patient, so that the second document that is more matched with the actual situation of a target patient can be screened out from a plurality of CDA documents by the keyword of the CDA document, and the second document is archived as a recommended document of the target patient, so that the recommended document of the target patient has a more reference value, and better management of the CDA document is achieved.

It should be noted that, all actions of acquiring signals, information or data in the present application are performed under the condition of conforming to the corresponding data protection rule policy of the country of the location and obtaining the authorization given by the owner of the corresponding device.

FIG. 2 is a flowchart illustrating a method of managing CDA documents, as shown in FIG. 2, according to an exemplary embodiment, including the following steps.

In step S101, a first document of which the document type corresponding to the target patient is a CDA document is acquired, and word segmentation processing is performed on the first document, so as to obtain a plurality of words in the first document.

The CDA file of the target patient can be recorded with various information of the target patient, such as family history, historical disease symptoms, historical examination records, gender, age, name and the like of the patient, through the information recorded in the CDA file, information sharing among different hospitals when the patient visits in different hospitals can be realized, information sharing among different departments when the patient visits in the department can also be realized, so that corresponding medical solutions can be given according to the actual condition of the patient, or repeated examination of the patient can be avoided, thereby reducing the hospitalization cost of the patient and improving the hospitalization experience of the patient.

The target patient may have corresponding identification information, for example, the identification information of the target patient may be personal information such as an identification card number or a mobile phone number of the patient, or the identification information of the target patient may be a number of a patient card of the target patient in a hospital; the CDA documents of the target patient may be obtained by inputting the identification information of the target patient into a target database, such as a server or local storage device, which may store a plurality of CDA documents of a plurality of patients, for example.

In one embodiment, obtaining a first document of which the document type corresponding to the target patient is a CDA document includes: acquiring an initial fourth document of which the document type of the target patient is a CDA document; and acquiring a key diagnosis information set, and processing the fourth document by using the key diagnosis information set to acquire a first document containing key diagnosis information in the key diagnosis information set.

For example, the set of critical visit information may be the patient's family history, historical symptoms, historical exam records, gender, and age.

The first document is subjected to word segmentation processing to obtain a plurality of words in the first document, and keywords in the plurality of words can be conveniently obtained, so that the CDA document matched with the actual situation of the target patient can be determined according to the keywords.

The word segmentation process is a link in natural language processing, and refers to the segmentation of a continuous text sequence into meaningful vocabulary sequences, and common word segmentation processing means comprise: word segmentation method based on character string matching, word segmentation method based on understanding, word segmentation method based on statistics, word segmentation method based on deep learning and word segmentation method based on rules; for example, the word segmentation method based on rules realizes word segmentation processing through a predefined rule set, and the rules may include vocabulary rules, syntax rules, semantic rules and the like; the word segmentation method based on deep learning can use a pre-trained Bi-LSTM (Bidirectional Long Short-Term Memory) network model, a CNN (Convolutional Neural Networks, convolutional neural network) model or a transducer model and the like to realize word segmentation processing on the document.

In step S102, window division is performed on the first document with the target window length, and a plurality of windows are obtained.

For example, for text nodes in a first document, text information contained in the text nodes generally has continuity, window division is performed on the first document by using a target window length, multiple sentences with a certain continuity can be divided into the same window, and thus multiple divided windows are obtained, and the text nodes can be text paragraphs, paragraph titles and the like in the document.

Therefore, window division is carried out on the first document by utilizing the target window length, so that whether different words co-occur in the same window or not is facilitated, and when different words co-occur in the same window, as sentences in the same window usually have certain logicality, the words co-occur with a larger probability have a relation, and the probability that the words are keywords in the first document is facilitated to be determined.

In one embodiment, the target window length may be obtained by: arranging the words in the first document according to the sequence to obtain word strings which comprise all the words in the first document after arrangement; dividing word strings according to preset window lengths in a plurality of different preset window lengths to obtain a plurality of divided word substrings; acquiring the word quantity of words contained in the word substring, and taking the variance of the word quantity of the word substrings as a target parameter value of a preset window length; and taking the preset window length with the minimum target parameter value in the multiple preset window lengths as the target window length.

For example, the preset window length may refer to a length corresponding to the number of characters, and the plurality of different preset window lengths may include, for example, a length corresponding to 10 characters, a length corresponding to 15 characters, a length corresponding to 20 characters, a length corresponding to 21 characters, and the like.

Arranging the words in the first document according to the sequence to obtain word strings which comprise all the words in the first document after arrangement, dividing the word strings according to the preset window length to obtain a plurality of divided word sub-strings, and dividing the word sub-strings with different lengths under different preset window lengths; for example, when a word string is divided by 10 character numbers, the number of characters of the obtained plurality of word substrings is 10; when the word string is divided by 15 character numbers, the number of characters of the obtained plurality of word substrings is 15.

Obtaining the word quantity of the words contained in the word sub-strings, and taking the variance of the word quantities of the word sub-strings as a target parameter value of a preset window length, dividing the word strings by the preset window length may cause that the words in a part of the divided word sub-strings lack logic, for example, the character sub-strings obtained by dividing according to the 10 character quantity are the blood system: dizziness free, pallor ", whereas the relatively complete information is" blood systems: no dizziness, pale, debilitation history, no dim eyesight, tinnitus and the like, and the pale and white in the pale are split into front and back 2 different character sub-strings, and the number of words in the character sub-strings is 2.

Under the same preset window length, the number of words contained in different obtained character sub-strings is different, and partial words can lose original semantic information due to being split into different character sub-strings; under different preset window lengths, the words split into different character sub-strings are different, or the words originally split into different character sub-strings are reserved in the same character sub-string possibly because of the different preset window lengths, semantic information of the words is reserved completely, and the overall loss degree of the semantic information when the words are split into different character sub-strings can be reflected through variances of the number of the words of the plurality of word sub-strings; therefore, the preset window length with the minimum target parameter value in various preset window lengths is used as the target window length, so that the loss degree of semantic information caused by splitting the same word into different character substrings can be reduced as much as possible.

In step S103, under the condition that the first word and the second word in the plurality of words coexist in the same window, a first frequency of occurrence of the first word and a second frequency of occurrence of the second word in the co-occurrence process are obtained, and a word distance between the first word and the second word in the co-occurrence process is obtained.

Under the condition that a first word and a second word in multiple words coexist in the same window, the fact that the first word and the second word have a high probability of existence connection is explained; the first frequency of occurrence of the first word can reflect the importance degree of the first word in the window co-occurrence with the second word, for example, the greater the first frequency of occurrence of the first word when the first word is co-occurrence with the second word, the greater the importance degree in the window co-occurrence of the first word and the second word; conversely, the smaller the first frequency at which the first word appears when the first word and the second word coexist, the smaller the degree of importance in the window in which the first word and the second word coexist.

Under the condition that the first word and the second word coexist in the same window, the second frequency of occurrence of the second word can reflect the importance degree of the second word in the window where the second word and the first word coexist, for example, the greater the second frequency of occurrence of the second word when the second word and the first word coexist, the greater the importance degree in the window where the second word and the first word coexist; conversely, the smaller the second frequency at which the second word appears when the second word and the first word are co-present, the less important the second word and the first word appear in the window.

In one embodiment, obtaining the word distance between the first word and the second word when the first word and the second word co-occur in the plurality of words comprises: under the condition that the first word and the second word coexist in the plurality of words, if the number of the second words which coexist with the first word is a plurality of, acquiring a plurality of first distances between the first word and a plurality of second words; and taking the largest one of the first distances as the word distance between the first word and the second word in the co-occurrence.

For example, due to the fact that the window divides text information in the document, words with weak association relationship between the words are possibly divided into the same window, so that the distance between the two words with weak association relationship in the window is short, when the number of second words co-occurring with the first words is multiple, the largest one of the multiple first distances is used as the word distance between the first words and the second words when the first words and the second words are co-occurring, the influence of the short distance between different words caused by window division on the word distance can be avoided, and therefore the association degree between the different words can be well represented through the word distance between the different words, and the target evaluation value of the words can be well determined.

In another possible implementation manner, in an embodiment, in a case that a first word and a second word coexist in the plurality of words, obtaining a word distance between the first word and the second word at the same time includes: under the condition that the first word and the second word coexist in the plurality of words, if the number of the second words which coexist with the first word is a plurality of, acquiring a plurality of first distances between the first word and a plurality of second words; and taking a fourth average value of the plurality of first distances as a word distance between the first word and the second word when the first word and the second word are coincided.

For example, there are a first word and a plurality of second words in the same window, the first second word in the plurality of second words in the window is located at the start position of the window, the last second word in the plurality of second words may be located at the end position of the window, and in normal cases, the first word may be neither a word adjacent to the second word nor a word separated from the second word by the whole window, so, a fourth average value of the plurality of first distances is taken as a word distance between the first word and the second word when the first word and the second word are coincided, and the word distance can better reflect the association degree between the first word and the second word.

Thus, the average value of the first distances is taken as the word distance between the first words and the second words when the first words and the second words are coincided, so that adverse effects on the degree of association between the words caused by the fact that the distance between the words is relatively close due to window division can be avoided, and judgment on the degree of association between the words due to the fact that the distance between the words is relatively far can be avoided.

In step S104, a target evaluation value of the first word is determined according to the first frequency, the second frequency, and the word distance when the first word is co-located with the plurality of second words.

The target evaluation value is used for representing the probability that the first word is a keyword; the larger the target evaluation value is, the larger the probability that the first word is a keyword in the first document is; conversely, the smaller the target evaluation value, the smaller the probability that the first word is a keyword in the first document.

In one embodiment, determining the target evaluation value of the first word according to the first frequency, the second frequency and the word distance when the first word is co-located with the plurality of second words, respectively, includes: determining a first correlation coefficient between the first word and the second word according to the first frequency, the second frequency and the word distance when the first word and the second word coexist in the same window; determining a second correlation coefficient between the first word and the second word according to a first average value of a plurality of first correlation coefficients in which the first word and the second word coexist in a plurality of windows respectively; and determining the target evaluation value of the first word according to second average values of a plurality of second correlation coefficients between the first word and a plurality of second words.

For example, the first correlation coefficient is positively correlated with the first frequency, the first correlation coefficient is positively correlated with the second frequency, and the first correlation coefficient is negatively correlated with the word distance.

The following illustrates a process of obtaining the first correlation coefficient in the embodiment of the present application by taking an exemplary calculation formula as an example: ; wherein X₁ is a first correlation coefficient between a first word and a second word, n₁ is a first frequency of the first word in a window in which the first word and the second word co-occur, n₂ is a second frequency of the second word in the window in which the first word and the second word co-occur, norm is a normalization processing function, exp is an exponential function based on a natural constant, and d is a word distance in the window in which the first word and the second word co-occur.

Here, a first correlation coefficient between the first word and the second word can be determined by combining a difference value between the first frequency and the second frequency, wherein the difference value is in negative correlation with the first correlation coefficient; the smaller the difference value between the first frequency and the second frequency, the higher the probability of the simultaneous occurrence of the first word and the second word, the greater the correlation between the first word and the second word, thus being beneficial to obtaining a more accurate target evaluation value of the first word.

According to a first average value of a plurality of first correlation coefficients in which the first word and the second word coexist in a plurality of windows, a second correlation coefficient between the first word and the second word can be determined; because the first document is divided into a plurality of windows, through the first average value of a plurality of first correlation coefficients in which the first word and the second word respectively coexist in the plurality of windows, the correlation of the first word and the second word in the plurality of windows can be comprehensively considered, so that the correlation of the first word and the second word in the whole is obtained.

Because the correlation between the first word and different second words is different, the target evaluation value of the first word can be determined according to the second average value of the second correlation coefficients between the first word and the second words, so that the target evaluation value of the first word is comprehensively considered to obtain the target evaluation value of the first word, and the probability that the first word is a keyword in the first document can be better represented by the target evaluation value of the first word.

For example, the larger the correlation coefficient between the first word and other words than the first word as a whole, the more likely the first word is a keyword in the first document.

In one embodiment, determining the target evaluation value of the first word based on a second average of a plurality of second correlation coefficients between the first word and a plurality of second words, respectively, includes: inputting the first word into a pre-trained word vector model to obtain a first word vector; inputting the second word into a pre-trained word vector model to obtain a second word vector; taking the first similarity between the first word vector and the second word vector as a third correlation coefficient between the first word and the second word; and determining a third average value of a plurality of third correlation coefficients between the first word and a plurality of second words respectively, and adding the second average value and the third average value to obtain a target evaluation value of the first word.

The word vector model is a model used for converting words into vectors in natural language processing, and semantic information of the words can be captured through vectors output by the word vector model; the Word vector model may be, for example, a Word2Vec model, a GloVe model, and a FastText model.

By converting the first word into a first word vector, converting the second word into a second word vector, and comparing the first similarity between the first word vector and the second word vector, the first similarity can better reflect the correlation between the first word and the second word, the similarity between the vectors can be, for example, cosine similarity between the two vectors, or the similarity between the vectors can be, for example, euclidean distance of the two vectors in a coordinate system.

The third average value of the plurality of third correlation coefficients between the first word and the plurality of second words can comprehensively reflect the correlation between the first word and other words except the first word, and the second average value and the third average value are added to obtain the target evaluation value of the first word.

In one embodiment, determining the target evaluation value of the first word based on a second average of a plurality of second correlation coefficients between the first word and a plurality of second words, respectively, includes: acquiring a first frequency of occurrence of a first word in all words of a first document; acquiring the inverse frequency of a first word in CDA documents except the first document; and multiplying the first frequency, the inverse frequency and the second average value to obtain a target evaluation value of the first word.

TF-IDF (Term Frequency-inverse document Frequency) is a statistical method commonly used for information retrieval and text mining to evaluate the importance of a word to a document in a document set or corpus. The higher the value of TF-IDF, the more important this word is in the current document, and the less rare it is in the whole document set; TF-IDF is composed of TF and IDF.

TF (Term Frequency) refers to the number of times a word appears in a document, with higher Term Frequency indicating that the word is more important in the document. However, using word frequencies alone may result in a bias towards longer documents, and therefore, word frequencies are typically normalized, e.g., by dividing the word frequency by the total number of all words in the document; IDF (Inverse Document Frequency ) refers to the reciprocal of the number of documents that contain a word in the document set, and then taking the logarithm; the purpose of IDF is to reduce the weight of words that are prevalent in a document set, while increasing the weight of words that are unusual in a document set.

In step S105, a keyword is determined from among a plurality of words of the first document according to the target evaluation value of the first word, and at least one second document similar to the first document is determined from among the plurality of CDA documents according to the keyword to archive the second document as a recommended document of the target patient.

Determining keywords from the plurality of words of the first document according to the target evaluation value of the first word may include, for example: and taking the first word with the target evaluation value larger than the preset evaluation threshold value as a keyword in the first document.

The preset evaluation threshold may be determined according to a value range of the target evaluation value, for example, in a case where the value range of the target evaluation value is [0,1], the preset evaluation threshold may be between [0.6,0.9], for example, the preset evaluation threshold may be 0.75.

Because the target evaluation value can represent the probability that the first word is the keyword in the first document, the larger the value of the target evaluation value of the first word is, the larger the probability that the target evaluation value of the first word is the keyword in the first document is; conversely, the target evaluation value can represent the probability that the first word is a keyword in the first document, and the smaller the value of the target evaluation value of the first word is, the smaller the probability that the target evaluation value of the first word is the keyword in the first document is; therefore, the first word with the target evaluation value larger than the preset evaluation threshold value is used as the keyword in the first document, and the keyword in the first document can be effectively screened out.

In one possible implementation manner, a plurality of first words with the largest target evaluation value can be used as keywords in the first document; for example, the first 5 words having the largest target evaluation value may be used as keywords in the first document.

In one embodiment, determining at least one second document from the plurality of CDA documents that is similar to the first document based on the keywords comprises: determining, for a third document of the plurality of CDA documents, a second similarity between the third document and the first document according to the number of keywords contained in the third document; and taking the third document with the second similarity larger than the preset similarity threshold as a second document similar to the first document to obtain at least one second document similar to the first document.

For example, if the keywords determined from the first document of the target patient are "thyroid", "pain", and "nodule", the second similarity between the third document and the first document may be determined according to the number of keywords included in the third document among the CDA documents, e.g., for the third document including all the keywords of the first document at the same time, the second similarity between the third document and the first document is 100%, and for the third document including no keyword among the keywords of the first document, the second similarity between the third document and the first document is 0.

And taking the third document with the second similarity larger than the preset similarity threshold value as a second document similar to the first document to obtain at least one second document similar to the first document, wherein the second document comprising at least one keyword in a plurality of keywords of the first document can be obtained, and the second document can provide valuable references for the illness state of the target patient because the second similarity of the second document and the first document is larger than the preset similarity threshold value.

According to the CDA document management method provided by the embodiment of the application, since the second document is determined according to the keywords of the first document, compared with the medical record type of a patient, the keywords in the CDA document can better reflect the actual condition of the patient, so that the second document which is more matched with the actual condition of the target patient can be screened out from a plurality of CDA documents through the keywords of the CDA document, and the second document is used as the recommended document of the target patient for archiving, so that the recommended document of the target patient has more reference value, and better management of the CDA document is realized.

Fig. 3 illustrates a schematic structure of a CDA document management system 1000 according to an exemplary embodiment. Referring to fig. 3, the CDA document management system 1000 includes: a processor 1100 and a memory 1200, said memory 1200 storing computer program instructions which, when executed by said processor 1100, implement all or part of the steps of the CDA document management method of the present application.

For example, the processor 1100 may be configured to perform the steps of: acquiring a first document with a CDA document type corresponding to a target patient, and performing word segmentation processing on the first document to acquire a plurality of words in the first document; window division is carried out on the first document by utilizing the target window length, so that a plurality of windows are obtained; under the condition that a first word and a second word in multiple words coexist in the same window, acquiring a first frequency of occurrence of the first word and a second frequency of occurrence of the second word when the first word and the second word are coincided, and acquiring a word distance between the first word and the second word when the first word and the second word are coincided; determining a target evaluation value of the first word according to the first frequency, the second frequency and the word distance when the first word is respectively coincided with a plurality of second words; the target evaluation value is used for representing the probability that the first word is a keyword; according to the target evaluation value of the first word, a keyword is determined from a plurality of words of the first document, and at least one second document similar to the first document is determined from a plurality of CDA documents according to the keyword, so that the second document is archived as a recommended document of the target patient.

According to the CDA document management system provided by the embodiment of the application, since the second document is determined according to the keywords of the first document, compared with the medical record type of a patient, the keywords in the CDA document can better reflect the actual condition of the patient, so that the second document which is more matched with the actual condition of the target patient can be screened out from a plurality of CDA documents through the keywords of the CDA document, and the second document is used as the recommended document of the target patient for archiving, so that the recommended document of the target patient has more reference value, and better management of the CDA document is realized.

It is to be understood that the features of the various embodiments of the application described herein may be combined with each other, unless specifically indicated otherwise.

Although terms such as "first," "second," and "third" may be used herein to describe various elements, components, regions, layers or sections, these elements, components, regions, layers or sections are not limited by these terms. Rather, these terms are only used to distinguish one component, part, region, layer or section from another component, part, region, layer or section. Thus, a first component, part, region, layer or section discussed in examples described herein could also be termed a second component, part, region, layer or section without departing from the teachings of the examples.

In addition, the terms "first," "second," are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description herein, the meaning of "plurality" means at least two, e.g., two, three, etc., unless specifically defined otherwise.

Furthermore, the word "exemplary" is used herein to mean serving as an example, instance, illustration. Any aspect or design described herein as "exemplary" is not necessarily to be construed as advantageous over other aspects or designs. Rather, the use of the word exemplary is intended to present concepts in a concrete fashion. As used herein, the term "or" is intended to mean an inclusive "or" rather than an exclusive "or".

Also, although the application has been shown and described with respect to one or more implementations, equivalent alterations and modifications will occur to others skilled in the art upon the reading and understanding of this specification and the annexed drawings. In particular regard to the various functions performed by the above described components (e.g., elements, resources, etc.), the terms used to describe such components are intended to correspond, unless otherwise indicated, to any component which performs the specified function of the described component (which is functionally equivalent), even though not structurally equivalent to the disclosed structure.

In addition, while a particular feature of the application may have been disclosed with respect to only one of several implementations, such feature may be combined with one or more other features of the other implementations as may be desired and advantageous for any given or particular application.

Other embodiments of the application will be apparent to those skilled in the art from consideration of the specification and practice of the application disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains and as may be applied to the instant claims and the description and examples are to be considered as exemplary only.

It is to be understood that the application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof.

Claims

1.A method for managing CDA documents, comprising:

acquiring a first document with a CDA document type corresponding to a target patient, and performing word segmentation processing on the first document to acquire a plurality of words in the first document;

window division is carried out on the first document by utilizing the target window length, so that a plurality of windows are obtained;

Under the condition that a first word and a second word in multiple words coexist in the same window, acquiring a first frequency of occurrence of the first word and a second frequency of occurrence of the second word when the first word and the second word are coincided, and acquiring a word distance between the first word and the second word when the first word and the second word are coincided;

Determining a target evaluation value of the first word according to the first frequency, the second frequency and the word distance when the first word is respectively coincided with a plurality of second words; the target evaluation value is used for representing the probability that the first word is a keyword;

according to the target evaluation value of the first word, a keyword is determined from a plurality of words of the first document, and at least one second document similar to the first document is determined from a plurality of CDA documents according to the keyword, so that the second document is archived as a recommended document of the target patient.

2. The CDA document management method according to claim 1, wherein determining the target evaluation value of the first word based on the first frequency, the second frequency, and the word distance when the first word is co-present with the plurality of second words, respectively, comprises:

determining a first correlation coefficient between the first word and the second word according to the first frequency, the second frequency and the word distance when the first word and the second word coexist in the same window;

determining a second correlation coefficient between the first word and the second word according to a first average value of a plurality of first correlation coefficients in which the first word and the second word coexist in a plurality of windows respectively;

and determining the target evaluation value of the first word according to second average values of a plurality of second correlation coefficients between the first word and a plurality of second words.

3. The CDA document management method according to claim 2, wherein determining the target evaluation value of the first word based on the second average of the plurality of second correlation coefficients between the first word and the plurality of second words, respectively, comprises:

inputting the first word into a pre-trained word vector model to obtain a first word vector;

Inputting the second word into a pre-trained word vector model to obtain a second word vector;

taking the first similarity between the first word vector and the second word vector as a third correlation coefficient between the first word and the second word;

And determining a third average value of a plurality of third correlation coefficients between the first word and a plurality of second words respectively, and adding the second average value and the third average value to obtain a target evaluation value of the first word.

4. The CDA document management method according to claim 2, wherein determining the target evaluation value of the first word based on the second average of the plurality of second correlation coefficients between the first word and the plurality of second words, respectively, comprises:

Acquiring a first frequency of occurrence of a first word in all words of a first document;

acquiring the inverse frequency of a first word in CDA documents except the first document;

And multiplying the first frequency, the inverse frequency and the second average value to obtain a target evaluation value of the first word.

5. The CDA document management method according to claim 1, wherein the target window length is obtained by:

Arranging the words in the first document according to the sequence to obtain word strings which comprise all the words in the first document after arrangement;

Dividing the word strings according to preset window lengths in a plurality of different preset window lengths to obtain a plurality of divided word substrings;

acquiring the word quantity of words contained in the word substring, and taking the variance of the word quantity of the word substrings as a target parameter value of a preset window length;

And taking the preset window length with the minimum target parameter value in the multiple preset window lengths as the target window length.

6. The method for managing a CDA document according to claim 1, wherein, in the case where a first word and a second word coexist in the plurality of words, obtaining a word distance between the first word and the second word at the same time includes:

Under the condition that the first word and the second word coexist in the plurality of words, if the number of the second words which coexist with the first word is a plurality of, acquiring a plurality of first distances between the first word and a plurality of second words;

and taking the largest one of the first distances as the word distance between the first word and the second word in the co-occurrence.

7. The method for managing a CDA document according to claim 1, wherein, in the case where a first word and a second word coexist in the plurality of words, obtaining a word distance between the first word and the second word at the same time includes:

and taking a fourth average value of the plurality of first distances as a word distance between the first word and the second word when the first word and the second word are coincided.

8. The method of managing CDA documents of claim 1, wherein determining at least one second document similar to the first document from among the plurality of CDA documents based on the keyword comprises:

Determining, for a third document of the plurality of CDA documents, a second similarity between the third document and the first document according to the number of keywords contained in the third document;

and taking the third document with the second similarity larger than the preset similarity threshold as a second document similar to the first document to obtain at least one second document similar to the first document.

9. The method for managing CDA documents according to any one of claims 1 to 8, wherein acquiring the first document whose document type corresponds to the target patient is the CDA document, comprises:

acquiring an initial fourth document of which the document type of the target patient is a CDA document;

and acquiring a key diagnosis information set, and processing the fourth document by using the key diagnosis information set to acquire a first document containing key diagnosis information in the key diagnosis information set.

10. A system for managing CDA documents, comprising: a processor and a memory storing computer program instructions that when executed by the processor implement a method of managing CDA documents according to any of claims 1 to 9.