CN110277149A

Movatterモバイル変換

Info

Publication number: CN110277149A
Application number: CN201910579541.7A
Authority: CN
Inventors: 戴岱; 高原; 贾巍; 王圣; 肖欣延; 肖珺; 佟卓远; 石晓坤
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2019-06-28
Filing date: 2019-06-28
Publication date: 2019-09-24

Abstract

Translated fromChinese

本发明提出了一种电子病历的处理方法、装置及设备，其中，方法包括：获取待处理的病历文本；识别出病历文本中的医学实体和属性信息；确定医学实体与所述属性信息之间的对应关系；根据对应关系生成结构化病历。由此，通过识别电子病历中的医疗实体及描述医疗实体的属性信息，并生成结构化病历，满足对病历的结构化需求，提高效率，降低了成本。

The present invention proposes a method, device and equipment for processing electronic medical records, wherein the method includes: obtaining medical record texts to be processed; identifying medical entities and attribute information in the medical record texts; determining the relationship between the medical entities and the attribute information The corresponding relationship; generate structured medical records according to the corresponding relationship. Therefore, by identifying the medical entity in the electronic medical record and describing the attribute information of the medical entity, and generating a structured medical record, the structured demand for the medical record is met, the efficiency is improved, and the cost is reduced.

Description

Translated fromChinese

电子病历的处理方法、装置及设备Electronic medical record processing method, device and equipment

技术领域technical field

本发明涉及医疗文本处理技术领域，尤其涉及一种电子病历的处理方法、装置及设备。The present invention relates to the technical field of medical text processing, in particular to a method, device and equipment for processing electronic medical records.

背景技术Background technique

病历作为医务人员对患者疾病的发生、发展、转归，进行检查、诊断、治疗等医疗活动过程的记录，蕴含着大量宝贵的信息，可以帮助医生研究发病规律、改善治疗方法，可以帮助药企研发新药，甚至可以帮助医疗AI学习如何诊断疾病。Medical records, as records of medical personnel's medical activities such as examination, diagnosis, and treatment of the occurrence, development, and outcome of a patient's disease, contain a lot of valuable information, which can help doctors study the law of disease, improve treatment methods, and help pharmaceutical companies Developing new drugs can even help medical AI learn how to diagnose diseases.

随着医院信息化的发展，大部分医院都配备了HIS(医院管理信息系统)系统，使得病历的记录基本实现电子化。但是，由于不同医生的书写风格、用词方式都有很大差异，不同医院使用不同的HIS系统、以及HIS的版本也随着时间不断更迭，导致电子病历很难被广泛利用。病历结构化通过分析和识别病历中的重要信息，从多个层次上构建描述病历的重要特征，最终将无结构的病历自然语言文本，转换成便于计算机理解(同时也方便人理解)的结构化信息。With the development of hospital informatization, most hospitals are equipped with HIS (Hospital Information System) system, which makes the records of medical records basically electronic. However, due to the great differences in the writing styles and wording methods of different doctors, different HIS systems used by different hospitals, and the version of HIS has been changing over time, it is difficult for electronic medical records to be widely used. Medical record structuring analyzes and identifies important information in medical records, constructs important features describing medical records from multiple levels, and finally converts unstructured natural language texts of medical records into structured structures that are easy for computers to understand (and for people to understand at the same time) information.

相关技术中，通常由医疗人员对病历的重要信息进行提取和结构化，效率较低，人力成本高。In related technologies, the important information of medical records is usually extracted and structured by medical personnel, which has low efficiency and high labor costs.

发明内容Contents of the invention

本发明旨在至少在一定程度上解决相关技术中的技术问题之一。The present invention aims to solve one of the technical problems in the related art at least to a certain extent.

为此，本发明的第一个目的在于提出一种电子病历的处理方法，通过识别电子病历中的医疗实体及描述医疗实体的属性信息，并生成结构化病历，满足对病历的结构化需求，提高效率，降低了成本。Therefore, the first purpose of the present invention is to propose a processing method for electronic medical records, by identifying the medical entities in the electronic medical records and describing the attribute information of the medical entities, and generating structured medical records to meet the structural needs of medical records, Improve efficiency and reduce costs.

本发明的第二个目的在于提出一种电子病历的处理装置。The second object of the present invention is to provide an electronic medical record processing device.

本发明的第三个目的在于提出一种计算机设备。A third object of the present invention is to propose a computer device.

本发明的第四个目的在于提出一种计算机可读存储介质。A fourth object of the present invention is to provide a computer-readable storage medium.

本发明第一方面实施例提出了一种电子病历的处理方法，包括：The embodiment of the first aspect of the present invention proposes a processing method for electronic medical records, including:

获取待处理的病历文本；Obtain the medical record text to be processed;

识别出所述病历文本中的医学实体和属性信息；Identify the medical entity and attribute information in the medical record text;

确定所述医学实体与所述属性信息之间的对应关系；determining the correspondence between the medical entity and the attribute information;

根据所述对应关系生成结构化病历。A structured medical record is generated according to the corresponding relationship.

本发明实施例的电子病历的处理方法，通过获取待处理的病历文本，进而识别出病历文本中的医学实体和属性信息。进一步确定医学实体与所述属性信息之间的对应关系，根据对应关系生成结构化病历。由此，能够识别电子病历中的医疗实体及描述医疗实体的属性信息，并结构化的表示，提高效率，降低了成本，满足实际应用中的结构化需求。提供了通用的对电子病历的理解结果，涵盖了病历中最重要的医疗实体及属性等信息，可以帮助辅助诊断、病历检索等系统从语义层面理解病历，构建病历的语义特征。The electronic medical record processing method of the embodiment of the present invention recognizes the medical entity and attribute information in the medical record text by acquiring the medical record text to be processed. The corresponding relationship between the medical entity and the attribute information is further determined, and a structured medical record is generated according to the corresponding relationship. As a result, the medical entity in the electronic medical record and the attribute information describing the medical entity can be identified and expressed in a structured manner, which improves efficiency, reduces costs, and meets the structural requirements in practical applications. It provides a general understanding of electronic medical records, covering the most important medical entities and attributes in medical records, which can help systems such as auxiliary diagnosis and medical record retrieval understand medical records from the semantic level and construct the semantic features of medical records.

另外，根据本发明上述实施例的电子病历的处理方法还可以具有如下附加技术特征：In addition, the electronic medical record processing method according to the above-mentioned embodiments of the present invention may also have the following additional technical features:

可选地，所述识别出所述病历文本中的医学实体包括：获取预设的医学词表，其中，所述医学词表中包括候选医学实体；将所述候选医学实体与所述病历文本进行匹配，确定所述病历文本中的医学实体。Optionally, the identifying the medical entity in the medical record text includes: obtaining a preset medical vocabulary, wherein the medical vocabulary includes candidate medical entities; combining the candidate medical entity with the medical record text Perform matching to determine the medical entity in the medical record text.

可选地，所述识别出所述病历文本中的医学实体包括：将所述病历文本输入到预先训练的序列标注模型中进行处理，获取所述病历文本的实体标注信息；根据所述实体标注信息确定所述病历文本中的医学实体。Optionally, the identifying the medical entity in the medical record text includes: inputting the medical record text into a pre-trained sequence labeling model for processing, and obtaining entity labeling information of the medical record text; according to the entity labeling Information identifying medical entities in the medical record text.

可选地，所述识别出所述病历文本中的属性信息包括：获取预设的属性词表，其中，所述属性词表中包括候选属性；将所述候选属性与所述病历文本进行匹配，确定所述病历文本中的属性信息。Optionally, the identifying the attribute information in the medical record text includes: obtaining a preset attribute vocabulary, wherein the attribute vocabulary includes candidate attributes; matching the candidate attributes with the medical record text , to determine the attribute information in the medical record text.

可选地，在确定所述医学实体与所述属性信息之间的对应关系之后，还包括：获取所述医学实体的候选类别，对于每一候选类别分别抽取所述医学实体的属性，将抽取出的属性个数最多的候选类别作为所述医学实体的类别；和/或，根据所述医学实体在所述病历文本中的上下文信息，确定所述医学实体的类别。Optionally, after determining the correspondence between the medical entity and the attribute information, it further includes: acquiring candidate categories of the medical entity, extracting attributes of the medical entity for each candidate category, and extracting The candidate category with the largest number of attributes is selected as the category of the medical entity; and/or, the category of the medical entity is determined according to the context information of the medical entity in the medical record text.

可选地，所述获取待处理的病历文本包括：获取纯文本病历；根据预设的字段名词典与所述纯文本病历进行匹配，确定所述纯文本病历中的字段名；根据所述字段名从所述纯文本病历中确定与所述字段名对应的字段内容，根据所述字段名和所述字段内容对所述纯文本病历进行切分。Optionally, the obtaining the medical record text to be processed includes: obtaining a plain text medical record; matching the plain text medical record according to a preset field name dictionary to determine the field names in the plain text medical record; The field content corresponding to the field name is determined from the plain text medical record, and the plain text medical record is segmented according to the field name and the field content.

可选地，在确定所述纯文本病历中的字段名之后，还包括：基于关键字匹配和/或语义相似度模型将所述字段名映射为标准的字段名。Optionally, after determining the field names in the plain text medical records, the method further includes: mapping the field names to standard field names based on keyword matching and/or semantic similarity models.

可选地，在根据所述字段名和所述字段内容对所述纯文本病历进行切分之后，还包括：判断切分后的纯文本病历中是否缺失字段内容；若是，则基于关键字匹配和/或文本分类模型从所述纯文本病历中获取缺失的字段内容；将缺失的字段内容添加至所述切分后的纯文本病历中目标字段名的位置。Optionally, after segmenting the plain text medical record according to the field name and the field content, it also includes: judging whether the field content is missing in the segmented plain text medical record; if so, based on keyword matching and /or the text classification model obtains the missing field content from the plain text medical record; and adds the missing field content to the position of the target field name in the segmented plain text medical record.

可选地，所述的方法还包括：对所述病历文本进行分句，并根据预设规则从所述病历文本中确定候选句；从所述候选句中提取与预设的输出名称对应的输出内容，根据所述输出名称和所述输出内容生成定制的结构化信息。Optionally, the method further includes: segmenting the medical record text into sentences, and determining candidate sentences from the medical record text according to preset rules; extracting from the candidate sentences corresponding to the preset output name output content, generating customized structured information according to the output name and the output content.

本发明第二方面实施例提出了一种电子病历的处理装置，包括：The embodiment of the second aspect of the present invention proposes a processing device for electronic medical records, including:

获取模块，用于获取待处理的病历文本；Obtaining module, used to obtain the medical record text to be processed;

识别模块，用于识别出所述病历文本中的医学实体和属性信息；An identification module, configured to identify medical entities and attribute information in the medical record text;

确定模块，用于确定所述医学实体与所述属性信息之间的对应关系；A determining module, configured to determine the correspondence between the medical entity and the attribute information;

生成模块，用于根据所述对应关系生成结构化病历。A generating module, configured to generate a structured medical record according to the corresponding relationship.

本发明实施例的电子病历的处理装置，能够识别电子病历中的医疗实体及描述医疗实体的属性信息，并结构化的表示，提高效率，降低了成本，满足实际应用中的结构化需求。提供了通用的对电子病历的理解结果，涵盖了病历中最重要的医疗实体及属性等信息，可以帮助辅助诊断、病历检索等系统从语义层面理解病历，构建病历的语义特征。The electronic medical record processing device of the embodiment of the present invention can identify the medical entities in the electronic medical records and describe the attribute information of the medical entities, and present them in a structured manner, improve efficiency, reduce costs, and meet the structural requirements in practical applications. It provides a general understanding of electronic medical records, covering the most important medical entities and attributes in medical records, which can help systems such as auxiliary diagnosis and medical record retrieval understand medical records from the semantic level and construct the semantic features of medical records.

另外，根据本发明上述实施例的电子病历的处理装置还可以具有如下附加技术特征：In addition, the electronic medical record processing device according to the above-mentioned embodiments of the present invention may also have the following additional technical features:

可选地，所述识别模块具体用于：获取预设的医学词表，其中，所述医学词表中包括候选医学实体；将所述候选医学实体与所述病历文本进行匹配，确定所述病历文本中的医学实体。Optionally, the identification module is specifically configured to: obtain a preset medical vocabulary, wherein the medical vocabulary includes candidate medical entities; match the candidate medical entities with the medical record text, and determine the Medical entities in medical record text.

可选地，所述识别模块具体用于：将所述病历文本输入到预先训练的序列标注模型中进行处理，获取所述病历文本的实体标注信息；根据所述实体标注信息确定所述病历文本中的医学实体。Optionally, the identification module is specifically configured to: input the medical record text into a pre-trained sequence labeling model for processing, and obtain entity labeling information of the medical record text; determine the medical record text according to the entity labeling information Medical entities in .

可选地，所述识别模块具体用于：获取预设的属性词表，其中，所述属性词表中包括候选属性；将所述候选属性与所述病历文本进行匹配，确定所述病历文本中的属性信息。Optionally, the identification module is specifically configured to: acquire a preset attribute vocabulary, wherein the attribute vocabulary includes candidate attributes; match the candidate attributes with the medical record text, and determine the medical record text attribute information in .

可选地，所述的装置还包括：分类模块，用于获取所述医学实体的候选类别，对于每一候选类别分别抽取所述医学实体的属性，将抽取出的属性个数最多的候选类别作为所述医学实体的类别；和/或，根据所述医学实体在所述病历文本中的上下文信息，确定所述医学实体的类别。Optionally, the device further includes: a classification module, configured to obtain candidate categories of the medical entity, extract attributes of the medical entity for each candidate category, and select the candidate category with the largest number of extracted attributes As the category of the medical entity; and/or, according to the context information of the medical entity in the medical record text, determine the category of the medical entity.

可选地，所述获取模块包括：获取单元，用于获取纯文本病历；匹配单元，用于根据预设的字段名词典与所述纯文本病历进行匹配，确定所述纯文本病历中的字段名；切分单元，用于根据所述字段名从所述纯文本病历中确定与所述字段名对应的字段内容，根据所述字段名和所述字段内容对所述纯文本病历进行切分。Optionally, the acquiring module includes: an acquiring unit, configured to acquire a plain text medical record; a matching unit, configured to match the plain text medical record according to a preset field name dictionary, and determine the fields in the plain text medical record name; a segmentation unit, configured to determine the field content corresponding to the field name from the plain text medical record according to the field name, and segment the plain text medical record according to the field name and the field content.

可选地，所述获取模块还包括：映射单元，用于基于关键字匹配和/或语义相似度模型将所述字段名映射为标准的字段名。Optionally, the obtaining module further includes: a mapping unit, configured to map the field names to standard field names based on keyword matching and/or semantic similarity models.

可选地，所述获取模块还包括：判断单元，用于判断切分后的纯文本病历中是否缺失字段内容；若是，则基于关键字匹配和/或文本分类模型从所述纯文本病历中获取缺失的字段内容；将缺失的字段内容添加至所述切分后的纯文本病历中目标字段名的位置。Optionally, the acquisition module further includes: a judging unit, configured to judge whether field content is missing in the segmented plain text medical record; Obtain the missing field content; add the missing field content to the position of the target field name in the segmented plain text medical record.

可选地，所述的装置还包括：处理模块，用于对所述病历文本进行分句，并根据预设规则从所述病历文本中确定候选句；从所述候选句中提取与预设的输出名称对应的输出内容，根据所述输出名称和所述输出内容生成定制的结构化信息。Optionally, the device further includes: a processing module, configured to segment the medical record text into sentences, and determine candidate sentences from the medical record text according to preset rules; extract and preset The output content corresponding to the output name of the given output name, and the customized structured information is generated according to the output name and the output content.

本发明第三方面实施例提出了一种计算机设备，包括处理器和存储器；其中，所述处理器通过读取所述存储器中存储的可执行程序代码来运行与所述可执行程序代码对应的程序，以用于实现如第一方面实施例所述的电子病历的处理方法。The embodiment of the third aspect of the present invention provides a computer device, including a processor and a memory; wherein, the processor runs the executable program code corresponding to the executable program code by reading the executable program code stored in the memory The program is used to implement the electronic medical record processing method as described in the embodiment of the first aspect.

本发明第四方面实施例提出了一种计算机可读存储介质，其上存储有计算机程序，该程序被处理器执行时实现如第一方面实施例所述的电子病历的处理方法。The embodiment of the fourth aspect of the present invention provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the electronic medical record processing method as described in the embodiment of the first aspect is implemented.

本发明附加的方面和优点将在下面的描述中部分给出，部分将从下面的描述中变得明显，或通过本发明的实践了解到。Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

附图说明Description of drawings

图1为本发明实施例所提供的一种电子病历的处理方法的流程示意图；FIG. 1 is a schematic flow chart of a method for processing electronic medical records provided by an embodiment of the present invention;

图2为本发明实施例所提供的另一种电子病历的处理方法的流程示意图；FIG. 2 is a schematic flowchart of another electronic medical record processing method provided by an embodiment of the present invention;

图3为本发明实施例所提供的另一种电子病历的处理方法的流程示意图；FIG. 3 is a schematic flowchart of another electronic medical record processing method provided by an embodiment of the present invention;

图4为本发明实施例所提供的一种电子病历的处理装置的结构示意图；FIG. 4 is a schematic structural diagram of an electronic medical record processing device provided by an embodiment of the present invention;

图5为本发明实施例所提供的另一种电子病历的处理装置的结构示意图；5 is a schematic structural diagram of another electronic medical record processing device provided by an embodiment of the present invention;

图6示出了适于用来实现本发明实施例的示例性计算机设备的框图。Figure 6 shows a block diagram of an exemplary computer device suitable for use in implementing embodiments of the present invention.

具体实施方式Detailed ways

下面详细描述本发明的实施例，所述实施例的示例在附图中示出，其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的，旨在用于解释本发明，而不能理解为对本发明的限制。Embodiments of the present invention are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals designate the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary and are intended to explain the present invention and should not be construed as limiting the present invention.

下面参考附图描述本发明实施例的电子病历的处理方法、装置及设备。The electronic medical record processing method, device and equipment according to the embodiments of the present invention will be described below with reference to the accompanying drawings.

图1为本发明实施例所提供的一种电子病历的处理方法的流程示意图，如图1所示，该方法包括：Fig. 1 is a schematic flow chart of a method for processing electronic medical records provided by an embodiment of the present invention. As shown in Fig. 1, the method includes:

步骤101，获取待处理的病历文本。Step 101, obtain the medical record text to be processed.

本实施例中，为了满足病历的结构化需求，可以先获取待处理的病历文本。例如，可以获取医院管理信息系统(HIS)中的电子病历文本作为待处理的病历文本。其中，结构化需求是指识别病历中的医疗实体，以及描述、修饰、限制医疗实体的属性信息，从而提取出描述病历语义的特征。In this embodiment, in order to meet the structural requirements of the medical records, the text of the medical records to be processed can be obtained first. For example, the electronic medical record text in the hospital management information system (HIS) can be obtained as the medical record text to be processed. Among them, structured requirements refer to the identification of medical entities in medical records, and the description, modification, and restriction of attribute information of medical entities, so as to extract the features describing the semantics of medical records.

步骤102，识别出病历文本中的医学实体和属性信息。Step 102, identify the medical entity and attribute information in the medical record text.

本实施例中，可以先从病历文本中识别出医学实体和描述医学实体的属性信息。其中，医学实体包括但不限于发热、咳嗽等，医学实体的类别包括但不限于症状、体征、疾病、药物、手术、检查、检验、过敏史、生活史等。属性信息用于描述医学实体，例如属性信息包括阴阳性、发生时间、持续时间、程度、频率、诱因、药物剂量、检查检验的结果、过敏原等。In this embodiment, the medical entity and attribute information describing the medical entity can be identified from the medical record text. Among them, the medical entity includes but not limited to fever, cough, etc., and the categories of medical entities include but not limited to symptoms, signs, diseases, drugs, operations, inspections, tests, allergy history, life history, etc. Attribute information is used to describe medical entities. For example, attribute information includes positive and negative, occurrence time, duration, degree, frequency, inducement, drug dosage, test results, allergens, etc.

其中，对病历文本进行识别获取病历文本中的医学实体的实现方式有多种。Among them, there are many ways to realize the recognition of the medical record text to obtain the medical entity in the medical record text.

在本发明的一个实施例中，可以预先设置医学词表，并在医学词表中存储候选医学实体，可选地，可以从病历、互联网、医学书籍中挖掘医学词表，以在病历文本中匹配医学词表中存在的候选医学实体。进而，可以获取预设的医学词表，将候选医学实体与病历文本进行匹配，确定病历文本中的医学实体。In one embodiment of the present invention, the medical vocabulary can be set in advance, and candidate medical entities can be stored in the medical vocabulary. Optionally, the medical vocabulary can be mined from medical records, the Internet, and medical books, so as to be included in the medical record text Match candidate medical entities present in the medical vocabulary. Furthermore, the preset medical vocabulary can be obtained, the candidate medical entities can be matched with the medical record text, and the medical entity in the medical record text can be determined.

作为一种示例，可以通过正则匹配的方式，根据预设的医学词表从病历文本中识别出医学实体。As an example, the medical entity can be identified from the medical record text according to the preset medical vocabulary by means of regular matching.

在本发明的一个实施例中，可以预先训练序列标注模型，并将病历文本输入到预先训练的序列标注模型中进行处理，获取病历文本的实体标注信息，从而根据实体标注信息确定病历文本中的医学实体。In one embodiment of the present invention, the sequence labeling model can be pre-trained, and the medical record text can be input into the pre-trained sequence labeling model for processing to obtain the entity labeling information of the medical record text, so as to determine the entity labeling information in the medical record text according to the entity labeling information. medical entity.

作为一种示例，可以预先收集病历文本并标注病历文本中的医学实体，进而根据标注的文本训练神经网络的处理参数，生成序列标注模型，使序列标注模型输入为文本，输出为标注后的文本。进一步，获取待处理的病历文本，并将病历文本输入到该序列标注模型中进行处理，获取标注的文本，并根据标注的文本确定病历文本中的医学实体。As an example, medical record texts can be collected in advance and the medical entities in the medical record texts can be annotated, and then the processing parameters of the neural network can be trained according to the annotated texts to generate a sequence annotation model, so that the input of the sequence annotation model is text, and the output is annotated text . Further, obtain the medical record text to be processed, input the medical record text into the sequence labeling model for processing, obtain the marked text, and determine the medical entity in the medical record text according to the marked text.

可选地，序列标注模型可以基于双向长短时记忆网络-条件随机场(Bi-LSTM-CRF)实现。在训练序列标注模型时，可以基于面向领域的小样本学习引擎，利用预训练的神经网络语言模型减少模型需要的人工标注样本数量，利用主动学习机制来提高人工标注的效率。Optionally, the sequence labeling model can be implemented based on bidirectional long short-term memory network-conditional random field (Bi-LSTM-CRF). When training the sequence labeling model, based on the domain-oriented small-sample learning engine, the pre-trained neural network language model can be used to reduce the number of manual labeling samples required by the model, and the active learning mechanism can be used to improve the efficiency of manual labeling.

需要说明的是，上述识别获取病历文本中的医学实体的实现方式仅仅是示例性的，可以根据需要选取其中一种或多种进行识别，此处不作限制。It should be noted that the above implementation of identifying and acquiring the medical entity in the medical record text is only exemplary, and one or more of them may be selected for identification as required, and there is no limitation here.

下面针对识别病历文本获取属性信息的实现方式进行说明。The following describes the implementation of identifying medical record texts to obtain attribute information.

在本发明的一个实施例中，可以预先设置属性词表，并在属性词表中存储候选属性。进而，获取预设的属性词表，将预设词表中的候选属性与病历文本进行匹配，确定病历文本中的属性信息。可选地，可以通过正则匹配的方式，根据预设的属性词表从病历文本中识别出属性信息。作为一种示例，病历文本中包括“无咳嗽”时，根据词表匹配到“无”，进而确定属性信息包括阴性(即衡量症状体征疾病未在发生在病人身上)。In an embodiment of the present invention, an attribute vocabulary can be preset, and candidate attributes can be stored in the attribute vocabulary. Furthermore, a preset attribute vocabulary is obtained, and the candidate attributes in the preset vocabulary are matched with the medical record text to determine the attribute information in the medical record text. Optionally, the attribute information can be identified from the medical record text according to a preset attribute vocabulary by means of regular matching. As an example, when "no cough" is included in the medical record text, "no cough" is matched according to the vocabulary, and then it is determined that the attribute information includes negative (that is, the disease that measures the symptoms and signs does not occur in the patient).

步骤103，确定医学实体与属性信息之间的对应关系。Step 103, determining the correspondence between the medical entity and the attribute information.

本实施例中，在获取病历文本中的医学实体和属性信息后，可以进一步确定医学实体和属性信息之间的对应关系，以确定属性信息用于描述哪一个医学实体。In this embodiment, after obtaining the medical entity and attribute information in the medical record text, the corresponding relationship between the medical entity and the attribute information may be further determined to determine which medical entity the attribute information is used to describe.

在本发明的一个实施例中，可以获取病历文本中与目标属性信息距离最近的医学实体，进而根据预设词表匹配目标属性信息和医学实体的上下文中是否包括特定特征词或模式，若匹配到特定特征词或模式，则确定该目标属性信息与医学实体存在对应关系；若未匹配到，则进一步选取其他医学实体与目标属性信息进行匹配。In one embodiment of the present invention, it is possible to obtain the medical entity closest to the target attribute information in the medical record text, and then match the target attribute information and the context of the medical entity according to the preset vocabulary to see if a specific feature word or pattern is included. If a specific feature word or pattern is found, it is determined that there is a corresponding relationship between the target attribute information and the medical entity; if no match is found, other medical entities are further selected to match the target attribute information.

在本发明的一个实施例中，还可以训练关系分类模型。进而，将属性信息与医学实体的组合输入到关系分类模型中进行处理，获取分类结果。其中，分类结果包括存在对应关系与不存在对应关系。In an embodiment of the present invention, a relationship classification model can also be trained. Furthermore, the combination of the attribute information and the medical entity is input into the relationship classification model for processing, and the classification result is obtained. Wherein, the classification result includes whether there is a corresponding relationship or not.

作为一种示例，可以收集标注有对应关系的、以及标注不存在对应关系的医学实体与属性信息的样本，根据样本训练CNN卷积神经网络的处理参数，生成关系分类模型。进而，根据识别出的医学实体、属性信息通过关系分类模型进行分类，以判断医学实体和属性信息是否存在关系。As an example, it is possible to collect samples of medical entities and attribute information labeled with corresponding relationships and labeled with no corresponding relationship, and train the processing parameters of the CNN convolutional neural network according to the samples to generate a relationship classification model. Furthermore, according to the identified medical entity and attribute information, it is classified through a relationship classification model to determine whether there is a relationship between the medical entity and the attribute information.

在本发明的一个实施例中，还可以确定病历文本中医学实体的类别，比如可以根据预设的医学实体与类别的映射关系表获取医学实体的类别。在实际应用中，由于同一表述的医学实体可能对应多个类别，因此还需要对医学实体进行类别消歧处理。In an embodiment of the present invention, the category of the medical entity in the medical record text can also be determined, for example, the category of the medical entity can be obtained according to a preset mapping relationship table between the medical entity and the category. In practical applications, since the medical entity of the same expression may correspond to multiple categories, it is also necessary to perform category disambiguation processing on the medical entity.

作为一种可能的实现方式，由于同一表述的医学实体的不同类别对应的属性不同，因此针对不同类别抽取的属性个数也不同。可以获取医学实体的候选类别，对于每一候选类别分别抽取医学实体的属性，将抽取出的属性个数最多的候选类别作为医学实体的类别。例如，医学实体“LDH”可能的类别为疾病和检验，对于示例1“右侧大量胸腔积液。血清LDH：628U/L。”针对类别检验可以提取出属性检验结果值：628U/L，针对类别疾病未提取出属性，则确定医学实体“LDH”的类别为检验。As a possible implementation, since different categories of medical entities in the same expression correspond to different attributes, the number of attributes extracted for different categories is also different. Candidate categories of medical entities may be obtained, attributes of medical entities are extracted for each candidate category, and the candidate category with the largest number of extracted attributes is used as the category of medical entities. For example, the possible categories of the medical entity "LDH" are diseases and tests. For example 1 "a large amount of pleural effusion on the right side. Serum LDH: 628U/L." For the category test, the attribute test result value can be extracted: 628U/L, for If no attribute is extracted from the category disease, then the category of the medical entity "LDH" is determined to be a test.

作为另一种可能的实现方式，可以根据医学实体在病历文本中的上下文信息，确定医学实体的类别。例如，医学实体“LDH”可能的类别为疾病和检验，对于示例2“初步诊断：LDH”。获取LDH在病历文本中的上下文信息包括字段初步诊断，则确定医学实体“LDH”的类别为疾病。As another possible implementation, the category of the medical entity may be determined according to the context information of the medical entity in the medical record text. For example, the possible categories of the medical entity "LDH" are diseases and tests, for example 2 "Initial Diagnosis: LDH". Obtain the context information of LDH in the medical record text including the field preliminary diagnosis, then determine the category of the medical entity "LDH" as a disease.

可选地，还可以根据实体的类别对医学实体的表述进行替换处理，比如根据预存的LDH检验与名称的对应关系，得到处理后的名称乳酸脱氢酶，根据LDH疾病处理后得到腰椎间盘突出症。Optionally, the expression of the medical entity can also be replaced according to the type of the entity, for example, according to the pre-stored correspondence between the LDH test and the name, the processed name lactate dehydrogenase is obtained, and the lumbar intervertebral disc herniation is obtained after processing according to the LDH disease disease.

需要说明的是，可以根据实际需要选取其中一种或多种组合进行实体类别消歧，此处不作限制。It should be noted that one or more combinations can be selected according to actual needs for entity category disambiguation, and there is no limitation here.

在本发明的一个实施例中，还可以对识别出的医学实体和属性信息进行标准化处理，以将医学实体和属性信息的名称归一化为标准或更常用的同义表达，或将属性信息中的数值和时间进行单位的标准化。可选地，可以基于预设的同义词表对医学实体和属性信息名称进行匹配，并将匹配到的名称映射至预设的标准名称。In one embodiment of the present invention, the recognized medical entities and attribute information can also be standardized, so as to normalize the names of medical entities and attribute information into standard or more commonly used synonymous expressions, or attribute information Values and times in are normalized to units. Optionally, the name of the medical entity and attribute information may be matched based on a preset synonym table, and the matched name is mapped to a preset standard name.

作为一种示例，还可以基于属性实现医学实体的归一化，通过将各医学实体拆解为对应的属性，并通过拆解出的属性与预设医学实体名称对应的属性进行对比，若一致的属性数量大于预设阈值，则确定该医学实体同义，将医学实体的名称归一化。As an example, the normalization of medical entities can also be realized based on attributes, by dismantling each medical entity into corresponding attributes, and comparing the disassembled attributes with the attributes corresponding to the preset medical entity names, if they are consistent If the number of attributes is greater than the preset threshold, it is determined that the medical entity is synonymous, and the name of the medical entity is normalized.

作为另一种示例，还可以基于skip-thoughts的语义相似度模型计算医学实体与预设实体间的相似度，若相似度大于预设阈值，则确定医学实体相似，将医学实体名称归一化。As another example, the similarity between the medical entity and the preset entity can also be calculated based on the skip-thoughts semantic similarity model. If the similarity is greater than the preset threshold, it is determined that the medical entity is similar, and the name of the medical entity is normalized. .

步骤104，根据对应关系生成结构化病历。Step 104, generating a structured medical record according to the corresponding relationship.

本实施例中，在确定医学实体与属性信息的对应关系后，可以确定医学实体及对应描述该医学实体的属性信息，从而生成结构化病历，构建病历的语义特征。In this embodiment, after determining the correspondence between the medical entity and the attribute information, the medical entity and the corresponding attribute information describing the medical entity can be determined, so as to generate a structured medical record and construct the semantic features of the medical record.

作为一种示例，待处理的病历文本为“患者自诉2日前无明显诱因出现阵发性上腹部剧烈疼痛，伴发热38.7度，无咳嗽咳痰等”，最终生成的结构化病历如下表所示，As an example, the text of the medical record to be processed is "Patient's self-reported paroxysmal upper abdominal pain with no obvious cause 2 days ago, accompanied by a fever of 38.7 degrees, no cough and sputum, etc.", and the final structured medical record is shown in the following table ,

基于上述实施例，进一步地，在获取待处理的电子病历时，还可以对电子病历进行规范化处理。规范的电子病历包括预设的字段，比如包括五史一诉(主诉、现病史、既往史、个人史、婚育史、家族史)等字段。Based on the above embodiments, further, when obtaining the electronic medical records to be processed, the electronic medical records can also be standardized. Standardized electronic medical records include preset fields, such as five histories and one complaint (main complaint, present illness history, past history, personal history, marriage and childbirth history, family history) and other fields.

图2为本发明实施例所提供的另一种电子病历的处理方法的流程示意图，如图2所示，该方法包括：Fig. 2 is a schematic flow chart of another electronic medical record processing method provided by an embodiment of the present invention. As shown in Fig. 2, the method includes:

步骤201，获取纯文本病历。Step 201, obtain plain text medical records.

在实际应用中，由于记录的不规范，可能存在并未将电子病历中的各个字段(如主诉、现病史等)以字段名-字段内容的形式分开存储，而是以一个纯文本文件进行存储的纯文本病历，本实施例中，可以获取纯文本病历，以进一步对纯文本病历进行规范化处理。In practical applications, due to irregular records, there may be cases where each field in the electronic medical record (such as chief complaint, history of present illness, etc.) is not stored separately in the form of field name-field content, but is stored in a plain text file The plain text medical records, in this embodiment, the plain text medical records can be obtained to further standardize the plain text medical records.

步骤202，根据预设的字段名词典与纯文本病历进行匹配，确定纯文本病历中的字段名。Step 202, matching the plain text medical records according to the preset field name dictionary to determine the field names in the plain text medical records.

本实施例中，可以预先设置字段名词典，字段名词典中包括字段名例如主诉、现病史等以及与字段名对应的提示语。In this embodiment, a field name dictionary can be preset, and the field name dictionary includes field names such as chief complaint, history of present illness, etc., and prompts corresponding to the field names.

作为一种示例，对于字段名“主诉”，词典中包括对应的提示语“主诉：”“主诉”“【主诉】”等。可以将词典中的提示语与纯文本病历进行匹配，确定病历中的提示语。可选地，可以通过正则匹配的方式，根据预设的字段名词典从病历文本中识别出提示语。进而，根据识别出的提示语确定纯文本病历中的字段名。As an example, for the field name "main complaint", the dictionary includes corresponding prompts "main complaint:", "main complaint", "[main complaint]" and so on. The prompts in the dictionary can be matched with the plain text medical records to determine the prompts in the medical records. Optionally, the prompt words can be identified from the medical record text according to the preset field name dictionary by means of regular matching. Furthermore, the field names in the plain text medical records are determined according to the recognized prompt words.

在本发明的一个实施例中，由于可能存在字段名记录不准确的情况，比如“现病史”记录为“病史”，因此，还可以将字段名映射为标准的字段名。作为一种示例，可以基于关键字匹配将字段名映射为标准的字段名。比如，通过关键字匹配到病历文本中的“病史”，进而将“病史”修改为“现病史”。In one embodiment of the present invention, since there may be cases where field names are recorded inaccurately, for example, "current medical history" is recorded as "medical history", therefore, the field names may also be mapped to standard field names. As an example, field names can be mapped to standard field names based on keyword matching. For example, match the keyword to "medical history" in the medical record text, and then modify "medical history" to "present illness history".

作为另一种示例，可以基于语义相似度模型将字段名映射为标准的字段名。基于语义相似度模型计算预设的标准字段名与文本中识别到的字段名的相似度，根据相似度最大的标准字段名替换相应的文本中的字段名。As another example, field names may be mapped to standard field names based on a semantic similarity model. Based on the semantic similarity model, the similarity between the preset standard field name and the field name recognized in the text is calculated, and the field name in the corresponding text is replaced according to the standard field name with the highest similarity.

步骤203，根据字段名从纯文本病历中确定与字段名对应的字段内容，根据字段名和字段内容对纯文本病历进行切分。Step 203, determine the field content corresponding to the field name from the plain text medical record according to the field name, and segment the plain text medical record according to the field name and field content.

本实施例中，在识别出字段名后，可以进一步确定各字段名对应的字段内容。例如，可以将目标字段名至下一字段名之间的内容，作为目标字段名对应的字段内容。再例如，可以将目标字段名至段落结束符之间的内容，作为目标字段名对应的字段内容。In this embodiment, after the field names are identified, the field content corresponding to each field name may be further determined. For example, the content between the target field name and the next field name may be used as the field content corresponding to the target field name. For another example, the content between the target field name and the paragraph terminator may be used as the field content corresponding to the target field name.

可选地，对于上述方法无法切分的段落，可以基于卷积神经网络训练文本分类模型，将剩余段落输入该文本分离模型进行处理，输出对应的字段名，从而实现根据字段名和字段内容对纯文本病历进行切分。Optionally, for the paragraphs that cannot be segmented by the above methods, a text classification model can be trained based on the convolutional neural network, and the remaining paragraphs can be input into the text separation model for processing, and the corresponding field names are output, so that pure Text medical records are segmented.

步骤204，判断切分后的纯文本病历中是否缺失字段内容。Step 204, judging whether field content is missing in the segmented plain text medical record.

步骤205，若是，则基于关键字匹配和/或文本分类模型从纯文本病历中获取缺失的字段内容。Step 205, if yes, obtain the missing field content from the plain text medical record based on keyword matching and/or text classification model.

步骤206，将缺失的字段内容添加至切分后的纯文本病历中目标字段名的位置。Step 206, adding the missing field content to the position of the target field name in the segmented plain text medical record.

在实际应用中，可能存在病历记录不规范，导致某些字段合并记录在了其他字段中，比如可能将主诉记录在现病史中、将婚育史记录在个人史中。In practical applications, there may be irregularities in medical record records, resulting in some fields being merged and recorded in other fields. For example, the chief complaint may be recorded in the current medical history, and the marriage and childbearing history may be recorded in the personal history.

本实施例中，可以先判断切分后的纯文本病历是否缺失字段内容，若是，则进一步从已有字段内容中尝试提取缺失的字段内容。In this embodiment, it may first be judged whether the segmented plain text medical record is missing field content, and if so, further try to extract the missing field content from the existing field content.

作为一种示例，可以对于每一字段名预设关键字，基于关键字匹配从纯文本病历中获取缺失的字段内容。例如在现病史的字段内容中匹配“因***入院”的模式，抽取缺失的主诉，再例如在个人史中匹配“已婚”或“未婚”等模型，提取对应的句子填入婚育史。As an example, a keyword can be preset for each field name, and the missing field content can be obtained from the plain text medical record based on keyword matching. For example, match the pattern of "admission due to ***" in the field content of the current medical history, extract the missing chief complaint, and then match the models of "married" or "unmarried" in the personal history, and extract the corresponding sentences to fill in the marriage and childbearing history.

作为另一种示例，基于文本分类模型从纯文本病历中获取缺失的字段内容。基于卷积神经网络训练文本分类模型，并将预设的部分字段中的每个句子进行预测，确定各句子对应的字段名，从而判断各句子是否可以填入缺失字段。As another example, a text classification model is used to obtain missing field content from plain text medical records. The text classification model is trained based on the convolutional neural network, and each sentence in the preset partial fields is predicted to determine the field name corresponding to each sentence, so as to judge whether each sentence can fill in the missing field.

在本发明的一个实施例中，若病历为半结构化的不规范病历，即病历包括字段名-字段内容形式的结构，但存在字段名记录不准确的情况，可以基于关键字匹配和/或语义相似度模型将字段名映射为标准的字段名。进而，对病历文本进行缺失字段填补，以获取规范的电子病历，便于后续进一步处理，提高准确性。In one embodiment of the present invention, if the medical record is a semi-structured non-standard medical record, that is, the medical record includes a structure in the form of field name-field content, but there is an inaccurate field name record, it can be based on keyword matching and/or The semantic similarity model maps field names to standard field names. Furthermore, the missing fields are filled in the medical record text to obtain a standardized electronic medical record, which is convenient for further processing and improves accuracy.

本发明实施例的电子病历的处理方法，通过对病历文本进行内容切分分类以及缺失字段填补，实现电子病历规范化，使不同版本HIS的病历可以被更有效的统一利用。并且可以补充病历中缺失的重要字段，帮助医院构建更规范的电子病历存档。The electronic medical record processing method of the embodiment of the present invention realizes the standardization of electronic medical records by performing content segmentation and classification on medical record texts and filling in missing fields, so that medical records of different versions of HIS can be more effectively and uniformly utilized. And it can supplement important fields missing in medical records, helping hospitals build more standardized electronic medical records.

基于上述实施例，进一步地，还可以定制化的抽取病历中的重要信息。Based on the above embodiments, further, it is also possible to extract important information from medical records in a customized manner.

图3为本发明实施例所提供的另一种电子病历的处理方法的流程示意图，如图3所示，该方法包括：Fig. 3 is a schematic flow chart of another electronic medical record processing method provided by an embodiment of the present invention. As shown in Fig. 3, the method includes:

步骤301，对病历文本进行分句，并根据预设规则从病历文本中确定候选句。Step 301, divide the medical record text into sentences, and determine candidate sentences from the medical record text according to preset rules.

本实施例中，可以预先定义待抽取的数据，其中，待抽取的数据可以根据需要进行设置，包括但不限于不同的病种、单据类型或定制者的需求等。In this embodiment, the data to be extracted can be defined in advance, wherein the data to be extracted can be set according to needs, including but not limited to different types of diseases, types of documents, or requirements of the customizer.

可选地，可以对病历文本进行预处理，识别出病历文本中的医学实体和属性信息。Optionally, the medical record text can be preprocessed to identify the medical entity and attribute information in the medical record text.

作为一种示例，可以根据待抽取的数据预先设置候选句定位规则，确定候选句的关键词表。进而，对病历文本进行分句，根据设定的候选句定位规则，若在某一句子中匹配到了关键词表中的内容，则确定该句为候选句。As an example, candidate sentence locating rules may be preset according to the data to be extracted to determine the keyword list of candidate sentences. Furthermore, the medical record text is divided into sentences, and according to the set candidate sentence positioning rules, if a certain sentence matches the content in the keyword list, then the sentence is determined to be a candidate sentence.

步骤302，从候选句中提取与预设的输出名称对应的输出内容，根据输出名称和输出内容生成定制的结构化信息。Step 302, extract the output content corresponding to the preset output name from the candidate sentence, and generate customized structured information according to the output name and output content.

本实施例中，待抽取的数据可以为输出名称-输出内容的形式。In this embodiment, the data to be extracted may be in the form of output name-output content.

作为一种示例，可以根据待抽取的数据预先设置输出内容提取规则，确定输出内容的关键词表。比如，对于输出名称“病变类型”，可以在词表中预设关键词“腺癌”等一系列病变类型，进而从候选句中匹配对应的关键词，若匹配到“腺癌”，则将该词作为与输出名称“病变类型”对应的输出内容。As an example, output content extraction rules may be preset according to the data to be extracted to determine the keyword table of the output content. For example, for the output name "lesion type", a series of lesion types such as the keyword "adenocarcinoma" can be preset in the vocabulary, and then the corresponding keywords are matched from the candidate sentences. If "adenocarcinoma" is matched, the This word is used as the output content corresponding to the output name "lesion type".

可选地，还可以基于语义相似度模型来对上述匹配规则中的词汇进行扩展，提高规则的泛化性。Optionally, the vocabulary in the above matching rules can also be extended based on the semantic similarity model to improve the generalization of the rules.

举例而言，病历文本为：“病历详情：(右上肺部分肺叶)部分肺叶组织，大小12x11x4cm。距支气管断端约0.2cm可见一灰白结节，直径4cm，质脆，界清，其余灰红，质中。结论：冰余组织右上肺)腺癌，中-高分化，以贴壁样生长为主(贴壁型60％+腺泡型40％)。”For example, the text of the medical record is: "Medical record details: (part of the right upper lung lobe) part of the lung lobe tissue, size 12x11x4cm. About 0.2cm from the broken end of the bronchus, there is a gray-white nodule, 4cm in diameter, crisp and clear, and the rest is gray-red , medium quality. Conclusion: adenocarcinoma of the right upper lung in ice-residual tissue, moderately to well differentiated, mainly with lepidic growth (lepidic type 60% + acinar type 40%)."

定制的结构化信息如下：The customized structured information is as follows:

“标本部位：右肺上叶"Specimen site: upper lobe of right lung

病变类型：腺癌Lesion type: Adenocarcinoma

病变亚型：[附壁状腺癌，腺泡状腺癌]Lesion subtype: [mural adenocarcinoma, acinar adenocarcinoma]

标本大小：{num:[12,11,4],unit:cm}Specimen size: {num:[12,11,4],unit:cm}

肿块大小：{num:4,unit:cm}”Lump size: {num:4,unit:cm}”

可选地，为了方便规则的定制，可以引入医疗从业人员完成上述的规则配置及数据标注工作，基于平台标注的数据，利用文本结构化的深度学习模型来提高规则系统的泛化能力。Optionally, in order to facilitate the customization of rules, medical practitioners can be introduced to complete the above-mentioned rule configuration and data labeling work. Based on the data marked on the platform, a text-structured deep learning model can be used to improve the generalization ability of the rule system.

可选地，可以针对五史一诉字段，以患者为中心构建通用的病历结构化数据集。该数据集包括患者的时间轴，每个时间点包括患者出现的症状和体征和其他病况、所做的检查检验、诊断的疾病、进行的治疗(手术或药品)、疗效和预后等。由此，能够整合患者多个时期的病历信息，并按照时间清晰的呈现给相关的人员。Optionally, a generic structured data set of medical records can be constructed centered on the patient for the fields of five histories and one complaint. The data set includes a patient's timeline, and each time point includes the symptoms and signs and other medical conditions of the patient, the examination tests performed, the disease diagnosed, the treatment performed (surgery or medicine), the curative effect and prognosis, etc. In this way, the medical record information of patients in multiple periods can be integrated and clearly presented to relevant personnel according to time.

本发明实施例的电子病历的处理方法，能够根据用户需求定制化的从病历中抽取特定的重要结构化信息，从而完成对病历更深层次的理解，有效的帮助医生构建患者数据库或科研数据集。The electronic medical record processing method of the embodiment of the present invention can customize and extract specific important structured information from medical records according to user needs, thereby completing a deeper understanding of medical records and effectively helping doctors build patient databases or scientific research data sets.

为了实现上述实施例，本发明还提出一种电子病历的处理装置。In order to realize the above embodiments, the present invention also proposes an electronic medical record processing device.

图4为本发明实施例所提供的一种电子病历的处理装置的结构示意图，如图4所示，该装置包括：获取模块100，识别模块200，确定模块300，生成模块400。FIG. 4 is a schematic structural diagram of an electronic medical record processing device provided by an embodiment of the present invention. As shown in FIG.

其中，获取模块100，用于获取待处理的病历文本。Wherein, the obtaining module 100 is used to obtain medical record texts to be processed.

识别模块200，用于识别出病历文本中的医学实体和属性信息。The identification module 200 is configured to identify medical entities and attribute information in the medical record text.

确定模块300，用于确定医学实体与属性信息之间的对应关系。The determination module 300 is configured to determine the correspondence between medical entities and attribute information.

生成模块400，用于根据对应关系生成结构化病历。A generating module 400, configured to generate structured medical records according to the corresponding relationship.

在本发明的一个实施例中，识别模块200具体用于：获取预设的医学词表，其中，医学词表中包括候选医学实体；将候选医学实体与病历文本进行匹配，确定病历文本中的医学实体。In one embodiment of the present invention, the recognition module 200 is specifically configured to: acquire a preset medical vocabulary, wherein the medical vocabulary includes candidate medical entities; match the candidate medical entities with the medical record text, and determine the medical record text medical entity.

在本发明的一个实施例中，识别模块200具体用于：将病历文本输入到预先训练的序列标注模型中进行处理，获取病历文本的实体标注信息；根据实体标注信息确定病历文本中的医学实体。In one embodiment of the present invention, the identification module 200 is specifically used to: input the medical record text into the pre-trained sequence labeling model for processing, and obtain the entity labeling information of the medical record text; determine the medical entity in the medical record text according to the entity labeling information .

在本发明的一个实施例中，识别模块200具体用于：获取预设的属性词表，其中，属性词表中包括候选属性；将候选属性与病历文本进行匹配，确定病历文本中的属性信息。In one embodiment of the present invention, the recognition module 200 is specifically used to: obtain a preset attribute vocabulary, wherein the attribute vocabulary includes candidate attributes; match the candidate attributes with the medical record text, and determine the attribute information in the medical record text .

在图4的基础上，图5所示的装置还包括：分类模块500，处理模块600。On the basis of FIG. 4 , the device shown in FIG. 5 further includes: a classification module 500 and a processing module 600 .

其中，分类模块500，用于获取医学实体的候选类别，对于每一候选类别分别抽取医学实体的属性，将抽取出的属性个数最多的候选类别作为医学实体的类别；和/或，根据医学实体在病历文本中的上下文信息，确定医学实体的类别。Among them, the classification module 500 is used to obtain the candidate categories of medical entities, extract the attributes of medical entities for each candidate category, and use the candidate category with the largest number of extracted attributes as the category of medical entities; and/or, according to medical The context information of the entity in the medical record text determines the category of the medical entity.

处理模块600，用于对病历文本进行分句，并根据预设规则从病历文本中确定候选句；从候选句中提取与预设的输出名称对应的输出内容，根据输出名称和输出内容生成定制的结构化信息。The processing module 600 is used to divide the medical record text into sentences, and determine candidate sentences from the medical record text according to preset rules; extract the output content corresponding to the preset output name from the candidate sentence, and generate customized structured information.

在本发明的一个实施例中，获取模块100包括：获取单元，用于获取纯文本病历；匹配单元，用于根据预设的字段名词典与纯文本病历进行匹配，确定纯文本病历中的字段名；切分单元，用于根据字段名从纯文本病历中确定与字段名对应的字段内容，根据字段名和字段内容对纯文本病历进行切分。In one embodiment of the present invention, the acquisition module 100 includes: an acquisition unit for acquiring plain text medical records; a matching unit for matching the plain text medical records according to a preset field name dictionary to determine the fields in the plain text medical records name; the segmentation unit is used to determine the field content corresponding to the field name from the plain text medical record according to the field name, and to segment the plain text medical record according to the field name and field content.

可选地，获取模块100还包括：映射单元，用于基于关键字匹配和/或语义相似度模型将字段名映射为标准的字段名。Optionally, the acquiring module 100 further includes: a mapping unit, configured to map field names to standard field names based on keyword matching and/or semantic similarity models.

可选地，获取模块100还包括：判断单元，用于判断切分后的纯文本病历中是否缺失字段内容；若是，则基于关键字匹配和/或文本分类模型从纯文本病历中获取缺失的字段内容；将缺失的字段内容添加至切分后的纯文本病历中目标字段名的位置。Optionally, the obtaining module 100 also includes: a judging unit, configured to judge whether field content is missing in the segmented plain text medical record; if so, obtain the missing field content from the plain text medical record based on keyword matching and/or text classification model Field content; add the missing field content to the position of the target field name in the segmented plain text medical record.

需要说明的是，前述实施例对电子病历的处理方法的解释说明同样适用于本实施例的电子病历的处理装置，此处不再赘述。It should be noted that, the explanations of the electronic medical record processing method in the foregoing embodiments are also applicable to the electronic medical record processing device in this embodiment, and will not be repeated here.

本发明实施例的电子病历的处理装置，通过获取待处理的病历文本，进而识别出病历文本中的医学实体和属性信息。进一步确定医学实体与所述属性信息之间的对应关系，根据对应关系生成结构化病历。由此，能够识别电子病历中的医疗实体及描述医疗实体的属性信息，并结构化的表示，提高效率，降低了成本，满足实际应用中的结构化需求。提供了通用的对电子病历的理解结果，涵盖了病历中最重要的医疗实体及属性等信息，可以帮助辅助诊断、病历检索等系统从语义层面理解病历，构建病历的语义特征。The electronic medical record processing device in the embodiment of the present invention recognizes medical entities and attribute information in the medical record text by acquiring the medical record text to be processed. The corresponding relationship between the medical entity and the attribute information is further determined, and a structured medical record is generated according to the corresponding relationship. As a result, the medical entity in the electronic medical record and the attribute information describing the medical entity can be identified and expressed in a structured manner, which improves efficiency, reduces costs, and meets the structural requirements in practical applications. It provides a general understanding of electronic medical records, covering the most important medical entities and attributes in medical records, which can help systems such as auxiliary diagnosis and medical record retrieval understand medical records from the semantic level and construct the semantic features of medical records.

为了实现上述实施例，本发明还提出一种计算机设备，包括处理器和存储器；其中，处理器通过读取存储器中存储的可执行程序代码来运行与可执行程序代码对应的程序，以用于实现如前述任一实施例所述的电子病历的处理方法。In order to realize the above embodiments, the present invention also proposes a computer device, including a processor and a memory; wherein, the processor executes a program corresponding to the executable program code by reading the executable program code stored in the memory, for Realize the electronic medical record processing method as described in any one of the foregoing embodiments.

为了实现上述实施例，本发明还提出一种计算机程序产品，当计算机程序产品中的指令被处理器执行时实现如前述任一实施例所述的电子病历的处理方法。In order to realize the above embodiments, the present invention also proposes a computer program product. When the instructions in the computer program product are executed by the processor, the electronic medical record processing method as described in any of the foregoing embodiments is realized.

为了实现上述实施例，本发明还提出一种计算机可读存储介质，其上存储有计算机程序，该程序被处理器执行时实现如前述任一实施例所述的电子病历的处理方法。In order to realize the above-mentioned embodiments, the present invention also proposes a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the electronic medical record processing method as described in any one of the above-mentioned embodiments is implemented.

图6示出了适于用来实现本发明实施例的示例性计算机设备的框图。图6显示的计算机设备12仅仅是一个示例，不应对本发明实施例的功能和使用范围带来任何限制。Figure 6 shows a block diagram of an exemplary computer device suitable for use in implementing embodiments of the present invention. The computer device 12 shown in FIG. 6 is only an example, and should not limit the functions and scope of use of this embodiment of the present invention.

如图6所示，计算机设备12以通用计算设备的形式表现。计算机设备12的组件可以包括但不限于：一个或者多个处理器或者处理单元16，系统存储器28，连接不同系统组件(包括系统存储器28和处理单元16)的总线18。As shown in FIG. 6, computer device 12 takes the form of a general-purpose computing device. Components of computer device 12 may include, but are not limited to: one or more processors or processing units 16 , system memory 28 , bus 18 connecting various system components including system memory 28 and processing unit 16 .

总线18表示几类总线结构中的一种或多种，包括存储器总线或者存储器控制器，外围总线，图形加速端口，处理器或者使用多种总线结构中的任意总线结构的局域总线。举例来说，这些体系结构包括但不限于工业标准体系结构(Industry StandardArchitecture；以下简称：ISA)总线，微通道体系结构(Micro Channel Architecture；以下简称：MAC)总线，增强型ISA总线、视频电子标准协会(Video Electronics StandardsAssociation；以下简称：VESA)局域总线以及外围组件互连(Peripheral ComponentInterconnection；以下简称：PCI)总线。Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, or a local bus using any of a variety of bus structures. For example, these architectures include but are not limited to Industry Standard Architecture (Industry Standard Architecture; hereinafter referred to as: ISA) bus, Micro Channel Architecture (Micro Channel Architecture; hereinafter referred to as: MAC) bus, enhanced ISA bus, video electronics standard Association (Video Electronics Standards Association; hereinafter referred to as: VESA) local bus and peripheral component interconnection (Peripheral Component Interconnection; hereinafter referred to as: PCI) bus.

计算机设备12典型地包括多种计算机系统可读介质。这些介质可以是任何能够被计算机设备12访问的可用介质，包括易失性和非易失性介质，可移动的和不可移动的介质。Computer device 12 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by computer device 12 and include both volatile and nonvolatile media, removable and non-removable media.

存储器28可以包括易失性存储器形式的计算机系统可读介质，例如随机存取存储器(Random Access Memory；以下简称：RAM)30和/或高速缓存存储器32。计算机设备12可以进一步包括其它可移动/不可移动的、易失性/非易失性计算机系统存储介质。仅作为举例，存储系统34可以用于读写不可移动的、非易失性磁介质(图6未显示，通常称为“硬盘驱动器”)。尽管图6中未示出，可以提供用于对可移动非易失性磁盘(例如“软盘”)读写的磁盘驱动器，以及对可移动非易失性光盘(例如：光盘只读存储器(Compact Disc Read OnlyMemory；以下简称：CD-ROM)、数字多功能只读光盘(Digital Video Disc Read OnlyMemory；以下简称：DVD-ROM)或者其它光介质)读写的光盘驱动器。在这些情况下，每个驱动器可以通过一个或者多个数据介质接口与总线18相连。存储器28可以包括至少一个程序产品，该程序产品具有一组(例如至少一个)程序模块，这些程序模块被配置以执行本申请各实施例的功能。The memory 28 may include a computer system readable medium in the form of a volatile memory, such as a random access memory (Random Access Memory; hereinafter referred to as: RAM) 30 and/or a cache memory 32 . Computer device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read and write to non-removable, non-volatile magnetic media (not shown in FIG. 6, commonly referred to as a "hard drive"). Although not shown in FIG. 6, a disk drive for reading and writing to a removable nonvolatile disk (such as a "floppy disk") may be provided, as well as a disk drive for a removable nonvolatile disk (such as a CD-ROM (Compact Disc Read Only Memory (hereinafter referred to as: CD-ROM), Digital Video Disc Read Only Memory (hereinafter referred to as: DVD-ROM) or other optical media) read and write optical disc drives. In these cases, each drive may be connected to bus 18 via one or more data media interfaces. Memory 28 may include at least one program product having a set (eg, at least one) of program modules configured to perform the functions of various embodiments of the present application.

具有一组(至少一个)程序模块42的程序/实用工具40，可以存储在例如存储器28中，这样的程序模块42包括但不限于操作系统、一个或者多个应用程序、其它程序模块以及程序数据，这些示例中的每一个或某种组合中可能包括网络环境的实现。程序模块42通常执行本申请所描述的实施例中的功能和/或方法。A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in memory 28, such program modules 42 including but not limited to an operating system, one or more application programs, other program modules, and program data , each or some combination of these examples may include implementations of network environments. The program modules 42 generally perform the functions and/or methods of the embodiments described herein.

计算机设备12也可以与一个或多个外部设备14(例如键盘、指向设备、显示器24等)通信，还可与一个或者多个使得用户能与该计算机系统/服务器12交互的设备通信，和/或与使得该计算机系统/服务器12能与一个或多个其它计算设备进行通信的任何设备(例如网卡，调制解调器等等)通信。这种通信可以通过输入/输出(I/O)接口22进行。并且，计算机设备12还可以通过网络适配器20与一个或者多个网络(例如局域网(Local AreaNetwork；以下简称：LAN)，广域网(Wide Area Network；以下简称：WAN)和/或公共网络，例如因特网)通信。如图所示，网络适配器20通过总线18与计算机设备12的其它模块通信。应当明白，尽管图中未示出，可以结合计算机设备12使用其它硬件和/或软件模块，包括但不限于：微代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、RAID系统、磁带驱动器以及数据备份存储系统等。The computer device 12 may also communicate with one or more external devices 14 (e.g., a keyboard, pointing device, display 24, etc.), and with one or more devices that enable a user to interact with the computer system/server 12, and/or Or communicate with any device (eg, network card, modem, etc.) that enables the computer system/server 12 to communicate with one or more other computing devices. Such communication may occur through input/output (I/O) interface 22 . Moreover, the computer device 12 can also be connected to one or more networks (such as a local area network (Local Area Network; hereinafter referred to as: LAN), a wide area network (Wide Area Network; hereinafter referred to as: WAN) and/or a public network, such as the Internet) through the network adapter 20 communication. As shown, network adapter 20 communicates with other modules of computer device 12 via bus 18 . It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with computer device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives And data backup storage system, etc.

处理单元16通过运行存储在系统存储器28中的程序，从而执行各种功能应用以及数据处理，例如实现前述实施例中提及的方法。The processing unit 16 executes various functional applications and data processing by running the programs stored in the system memory 28 , such as implementing the methods mentioned in the foregoing embodiments.

在本发明的描述中，需要理解的是，术语“第一”、“第二”仅用于描述目的，而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此，限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。在本发明的描述中，“多个”的含义是至少两个，例如两个，三个等，除非另有明确具体的限定。In the description of the present invention, it should be understood that the terms "first" and "second" are used for description purposes only, and cannot be interpreted as indicating or implying relative importance or implicitly indicating the quantity of indicated technical features. Thus, the features defined as "first" and "second" may explicitly or implicitly include at least one of these features. In the description of the present invention, "plurality" means at least two, such as two, three, etc., unless otherwise specifically defined.

在本说明书的描述中，参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施例或示例中。在本说明书中，对上述术语的示意性表述不必须针对的是相同的实施例或示例。而且，描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外，在不相互矛盾的情况下，本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。In the description of this specification, descriptions with reference to the terms "one embodiment", "some embodiments", "example", "specific examples", or "some examples" mean that specific features described in connection with the embodiment or example , structure, material or characteristic is included in at least one embodiment or example of the present invention. In this specification, the schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the described specific features, structures, materials or characteristics may be combined in any suitable manner in any one or more embodiments or examples. In addition, those skilled in the art can combine and combine different embodiments or examples and features of different embodiments or examples described in this specification without conflicting with each other.

尽管上面已经示出和描述了本发明的实施例，可以理解的是，上述实施例是示例性的，不能理解为对本发明的限制，本领域的普通技术人员在本发明的范围内可以对上述实施例进行变化、修改、替换和变型。Although the embodiments of the present invention have been shown and described above, it can be understood that the above embodiments are exemplary and should not be construed as limiting the present invention, those skilled in the art can make the above-mentioned The embodiments are subject to changes, modifications, substitutions and variations.