CN117292782A

Movatterモバイル変換

Info

Publication number: CN117292782A
Application number: CN202310675843.0A
Authority: CN
Inventors: 王其景
Original assignee: Individual
Current assignee: Individual
Priority date: 2023-06-08
Filing date: 2023-06-08
Publication date: 2023-12-26
Anticipated expiration: 2043-06-08
Also published as: CN117292782B

Abstract

The invention provides a method and a system for automatically generating an electronic report, and belongs to the technical field of automatically generating reports by electronic data. The automatic generation method of the electronic report comprises the following steps: step 1, obtaining diagnosis and treatment image data through detection equipment; step 2, constructing a report generation model and receiving diagnosis and treatment image data; step 3, the report generation model performs data analysis on the received diagnosis and treatment image data to generate an electronic report; and step 4, outputting an electronic report. According to the invention, the original manual input is replaced by automatically generating the electronic report, so that the report conversion time is effectively shortened, and the report generation efficiency is improved; meanwhile, training optimization is performed in a mode of rewarding weight aiming at the model, the calculated amount is effectively reduced, and meanwhile, the accuracy of the electronic report is improved, so that the quality of the electronic report is optimized.

Description

Translated fromChinese

一种电子报告自动生成的方法及系统A method and system for automatically generating electronic reports

技术领域Technical field

本发明涉及电子数据自动生成报告的技术领域，特别涉及一种电子报告自动生成的方法及系统。The present invention relates to the technical field of automatically generating reports from electronic data, and in particular to a method and system for automatically generating electronic reports.

背景技术Background technique

科学技术的发展不仅在探索未知的领域，同时也在推动着社会进步，智能技术的渗透使得各行各业的作业效率都得到了有效提升。同样，在医疗就诊的过程中，智能化技术也为日常的就诊过程提供了更为便利的作业方法。The development of science and technology is not only exploring unknown areas, but also promoting social progress. The penetration of intelligent technology has effectively improved the operating efficiency of all walks of life. Similarly, in the process of medical treatment, intelligent technology also provides more convenient working methods for the daily medical treatment process.

在日常的医疗活动中，例如就诊、会议、巡视等活动都会产生一系列的活动数据，为了实现友好的数据留存，为后续的诊疗提供辅助源数据，有效的数据记录便可以极大程度上的提高就诊效率。现有技术中，关于日常产生的活动数据，普遍还是采用人工编辑的方式生成相对应的报告记录，仅仅是实现了纸质记录到电子留存的转换，虽然实现了数据的有效存储，但其实质上所要浪费的人力和时间成本还是存在的。In daily medical activities, such as medical visits, meetings, inspections and other activities, a series of activity data will be generated. In order to achieve friendly data retention and provide auxiliary source data for subsequent diagnosis and treatment, effective data recording can greatly improve Improve the efficiency of medical treatment. In the existing technology, for daily generated activity data, manual editing is generally used to generate corresponding report records, which only realizes the conversion of paper records to electronic retention. Although effective storage of data is achieved, its essence is The cost of wasted manpower and time still exists.

发明内容Contents of the invention

发明目的：提出一种电子报告自动生成的方法及系统，以解决现有技术存在的上述问题。通过自动生成电子报告的方式替换原有的人工录入，有效缩减报告转换时间，提高报告生成效率。Purpose of the invention: Propose a method and system for automatically generating electronic reports to solve the above problems existing in the existing technology. By automatically generating electronic reports to replace the original manual entry, the report conversion time is effectively reduced and the report generation efficiency is improved.

技术方案：第一方面，提出了一种电子报告自动生成的方法，该方法包括以下步骤：Technical solution: First, a method for automatically generating electronic reports is proposed, which includes the following steps:

步骤1、通过检测设备获得诊疗图像数据；Step 1. Obtain diagnosis and treatment image data through detection equipment;

步骤2、构建报告生成模型并接收诊疗图像数据；Step 2. Build a report generation model and receive diagnosis and treatment image data;

步骤3、报告生成模型对接收到的诊疗图像数据进行数据分析生成电子报告；其中，报告生成模型包括：图像报告模型和文本报告模型。Step 3: The report generation model performs data analysis on the received diagnosis and treatment image data to generate an electronic report; the report generation model includes: an image report model and a text report model.

当诊疗图像数据是医学影像数据时，利用图像报告模型进行数据分析生成电子报告；当诊疗图像数据是手写报告时，利用文本报告模型进行数据分析生成电子报告；When the diagnosis and treatment image data is medical imaging data, the image report model is used for data analysis to generate an electronic report; when the diagnosis and treatment image data is a handwritten report, the text report model is used for data analysis and generation of an electronic report;

步骤4、输出电子报告。Step 4. Output the electronic report.

在第一方面的一些可实现方式中，图像报告模型包括：编码器、解码器和强化模块；其中，编码器接收诊疗图像数据后执行图像编码操作，获得相对应的特征数据，并传输至所述解码器中；解码器接收特征后执行解码操作，并生成相对应的文字描述后，输出至强化模块；强化模块接收根据接收到的文字描述生成初版电子报告，并对初版电子报告执行强化学习操作，在完成强化学习后，获得最终的电子报告。In some implementable ways of the first aspect, the image reporting model includes: an encoder, a decoder and an enhancement module; wherein the encoder performs an image encoding operation after receiving the diagnosis and treatment image data, obtains the corresponding feature data, and transmits it to In the decoder described above; the decoder performs the decoding operation after receiving the features, generates the corresponding text description, and outputs it to the enhancement module; the enhancement module receives and generates the first version of the electronic report based on the received text description, and performs reinforcement learning on the first version of the electronic report Operation, after completing the reinforcement learning, obtain the final electronic report.

编码器中包括：残差网络、特征连接模块、第一注意力模块；其中，残差网络在接收到诊疗图像数据后执行特征提取和编码，获得视觉特征数据，并传输至特征连接模块；特征连接模块针对接收到的视觉特征数据进行连接，获得连接特征并将获得的连接特征传输至注意力模块，进行特征之间的关系挖掘，建立特征之间的联系。The encoder includes: residual network, feature connection module, and first attention module; among them, the residual network performs feature extraction and encoding after receiving the diagnosis and treatment image data, obtains visual feature data, and transmits it to the feature connection module; feature The connection module connects the received visual feature data, obtains the connection features, and transmits the obtained connection features to the attention module to mine the relationship between the features and establish the connection between the features.

解码器中包括：LSTM网络、第二注意力模块、GLU函数模块、线性模块、softmax函数模块。其中，LSTM网络接收编码器的输出特征数据，在经过分析后结合解码器输出的特征一同传输至第二注意力模块；所述第二注意力模块在经过数据处理后按序传输至GLU函数模块、线性模块和softmax函数模块，获得预测的字词。The decoder includes: LSTM network, second attention module, GLU function module, linear module, and softmax function module. Among them, the LSTM network receives the output feature data of the encoder, and after analysis, it is combined with the features output by the decoder and transmitted to the second attention module; the second attention module is sequentially transmitted to the GLU function module after data processing , linear module and softmax function module to obtain predicted words.

在第一方面的一些可实现方式中，在预设范围内执行循环迭代的方式进行数据处理。在利用编码器执行编码的阶段，在注意力模块执行完当前的特征提取操作后，在预设的循环范围内，将提取到的特征再次输入到第一注意力模块中，在循环结束后获得最终的编码结果。In some implementation manners of the first aspect, data processing is performed in a loop iteration manner within a preset range. In the stage of using the encoder to perform encoding, after the attention module completes the current feature extraction operation, the extracted features are input into the first attention module again within the preset loop range, and are obtained after the loop ends. The final encoding result.

在利用解码器执行解码的阶段，在LSTM网络接收到编码器的输出编码结果后，执行数据处理，并将输出的隐藏状态以及编码器中的编码数据传输至第二注意力模块，在循环范围内，获得最终的解码数据，输出相对应的文字描述。In the stage of using the decoder to perform decoding, after the LSTM network receives the output encoding result of the encoder, it performs data processing and transmits the output hidden state and the encoded data in the encoder to the second attention module, in the loop scope Within, the final decoded data is obtained and the corresponding text description is output.

在第一方面的一些可实现方式中，文本报告模型对接收到的诊疗图像数据进行数据分析生成电子报告的过程中，包括以下步骤：首先，读取待分析的纸质数据，获得初始数据；然后，构建语句转化模型，并利用文字转化模型对初始数据进行检测，获得对应的初始语句集合；最后，构建纠错模型，并用纠错模型对初始语句集结进行纠错处理，获得优化后的电子数据。In some implementable ways of the first aspect, the text report model performs data analysis on the received diagnosis and treatment image data to generate an electronic report, including the following steps: first, read the paper data to be analyzed to obtain initial data; Then, build a sentence conversion model, and use the text conversion model to detect the initial data to obtain the corresponding initial sentence set; finally, build an error correction model, and use the error correction model to correct the initial sentence set to obtain the optimized electronic data.

第二方面，提出一种电子报告自动生成的系统，用于实现一种电子报告自动生成的方法，该系统包括：数据采集模块、数据分析模块、报告生成模块和数据输出模块。In the second aspect, a system for automatically generating electronic reports is proposed to implement a method for automatically generating electronic reports. The system includes: a data collection module, a data analysis module, a report generation module and a data output module.

其中，数据采集模块用于采集就诊用户的检测数据，生成诊疗图像数据；数据分析模块用于对接收到的诊疗图像数据进行数据分析；报告生成模块用于根据数据分析模块的数据分析结果生成电子报告；数据输出模块用于输出报告生成模块生成的电子报告。Among them, the data collection module is used to collect the detection data of medical users and generate diagnosis and treatment image data; the data analysis module is used to perform data analysis on the received diagnosis and treatment image data; the report generation module is used to generate electronic reports based on the data analysis results of the data analysis module. Report; the data output module is used to output electronic reports generated by the report generation module.

第三方面，提出一种电子报告自动生成的设备，该设备包括：处理器以及存储有计算机程序指令的存储器。其中，处理器读取并执行计算机程序指令时，以实现电子报告自动生成方法。In the third aspect, a device for automatically generating an electronic report is proposed, which device includes: a processor and a memory storing computer program instructions. Wherein, when the processor reads and executes the computer program instructions, the electronic report automatic generation method is implemented.

第四方面，提出一种计算机可读存储介质，该计算机可读存储介质上存储有计算机程序指令，当计算机程序指令被处理器执行时，以实现电子报告自动生成方法。In a fourth aspect, a computer-readable storage medium is proposed. Computer program instructions are stored on the computer-readable storage medium. When the computer program instructions are executed by a processor, a method for automatically generating an electronic report is implemented.

有益效果：本发明提出了一种电子报告自动生成的方法及系统，通过自动生成电子报告的方式替换原有的人工录入，有效缩减报告转换时间，提高报告生成效率；同时本发明针对模型采用奖励权重的方式进行训练优化，有效缩减计算量的同时，提高电子报告的准确率，从而优化电子报告的质量。Beneficial effects: The present invention proposes a method and system for automatically generating electronic reports, which replaces the original manual entry by automatically generating electronic reports, effectively reducing report conversion time and improving report generation efficiency; at the same time, the present invention uses rewards for models Training and optimization are carried out in a weighted manner, which effectively reduces the amount of calculation and improves the accuracy of electronic reports, thus optimizing the quality of electronic reports.

附图说明Description of drawings

图1为本发明的数据处理流程图。Figure 1 is a data processing flow chart of the present invention.

实施方式Implementation

在下文的描述中，给出了大量具体的细节以便提供对本发明更为彻底的理解。然而，对于本领域技术人员而言显而易见的是，本发明可以无需一个或多个这些细节而得以实施。在其他的例子中，为了避免与本发明发生混淆，对于本领域公知的一些技术特征未进行描述。In the following description, numerous specific details are given in order to provide a more thorough understanding of the invention. However, it will be apparent to one skilled in the art that the present invention may be practiced without one or more of these details. In other examples, some technical features that are well known in the art are not described in order to avoid confusion with the present invention.

实施例Example

在一个实施例中，随着科学技术的发展，医疗诊治过程中也借助了大量的智能化检测设备，基于电子化办公趋势的推进，针对检测设备的检查结果生成电子报告的过程常常依赖医护人员键盘键入的方式，形成段落式的检测报告。为了提高电子报告的生成效率，根据检测结果自动生成高质量的电子报告，本实施例提出一种电子报告自动生成的方法，如图1所示，该方法包括以下步骤：In one embodiment, with the development of science and technology, a large number of intelligent testing equipment are also used in the medical diagnosis and treatment process. Based on the advancement of electronic office trends, the process of generating electronic reports for the inspection results of the testing equipment often relies on medical staff. Keyboard typing is used to form a paragraph-based inspection report. In order to improve the efficiency of electronic report generation and automatically generate high-quality electronic reports based on detection results, this embodiment proposes a method for automatically generating electronic reports, as shown in Figure 1. The method includes the following steps:

步骤3、报告生成模型对接收到的诊疗图像数据进行数据分析生成电子报告；Step 3. The report generation model performs data analysis on the received diagnosis and treatment image data to generate an electronic report;

步骤4、输出电子报告。Step 4. Output the electronic report.

其中，报告生成模型包括：图像报告模型和文本报告模型，图像报告模型包括：编码器、解码器和强化模块，编码器首先通过卷积神经网络对诊疗图像数据进行编码，获得相应的特征，并传输至解码器；随后，解码器采用循环神经网络对接收到的特征进行解码，生成相对应的文字描述，并传输至强化模块；最后强化模块进行强化学习，获得最终的电子报告。Among them, the report generation model includes: image report model and text report model. The image report model includes: encoder, decoder and enhancement module. The encoder first encodes the diagnosis and treatment image data through the convolutional neural network, obtains the corresponding features, and Transmitted to the decoder; then, the decoder uses a recurrent neural network to decode the received features, generates corresponding text descriptions, and transmits them to the reinforcement module; finally, the reinforcement module performs reinforcement learning to obtain the final electronic report.

可选的，编码器中包括：残差网络、特征连接模块、第一注意力模块；解码器中包括：LSTM网络（长短时记忆神经网络）、第二注意力模块、GLU函数模块、线性模块、softmax函数模块。其中，执行数据处理的过程中，残差网络在接收到诊疗图像数据后执行特征提取和编码，获得视觉特征数据，并传输至特征连接模块；随后，特征连接模块针对接收到的视觉特征数据进行连接，获得连接特征并将获得的连接特征传输至注意力模块，进行特征之间的关系挖掘，建立特征之间的联系。LSTM网络接收编码器的输出特征数据，在经过分析后结合解码器输出的特征一同传输至第二注意力模块；接着，第二注意力模块在经过数据处理后按序传输至GLU函数模块、线性模块和softmax函数模块，获得预测的字词。Optional, the encoder includes: residual network, feature connection module, first attention module; the decoder includes: LSTM network (long short-term memory neural network), second attention module, GLU function module, linear module , softmax function module. Among them, during the process of data processing, the residual network performs feature extraction and encoding after receiving the diagnosis and treatment image data, obtains the visual feature data, and transmits it to the feature connection module; subsequently, the feature connection module performs a function on the received visual feature data. Connection, obtain connection features and transfer the obtained connection features to the attention module to mine the relationship between features and establish the connection between features. The LSTM network receives the output feature data of the encoder, and after analysis, combines it with the features output by the decoder and transmits it to the second attention module; then, the second attention module transmits it to the GLU function module and linear module in sequence after data processing. module and softmax function module to obtain predicted words.

优选的，图像报告模型对诊疗图像数据进行处理分析生成电子报告的过程中，首先利用编码器对不同角度的诊疗图像数据进行编码，并对编码结果进行拼接，获得拼接图像数据；其次，对拼接图像数据进行视觉特征提取；再次，将提取到的视觉特征输入解码器进行解码，获得用于生成电子报告的预测词；从次，强化模块根据生成电子报告的预测词，生成电子报告；最后，对生成的电子报告进行强化学习，获得优化后的电子报告。Preferably, in the process of processing and analyzing the diagnosis and treatment image data to generate an electronic report by the image report model, the encoder is first used to encode the diagnosis and treatment image data from different angles, and the coding results are spliced to obtain the spliced image data; secondly, the splicing is performed Visual feature extraction is performed on the image data; again, the extracted visual features are input into the decoder for decoding to obtain the predicted words used to generate the electronic report; next, the enhancement module generates the electronic report based on the predicted words generated for the electronic report; finally, Perform reinforcement learning on the generated electronic reports to obtain optimized electronic reports.

可选的，执行强化学习的过程中采用的优化目标梯度表达式为：Optional, the optimization target gradient expression used in the process of performing reinforcement learning is:

式中，表示蒙特卡罗样本；/>表示报告生成模型；/>表示当前的奖励；/>表示推理时的奖励。In the formula, Represents a Monte Carlo sample;/> Indicates the report generation model;/> Indicates the current reward;/> Represents rewards during reasoning.

可选的，为了解决基于强化学习方法中优化目标不全面的问题，本实施例进一步提出一种基于混合奖励的强化学习算法，利用多个评估指标的加权混合作为强化学习的奖励，以实现对模型的全面优化。优选的，通过将不可微的自然语言评估指标作为强化学习的奖励，可以有效地直接对评估指标进行优化，从而同时进行词级别和段落级别的优化；同时本文还提供了一种线性计算量的最优奖励权重搜索方法，以减少计算量。Optionally, in order to solve the problem of incomplete optimization goals in the reinforcement learning method, this embodiment further proposes a reinforcement learning algorithm based on mixed rewards, using a weighted mixture of multiple evaluation indicators as a reward for reinforcement learning to achieve Comprehensive optimization of the model. Preferably, by using non-differentiable natural language evaluation indicators as rewards for reinforcement learning, the evaluation indicators can be effectively and directly optimized, thereby optimizing both the word level and the paragraph level; at the same time, this article also provides a linear calculation method Optimal reward weight search method to reduce the amount of calculation.

具体的，通过加权的混合奖励获得最优权重的过程包括以下步骤：Specifically, the process of obtaining optimal weights through weighted mixed rewards includes the following steps:

优选实施例中，将预先选定的自然语言处理评估指标作为强化学习的奖励，随后通过对每个指标施加同样的权重1并逐个加1，我们可以找到最具影响力的指标；接着，我们以为搜索空间对最具影响力的指标逐次加1以找到该指标的最优权重；然后，选定当前的权重，并重复查找步骤直到所有具有影响力的指标均被找到，从而得到一个最优组合。优化后的目标梯度表达式为：In the preferred embodiment, pre-selected natural language processing evaluation indicators are used as rewards for reinforcement learning, and then by applying the same weight of 1 to each indicator and adding 1 one by one, we can find the most influential indicator; then, we Add 1 to the most influential indicator in the search space successively to find the optimal weight of the indicator; then, select the current weight and repeat the search steps until all influential indicators are found, thereby obtaining an optimal combination. The optimized target gradient expression is:

式中，表示选中的指标数量；/>表示当前选中指标的权重；/>表示根据/>进行蒙特卡罗采样的词；/>表示报告生成模型；/>表示当前的奖励；/>表示推理时的奖励。In the formula, Indicates the number of selected indicators;/> Indicates the weight of the currently selected indicator;/> Indicates the basis/> Words for Monte Carlo sampling;/> Indicates the report generation model;/> Indicates the current reward;/> Represents rewards during reasoning.

本实施例通过构建报告自动生成模型对智能检测设备的检测数据进行识别分析，有效提高了报告生成效率，同时本实施例针对模型采用奖励权重的方式进行训练优化，有效缩减计算量的同时，提高电子报告的准确率，从而优化电子报告的质量。This embodiment identifies and analyzes the detection data of the intelligent detection equipment by building an automatic report generation model, which effectively improves the efficiency of report generation. At the same time, this embodiment adopts a reward weighting method for training and optimization of the model, which effectively reduces the amount of calculation and improves the Accuracy of electronic reports, thereby optimizing the quality of electronic reports.

实施例Example

在实施例一基础上的进一步实施例中，电子报告的生成除了根据诊疗过程中产生的数据生成诊断报告，还包括电子病历的填写，为了提高电子病历的生成效率，通过关系推测，生成疾病与诊疗方案之间的联系，从而获得相关的疾病与诊疗方案，随后通过提示框的形式辅助医护人员对电子病历的填写，提高电子病历的生成效率。In a further embodiment based on the first embodiment, the generation of electronic reports not only generates diagnostic reports based on data generated during diagnosis and treatment, but also includes filling in electronic medical records. In order to improve the efficiency of electronic medical record generation, the generation of diseases and The connection between diagnosis and treatment plans is obtained to obtain relevant diseases and diagnosis and treatment plans, and then the prompt box is used to assist medical staff in filling in electronic medical records, improving the efficiency of electronic medical record generation.

可选的，根据检测数据构建相对应的知识图谱，从而根据知识图谱推测出相关的疾病与诊疗方案，随后通过提示框的形式辅助医护人员对电子病历的填写，提高电子病历的生成效率。Optionally, a corresponding knowledge graph is constructed based on the detection data, so that relevant diseases and diagnosis and treatment plans can be inferred based on the knowledge graph, and then prompt boxes are used to assist medical staff in filling out electronic medical records to improve the efficiency of electronic medical record generation.

具体的，知识图谱中每一个节点都是一个实体，实体和实体之间的边表示这两个实体存在的语义之间的关系，通常知识图谱中由（实体1，关系，实体2）和（实体、属性、属性值）三元组构成，（实体1，关系，实体2）由两个实体以及它们之间的边组成。知识图谱使用图（Graph）的方式进行存储，顶点之间存在不规则的关联关系，图中关系的数量不确定，关系的距离不确定。与像（Image）相比，图难以通过规整的矩阵来表示，知识图谱中每个节点有不同数量的边，每个边有不同类型的节点。Specifically, each node in the knowledge graph is an entity, and the edge between the entity and the entity represents the semantic relationship between the two entities. Usually, the knowledge graph consists of (entity 1, relationship, entity 2) and ( Entity, attribute, attribute value) triplet, (entity 1, relationship, entity 2) consists of two entities and the edge between them. Knowledge graphs are stored in the form of graphs. There are irregular relationships between vertices. The number of relationships in the graph is uncertain, and the distance of the relationships is uncertain. Compared with images, graphs are difficult to represent by regular matrices. Each node in the knowledge graph has a different number of edges, and each edge has different types of nodes.

可选的，采用构建知识库的方式，实现文本的信息抽取与预测，从而获得一个包含分多种关系判断依据的知识库。Optionally, use the method of building a knowledge base to realize information extraction and prediction of text, thereby obtaining a knowledge base that contains multiple relationship judgment bases.

具体的，利用知识库实现实体之间关系预测的过程包括以下步骤：Specifically, the process of using knowledge base to predict relationships between entities includes the following steps:

步骤1、构建三阶段式分析结构；其中，第一阶段为全局联想召回阶段；第二阶段为假设形成与表示阶段；第三阶段为决策预测阶段；第一阶段共有两个分支，第一阶段第一分支为条件概率分支，第一阶段第二分支为Contextualized GCN分支。Step 1. Construct a three-stage analysis structure; among them, the first stage is the global association recall stage; the second stage is the hypothesis formation and representation stage; the third stage is the decision prediction stage; the first stage has two branches, the first stage The first branch is the conditional probability branch, and the second branch in the first stage is the Contextualized GCN branch.

步骤2、基于三阶段式分析结构，获取文本库与语料库中的节点信息，通过处理分析得到最优的节点序列；所述最优的节点序列为按照关系紧密程度位于降序排列的前K个结点，表示关系最为紧密的K个结点。Step 2. Based on the three-stage analysis structure, obtain the node information in the text library and corpus, and obtain the optimal node sequence through processing and analysis; the optimal node sequence is the top K nodes arranged in descending order according to the closeness of the relationship. Points represent the K nodes with the closest relationship.

步骤3、基于最优的节点序列，设定值域，结合封闭世界假设与开放世界假设预测实体之间的关系。Step 3. Based on the optimal node sequence, set the value range, and predict the relationship between entities by combining the closed world hypothesis and the open world hypothesis.

步骤4、构建评分函数对实体之间的预测关系进行评估，从而获取预测实体之间关系的置信度。Step 4: Construct a scoring function to evaluate the predicted relationship between entities, thereby obtaining the confidence in predicting the relationship between entities.

步骤5、构建基于知识-意识的注意力机制，并接收经过编码器编码后的数据以及预测实体之间关系的置信度。Step 5: Build an attention mechanism based on knowledge-awareness, and receive the data encoded by the encoder and the confidence of the relationship between predicted entities.

步骤6、通过基于知识-意识的注意力机制的分析，输出最终的预测结果，实现完整的实体对关系的预测以及它们的置信度打分。Step 6. Through the analysis of the attention mechanism based on knowledge-awareness, the final prediction result is output, and the prediction of the complete entity-to-relationship and their confidence score are realized.

可选的，根据实际应用需求，选取置信度排名靠前的N位为推荐数据，其中N的取值为正整数。Optionally, according to actual application requirements, select the N bits with the highest confidence levels as recommended data, where the value of N is a positive integer.

在进一步的实施例中，在第二阶段假设形成与表示阶段汇集了所有的假设表示之后进入第三阶段即决策预测阶段。在此阶段中，在假设表示的基础上，采用一个全新的评分函数来评估它的准确性。在整个模型得到充分的训练之后，对三元组进行排序，并选择topK个三元组作为定理输入到基于知识-意识的注意力机制中。与此同时，从最开始的文本库与语料库中的文本经过一个实体编码器进行编码后也输入至基于知识-意识的注意力机制中，最终输出top K个三元组与他们作为定理的置信度。In a further embodiment, after all hypothesis representations are gathered in the second stage hypothesis formation and representation stage, the third stage, that is, the decision prediction stage is entered. In this stage, based on the hypothesis representation, a new scoring function is employed to evaluate its accuracy. After the entire model is fully trained, the triples are sorted, and the topK triples are selected as theorem input to the attention mechanism based on knowledge-awareness. At the same time, the text from the initial text library and corpus is encoded by an entity encoder and then input into the attention mechanism based on knowledge-awareness, and finally the top K triples and their confidence as the theorem are output. Spend.

可选的，在第三阶段中，首先将第二阶段CWA封闭世界假设中的显示假设表示与OWA开放世界假设中的隐式假设表示进行汇集；其次，根据给定的实体对实例口袋完成实例嵌入；再次，根据嵌入的类型进行分组；从次，根据分组结果将其送入基于知识-意识的注意力机制，获取文本关系；最后，通过计算关系得分，完成实体对的关系预测以及置信度打分。Optionally, in the third stage, firstly, the explicit hypothesis representation in the CWA closed-world hypothesis in the second stage is pooled with the implicit hypothesis representation in the OWA open-world hypothesis; secondly, the instance pocket is completed according to the given entity. Embedding; again, group according to the type of embedding; secondly, according to the grouping results, send them to the attention mechanism based on knowledge-awareness to obtain the text relationship; finally, by calculating the relationship score, complete the relationship prediction and confidence of the entity pair Score.

其中，汇集假设表示的表达式为：Among them, the expression expressed by the pooling hypothesis is:

其中，表示假设的汇集；/>表示显示假设表示；/>表示隐式假设表示。in, Represents a collection of hypotheses;/> Represents display hypothesis representation;/> Represents an implicit assumption representation.

在进一步的实施例中，给定实体对(h,t)及其实例口袋，使用句编码器实现实例嵌入/>。根据词嵌入的类型对其进行分组。然后，对它们应用基于知识-意识的注意力机制，得到文本关系表示/>。本实施例提出如下表达式来计算每个实例特征向量/>和/>之间包含注意力权重（相似性或关联性）的实体表达方式。In a further embodiment, given an entity pair (h, t) and its instance pocket , use sentence encoder to implement instance embedding/> . Group word embeddings according to their type. Then, apply the attention mechanism based on knowledge-awareness to them to obtain text relationship representation/> . This embodiment proposes the following expression to calculate the feature vector of each instance/> and/> Entity expressions that contain attention weights (similarity or relevance) between them.

式中，表示/>和/>的垂直连接；/>表示权重矩阵；/>表示偏置。对目标实体对进行注意力运算，获得相应的文本关系表示，即：In the formula, Express/> and/> vertical connection;/> Represents the weight matrix;/> Represents bias. Perform attention operations on the target entity pairs to obtain the corresponding text relationship representation, namely:

式中，表示一个权重矩阵；/>表示一个关于查询实体之间满足的关系函数；表示注意力计算；/>表示对输入的文本关系表征和预测关系r的匹配程度进行评分。文本关系表征的计算方法是：In the formula, Represents a weight matrix;/> Represents a relationship function that satisfies the query entities; Indicates attention calculation;/> Indicates scoring the matching degree between the input text relationship representation and the predicted relationship r. The calculation method of textual relationship representation is:

式中，表示对输入的文本关系表征和预测关系r的匹配程度进行评分。In the formula, Indicates scoring the matching degree between the input text relationship representation and the predicted relationship r.

将不同层的文本关系表征串联起来，作为最终的表示，即：The text relationship representations at different levels are concatenated together as the final representation, that is:

其中N为层数，表征最后将被用于计算条件概率/>，其表达式为：where N is the number of layers, representing Finally, it will be used to calculate the conditional probability/> , its expression is:

式中，表示；/>表示所有关系的得分，定义为：In the formula, means;/> Representing the score of all relations, defined as:

式中，M表示用于计算关系分数的表示矩阵。其中，注意力权重是从第二阶段CWA和OWA的输出中获得的，它与数据驱动的学习相比可以提供更多参数上的信息。至此获取到完整的实体对关系的预测以及它们的置信度打分。In the formula, M represents the representation matrix used to calculate the relationship score. Among them, the attention weight is obtained from the output of the second stage CWA and OWA, which can provide more parameter information compared with data-driven learning. At this point, complete entity-to-relationship predictions and their confidence scores are obtained.

实施例Example

在实施例一基础上的进一步实施例中，现代化技术的发展迅猛使得纸质数据逐渐被电子数据所取代，但是技术的发展是快速的，而人对新事物的接受却是随着时间的变化而消退的，在医疗诊治的领域，有经验的技术专家在就诊过程中，为了提高就诊效率，还是更倾向于通过手写的方式形成纸质版病历的方式。因此，为了贴合每一位医护人员的就诊习惯，以及对就诊数据的进行电子化的同步保存，在执行实施例一自动生成电子报告之前，还包括纸质化报告到电子化报告之间的转变。In a further embodiment based on the first embodiment, the rapid development of modern technology makes paper data gradually replaced by electronic data. However, the development of technology is rapid, and people's acceptance of new things changes with time. However, in the field of medical diagnosis and treatment, experienced technical experts still prefer to form paper medical records by handwriting in order to improve the efficiency of medical treatment. Therefore, in order to adapt to the medical treatment habits of each medical staff and to electronically and synchronously save the medical treatment data, before executing Embodiment 1 to automatically generate an electronic report, there is also a step between the paper report and the electronic report. change.

具体的，在老专家手写完成就诊报告的填写后，由资料汇总人员统一进行电子数据的转化留存。现有的技术中常常是通过资料汇总人根据纸质电子报告手动敲击键盘键入的方式，完成纸质报告到电子报告的转化。为了提高转化效率，减少资料汇总人员的工作冗余，针对待转化的纸质报告，首先通过图像采集设备拍摄相对应的图像数据，并上传至文字转化模型中；随后，文字转化模型提取图像数据中的文字部分，并进行转化生成对应的电子报告；最后，资料汇总人员对获得的电子报告进行核查，在没有出现差错的情况下进行数据留存。Specifically, after the veteran expert completes the medical treatment report in handwriting, the data aggregator will uniformly convert and retain the electronic data. In the existing technology, the data aggregator usually completes the conversion from paper reports to electronic reports by manually typing on the keyboard according to the paper electronic reports. In order to improve the conversion efficiency and reduce the work redundancy of data aggregation personnel, for the paper reports to be converted, the corresponding image data is first captured through the image acquisition device and uploaded to the text conversion model; then, the text conversion model extracts the image data The text part in the report is converted into a corresponding electronic report; finally, the data aggregator checks the obtained electronic report and retains the data if there are no errors.

可选的，在执行纸质数据到电子数据转化的过程中，包括以下步骤：Optional, the process of converting paper data to electronic data includes the following steps:

步骤1、读取待分析的纸质数据，获得初始数据；Step 1. Read the paper data to be analyzed and obtain initial data;

具体的，在读取到待分析的纸质数据后，首先对其进行OCR识别，获得第一数据；随后，对第一数据执行预处理操作，获得初始数据。可选的，预处理过程中包括通过半角到全角模式的转换，执行文本数据中的非法字符剔除操作，从而克服换行、表格等导致的分句问题。Specifically, after reading the paper data to be analyzed, OCR recognition is first performed on it to obtain the first data; then, a preprocessing operation is performed on the first data to obtain the initial data. Optionally, the preprocessing process includes converting from half-width to full-width mode and removing illegal characters from the text data to overcome segmentation problems caused by line breaks, tables, etc.

步骤2、构建语句转化模型，并利用文字转化模型对初始数据进行检测，获得对应的初始语句集合；Step 2: Construct a sentence conversion model, and use the text conversion model to detect the initial data to obtain the corresponding initial sentence set;

具体的，将初始数据传输至构建的语句转化模型中，结合预设的标识符，通过对初始数据执行边界检测获得第二数据；随后，将第二数据按照标段尾识符划分成不同的语句，获得对应的初始语句集合。Specifically, the initial data is transferred to the built sentence conversion model, combined with the preset identifier, and the second data is obtained by performing boundary detection on the initial data; then, the second data is divided into different segments according to the end identifier of the segment. statement to obtain the corresponding initial statement set.

步骤3、构建纠错模型，并用纠错模型对初始语句集结进行纠错处理，获得优化后的电子数据。Step 3: Construct an error correction model, and use the error correction model to correct the initial statement collection to obtain optimized electronic data.

具体的，按序读取初始语句集合中语句并利用纠错模型执行逐句纠错，首先，针对读取到的语句进行分词处理，获得按出现顺序排列的分词结果；随后，根据分词结果按序构建第一候选语句，并计算第一候选语句的困惑度值；接着，设定困惑度阈值，并将第一候选语句的困惑度值与困惑度阈值进行比对，当第一候选语句的困惑度值小于困惑度阈值时，表明当前的候选语句不存在纠错需求，将候选语句加入数据语句集合中；最后，输出数据语句集合，获得最终的电子化文档。Specifically, the sentences in the initial sentence set are read in order and the error correction model is used to perform sentence-by-sentence error correction. First, word segmentation is performed on the read sentences to obtain word segmentation results arranged in the order of appearance; then, according to the word segmentation results, the word segmentation results are obtained. Construct the first candidate statement sequentially, and calculate the perplexity value of the first candidate statement; then, set the perplexity threshold, and compare the perplexity value of the first candidate statement with the perplexity threshold. When the first candidate statement When the perplexity value is less than the perplexity threshold, it indicates that the current candidate statement does not require error correction, and the candidate statement is added to the data statement set; finally, the data statement set is output to obtain the final electronic document.

其中，当第一候选语句的困惑度值不小于困惑度阈值时，表明当前的候选语句存在纠错的需求，因此通过预先构建的词语替换映射关系对当前候选语句中的分进行替换，获得第二候选语句；随后，继续对第二候选语句执行困惑度比对。当困惑度比值满足需求时，将候选语句加入数据语句集合中；反之，则继续通过预先构建的词语替换映射关系对当前分析的候选语句执行纠错处理。Among them, when the perplexity value of the first candidate sentence is not less than the perplexity threshold, it indicates that the current candidate sentence needs error correction. Therefore, the points in the current candidate sentence are replaced through the pre-constructed word replacement mapping relationship to obtain the first candidate sentence. Two candidate statements; then, continue to perform perplexity comparison on the second candidate statement. When the perplexity ratio meets the requirements, the candidate statement is added to the data statement set; otherwise, error correction processing is continued on the currently analyzed candidate statement through the pre-built word replacement mapping relationship.

本实施例针对实际报告生成过程中的纸质数据，剔除纸质数据到电子化数据的转化过程，从而更贴合实际应用的需求；同时，数据转化过程中针对转化后的语句进一步执行的纠错处理，有效提高了数据转化结果中的准确度。This embodiment focuses on the paper data in the actual report generation process and eliminates the conversion process from paper data to electronic data, so as to better meet the needs of practical applications; at the same time, during the data conversion process, the correction of further execution of the converted statements is Error handling effectively improves the accuracy of data conversion results.

实施例Example

在一个实施例中，提出一种电子报告自动生成的系统，用于实现实施例一种提出的一种电子报告自动生成的方法，该系统包括：数据采集模块、数据分析模块、报告生成模块和数据输出模块。In one embodiment, a system for automatically generating electronic reports is proposed to implement a method for automatically generating electronic reports proposed in the first embodiment. The system includes: a data collection module, a data analysis module, a report generation module and Data output module.

具体的，数据采集模块包括不用的医疗终端检测设备，用于采集就诊用户的检测数据，生成诊疗图像数据。可选的，包括但不限于：胸片、心率图、超声图、计算机断层扫描图、核磁共振图、正电子发射断层扫描图。Specifically, the data collection module includes various medical terminal detection equipment, which is used to collect detection data of medical users and generate diagnosis and treatment image data. Optional, including but not limited to: chest X-ray, heart rate chart, ultrasound, computed tomography, magnetic resonance imaging, positron emission tomography.

数据分析模块用于对接收到的诊疗图像数据进行数据分析，包括：模型构建模块、数据接收模块和检测模块。其中，模型构建模块用于构建报告生成模型；数据接收模块用于接收数据采集模块生成的诊疗图像数据；检测模块用于利用报告生成模型对接收到的诊疗图像数据进行数据分析。The data analysis module is used to perform data analysis on the received diagnosis and treatment image data, including: model building module, data receiving module and detection module. Among them, the model building module is used to build a report generation model; the data receiving module is used to receive the diagnosis and treatment image data generated by the data acquisition module; and the detection module is used to use the report generation model to perform data analysis on the received diagnosis and treatment image data.

报告生成模块用于根据数据分析模块的数据分析结果生成电子报告。The report generation module is used to generate electronic reports based on the data analysis results of the data analysis module.

数据输出模块用于输出报告生成模块生成的电子报告。The data output module is used to output electronic reports generated by the report generation module.

在进一步的实施例中，为了提高报告生成模型的性能，电子报告自动生成的系统中还包括性能优化模块，该模块用于对报告生成模型进行性能优化。In a further embodiment, in order to improve the performance of the report generation model, the system for automatically generating electronic reports also includes a performance optimization module, which is used to optimize the performance of the report generation model.

实施例Example

在一个实施例中，提出一种电子报告自动生成的设备，该设备包括：处理器以及存储有计算机程序指令的存储器；其中，处理器读取并执行计算机程序指令，以实现实施例一中提出的电子报告自动生成方法。In one embodiment, a device for automatically generating an electronic report is proposed. The device includes: a processor and a memory storing computer program instructions; wherein, the processor reads and executes the computer program instructions to implement what is proposed in Embodiment 1. Automatically generate electronic reports.

具体的，电子设备包括各种形式状态的计算机，可选的，包括但不限于：膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。另外，电子设备也能够以各种形式的移动装置呈现，例如蜂窝电话、智能手机、可穿戴式设备等计算机装置。Specifically, electronic devices include computers in various forms, optionally including but not limited to: laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computer. In addition, electronic devices can also be presented in various forms of mobile devices, such as cellular phones, smart phones, wearable devices and other computer devices.

可选的，电子报告自动生成的设备中还包括用于实现不同功能的部件，且部件之间均通过总线连接至I/O接口。其中，功能部件包括但不限于：输入单元、输出单元、存储单元和通信单元，输入单元例如鼠标和键盘；输出单元例如不同类型的显示器、扬声器；存储单元例如磁盘、光盘；通信单元例如网卡、调制解调器、无线通信收发机。Optionally, the device for automatically generating electronic reports also includes components used to implement different functions, and the components are all connected to the I/O interface through a bus. Among them, functional components include but are not limited to: input unit, output unit, storage unit and communication unit. Input units such as mouse and keyboard; output units such as different types of displays and speakers; storage units such as magnetic disks and optical disks; communication units such as network cards, Modems, wireless communication transceivers.

其中，通信单元用于实现电子设备与其他目标对象的数据传输，可选的，通过诸如因特网的计算机网络和/或各种电信网络与其他设备交换信息/数据。Wherein, the communication unit is used to realize data transmission between the electronic device and other target objects, optionally, to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunications networks.

实施例Example

在一个实施例中，提出一种计算机可读存储介质，该计算机可读存储介质上存储有计算机程序指令；其中，计算机程序指令被处理器执行时，实现实施例一中提出的电子报告自动生成方法。In one embodiment, a computer-readable storage medium is proposed. Computer program instructions are stored on the computer-readable storage medium. When the computer program instructions are executed by a processor, the automatic generation of the electronic report proposed in Embodiment 1 is realized. method.

具体的，可读存储介质上存储的计算机程序指令根据实际需求采用一种或多种变成语言实现。可选的，计算机程序指令可以被存储器或通信单元加载至电子报告自动生成的设备中，并在触发时执行电子报告的自动生成方法。Specifically, the computer program instructions stored on the readable storage medium are implemented in one or more programming languages according to actual needs. Optionally, the computer program instructions can be loaded into the device for automatically generating an electronic report by a memory or a communication unit, and execute the method for automatically generating an electronic report when triggered.

可选的，当计算机可读存储介质是具有具体形态的介质时，包含：机器可读信号介质和机器可读存储介质，机器可读信号介质包括但不限于电子子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备，或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。Optionally, when the computer-readable storage medium is a medium with a specific form, it includes: machine-readable signal media and machine-readable storage media. Machine-readable signal media include but are not limited to electronic, magnetic, and optical media. , electromagnetic, infrared, or semiconductor systems, devices or equipment, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, laptop disks, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.

如上所述，尽管参照特定的优选实施例已经表示和表述了本发明，但其不得解释为对本发明自身的限制。在不脱离所附权利要求定义的本发明的精神和范围前提下，可对其在形式上和细节上做出各种变化。As stated above, although the present invention has been shown and described with reference to specific preferred embodiments, this is not to be construed as limiting the invention itself. Various changes may be made in form and details without departing from the spirit and scope of the invention as defined by the appended claims.