Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The flow diagrams depicted in the figures are merely illustrative and not necessarily all of the elements and operations/steps are included or performed in the order described. For example, some operations/steps may be further divided, combined, or partially combined, so that the order of actual execution may be changed according to actual situations. In addition, although the division of the functional modules is performed in the apparatus schematic, in some cases, the division of the modules may be different from that in the apparatus schematic.
The term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
Some embodiments of the present application are described in detail below with reference to the accompanying drawings. The following embodiments and features of the embodiments may be combined with each other without conflict.
As shown in fig. 1, the method for extracting information based on pictures provided by the embodiment of the application can be applied to an application environment as shown in fig. 1. The application environment includes a terminal device 110 and a server 120, where the terminal device 110 may communicate with the server 120 through a network. Specifically, the server 120 can obtain a target picture to be extracted, and obtain a target service requirement; performing text recognition operation on the target picture to obtain a corresponding target text; further, carrying out feature extraction operation on the target text to obtain a target feature vector corresponding to the target text; and finally, analyzing the target feature vector through the target deep learning model based on the target service requirement to obtain target information corresponding to the target service requirement, and sending the target information to the terminal equipment 110. The server 120 may be an independent server, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), and basic cloud computing services such as big data and artificial intelligence platforms. The terminal device 110 may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, etc. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the present application is not limited herein.
Referring to fig. 2, fig. 2 is a schematic diagram illustrating steps of a method for extracting information based on a picture according to an embodiment of the present application. The picture-based information extraction method can be applied to computer equipment, so that information extraction is realized.
As shown in fig. 2, the picture-based information extraction method includes steps S11 to S14.
Step S11: and obtaining a target picture to be extracted and obtaining a target service requirement.
The target may be a medical picture such as a medical record sheet or an examination sheet uploaded by the patient, or may be a picture in another field, which is not limited in this regard, and the target picture is described as a medical picture.
Furthermore, the target business requirement may be a requirement of performing named entity recognition, relationship extraction, text classification and the like on the medical picture, which may be set according to a requirement of a user, and the application is not limited thereto.
In the embodiment of the application, the medical picture to be extracted with the information and the specific business requirement can be obtained, and the purpose of realizing the corresponding information extraction on the medical picture based on the specific business requirement is realized.
Step S12: and performing text recognition operation on the target picture to obtain a corresponding target text.
The target text is a text corresponding to the medical picture.
Further, the text recognition operation is an operation of recognizing information on the picture as a corresponding text, and the method of the text recognition operation is not limited, for example, the medical picture can be recognized as a corresponding text by an OCR recognition method.
OCR recognition (Optical Character Recognition ) refers to recognition of optical characters by image processing and pattern recognition techniques. OCR recognition is one of the branches of the field of computer vision research and is an important component of computer science. The main indexes for measuring the performance of an OCR system are as follows: rejection rate, false recognition rate, recognition speed, user interface friendliness, product stability, usability, feasibility and the like. According to the application scenes of OCR recognition, it can be roughly classified into recognition of dedicated OCR in a specific scene and recognition of general OCR in various scenes. In the former case, certificate recognition and license plate recognition are typical cases of special OCR. And universal OCR can be used in more and more complex scenes and has better universality. Therefore, the medical picture can be recognized to obtain the corresponding text based on the universal OCR recognition method.
Optionally, performing text recognition operation on the target picture to obtain a corresponding target text, including: preprocessing the target picture to be extracted to obtain a preprocessed target picture; and performing text recognition operation on the preprocessed target picture to obtain a corresponding target text.
Wherein the preprocessing operation comprises at least one of size adjustment, normalization processing and center clipping.
Optionally, after obtaining the corresponding target text, the method further includes: and performing data cleaning operation on the target text to obtain the target text after data cleaning, wherein the data cleaning operation comprises at least one of stop word removal operation, punctuation removal operation and special character removal operation.
Specifically, in order to improve the recognition effect of the corresponding text obtained by recognizing the medical picture, preprocessing operations such as size adjustment, normalization processing, center cutting and the like can be performed on the medical image, so that the key characteristics of the medical picture can be amplified, and the more accurate corresponding text can be obtained.
Furthermore, at least one operation of removing stop words, punctuation marks and special characters can be performed on the identified text, and the text with the cleaned data can be obtained, so that invalid data is prevented from affecting the feature extraction of the subsequent text.
In the embodiment of the application, the medical picture can be preprocessed, and then the medical picture is converted into a relatively accurate corresponding text by an OCR (optical character recognition) method. In addition, the data cleaning operation can be performed on the text obtained through recognition, so that invalid data is prevented from affecting the feature extraction of the subsequent text.
Step S13: and carrying out feature extraction operation on the target text to obtain a target feature vector corresponding to the target text.
It should be noted that feature extraction refers to extracting feature vectors from an image or text to represent text information. The method for extracting the characteristics of the target text is not limited, and the method can be used for extracting the characteristics of the target sentences and the target problems through methods such as word frequency-inverse document frequency algorithm, word2vec model, text frequency method, single-hot coding algorithm, mutual information and the like.
Optionally, on the basis of the foregoing embodiment, performing feature extraction operation on the target text to obtain a target feature vector corresponding to the target text, and further includes: and performing feature extraction operation on the data-cleaned target text based on one of a word frequency-inverse document frequency algorithm, a word2vec model and a single-hot coding algorithm to obtain a target feature vector.
The Term Frequency-inverse document Frequency (TF-IDF) algorithm is a commonly used weighting algorithm for information retrieval and text mining, and can be used to evaluate the importance of a word to a document in a document set or corpus. The importance of a word increases proportionally with the number of times it appears in the file, but at the same time decreases inversely with the frequency with which it appears in the corpus. If a word is rare, but appears multiple times throughout a document, the word is likely to reflect the characteristics of the entire document. Therefore, feature extraction operation can be performed on the target text based on the word frequency-inverse document frequency algorithm, and corresponding feature vectors are obtained.
The word2vec model is a model that converts words into vectors, and contains two algorithms, skip-gram and CBOW, respectively. The greatest difference between skip-gram and CBOW is that skip-gram predicts the surrounding words of the center word by the center word, and CBOW predicts the center word by the surrounding words. Therefore, the method and the device can perform feature extraction operation on the target text through the word2vec model to obtain the corresponding feature vector.
The single-heat coding algorithm is an algorithm for calculating the feature vector based on the measurement in the vector space, and can expand the value of the discrete feature to the European space, so that a certain value of the discrete feature corresponds to a certain point of the European space. I.e. to make the calculation of the distance between the features more reasonable. After the discrete feature is subjected to the single-hot coding algorithm, the feature vector of each dimension after coding can be regarded as continuous feature, so that the normalization of the feature vector is realized. Therefore, the method and the device can perform feature extraction operation on the target text through the single-hot encoding algorithm to obtain the corresponding feature vector.
In the embodiment of the application, the feature extraction operation can be carried out on the target text by one algorithm of a word frequency-inverse document frequency algorithm, a word2vec model and a single-hot coding algorithm, so that the target feature vector capable of representing the semantic information of the target text is obtained.
Step S14: and analyzing the target feature vector through a target deep learning model based on the target service demand to obtain target information corresponding to the target service demand.
Specifically, after the target service requirement is obtained, the target feature vector can be input into the target deep learning model based on the specific service requirement, so that target information corresponding to the target service requirement is output and obtained.
Optionally, the target deep learning model includes a Prompt mechanism, the service requirement includes a requirement identified by a named entity, and based on the target service requirement, the target feature vector is analyzed by the target deep learning model to obtain target information corresponding to the target service requirement, including: and carrying out named entity analysis on the target feature vector through a Prompt mechanism based on the requirements of named entity identification to obtain target information.
Note that the Prompt (text Prompt) mechanism is a mechanism that can align the target of a downstream task with a pre-trained target. The Prompt mechanism can realize open domain information extraction, and supports zero sample extraction and less sample extraction; unified naming entity recognition, relation extraction, text classification and other information extraction tasks, and cost reduction and efficiency improvement. Therefore, the task target can be aligned with the target of the model based on the Prompt mechanism, namely, the extraction of the information corresponding to the service requirement is realized based on the Prompt mechanism.
Specifically, when the service requirement is a requirement for named entity identification, named entity analysis can be performed on the target feature vector through a promt mechanism to obtain target information.
The target information may be composed of named entities such as patient name, hospital, department, time, and content of diagnosis, which is not limited in the present application.
Optionally, the service requirement further includes a requirement of relation extraction or a requirement of text classification, and after obtaining the target information, the method further includes: based on the relation extraction requirement, carrying out relation analysis on a plurality of named entities through a Prompt mechanism to obtain a relation analysis result; or based on the requirement of text classification, carrying out text classification on a plurality of named entities through a Prompt mechanism to obtain a text classification result.
The relation result can be the corresponding doctor's hospital, doctor's department, doctor's content, etc. of each doctor's time; the text classification result may be classified based on the visit hospital or the visit time, which is not limited in the present application.
In the embodiment of the application, the information of corresponding service requirements can be obtained by analyzing the target feature vector based on the Prompt mechanism in the target depth model, so that the medical picture is extracted to the corresponding information based on different service requirements, and the accuracy and efficiency of information extraction are improved.
According to the information extraction method, the information extraction device, the computer equipment and the computer readable storage medium based on the pictures, which are disclosed by the embodiment of the application, the medical pictures to be extracted and the target business requirements can be obtained, and the text recognition operation is carried out on the medical pictures to obtain the corresponding medical texts. Furthermore, feature extraction operation can be performed on the medical text to obtain a corresponding target feature vector. Thus, the target feature vector is analyzed through the target deep learning model based on the target service requirement, and target information corresponding to the target service requirement is obtained. According to the application, the target feature vector can be directly analyzed through the target deep learning model, so that the corresponding target information is obtained, the labor consumption is saved, and the information extraction efficiency is higher. In addition, the application can realize information extraction only through the target depth model without repeatedly or repeatedly constructing other models, thereby being beneficial to information sharing and reducing development cost and machine cost.
Referring to fig. 3, fig. 3 is a flowchart illustrating a process of obtaining a target deep learning model according to an embodiment of the application. As shown in fig. 3, the target deep learning model may be obtained through steps S21 to S23.
Step S21: a training dataset and an initial deep learning model are acquired.
Wherein the training dataset comprises a number of pictures, the initial deep learning model comprises an ERNIE-Layout model, and the ERNIE-Layout model comprises a promt mechanism.
The pictures in the training data set may be medical pictures, which is not limited in the present application.
The ERNIE-Layout model is based on a text multi-language ERNIE, and is a model for cross-modal joint modeling by fusing information such as text, images and Layout. The ERNIE-Layout model can also introduce Layout knowledge enhancement to realize self-supervision pre-training tasks such as reading sequence prediction, fine-granularity image-text matching and the like, and maximally supports 96 languages, and the fields which are good at and applied include industries such as finance, insurance, energy, logistics, medical treatment and the like. Therefore, the embodiment of the application can realize the information extraction of the medical picture based on the ERNIE-Layout model.
Step S22: and (5) performing sequence labeling on the training data set through a Prompt mechanism to obtain a sequence labeling result.
Step S23: and training the ERNIE-Layout model through the training data set and the sequence labeling result to obtain a target deep learning model.
The sequence labeling result comprises information corresponding to each picture.
Specifically, a plurality of medical pictures are subjected to sequence labeling through a Prompt mechanism, and a sequence labeling result comprising information of each medical picture is obtained. And then, taking the sequence labeling result as the label of the group of input data, inputting each group of training data sets carrying the label into the initial deep learning model for supervised learning, and ending training when the training ending condition is met, for example, the training times reach a frequency threshold or the output precision of the model reaches a precision threshold, so as to obtain the target deep learning model after training is completed.
In the embodiment of the application, the training data set and the sequence labeling result can be input into the initial deep learning model for supervised learning, so that the target deep learning model is obtained through training. Therefore, the information corresponding to the medical picture can be output based on the target deep learning model.
With continued reference to fig. 4, fig. 4 is a schematic flow chart of obtaining an iterated target deep learning model according to an embodiment of the present application. As shown in fig. 4, the iterative target deep learning model may be obtained through steps S24 to S26.
Step S24: and carrying out iterative training on the target deep learning model to extract data characteristics, and calculating to obtain a loss function.
Step S25: and carrying out iterative training on the loss function by using a preset method with the aim of reducing the loss function value until the expected threshold value specification is met.
Step S26: and obtaining an iterated target deep learning model based on the iterated loss function.
It can be understood that, in order to train the target deep learning model with higher accuracy, the loss function is continuously reduced by repeatedly and iteratively training the target deep learning model until the loss function meets the expected threshold specification, so that more accurate information corresponding to the medical picture can be obtained based on the iterated target deep learning model.
The preset method and the expected threshold value are not limited, and for example, the preset method may be a gradient descent algorithm, a batch gradient descent algorithm, a random gradient descent algorithm, or the like.
The purpose of the gradient descent algorithm is to find the minimum of the loss function, or to converge to the minimum, in an iterative manner. The gradient descent algorithm geometrically, that is, where the function changes most rapidly, decreases most rapidly along the opposite direction of the vector, so that the function minimum is more easily found. Based on the above, in the embodiment of the application, the gradient descent algorithm mode can be adopted to repeatedly and iteratively train the target deep learning model so as to continuously reduce the loss function, thereby reducing the error of the calculation result.
In the embodiment of the application, the loss function is continuously reduced by adopting the mode of repeatedly and iteratively training the target deep learning model by adopting the gradient descent algorithm so as to obtain the iterated target deep learning model, and further, the information corresponding to the more accurate medical picture can be obtained based on the iterated target deep learning model.
Referring to fig. 5, fig. 5 is a schematic block diagram of an information extraction apparatus according to an embodiment of the present application. The information extraction means may be arranged in a server for performing the aforementioned picture-based information extraction method.
As shown in fig. 6, the information extraction apparatus 200 includes: an acquisition module 201, a text recognition module 202, a feature extraction module 203 and an information extraction module 204.
An obtaining module 201, configured to obtain a target picture to be extracted, and obtain a target service requirement;
the text recognition module 202 is configured to perform a text recognition operation on the target picture to obtain a corresponding target text;
the feature extraction module 203 is configured to perform feature extraction operation on the target text to obtain a target feature vector corresponding to the target text;
the information extraction module 204 is configured to analyze the target feature vector through the target deep learning model based on the target service requirement, so as to obtain target information corresponding to the target service requirement.
The text recognition module 202 is further configured to perform a preprocessing operation on the target picture to be extracted, to obtain a preprocessed target picture, where the preprocessing operation includes at least one of size adjustment, normalization processing, and center clipping; performing text recognition operation on the preprocessed target picture to obtain a corresponding target text; and performing data cleaning operation on the target text to obtain the target text after data cleaning, wherein the data cleaning operation comprises at least one of stop word removal operation, punctuation mark removal operation and special character removal operation.
The feature extraction module 203 is further configured to perform feature extraction operation on the target text after the data cleaning based on one of a word frequency-inverse document frequency algorithm, a word2vec model, and a single-hot encoding algorithm, so as to obtain the target feature vector.
The information extraction module 204 is further configured to perform named entity analysis on the target feature vector through the promt mechanism based on the requirement identified by the named entity, so as to obtain the target information, where the target information is composed of a plurality of named entities.
The information extraction module 204 is further configured to perform a relationship analysis on the plurality of named entities through the promt mechanism based on the requirement of the relationship extraction, so as to obtain a relationship analysis result; or based on the text classification requirement, performing text classification on a plurality of named entities through the promt mechanism to obtain a text classification result.
The obtaining module 201 is further configured to obtain a training data set and an initial deep learning model, where the training data set includes a plurality of pictures, the initial deep learning model includes an ERNIE-Layout model, and the ERNIE-Layout model includes the Prompt mechanism; performing sequence labeling on the training data set through the Prompt mechanism to obtain a sequence labeling result, wherein the sequence labeling result comprises information corresponding to each picture; and training the ERNIE-Layout model through the training data set and the sequence labeling result to obtain the target deep learning model.
The acquisition module 201 is further configured to perform iterative training on the target deep learning model to extract data features, and calculate a loss function; performing iterative training on the loss function by using a preset method with the aim of reducing the loss function value until the expected threshold value specification is met; and obtaining the target deep learning model after iteration based on the loss function after iteration training.
It should be noted that, for convenience and brevity of description, specific working processes of the above-described apparatus and each module, unit may refer to corresponding processes in the foregoing method embodiments, which are not repeated herein.
The methods and apparatus of the present application are operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
By way of example, the methods, apparatus described above may be implemented in the form of a computer program that is executable on a computer device as shown in fig. 6.
Referring to fig. 6, fig. 6 is a schematic diagram of a computer device according to an embodiment of the application. The computer device may be a server.
As shown in fig. 6, the computer device includes a processor, a memory, and a network interface connected by a system bus, wherein the memory may include a volatile storage medium, a non-volatile storage medium, and an internal memory.
The non-volatile storage medium may store an operating system and a computer program. The computer program comprises program instructions that, when executed, cause a processor to perform any of a number of picture-based information extraction methods.
The processor is used to provide computing and control capabilities to support the operation of the entire computer device.
The internal memory provides an environment for the execution of a computer program in a non-volatile storage medium that, when executed by a processor, causes the processor to perform any of a number of picture-based information extraction methods.
The network interface is used for network communication such as transmitting assigned tasks and the like. It will be appreciated by those skilled in the art that the architecture of the computer device, which is merely a block diagram of some of the structures associated with the present application, is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or less components than those shown, or may combine some of the components, or have a different arrangement of components.
It should be appreciated that the processor may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field-programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. Wherein the general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
Wherein in some embodiments the processor is configured to run a computer program stored in the memory to implement the steps of: acquiring a target picture to be extracted and acquiring a target service requirement; performing text recognition operation on the target picture to obtain a corresponding target text; performing feature extraction operation on the target text to obtain a target feature vector corresponding to the target text; and analyzing the target feature vector through the target deep learning model based on the target service requirement to obtain target information corresponding to the target service requirement.
In some embodiments, the processor is further configured to perform a preprocessing operation on the target picture to be extracted, to obtain a preprocessed target picture, where the preprocessing operation includes at least one of resizing, normalizing, and center cropping; performing text recognition operation on the preprocessed target picture to obtain a corresponding target text; and performing data cleaning operation on the target text to obtain the target text after data cleaning, wherein the data cleaning operation comprises at least one of stop word removal operation, punctuation mark removal operation and special character removal operation.
In some embodiments, the processor is further configured to perform a feature extraction operation on the target text after the data cleaning based on one of a word frequency-inverse document frequency algorithm, a word2vec model, and a one-hot encoding algorithm, to obtain the target feature vector.
In some embodiments, the processor is further configured to perform named entity analysis on the target feature vector through the Prompt mechanism based on the requirement identified by the named entity, to obtain the target information, where the target information is composed of a plurality of named entities.
In some embodiments, the processor is further configured to perform, based on the requirement extracted by the relationship, relationship analysis on the plurality of named entities by using the promt mechanism, to obtain a relationship analysis result; or based on the text classification requirement, performing text classification on a plurality of named entities through the promt mechanism to obtain a text classification result.
In some embodiments, the processor is further configured to obtain a training dataset and an initial deep learning model, wherein the training dataset comprises a number of pictures, the initial deep learning model comprises an ERNIE-Layout model, the ERNIE-Layout model comprising the Prompt mechanism; performing sequence labeling on the training data set through the Prompt mechanism to obtain a sequence labeling result, wherein the sequence labeling result comprises information corresponding to each picture; and training the ERNIE-Layout model through the training data set and the sequence labeling result to obtain the target deep learning model.
In some embodiments, the processor is further configured to iteratively train the target deep learning model to extract data features and calculate a loss function; performing iterative training on the loss function by using a preset method with the aim of reducing the loss function value until the expected threshold value specification is met; and obtaining the target deep learning model after iteration based on the loss function after iteration training.
The embodiment of the application also provides a computer readable storage medium, and a computer program is stored on the computer readable storage medium, wherein the computer program comprises program instructions, and the program instructions realize any one of the image-based information extraction methods provided by the embodiment of the application when being executed.
The computer readable storage medium may be an internal storage unit of the computer device according to the foregoing embodiment, for example, a hard disk or a memory of the computer device. The computer readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like, which are provided on the computer device.
Further, the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like.
While the application has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the application. Therefore, the protection scope of the application is subject to the protection scope of the claims.