Disclosure of Invention
The invention provides an OCR (optical character recognition) form semantic recognition method and device based on a graph neural network, which can effectively recognize key values existing in a form and find out corresponding relations existing among the key values, thereby meeting the actual industrial requirements of automatic form auditing and the like.
In order to solve the above technical problems, an embodiment of the present invention provides an OCR form semantic recognition method based on a graph neural network, including:
acquiring a first PNG table picture to be identified, wherein the first PNG table picture is obtained by preprocessing a PDF table;
Inputting the first PNG table picture into a trained GKVR recognition model, so that the GKVR recognition model carries out OCR (optical character recognition) on the first PNG table picture to obtain first text information, table frame information and position information of each text node, generating sentence vector features corresponding to the first text information through a GRU (word line unit) network according to the first text information and a preset vocabulary, converting the table frame information into node image features through a convolutional neural network and a grid_simple algorithm, carrying out normalization processing on the position information of each text node to obtain position features, finally respectively inputting the sentence vector features corresponding to the first text information and the position features into a graph-meaning network, and outputting a key value information set corresponding to the first PNG picture through a multi-layer perceptron MLP (multi-level perceptron) after being spliced with the node image features, wherein the key value information set comprises a key information set and a value information set;
And traversing and matching the key value information set according to a preset division rule tree, and outputting each key value pair in the key value information set.
As a preferable scheme, the trained GKVR recognition model comprises a sentence vector feature extraction module;
The training process of the sentence vector feature extraction module specifically comprises the following steps:
According to a preset vocabulary, carrying out vocabulary recognition on text contents of each text node in a training sample, generating character strings, carrying out one-hot encoding on each character string, and then carrying out word embedding by applying a layer of unidirectional feed-forward network to obtain a word sequence corresponding to each text node;
and learning the semantics in each word sequence through the GRU network to generate sentence vector characteristics of each text node.
As a preferable scheme, the trained GKVR identification model comprises a node image feature extraction module;
the training process of the node image feature extraction module specifically comprises the following steps:
acquiring a plurality of pieces of table frame information in a training sample, and extracting picture structure information of each piece of table frame information through a convolutional neural network to acquire a plurality of first feature images;
And scaling the plurality of first feature graphs into grids by a bilinear interpolation method through a grid_simple algorithm, and taking grid features of coordinates corresponding to each text node as node image features of the text node.
Preferably, the trained GKVR recognition model comprises a position feature extraction module;
the training process of the position feature extraction module specifically comprises the following steps:
Acquiring position information of each text node in a training sample;
and carrying out coordinate conversion on each piece of position information, normalizing the coordinate system into the [ -1,1] interval, and outputting the position characteristics corresponding to each text node.
As a preferable scheme, the training process of the trained recognition model specifically comprises the following steps:
Sentence vector features, node image features and position features corresponding to each text node in the training sample are used as input of GKVR identification models, and key information and value information corresponding to each text node are used as output of GKVR identification models;
and respectively inputting sentence vector features and position features to a drawing attention network for each text node, splicing the sentence vector features and the position features with the node image features to form the node features of each text node, and combining the outputs of the GKVR recognition model to train the drawing attention network and the multi-layer perceptron MLP.
As a preferred solution, the first PNG table picture is obtained by preprocessing a PDF table, specifically:
and acquiring a PDF document to be processed, intercepting a form part from the PDF document through a KVLabel tool, and generating the first PNG form picture.
Preferably, the KVLabel tool is further configured to preprocess training samples of the GKVR recognition model, specifically:
And selecting a form frame of the PDF document in the initial sample through the KVLabel tool, marking each text node in the form frame with a key value and marking with a key value pair, generating PNG form pictures corresponding to each initial sample, and taking all the PNG form pictures, the key value marks and the key value pair marks as the training sample.
As a preferred solution, the traversing matching is performed on the key value information set according to a preset partition rule tree, and each key value pair in the key value information set is output, which specifically includes:
Gradually dividing the key information set by traversing the dividing rule tree with breadth first, and selecting values in the value information set when the key information set reaches the leaf node to generate a plurality of key value pairs.
Preferably, the division rule tree is set in the GKVR identification model.
The invention correspondingly provides an OCR form semantic recognition device based on a graphic neural network, which comprises an acquisition unit, a recognition unit and an output unit;
the acquisition unit is used for acquiring a first PNG table picture to be identified, wherein the first PNG table picture is obtained by preprocessing a PDF table;
The recognition unit is used for inputting the first PNG table picture into a trained GKVR recognition model, so that the GKVR recognition model carries out OCR (optical character recognition) on the first PNG table picture to obtain first text information, table frame information and position information of each text node, sentence vector features corresponding to the first text information are generated through a GRU (graphic user unit) network according to the first text information and a preset vocabulary, the table frame information is converted into node image features through a convolutional neural network and a grid_simple algorithm, the position information of each text node is normalized to obtain position features, sentence vector features corresponding to the first text information and the position features are finally respectively input into a graph meaning network, and key value information sets corresponding to the first PNG table picture are output through a multi-layer perceptron MLP after being spliced with the node image features, wherein the key value information sets comprise key information sets and value information sets;
the output unit is used for performing traversal matching on the key value information set according to a preset division rule tree, and outputting each key value pair in the key value information set.
Compared with the prior art, the embodiment of the invention has the following beneficial effects:
the invention provides an OCR (optical character recognition) form semantic recognition method and device based on a graph neural network, which are characterized in that PNG form pictures are input into a trained GKVR recognition model, the attribute of a form node can be accurately judged to be a key or a value through sentence vector characteristics, node image characteristics and position characteristics of text nodes in the model, and the matching between key values is realized in a mode of setting a division rule tree, so that the relation recognition capability between the keys and the values of the form can be improved. Compared with the portable document format and image which are difficult to directly extract in the prior art, the invention combines deep learning network structures such as a graph neural network, a gate control circulation unit and the like, provides a GKVR network model for carrying out table key value identification, can realize one-key identification, and meets the actual industrial requirements of automatic table auditing and the like.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
Referring to fig. 1, a schematic flow chart of an embodiment of an OCR form semantic recognition method based on a graph neural network according to an embodiment of the present invention is shown, where the method includes steps 101 to 103, and the steps are as follows:
step101, acquiring a first PNG table picture to be identified, wherein the first PNG table picture is obtained by preprocessing a PDF table.
In this embodiment, the first PNG table picture is obtained by preprocessing a PDF table, specifically, a PDF document to be processed is obtained, and a table portion is intercepted from the PDF document by a KVLabel tool to generate the first PNG table picture.
Specifically, KVLabel is a tool for marking PDF documents, and can realize functions such as region information marking, node attribute marking (for example, a certain node is a key or a value), node key value pair relation marking and the like. For a PDF document to be identified, a table part of the PDF document is firstly intercepted by a KVLabel tool, and then converted into a first PNG table picture.
Step 102, inputting the first PNG table picture into a trained GKVR recognition model, so that the GKVR recognition model carries out OCR (optical character recognition) on the first PNG table picture to obtain first text information, table frame information and position information of each text node, generating sentence vector features corresponding to the first text information through a GRU (graphic user unit) network according to the first text information and a preset vocabulary, converting the table frame information into node image features through a convolutional neural network and a grid-simple algorithm, carrying out normalization processing on the position information of each text node to obtain position features, finally inputting the sentence vector features corresponding to the first text information and the position features into a graph-meaning network respectively, and outputting a key value information set corresponding to the first PNG table picture through a multi-layer perceptron MLP after being spliced with the node image features, wherein the key value information set comprises a key information set and a value information set.
In this embodiment, before executing step 102, training samples are required to train GKVR the recognition model, the first PNG table picture is recognized by the trained GKVR recognition model, and a corresponding key value information set is output. The key value information set comprises a key information set, a value information set and other information sets. Other information sets are sets of other content than keys and values, such as the header in a table.
In this embodiment, a plurality of table data may be extracted from SciTSR data sets, and then preprocessed by KVLabel tool to obtain a data set SciTSR-Key-Value of the training sample. Preprocessing a training sample of the GKVR recognition model, specifically, selecting a form frame of a PDF document in an initial sample through the KVLabel tool, marking each text node in the form frame with a key value and a key value pair, generating PNG form pictures corresponding to each initial sample, and taking all the PNG form pictures, the key value marks and the key value pair marks as the training sample.
In this embodiment, compared with the prior art that directly converts the PDF document in the SciTSR dataset into the PNG file, and then performs table framing according to the rectangular label of the picture, the embodiment firstly intercepts the table frame corresponding to the table from the PDF document, then converts the intercepted table frame into the PNG table picture, and finally performs key value label and key value pair label. Therefore, the coordinate information of the table frame can be ensured to be consistent with the picture, and the problem of dislocation or mismatch caused by converting the file format from PDF to PNG is avoided. As shown in fig. 2 and 3, fig. 2 is a diagram of PNG table obtained by adopting the technical means of the present embodiment, and fig. 3 is a diagram of PNG table for solving the problem of misalignment or inconsistency caused by the prior art.
As an example of the present embodiment, key value information in a picture marked with a rectangular frame, coordinate information of the rectangular frame are stored in the chunk data. A specific illustration of the chunk data may be, but is not limited to, see fig. 4. As shown in fig. 4, the chunk data is used to also hold text node locations, with the text nodes indexed in the list by their node numbers.
As an example of this embodiment, when the Key Value is labeled and the Key Value pair is labeled, the text node can be distinguished as the Key Key, the Value or Other information Other by labeling the type attribute of the text node. And storing the type attribute of the text node by adopting info data, wherein the key of the dictionary is the node number, and the value is the attribute. In addition, the label of the Key Value pair is stored by using pair data, and the relation of the text node Key Value pair is recorded, wherein the meaning of the element is [ Key node number, value node number ].
Therefore, by means of the KVLabel tool developed by the embodiment of the invention, functions of importing a data set to be marked, selecting the data to be marked, performing rectangular box selection marking, setting attributes of rectangular nodes selected by boxes, setting key value relations among the nodes and the like can be realized.
In the present embodiment, in the form key value recognition work, the form as input data has a strong structural nature in practice, and after passing OCR recognition, the form obtains position information and text information of each text area in the form, which can be regarded as map data. Moreover, because the OCR technology is mature, the characters are easy to identify, and the embodiment of the invention is mainly aimed at identifying the key value category of the character nodes in the table.
Before each module performs training, a plurality of disturbance modes can be added, and default disturbance modes include color space conversion (cvtColor), blurring (blur), jitter (jitter), noise (Gasuss noise), random crop (random crop), perspective (perspective), color inversion (reverse) and the like. The data required for training are as shown in the following table:
in the embodiment, the trained GKVR recognition model comprises a sentence vector feature extraction module, a node image feature extraction module and a position feature extraction module.
The training process of the sentence vector feature extraction module comprises the steps of carrying out vocabulary recognition on text contents of each text node in a training sample according to a preset vocabulary, generating character strings, carrying out one-hot encoding on each character string, then carrying out word embedding by using a layer of unidirectional feed-forward network to obtain word sequences corresponding to each text node, and learning semantics in each word sequence through a GRU network to generate sentence vector features of each text node.
In order to enable the model to obtain the semantic information of the form text, a common processing manner of text information in the field of natural language processing is used herein, and a vocabulary vocab is first established, where Vocab is 26 letters and each symbol. The data type is a character string. The first part is more ordered ("0123456789 abcdefghijklmnopqrstuvwxyz") as a result of a simple number plus lowercase traversal. The second part is more unordered, is formed by splicing a plurality of symbols and capital letters, is added with Roman numerals and the like, and is still in the form of a character string. The structural feature is simple horizontal "xxxx", no special arrangement. Next, the characters for which vocab is not present are converted into vocab words representing unknown symbols.
Then, one-Hot encoding is performed on each character and word embedding (word embedding) is performed by applying a layer of single feed forward network (not described in detail in the prior art) to represent, and One-Hot encoding uses an N-bit state register to encode N states, each state is represented by its independent register bit, and only One bit is valid at any time. One-Hot encoding first requires mapping the classification value to the entire value. Each integer value is represented as a binary vector, which is zero except for the index of the integer, labeled 1.
Finally, the GRU is used for learning semantic information in the word sequence, and sentence vector features used for representing the text information of the graph nodes are finally obtained, and the process is as follows:
word_vectori[j]=embedding(one_hot(testi[j]))
sentence_featurei[j]=GRU(word_vectori[j])
Where word vectori j represents the set of characters embedding of the j-th node text in the i-th chart. sentence _ei [ j ] is a sentence vector feature. The text sentence vector feature representation network parameters are shown in the following table:
| Parameters (parameters) | Value of |
| Vocabulary size | 105 |
| Post-Embedding word vector size | 64 |
| Sentence vector size | 64 |
The training process of the node image feature extraction module comprises the steps of obtaining a plurality of pieces of table frame information in a training sample, extracting picture structure information of each piece of table frame information through a convolutional neural network to obtain a plurality of first feature images, scaling the plurality of first feature images into grids through a grid-simple algorithm in a bilinear interpolation method, and taking grid features of coordinates corresponding to each text node as node image features of the text node.
For the table, besides the position and the text information of each text node, the table frame information of the table has a certain value for key value identification, and the difference between the table frame structures of the key node and the value node shows that the table frame information has a certain reference value. In order to better complete the work of key value identification, the embodiment of the invention uses a convolutional neural network (Convolutional Neural Networks, CNN) to extract the picture structure information of a table, and then scales a feature map obtained through the convolutional network into a grid by a bilinear interpolation method and obtains the grid feature of the corresponding coordinate of each text node as the image feature of the text node by a grid-simple algorithm, and the detailed process is as follows:
img_feature_mapi=CNNs(imgi)
img_feature_boxi[j]=grid_simple(img_feature_mapi,posi[j])
Wherein img_feature_mapi is a picture feature map of the ith table, img_feature_boxi [ j ] is a node image feature of the jth text node in the ith table. CNNs network parameters are specifically shown in the following table:
| Layer | In channel | Out channel | Kernel size | Stride | Padding | Activation function |
| 0 | 1 | 64 | 3x3 | 1 | 1 | Relu |
| 1 | 64 | 64 | 3x3 | 1 | 1 | Relu |
| 2 | 64 | 64 | 3x3 | 1 | 1 | Relu |
In the embodiment, the training process of the position feature extraction module specifically comprises the steps of obtaining position information of each text node in a training sample, carrying out coordinate transformation on each position information, normalizing a coordinate system to be within a [ -1,1] interval, and outputting position features corresponding to each text node.
In the table data, since the absolute positions of the table nodes having similar structures have a large gap due to the different sizes of the tables, the learning efficiency may be low if the absolute positioning is directly used as the input of the network. To avoid the above problems, and thereby enable the network to learn the table structure better, absolute position information of the nodes is converted into relative position information, and the coordinate system is normalized to the interval-1 to 1. The process is as follows:
min_xi=min(Xi)
min_yi=min(Yi)
Wherein, Xi,Yi represents an i-th table X coordinate value set and a y coordinate set, respectively, and an absolute position of a certain node j is set to be { (X1,y1),(x2,y2) }, tabel _width represents a table width, and tabel _height represents a table height.
In the embodiment, sentence vector features, node image features and position features corresponding to each text node in a training sample are used as input of GKVR identification models, key information and value information corresponding to each text node are used as output of GKVR identification models, sentence vector features and position features are respectively input to a drawing attention network for each text node, the node features of each text node are formed after the sentence vector features and the position features are spliced with the node image features, and the drawing attention network and a multi-layer perceptron MLP are trained by combining the output of GKVR identification models.
The attention network GAT determines the weight occupied by the neighbor node characteristics during aggregation through a self-attention mechanism so as to realize weight self-adaption on different neighbor nodes and avoid the influence of the number of the neighbor nodes on the output characteristics. Since the data set cannot provide the side information existing between the nodes, if the side set is constructed by using full connection, the complexity reaches O (|N|2), in order to reduce the complexity and considering the characteristic that the neighbor nodes of the table node have similar positions, a nearest neighbor algorithm (K Nearest Neighbor, KNN) is adopted to generate the side set of the table graph, so that the complexity can be reduced to O (K|N|).
In the overall step, the use of nearest neighbor algorithms to reduce complexity is a function of the "pos of Node" section to GRID SIMPLE in FIG. 5. Each node in the graph has its own relative position attribute, and k nodes nearest to a node are selected by nearest neighbors, so that an edge is set between the node and other k nodes. The reason is that with the edge set and the junction set, the graph convolution can be performed.
The calculation process is as follows:
edgesi[j]=KNN(posi[j])
pos_h_featurei[j]=GATθ1(normlized_posi[j],edgesi[j])
text_h_featurei[j]=GATθ2(sentence_featurei[j],edgesi[j])
h_fi[j]=concat(pos_h_featurei[j],text_h_featurei[j],img_feature_boxi[j])
predictioni[j]=Softmax(MLP((h_fi[j])))
Where edgesi represent the edges of the ith table obtained by the KNN algorithm. The edgesi [ j ] represent the edge of the ith table node j obtained through the KNN algorithm, the posi [ j ] represents the absolute position of the node j, the normlized _posi [ j ] represents the relative position of the node j, the sentence _featurei [ j ] represents the sentence vector of the text information of the node j, the pos_h_featurei [ j ] represents the position feature of the jth text node of the ith table, the text_h_featurei [ j ] represents the text feature of the jth text node of the ith table, the img_feature_boxi [ j ] represents the image feature of the jth text node, the h_fi [ j ] represents the feature information of the node j, the predictioni [ j ] represents the prediction result of the class of the node j, and the GATθ1 and GATθ2 are the method for distinguishing the GAT position feature from the sentence feature.
To better illustrate the benefits of this embodiment, it can be verified by GCN and GAT-based comparative experiments. In model design, the graph neural network is used to enable nodes on the graph to incorporate information of nearby nodes, thereby better presuming the type of the nodes. In GFTE, a graph neural network-based key value row-column relationship derivation model, GCN is used as a bottom layer network for node information aggregation of the GCN and works well in the work of the GCN. However, the fusion of the GCN to the neighbor nodes is influenced by the degrees of the neighbor nodes and cannot achieve the purpose of generating the corresponding weights according to different characteristic values of different nodes, and in a table key value deducing task, the influence factors of the neighbor nodes to the center node are considered to contain the characteristic values of the neighbor nodes, so that GAT is adopted as a bottom network of node aggregation work, and the accuracy and convergence stability of a model are better improved when key value identification work is completed.
As shown in fig. 6, the GCN-based GKVR model has a Loss convergence trend substantially consistent with the GAT-based GKVR model over the training set, but the former has a greater minimum value of convergence. The GKVR model based on GCN shows strong jitter of Loss on the test set, so that the use of GAT as the underlying network for node information aggregation can be seen to improve the convergence stability of the GKVR model.
As shown in fig. 7, the GAT-based GKVR model performs significantly better than the GCN-based GKVR model in terms of accuracy of the recognition effort, with the highest accuracy on the training set being greater than the latter 6 percentiles and the highest accuracy on the test set being greater than the latter 7 percentiles. It is a reasonable solution for table node key identification to replace GCN with GAT.
In this embodiment, after GKVR recognition model training, the first PNG table picture is input and then the corresponding first text information, table frame information and position information of each text node are extracted. The first text information mainly comprises text content of a table, sentence vector characteristics corresponding to the first text information are generated through a sentence vector characteristic extraction module, table frame information is converted into node image characteristics through a node image characteristic extraction module, position information of each text node is normalized through a position characteristic extraction module to obtain position characteristics, the sentence vector characteristics corresponding to the first text information and the position characteristics are finally respectively input into a graph meaning network, and the sentence vector characteristics and the position characteristics are spliced with the node image characteristics and then pass through a multi-layer perceptron MLP to output a key value information set corresponding to the first PNG table picture.
Step 103, performing traversal matching on the key value information set according to a preset division rule tree, and outputting each key value pair in the key value information set.
In this embodiment, step 103 includes gradually dividing the key information set by traversing the dividing rule tree with breadth first, and selecting values in the value information set when the leaf node is reached, to generate a plurality of key value pairs.
After the task of identifying the node Key attribute in the table, the table node set may be divided into Key={k1,k2,…,kn},Value={v1,v2,…,vm},Other={o1,o2,…,ok} sets in step 102, where how to obtain the correspondence between the elements in the Key set and the Value set in the table is mainly discussed.
And regarding whether the key value pair relation exists between the nodes as two categories, extracting the characteristics of the nodes on the graph through the graph neural network, and finally converting the problem into two categories of problems so as to predict whether the key value pair relation exists between the nodes. In order to find all Key-Value pair relationships existing between Key values, a reasonable design is to construct the Key set and the Value set into a complete bipartite graph, so as to predict the category of each edge < Node1,Node2 > on the bipartite graph. The above solution is the prior art, and designing a two-class neural network according to the above will face the problem of extremely unbalanced sample tag distribution. Experimentally, this imbalance results in the model classifying the edges between all nodes into non-key-pair categories to enable higher accuracy, but the confusion matrix-aware model cannot identify key-pair relationships that exist in the graph.
In the embodiment, obvious priori knowledge exists in table Key Value matching, such as that Key nodes and Value nodes are in the same row or the same column, and the distance between the Value nodes and the corresponding Key nodes in a certain coordinate system or Euclidean distance is the smallest. To introduce the above a priori knowledge for the key-value matching problem, the partitioning rule tree PT is defined herein as follows:
pt is not null.
2. If a node i in the PT is not a leaf node, it contains a partition rule pi.
If a node i in the PT is not a leaf node, the number of child nodes is equal to the number of set categories into which the set is divided by pi.
3. If a node i in the PT is a leaf node, it includes a selection rule si.
The above is to identify key values, and rule tree algorithms are used to explore key-value pair relationships. pi is a partitioning rule set on the rule tree, partitioning the key set into subsets. si represents a selection rule defined by the rule tree, and matches key-value pairs conforming to the unified rule.
Key value matching can be performed on keys conforming to the same rule tree by gradually dividing the Key set through breadth-first traversal rule tree PT and selecting keys and generating Key value pairs when the Key set reaches leaf nodes. The matching rule can be expressed as breadth traversal, a subset of the key set is taken according to rules in the rule tree nodes, and finally a key corresponding to a certain value is found. For example, the key set is divided, and the key set is matched with the value set in the rule tree, so that the key value pair is generated by finally finding the value conforming to the same rule tree.
As an example of this embodiment, a partition-based key-value matching algorithm may be, but is not limited to, that shown in the following table.
For a better explanation of the application of the partition-based key-value matching algorithm of this embodiment, the following example is described, referring to fig. 8, and the corresponding partition rule tree PT is defined as shown in fig. 8, where the root node is a rule of partition, contains a set of horizontal and vertical directions, and an interval within the scope of action is given. The left child node is a horizontal set element and the right child node is a vertical set element. All conform to the nearest neighbor principle of the key value pair. Finally, key Value pair matching is carried out through the rule tree, so that Key Value pairs in SciTSR-Key-Value data sets can be well identified. Wherein D (x, y) is the angle formed by the edges of the x node and the y node and the x axis.
As an example of this embodiment, a division rule tree is set in the GKVR recognition model. In the example, the division rule tree is also integrated in the GKVR identification model, so that the operation is simplified, and the efficiency is improved.
On the other hand, the embodiment of the invention provides an OCR table semantic recognition device based on a graphic neural network, which comprises an acquisition unit, a recognition unit and an output unit;
the acquisition unit is used for acquiring a first PNG table picture to be identified, wherein the first PNG table picture is obtained by preprocessing a PDF table;
The recognition unit is used for inputting the first PNG table picture into a trained GKVR recognition model, so that the GKVR recognition model carries out OCR (optical character recognition) on the first PNG table picture to obtain first text information, table frame information and position information of each text node, sentence vector features corresponding to the first text information are generated through a GRU (graphic user unit) network according to the first text information and a preset vocabulary, the table frame information is converted into node image features through a convolutional neural network and a grid_simple algorithm, the position information of each text node is normalized to obtain position features, sentence vector features corresponding to the first text information and the position features are finally respectively input into a graph meaning network, and key value information sets corresponding to the first PNG table picture are output through a multi-layer perceptron MLP after being spliced with the node image features, wherein the key value information sets comprise key information sets and value information sets;
the output unit is used for performing traversal matching on the key value information set according to a preset division rule tree, and outputting each key value pair in the key value information set.
The more detailed working principle and flow of the device can be but is not limited to the related description
From the above, the embodiment of the invention provides an OCR (optical character recognition) table semantic recognition method and device based on a graph neural network, which are used for inputting PNG table pictures into a trained GKVR recognition model, accurately judging that the attribute of a table node is a key or a value through sentence vector characteristics, node image characteristics and position characteristics of text nodes in the model, realizing the matching between key values in a mode of setting a division rule tree, and improving the capability of recognizing the relationship between the key and the value of the table. Compared with the portable document format and image which are difficult to directly extract in the prior art, the invention combines deep learning network structures such as a graph neural network, a gate control circulation unit and the like, provides a GKVR network model for carrying out table key value identification, can realize one-key identification, is an important supplement to the existing, traditional and widely applied table identification method, and meets the actual industrial requirements such as automatic table auditing and the like.
Furthermore, the fusion of the prior art application graph convolution neural network to the neighbor nodes is influenced by the degree of the neighbor nodes and cannot achieve the purpose of generating the corresponding weights according to different characteristic values of different nodes. In the task of deducing the key value of the table, because the influence factors of the neighbor nodes to the central node should contain the characteristic values of the neighbor nodes, the invention adopts the graph annotation force network as the bottom network of the node aggregation work and improves the accuracy and convergence stability of the model when the key value identification work is completed.
Further, while some approaches have explored in identifying key values present in tables, there are often portable document formats and images in tables that are difficult to directly extract. The invention combines deep learning network structures such as a Graph neural network, a gate control circulation unit and the like, and provides a network model GKVR (Graph-based Key and Value Recognition) for carrying out table key value identification, wherein the model can carry out key value classification on a certain node in a table by utilizing text information, position information and picture information of a text in the table and picture information of the table picture, thereby improving the capability of the relation identification between keys and values of the table.
The foregoing embodiments have been provided for the purpose of illustrating the general principles of the present invention, and are not to be construed as limiting the scope of the invention. It should be noted that any modifications, equivalent substitutions, improvements, etc. made by those skilled in the art without departing from the spirit and principles of the present invention are intended to be included in the scope of the present invention.