








技术领域technical field
本发明涉及人工智能与计算机领域,尤其涉及一种表格结构识别的方法、系统、设备及存储介质。The present invention relates to the field of artificial intelligence and computer, in particular to a method, system, device and storage medium for table structure identification.
背景技术Background technique
表格是一种数据组织形式,在日常生活中为人们所常用,表格出现的场景丰富多样,包括电子文档如PDF、Excel、扫描文档,也包括打印文档,如票据、账单,除此之外,还有更复杂的自然场景,如食品包装、街头海报,这类场景下的拍照表格的结构往往存在较为严重的形变,是表格结构识别技术中较有挑战性的部分。Tables are a form of data organization that are commonly used in daily life. Tables appear in a variety of scenarios, including electronic documents such as PDF, Excel, scanned documents, and printed documents such as bills and bills. In addition, There are also more complex natural scenes, such as food packaging and street posters. In such scenes, the structure of the photo form often has serious deformation, which is a more challenging part of the form structure recognition technology.
目前,主流表格结构识别的方法,无论是基于传统图像处理还是基于深度学习的表格结构识别,都必须基于较强的结构先验,即表格必须是矩形的,一旦碰到如前所述的形变情况,失去了结构先验,识别的准确率会有严重下降。At present, the mainstream table structure recognition methods, whether based on traditional image processing or deep learning-based table structure recognition, must be based on strong structural priors, that is, the table must be rectangular. In this case, if the structure prior is lost, the accuracy of recognition will be seriously reduced.
发明内容SUMMARY OF THE INVENTION
为解决现有技术所存在的技术问题,本发明提供一种表格结构识别的方法、系统、设备及存储介质,可以解决表格结构识别场景中表格结构形变带来的影响,提高表格结构识别准确率。In order to solve the technical problems existing in the prior art, the present invention provides a method, system, device and storage medium for table structure recognition, which can solve the influence of table structure deformation in the table structure recognition scene and improve the table structure recognition accuracy. .
本发明的第一个目的在于提供一种表格结构识别的方法。The first object of the present invention is to provide a method for identifying a table structure.
本发明的第二个目的在于提供一种表格结构识别的系统。The second object of the present invention is to provide a table structure identification system.
本发明的第三个目的在于提供一种计算机设备。A third object of the present invention is to provide a computer device.
本发明的第四个目的在于提供一种存储介质。A fourth object of the present invention is to provide a storage medium.
本发明的第一个目的可以通过采取如下技术方案达到:The first purpose of the present invention can be achieved by adopting the following technical solutions:
一种表格结构识别的方法,所述方法包括:A method for identifying a table structure, the method comprising:
S1、使用预训练的语义分割模型,对待识别图像的单元格区域、表格框线区域和表格区域进行语义分割,得到单元格区域分割图、表格框线区域分割图、表格区域分割图;S1. Use a pre-trained semantic segmentation model to perform semantic segmentation on the cell area, table border area and table area of the image to be recognized, and obtain a cell area segmentation map, a table border area segmentation map, and a table area segmentation map;
S2、对单元格区域分割图和表格框线区域分割图进行融合,得到融合单元格区域分割图;S2, fuse the cell area segmentation map and the table frame line area segmentation map to obtain a fused cell area segmentation map;
S3、对表格区域分割图进行控制点提取,得到矫正变换;S3, extracting control points from the table area segmentation map to obtain a rectification transformation;
S4、使用矫正变换对融合单元格分割图进行矫正,得到对齐单元格区域分割图;S4, correcting the fusion cell segmentation map using the correction transformation to obtain the aligned cell region segmentation map;
S5、将对齐单元格区域分割图进行连通域提取分析,根据匹配条件得到表格结构信息。S5 , extracting and analyzing the connected domain of the alignment cell area segmentation map, and obtains table structure information according to matching conditions.
具体地,所述使用预训练的语义分割模型为神经网络deeplabV3+,步骤S1具体包括:Specifically, the pre-trained semantic segmentation model is the neural network deeplabV3+, and step S1 specifically includes:
更改神经网络deeplabV3+最后一层多分类输出,使其成为三个二分类输出,损失函数分别相对单元格区域、表格框线区域、表格区域的标注图像进行计算并进行梯度反传;Change the multi-category output of the last layer of neural network deeplabV3+ to make it three two-category outputs. The loss function is calculated relative to the cell area, the table frame area, and the labeled image of the table area, and the gradient is back-propagated;
神经网络deeplabV3+前向时,同时输出单元格区域、表格框线区域、表格区域的语义分割图。When the neural network deeplabV3+ is forward, the semantic segmentation map of the cell area, table border area, and table area is output at the same time.
进一步地,所述步骤S2具体包括:Further, the step S2 specifically includes:
S21、对表格框线区域分割图进行像素级取反操作,得到反转表格框线区域分割图;S21, performing a pixel-level inversion operation on the table frame line area segmentation map to obtain a reverse table frame line area segmentation map;
S22、对单元区域分割图与反转表格框线区域分割图进行像素级与操作,得到融合单元格区域分割图。S22, performing pixel-level sum operation on the unit area segmentation map and the inverted table frame line area segmentation map to obtain a fusion cell area segmentation map.
本发明的第二个目的可以通过采取如下技术方案达到:The second object of the present invention can be achieved by adopting the following technical solutions:
一种表格结构识别的系统,所述系统包括:A system for table structure identification, the system includes:
语义分割模块,用于使用预训练的语义分割模型,对待识别图像的单元格区域、表格框线区域和表格区域进行语义分割,得到单元格区域分割图、表格框线区域分割图、表格区域分割图;Semantic segmentation module, used to use the pre-trained semantic segmentation model to semantically segment the cell area, table border area and table area of the image to be recognized, and obtain the cell area segmentation map, table border area segmentation map, and table area segmentation picture;
分割图融合模块,用于对单元格区域分割图和表格框线区域分割图进行融合,得到融合单元格区域分割图;The segmentation map fusion module is used to fuse the cell area segmentation map and the table frame line area segmentation map to obtain the fused cell area segmentation map;
控制点提取模块,用于对表格区域分割图进行控制点提取,得到矫正变换;The control point extraction module is used to extract the control points of the table area segmentation map to obtain the correction transformation;
分割图矫正模块,用于使用矫正变换对融合单元格分割图进行矫正,得到对齐单元格区域分割图;The segmentation map correction module is used to correct the fusion cell segmentation map by using the correction transformation to obtain the aligned cell region segmentation map;
提取分析模块,用于将对齐单元格区域分割图进行连通域提取分析,根据匹配条件得到表格结构信息。The extraction and analysis module is used to extract and analyze the connected domain of the alignment cell area segmentation map, and obtain the table structure information according to the matching conditions.
本发明的第三个目的可以通过采取如下技术方案达到:The third object of the present invention can be achieved by adopting the following technical solutions:
一种计算机设备,包括处理器以及用于存储处理器可执行程序的存储器,所述处理器执行存储器存储的程序时,实现上述的表格结构识别的方法。A computer device includes a processor and a memory for storing a program executable by the processor. When the processor executes the program stored in the memory, the above method for identifying a table structure is implemented.
本发明的第四个目的可以通过采取如下技术方案达到:The fourth object of the present invention can be achieved by adopting the following technical solutions:
一种存储介质,存储有程序,所述程序被处理器执行时,实现上述的表格结构识别的方法。A storage medium stores a program, and when the program is executed by a processor, the above-mentioned method for identifying a table structure is implemented.
本发明与现有技术相比,具有如下优点和有益效果:Compared with the prior art, the present invention has the following advantages and beneficial effects:
本发明通过对单元格区域、表格框线区域的分割图进行融合,得到融合单元格区域分割图,通过单元格区域、表格框线区域的分割图取长补短,提高表格结构识别准确率;对表格区域分割图进行控制点提取,得到矫正变换,恢复结构先验,有利于进行下一步处理,提高表格结构识别准确率;使用矫正变换对融合单元格分割图进行矫正,得到对齐单元格区域分割图,将对齐单元格区域分割图进行连通域提取分析,根据匹配条件得到表格结构信息。本发明可解决表格结构识别场景中表格结构形变带来的影响,提高表格结构识别准确率。The invention obtains the fused cell region segmentation map by fusing the segmentation maps of the cell region and the table frame line region, and improves the table structure recognition accuracy by taking advantage of the strengths and weaknesses of the segmentation maps of the cell region and the table box line region; Extracting the control points of the segmentation map, obtaining the rectification transformation, and restoring the structure prior, which is conducive to the next step of processing and improving the accuracy of table structure identification; using the rectification transformation to rectify the fused cell segmentation map to obtain the aligned cell area segmentation map, The segmented map of the aligned cell area is extracted and analyzed for connected domains, and the table structure information is obtained according to the matching conditions. The invention can solve the influence brought by the deformation of the table structure in the table structure recognition scene, and improve the accuracy rate of the table structure recognition.
附图说明Description of drawings
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图示出的结构获得其他的附图。In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention, and for those of ordinary skill in the art, other drawings can also be obtained according to the structures shown in these drawings without creative efforts.
图1为本发明实施例1中表格结构识别方法的总体流程图;Fig. 1 is the overall flow chart of the table structure identification method in
图2为本发明实施例1中的单元格区域分割图、表格框线分割图、表格区域分割图的示意图;2 is a schematic diagram of a cell area segmentation diagram, a table frame line segmentation diagram, and a table area segmentation diagram in
图3为本发明实施例1中融合单元格分割图生成流程图;3 is a flow chart of generating a fusion cell segmentation map in
图4为本发明实施例1中一张待识别图像的效果图;4 is an effect diagram of an image to be recognized in
图5为本发明实施例1中得到的单元格区域分割图;Fig. 5 is the cell area segmentation diagram obtained in the embodiment of the
图6为本发明实施例1中得到的表格框线分割图;Fig. 6 is the table frame line segmentation diagram obtained in the embodiment of the
图7为本发明实施例1中得到的融合单元格区域分割图的示意图;7 is a schematic diagram of a fusion cell region segmentation diagram obtained in
图8为本发明实施例1中的矫正前后单元格区域分割图的示意图;8 is a schematic diagram of a cell area segmentation diagram before and after correction in
图9为本发明实施例1中得到表格结构信息的具体流程图。FIG. 9 is a specific flowchart for obtaining table structure information in
具体实施方式Detailed ways
下面将结合附图和实施例,对本发明技术方案做进一步具体描述,显然所描述的实施例是本发明一部分实施例,而不是全部的实施例,本发明的实施方式并不限于此。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions of the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. Obviously, the described embodiments are part of the embodiments of the present invention, not all embodiments, and the embodiments of the present invention are not limited thereto. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.
实施例1Example 1
如图1所示,本实施例提供了一种表格结构识别方法,包括以下步骤:As shown in FIG. 1 , this embodiment provides a method for identifying a table structure, including the following steps:
S1、使用预训练的语义分割模型,对待识别图像的单元格区域、表格框线区域和表格区域进行语义分割,得到单元格区域分割图、表格框线区域分割图、表格区域分割图。S1. Use a pre-trained semantic segmentation model to semantically segment the cell area, table border area and table area of the image to be recognized, and obtain a cell area segmentation map, a table border area segmentation map, and a table area segmentation map.
本实施例所述的待识别图像为含有表格的场景图像,包括使用摄像设备拍摄的含有表格的照片,如打印票据、账单、食品包装、街头海报等,也包括电子文档如PDF、Excel、扫描文档。The to-be-recognized image described in this embodiment is a scene image containing a form, including a photo containing a form taken by a camera device, such as printed bills, bills, food packaging, street posters, etc., as well as electronic documents such as PDF, Excel, scanned documentation.
本实施例所述预训练的语义分割模型为一种基于深度学习的神经网络deeplabV3+,可对图像进行像素级的分类与标注。The pre-trained semantic segmentation model described in this embodiment is a deep learning-based neural network deeplabV3+, which can perform pixel-level classification and labeling of images.
本实施例中,所述预训练的语义分割模型的训练数据从现有的公开数据集中获取。In this embodiment, the training data of the pre-trained semantic segmentation model is obtained from an existing public data set.
进一步地,所述同时对待识别图像的单元格区域、表格框线区域和表格区域进行语义分割具体步骤包括:Further, the specific steps of performing semantic segmentation on the cell area, table frame area and table area of the image to be recognized at the same time include:
更改deeplabV3+最后一层多分类输出的用途,使其成为三个二分类输出,损失函数分别相对单元格区域、表格框线区域、表格区域的标注图像进行计算并进行梯度反传。Change the purpose of the multi-classification output of the last layer of deeplabV3+ to make it three binary classification outputs. The loss function is calculated relative to the cell area, the table frame area, and the labeled image of the table area, and the gradient is back-propagated.
deeplabV3+网络前向时,同时输出单元格区域、表格框线区域、表格区域的语义分割图。When the deeplabV3+ network is forward, the semantic segmentation map of the cell area, table border area, and table area is output at the same time.
如图2所示,从左往右依次为单元格区域分割图、表格框线区域分割图、表格区域分割图。As shown in Figure 2, from left to right are the cell area segmentation diagram, the table frame line area segmentation diagram, and the table area segmentation diagram.
S2、对单元格区域分割图和表格框线区域分割图进行融合,得到融合单元格区域分割图。S2, fuse the cell area segmentation map and the table frame line area segmentation map to obtain a fusion cell area segmentation map.
如图3所示,所述对单元格区域分割图与表格框线分割图进行融合,具体包括步骤:As shown in Figure 3, the fusion of the cell area segmentation diagram and the table frame line segmentation diagram specifically includes the steps:
S21、对表格框线区域分割图进行像素级取反操作,得到反转表格框线区域分割图;S21, performing a pixel-level inversion operation on the table frame line area segmentation map to obtain a reverse table frame line area segmentation map;
S22、对单元格区域分割图与反转表格框线区域分割图进行像素级与操作,得到融合单元格区域分割图。S22, performing pixel-level sum operation on the cell area segmentation map and the inverted table frame line area segmentation map to obtain a fusion cell area segmentation map.
如图4-7所示,对一张待识别图像进行步骤S1、S2操作得到融合单元格区域分割图。其中图4为一张待识别图像的效果图,如图5所示,单元格区域分割图因表格自身性质,单元格排列非常密集,分割图存在不同单元格之间粘连的情况,这种情况会使粘连的单元格在后续操作中被分为同一个单元格,影响了表格结构识别准确率。As shown in Figure 4-7, performing steps S1 and S2 on a to-be-recognized image to obtain a fused cell area segmentation map. Among them, Figure 4 is the rendering of an image to be recognized. As shown in Figure 5, the cell area segmentation diagram is very dense due to the nature of the table itself, and the segmentation diagram has adhesion between different cells. In this case Cells that will be glued will be divided into the same cell in subsequent operations, which will affect the accuracy of table structure recognition.
同时,从表格框线分割图中提取单元格结构,需要区域完整地被表格框线标记包围。如图6所示,所述表格框线区域分割图因为表格框线本身的性质,其区域分布细长,分割模型的结果易有缺损之处,破坏了完整的包围,使得单元格丢失,也影响了表格结构识别的准确率。At the same time, to extract the cell structure from the table frame line segmentation map, the area needs to be completely surrounded by the table frame line mark. As shown in FIG. 6 , because of the nature of the table frame line itself, the area distribution of the table frame line area is slender, and the result of the segmentation model is prone to defects, which destroys the complete encirclement and causes the loss of cells. It affects the accuracy of table structure recognition.
如图7所示,为融合单元格区域分割图的示意图,对单元格区域分割图与表格框线分割图进行融合,得到融合单元格区域分割图,通过单元格区域、表格框线区域的分割图取长补短,提高表格结构识别准确率。As shown in Figure 7, which is a schematic diagram of the fused cell area segmentation diagram, the cell area segmentation diagram and the table frame line segmentation diagram are fused to obtain the fused cell area segmentation diagram. Figures complement each other's strengths to improve the accuracy of table structure recognition.
S3、对表格区域分割图进行控制点提取,得到矫正变换。S3, extracting control points from the table area segmentation map to obtain a correction transformation.
在实际应用场景中,待检测的表格常常会出现倾斜、扭曲的情况,原有的结构先验遭到破坏,此时需要将表格区域进行矫正,恢复结构先验,有利于进行下一步处理,提高表格结构识别准确率。In practical application scenarios, the table to be detected is often tilted and distorted, and the original structure prior is damaged. At this time, the table area needs to be corrected to restore the structure prior, which is conducive to the next step. Improve table structure recognition accuracy.
由表格区域分割图行控制点提取,得到矫正变换,具体包括以下步骤:Extraction from the control points of the table area segmentation map row to obtain the rectification transformation, which specifically includes the following steps:
S31、使用多边形拟合算法提取表格区域分割图的外接四边形的四个顶点。S31, using a polygon fitting algorithm to extract four vertices of the circumscribed quadrilateral of the table region segmentation graph.
S32、根据四个顶点确定表格区域中心点与表格区域宽高,并根据预设比例寻找中间控制点,中间控制点与四个顶点组成第一控制点集。S32. Determine the center point of the table area and the width and height of the table area according to the four vertices, and find the middle control point according to a preset ratio, and the middle control point and the four vertices form a first control point set.
S33、根据表格区域宽高与预设比例生成第二控制点集。S33. Generate a second control point set according to the width and height of the table area and a preset ratio.
S34、根据第一控制点集与第二控制点集,使用薄板样条插值算法生成矫正变换。S34. According to the first control point set and the second control point set, use a thin plate spline interpolation algorithm to generate a rectification transformation.
本实施例所用矫正变换为薄板样条插值算法,为一种二维插值方法,用于图像矫正或图像配准。给定一对控制点集,第一控制点集用于描述表格区域的轮廓,第二控制点集用于描述矫正变换的目标,因其描述了一个矩形,所以进行变换后,扭曲的表格区域可以被矫正为矩形,恢复结构先验。The correction transformation used in this embodiment is a thin plate spline interpolation algorithm, which is a two-dimensional interpolation method used for image correction or image registration. Given a pair of control point sets, the first control point set is used to describe the outline of the table area, and the second control point set is used to describe the target of the rectification transformation. Because it describes a rectangle, after the transformation, the distorted table area is Can be rectified into rectangles, restoring the structural prior.
根据第一控制点集和第二控制点集,可使用公开的通用薄板样条插值计算公式得到矫正变换。According to the first set of control points and the second set of control points, the rectification transformation can be obtained by using the disclosed general thin-plate spline interpolation calculation formula.
S4、使用矫正变换对融合单元格分割图进行矫正,得到对齐单元格区域分割图。S4, correcting the fused cell segmentation map by using the rectification transformation to obtain the aligned cell region segmentation map.
步骤S3中,矫正变换由对表格区域分割图进行分析得出,因融合单元格分割图与表格区域分割图的形状一致,只是更为细化,能将表格区域中的单元格区分出来,故将矫正变换应用于融合单元格分割图,能够得到矫正为矩形的对齐单元格区域分割图,如图8所示,左边弯曲的融合单元格区域分割图被矫正为右边对齐单元格区域分割图。In step S3, the correction transformation is obtained by analyzing the table area segmentation map. Because the shape of the fused cell segmentation map is the same as that of the table area segmentation map, it is more refined and can distinguish the cells in the table area. Applying the rectification transformation to the fused cell segmentation map can obtain a rectangular aligned cell region segmentation map. As shown in Figure 8, the fused cell region segmentation map that is curved on the left is corrected to the right-aligned cell region segmentation map.
S5、将对齐单元格区域分割图进行连通域提取分析,根据匹配条件得到表格结构信息。S5 , extracting and analyzing the connected domain of the alignment cell area segmentation map, and obtains table structure information according to matching conditions.
如图9所示,所述对齐单元格区域分割图进行连通域提取分析,具体包括以下步骤:As shown in FIG. 9 , the connected domain extraction and analysis is performed on the aligned cell region segmentation map, which specifically includes the following steps:
S51、利用基于轮廓的标记算法对所述对齐单元格区域分割图进行图像连通域分析,得到多个连通区域,并转换为多个标记框。S51 , using a contour-based labeling algorithm to perform image connected domain analysis on the aligned cell region segmentation map, to obtain multiple connected regions, and convert them into multiple labeled frames.
S52、将所有连通区域分别转换为标记框,标记框由连通区域外接矩形的左上角坐标、宽、高来定义。S52 , converting all connected regions into marked boxes respectively, and the marked boxes are defined by the coordinates, width and height of the upper left corner of the circumscribed rectangle of the connected regions.
标记框的定义是连通区域的外接矩形,具体由外接矩形的左上角坐标、宽、高描述,即[x,y,w,h]The definition of the mark box is the circumscribed rectangle of the connected area, which is described by the coordinates, width and height of the upper left corner of the circumscribed rectangle, namely [x, y, w, h]
S53、使用预设自适应条件对标记框进行过滤,去除符合条件的标记框。S53. Filter the marked frame by using the preset adaptive condition, and remove the marked frame that meets the condition.
其中,预设自适应条件为:Among them, the preset adaptive conditions are:
1)表格区域分割图范围之外的标记框;1) The marker box outside the range of the table area segmentation map;
2)标记框本身的面积小于预设阈值的标记框。2) The area of the marker frame itself is smaller than the preset threshold.
本实施例中,所述阈值根据表格区域的宽度、高度与标记框个数来计算,阈值计算公式为:In this embodiment, the threshold is calculated according to the width, height and number of marked boxes of the table area, and the threshold calculation formula is:
threshold=W*H/N/20threshold=W*H/N/20
其中,threshold为阈值,W为表格区域宽度,H为表格区域高度,N为标记框个数。Among them, threshold is the threshold, W is the width of the table area, H is the height of the table area, and N is the number of marked boxes.
S54、对标记框进行横向匹配与纵向匹配,横向匹配与纵向匹配分别生成一系列匹配列表。对标记框进行横向匹配与纵向匹配,具体包括步骤:S54, perform horizontal matching and vertical matching on the marked frame, and generate a series of matching lists respectively for horizontal matching and vertical matching. The horizontal matching and vertical matching of the marker frame includes the following steps:
一对标记框y坐标中点间误差不超过自适应预设值的为横向匹配成功;The horizontal matching is successful if the error between the points in the y-coordinate of a pair of marked frames does not exceed the adaptive preset value;
一对标记框x坐标中点间误差不超过自适应阈值的为纵向匹配成功。The vertical matching is successful if the error between the points in the x-coordinate of a pair of marked boxes does not exceed the adaptive threshold.
进一步地,匹配时将一对标记框分为源标记框与目标标记框,自适应阈值根据目标标记框的坐标确定,若为横向匹配,自适应阈值为目标标记框高度的1/2,Further, a pair of marker frames are divided into a source marker frame and a target marker frame during matching, and the adaptive threshold is determined according to the coordinates of the target marker frame. If it is a horizontal match, the adaptive threshold is 1/2 of the height of the target marker frame,
若为纵向匹配,自适应阈值为目标标记框宽度的1/2。For vertical matching, the adaptive threshold is 1/2 of the width of the target marker box.
S55、对匹配列表进行去重,统计各标记框在横向、纵向匹配列表中出现的次数,得到各标记框的跨列、跨行个数。S55 , deduplicate the matching list, count the number of times each marked frame appears in the horizontal and vertical matching lists, and obtain the number of cross-column and cross-row of each marked frame.
在匹配过程中,为了保证匹配完全,每一个标记框均会作为源标记框匹配所有其他标记框,最终生成的一系列匹配列表中会有多个重复匹配列表,需进行去重,仅保留其中的一个,即可得到能够正确描述表格结构信息的匹配列表。In the matching process, in order to ensure complete matching, each marker box will be used as a source marker box to match all other marker boxes, and there will be multiple duplicate matching lists in the final generated series of matching lists. One, you can get a matching list that can correctly describe the table structure information.
其中,存在跨行跨列情况的标记框会在不同的行、列匹配列表中同时出现,统计出现的次数,即为该标记框的跨行、跨列个数。Among them, the marked boxes with cross-row and cross-column situations will appear at the same time in different row and column matching lists, and the number of occurrences is counted, that is, the number of cross-row and cross-column of the marked box.
S56、在各行匹配列表间对匹配列表进行排序,在行匹配列表内对标记框进行排序,得到表格结构信息。S56 , sorting the matching lists among the matching lists of each row, sorting the mark boxes in the matching lists of the rows, and obtaining the table structure information.
在各行匹配列表间,对匹配列表进行排序,根据其内含标记框垂直中点均值由小到大排列;在行匹配列表内,对标记框进行排序,标记框根据其左上角坐标水平数值由小到大排列;以行匹配列表间由上到下、行匹配列表内由左到右的顺序结合各标记框的跨行、跨列个数得到表格结构信息。In the matching list of each row, sort the matching list, according to the average value of the vertical midpoint of the marked box in it, from small to large; in the matching list of the row, sort the marked box, and the marked box is based on its upper left corner coordinate horizontal value from small to large Arrange from small to large; the table structure information is obtained by combining the number of cross-row and cross-column of each mark box in the order from top to bottom in the row matching list and from left to right in the row matching list.
实施例2:Example 2:
本实施例提供了一种表格结构识别的系统,该系统包括语义分割模块、分割图融合模块、控制点提取模块、分割图矫正模块、分割图融合模块,各个模块的具体功能如下:This embodiment provides a system for table structure recognition, the system includes a semantic segmentation module, a segmentation map fusion module, a control point extraction module, a segmentation map correction module, and a segmentation map fusion module, and the specific functions of each module are as follows:
语义分割模块,用于使用预训练的语义分割模型,对待识别图像的单元格区域、表格框线区域和表格区域进行语义分割,得到单元格区域分割图、表格框线区域分割图、表格区域分割图;Semantic segmentation module, used to use the pre-trained semantic segmentation model to semantically segment the cell area, table border area and table area of the image to be recognized, and obtain the cell area segmentation map, table border area segmentation map, and table area segmentation picture;
分割图融合模块,用于对单元格区域分割图和表格框线区域分割图进行融合,得到融合单元格区域分割图,得到融合单元格区域分割图;The segmentation map fusion module is used to fuse the cell area segmentation map and the table frame line area segmentation map to obtain the fused cell area segmentation map, and obtain the fused cell area segmentation map;
控制点提取模块,用于对表格区域分割图进行控制点提取,得到矫正变换;The control point extraction module is used to extract the control points of the table area segmentation map to obtain the correction transformation;
分割图矫正模块,用于使用矫正变换对融合单元格分割图进行矫正,得到对齐单元格区域分割图;The segmentation map correction module is used to correct the fusion cell segmentation map by using the correction transformation to obtain the aligned cell region segmentation map;
提取分析模块,用于将对齐单元格区域分割图进行连通域提取分析,根据匹配条件得到表格结构信息。The extraction and analysis module is used to extract and analyze the connected domain of the alignment cell area segmentation map, and obtain the table structure information according to the matching conditions.
实施例3:Example 3:
本实施例提供了一种计算机设备,该计算机设备可以是服务器、计算机等,其包括通过系统总线连接的处理器、存储器、输入装置、显示器和网络接口,该处理器用于提供计算和控制能力,该存储器包括非易失性存储介质和内存储器,该非易失性存储介质存储有操作系统、计算机程序和数据库,该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境,处理器执行存储器存储的计算机程序时,实现上述实施例1的一种表格结构识别的方法,如下:This embodiment provides a computer device, which can be a server, a computer, etc., which includes a processor, a memory, an input device, a display, and a network interface connected through a system bus, the processor is used to provide computing and control capabilities, The memory includes a non-volatile storage medium and an internal memory, the non-volatile storage medium stores an operating system, a computer program and a database, and the internal memory provides for the operation of the operating system and the computer program in the non-volatile storage medium In the environment, when the processor executes the computer program stored in the memory, a method for identifying a table structure in the above-mentioned
使用预训练的语义分割模型,对待识别图像的单元格区域、表格框线区域和表格区域进行语义分割,得到单元格区域分割图、表格框线区域分割图、表格区域分割图;Using the pre-trained semantic segmentation model, semantically segment the cell area, table border area and table area of the image to be recognized, and obtain the cell area segmentation map, the table border area segmentation map, and the table area segmentation map;
对单元格区域分割图和表格框线区域分割图进行融合,得到融合单元格区域分割图;The cell area segmentation map and the table frame line area segmentation map are fused to obtain the fused cell area segmentation map;
对表格区域分割图进行控制点提取,得到矫正变换;Extracting control points from the table area segmentation map to obtain rectification transformation;
使用矫正变换对融合单元格分割图进行矫正,得到对齐单元格区域分割图;Use the rectification transformation to rectify the fused cell segmentation map to obtain the aligned cell area segmentation map;
将对齐单元格区域分割图进行连通域提取分析,根据匹配条件得到表格结构信息;Extract and analyze the connected domain of the alignment cell area segmentation map, and obtain the table structure information according to the matching conditions;
实施例4:Example 4:
本实施例提供了一种存储介质,该存储介质为计算机可读存储介质,其存储有计算机程序,所述程序被处理器执行时,处理器执行存储器存储的计算机程序时,实现上述实施例1的种表格结构识别的方法,如下:This embodiment provides a storage medium, which is a computer-readable storage medium, and stores a computer program. When the program is executed by a processor, when the processor executes the computer program stored in the memory, the above-mentioned
使用预训练的语义分割模型,对待识别图像的单元格区域、表格框线区域和表格区域进行语义分割,得到单元格区域分割图、表格框线区域分割图、表格区域分割图;Using the pre-trained semantic segmentation model, semantically segment the cell area, table border area and table area of the image to be recognized, and obtain the cell area segmentation map, the table border area segmentation map, and the table area segmentation map;
对单元格区域分割图和表格框线区域分割图进行融合,得到融合单元格区域分割图,得到融合单元格区域分割图;The cell area segmentation map and the table frame line area segmentation map are fused to obtain the fused cell area segmentation map, and the fused cell area segmentation map is obtained;
对表格区域分割图进行控制点提取,得到矫正变换;Extracting control points from the table area segmentation map to obtain rectification transformation;
使用矫正变换对融合单元格分割图进行矫正,得到对齐单元格区域分割图;Use the rectification transformation to rectify the fused cell segmentation map to obtain the aligned cell area segmentation map;
将对齐单元格区域分割图进行连通域提取分析,根据匹配条件得到表格结构信息;Extract and analyze the connected domain of the alignment cell area segmentation map, and obtain the table structure information according to the matching conditions;
本实施例中所述的存储介质可以是磁盘、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、U盘、移动硬盘等介质。The storage medium described in this embodiment may be a magnetic disk, an optical disk, a computer memory, a read-only memory (ROM, Read-Only Memory), a random access memory (RAM, Random Access Memory), a U disk, a removable hard disk, and other media.
综上所述,本发明通过使用预训练的语义分割模型,同时对待识别图像的单元格区域、表格框线区域和表格区域进行语义分割,对单元格区域、表格框线区域的分割图进行融合,得到融合单元格区域分割图,通过单元格区域、表格框线区域的分割图取长补短,提高表格结构识别准确率;对表格区域分割图进行控制点提取,得到矫正变换,恢复结构先验,有利于进行下一步处理,提高表格结构识别准确率;使用矫正变换对融合单元格分割图进行矫正,得到对齐单元格区域分割图,将对齐单元格区域分割图进行连通域提取分析,根据匹配条件得到表格结构信息。本发明可解决表格结构识别场景中表格结构形变带来的影响,提高表格结构识别准确率。To sum up, the present invention uses a pre-trained semantic segmentation model to simultaneously perform semantic segmentation of the cell area, table border area and table area of the image to be recognized, and fuse the segmentation maps of the cell area and table border area. , obtain the segmentation map of the fusion cell area, and improve the accuracy of table structure recognition through the segmentation map of the cell area and the table frame area; It is conducive to the next step of processing and improves the accuracy of table structure identification; use the correction transformation to correct the fused cell segmentation map to obtain the aligned cell area segmentation map, extract the connected domain from the aligned cell area segmentation map, and obtain according to the matching conditions. Table structure information. The invention can solve the influence brought by the deformation of the table structure in the table structure recognition scene, and improve the accuracy rate of the table structure recognition.
上述实施例为本发明较佳的实施方式,但本发明的实施方式并不受上述实施例的限制,其他的任何未背离本发明的精神实质与原理下所作的改变、修饰、替代、组合、简化,均应为等效的置换方式,都包含在本发明的保护范围之内。The above-mentioned embodiments are preferred embodiments of the present invention, but the embodiments of the present invention are not limited by the above-mentioned embodiments, and any other changes, modifications, substitutions, combinations, The simplification should be equivalent replacement manners, which are all included in the protection scope of the present invention.
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202111563490.2ACN114419643B (en) | 2021-12-20 | 2021-12-20 | A method, system, device and storage medium for table structure recognition |
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202111563490.2ACN114419643B (en) | 2021-12-20 | 2021-12-20 | A method, system, device and storage medium for table structure recognition |
| Publication Number | Publication Date |
|---|---|
| CN114419643Atrue CN114419643A (en) | 2022-04-29 |
| CN114419643B CN114419643B (en) | 2024-12-03 |
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202111563490.2AActiveCN114419643B (en) | 2021-12-20 | 2021-12-20 | A method, system, device and storage medium for table structure recognition |
| Country | Link |
|---|---|
| CN (1) | CN114419643B (en) |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN117540715A (en)* | 2023-12-14 | 2024-02-09 | 创云融达信息技术(天津)股份有限公司 | Table identification method and system based on deep learning and computer vision |
| CN120544214A (en)* | 2025-07-25 | 2025-08-26 | 上海致宇信息技术有限公司 | A method and system for segmenting deformable table cell instances |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112528863A (en)* | 2020-12-14 | 2021-03-19 | 中国平安人寿保险股份有限公司 | Identification method and device of table structure, electronic equipment and storage medium |
| WO2021147221A1 (en)* | 2020-01-22 | 2021-07-29 | 平安科技(深圳)有限公司 | Text recognition method and apparatus, and electronic device and storage medium |
| CN113221743A (en)* | 2021-05-12 | 2021-08-06 | 北京百度网讯科技有限公司 | Table analysis method and device, electronic equipment and storage medium |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2021147221A1 (en)* | 2020-01-22 | 2021-07-29 | 平安科技(深圳)有限公司 | Text recognition method and apparatus, and electronic device and storage medium |
| CN112528863A (en)* | 2020-12-14 | 2021-03-19 | 中国平安人寿保险股份有限公司 | Identification method and device of table structure, electronic equipment and storage medium |
| CN113221743A (en)* | 2021-05-12 | 2021-08-06 | 北京百度网讯科技有限公司 | Table analysis method and device, electronic equipment and storage medium |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN117540715A (en)* | 2023-12-14 | 2024-02-09 | 创云融达信息技术(天津)股份有限公司 | Table identification method and system based on deep learning and computer vision |
| CN120544214A (en)* | 2025-07-25 | 2025-08-26 | 上海致宇信息技术有限公司 | A method and system for segmenting deformable table cell instances |
| Publication number | Publication date |
|---|---|
| CN114419643B (en) | 2024-12-03 |
| Publication | Publication Date | Title |
|---|---|---|
| CN112686812B (en) | Bank card tilt correction detection method, device, readable storage medium and terminal | |
| CN113673338B (en) | Method, system and medium for weakly supervised automatic annotation of character pixels in natural scene text images | |
| CN115331245B (en) | Table structure identification method based on image instance segmentation | |
| CN110443235B (en) | A method and system for identifying the total score of an intelligent paper test paper | |
| CN112712273B (en) | Handwriting Chinese character aesthetic degree judging method based on skeleton similarity | |
| CN112528845B (en) | A deep learning-based physical circuit diagram recognition method and its application | |
| CN107239777B (en) | A method of tableware detection and recognition based on multi-view graph model | |
| CN111626292B (en) | Text recognition method of building indication mark based on deep learning technology | |
| CN115063802A (en) | PSENet-based circular seal identification method, device and medium | |
| CN112257665A (en) | Image content recognition method, image recognition model training method and medium | |
| CN114581928B (en) | A table recognition method and system | |
| Hossain et al. | Recognition and solution for handwritten equation using convolutional neural network | |
| CN112446259A (en) | Image processing method, device, terminal and computer readable storage medium | |
| CN114821620B (en) | Text content extraction and recognition method based on vertical merging of line text boxes | |
| CN114419643B (en) | A method, system, device and storage medium for table structure recognition | |
| CN112749606A (en) | Text positioning method and device | |
| CN110399882A (en) | A text detection method based on deformable convolutional neural network | |
| CN115909378A (en) | Training method of receipt text detection model and receipt text detection method | |
| CN110659637A (en) | Electric energy meter number and label automatic identification method combining deep neural network and SIFT features | |
| CN115273115A (en) | Document element labeling method and device, electronic equipment and storage medium | |
| CN117636379A (en) | A table recognition method based on deep learning | |
| CN118968537A (en) | Bill scene recognition method, device, equipment and storage medium | |
| CN118135584A (en) | Automatic handwriting form recognition method and system based on deep learning | |
| CN112200789B (en) | Image recognition method and device, electronic equipment and storage medium | |
| CN118865426A (en) | A method for extracting key information from airport luggage tags |
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |