CN116311298B

Movatterモバイル変換

Info

Publication number: CN116311298B
Application number: CN202310023539.8A
Authority: CN
Inventors: 于海鹏; 李煜林; 钦夏孟; 姚锟
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2023-01-06
Filing date: 2023-01-06
Publication date: 2025-08-26
Anticipated expiration: 2043-01-06
Also published as: CN116311298A

Abstract

Translated fromChinese

本公开提供了一种信息生成方法、信息处理方法、装置、电子设备以及介质，涉及人工智能技术领域，尤其涉及深度学习技术、图像处理技术和计算机视觉技术领域，可应用于OCR光学字符识别等场景。具体实现方案为：对文本图像进行文本检测，得到检测信息，检测信息包括多个文本区域各自的类别信息和位置信息；根据位置信息和文本图像，获取与多个文本区域各自对应的文本区域图像；对文本区域图像进行文本识别，得到识别信息，识别信息包括多个文本区域图像各自的文本识别信息；根据识别信息，确定语义关系信息，语义关系信息包括多个文本识别信息之间的语义关系；根据类别信息、语义关系信息和识别信息，生成文本图像的结构化信息。

The present disclosure provides an information generation method, an information processing method, an apparatus, an electronic device, and a medium, which relate to the fields of artificial intelligence technology, in particular to the fields of deep learning technology, image processing technology, and computer vision technology, and can be applied to scenarios such as OCR (Optical Character Recognition). The specific implementation scheme is as follows: performing text detection on a text image to obtain detection information, the detection information including category information and position information of each of a plurality of text regions; obtaining text region images corresponding to each of the plurality of text regions based on the position information and the text image; performing text recognition on the text region images to obtain recognition information, the recognition information including text recognition information of each of the plurality of text region images; determining semantic relationship information based on the recognition information, the semantic relationship information including the semantic relationship between a plurality of text recognition information; generating structured information of the text image based on the category information, the semantic relationship information, and the recognition information.