Movatterモバイル変換


[0]ホーム

URL:


CN110188750A - A text recognition method for natural scene pictures based on deep learning - Google Patents

A text recognition method for natural scene pictures based on deep learning
Download PDF

Info

Publication number
CN110188750A
CN110188750ACN201910406709.4ACN201910406709ACN110188750ACN 110188750 ACN110188750 ACN 110188750ACN 201910406709 ACN201910406709 ACN 201910406709ACN 110188750 ACN110188750 ACN 110188750A
Authority
CN
China
Prior art keywords
image
deep learning
natural scene
recognition method
processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910406709.4A
Other languages
Chinese (zh)
Inventor
赵春阳
章奇
陈晓飞
欧杨磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Electronic Science and Technology University
Original Assignee
Hangzhou Electronic Science and Technology University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Electronic Science and Technology UniversityfiledCriticalHangzhou Electronic Science and Technology University
Priority to CN201910406709.4ApriorityCriticalpatent/CN110188750A/en
Publication of CN110188750ApublicationCriticalpatent/CN110188750A/en
Pendinglegal-statusCriticalCurrent

Links

Classifications

Landscapes

Abstract

Translated fromChinese

本发明提供一种基于深度学习的自然场景图片文字识别方法。所述基于深度学习的自然场景图片文字识别方法包括以下步骤:S1:建立文字图像数据库,并对其进行分类;S2:通过图像采集器进行图像采集,并对采集的图像进行预处理;S3:利用深度神经网络对预处理后的图像进行特征提取,并对提取的特征通过统计分类器进行分类;S4:利用人工设计特征,使用模板匹配的方法进行辅助训练。本发明提供的基于深度学习的自然场景图片文字识别方法,通过对比度处理、防噪处理和放大分隔处理,能够图像的对比度提升效果较好,使得图像具有较高的质量,同时能够有效的提高图像的识别率,降低外界环境因素对其造成的影响,并且有效的提高其特征提取率。

The present invention provides a method for recognizing characters in natural scene pictures based on deep learning. The method for character recognition of natural scene pictures based on deep learning includes the following steps: S1: establish a character image database, and classify it; S2: collect images through an image collector, and preprocess the collected images; S3: Use the deep neural network to extract features from the preprocessed image, and classify the extracted features through a statistical classifier; S4: use artificially designed features, and use template matching methods for auxiliary training. The text recognition method of natural scene pictures based on deep learning provided by the present invention, through contrast processing, anti-noise processing and amplification and separation processing, can improve the contrast of the image better, so that the image has a higher quality, and can effectively improve the image quality. Recognition rate, reduce the impact of external environmental factors on it, and effectively improve its feature extraction rate.

Description

Translated fromChinese
一种基于深度学习的自然场景图片文字识别方法A text recognition method for natural scene pictures based on deep learning

技术领域technical field

本发明涉及文字识别领域,尤其涉及一种基于深度学习的自然场景图片文字识别方法。The present invention relates to the field of character recognition, in particular to a deep learning-based character recognition method for natural scene pictures.

背景技术Background technique

文字识别技术,是模式识别应用的一个重要领域,50年代开始探讨一般文字识别方法,并研制出光学字符识别器,60年代出现了采用磁性墨水和特殊字体的实用机器,60年代后期,出现了多种字体和手写体文字识别机,但识别精度和机器性能都很不理想,70年代,主要研究文字识别的基本理论和研制高性能的文字识别机,并着重于汉字识别的研究,现如今文字识别技术已经提升了很多。Text recognition technology is an important field of pattern recognition applications. In the 1950s, general text recognition methods were discussed, and optical character recognizers were developed. In the 1960s, practical machines using magnetic ink and special fonts appeared. In the late 1960s, there appeared A variety of fonts and handwritten character recognition machines, but the recognition accuracy and machine performance are not ideal. In the 1970s, it mainly studied the basic theory of character recognition and developed high-performance character recognition machines, and focused on the research of Chinese character recognition. Today's characters Recognition technology has improved a lot.

目前的基于深度学习的自然场景图片文字识别方法,在使用时,无法保证其识别和特征提取的准确率,不利于广泛推广使用。The current deep learning-based text recognition method for natural scene pictures cannot guarantee the accuracy of its recognition and feature extraction when used, which is not conducive to widespread promotion and use.

因此,有必要提供一种基于深度学习的自然场景图片文字识别方法解决上述技术问题。Therefore, it is necessary to provide a deep learning-based text recognition method for natural scene pictures to solve the above technical problems.

发明内容Contents of the invention

本发明提供一种基于深度学习的自然场景图片文字识别方法,解决了目前的基于深度学习的自然场景图片文字识别方法,在使用时,无法保证其识别和特征提取的准确率的问题。The present invention provides a deep learning-based natural scene picture text recognition method, which solves the problem that the current deep learning-based natural scene picture text recognition method cannot guarantee the accuracy of its recognition and feature extraction.

为解决上述技术问题,本发明提供的基于深度学习的自然场景图片文字识别方法包括以下步骤:In order to solve the above-mentioned technical problems, the natural scene picture text recognition method based on deep learning provided by the present invention comprises the following steps:

S1:建立文字图像数据库,并对其进行分类;S1: Establish a text image database and classify it;

S2:通过图像采集器进行图像采集,并对采集的图像进行预处理;S2: collecting images through the image collector, and preprocessing the collected images;

S3:利用深度神经网络对预处理后的图像进行特征提取,并对提取的特征通过统计分类器进行分类;S3: Use the deep neural network to perform feature extraction on the preprocessed image, and classify the extracted features through a statistical classifier;

S4:利用人工设计特征,使用模板匹配的方法进行辅助训练,并进行匹配。S4: Use artificial design features, use template matching method for auxiliary training, and perform matching.

优选的,所述文字图像数据库以楷书单字、甲骨文单字等为索引条目,对指定来源中的单字的形体进行扫描或拍照,获取标准形体,并进行分类标签。Preferably, the character image database uses regular script characters, oracle bone characters, etc. as index entries, scans or takes pictures of the shapes of the characters in the specified source, obtains standard shapes, and performs classification and labeling.

优选的,所述S2中图像的预处理包括对比度处理、去噪处理、分隔剥离处理和扩充数据库。Preferably, the preprocessing of the image in S2 includes contrast processing, denoising processing, separation and stripping processing, and database expansion.

优选的,所述对比度处理:对采集的图像进行处理,获取采集的图像的全图关注程度权重映射图,全图关注程度权重映射图包括分别与采集的图像的多个像素对应的多个映射点,每一映射点均具有对应的权重值,而后将采集的图像划分为多个原始区块,利用全图关注程度权重映射图及预设的灰阶转换公式分别对多个原始区块进行直方图均衡化处理形成多个转换区块,具体为利用全图关注程度权重映射图及预设的灰阶转换公式分别计算每一原始区块中的各个像素的原始灰阶对应的转换灰阶并利用转换灰阶对原始灰阶进行替换,从而形成多个转换区块,最终对多个转换区块进行拼接处理形成处理后的图像。Preferably, the contrast processing: process the collected image, and obtain the full-image attention degree weight map of the collected image, and the full-image attention degree weight map includes a plurality of maps respectively corresponding to a plurality of pixels of the collected image Points, each mapping point has a corresponding weight value, and then the collected image is divided into multiple original blocks, and the multiple original blocks are respectively processed by using the weight map of the whole image attention degree and the preset gray scale conversion formula The histogram equalization process forms multiple conversion blocks, specifically using the weight map of the full image attention degree and the preset gray scale conversion formula to calculate the converted gray scale corresponding to the original gray scale of each pixel in each original block And the converted gray scale is used to replace the original gray scale to form multiple converted blocks, and finally the multiple converted blocks are spliced to form a processed image.

优选的,所述防躁处理:选取一定数量的目标测试样本;进行人工标注,并将标注好的样本集划分为开发样本和第一训练样本,通过分析采集的图像可能出现的噪声模型和扭曲特征设计随机样本生成器,在已选择字体的标准字的基础上,自动生成可供神经网络训练使用的大量第二训练样本,自动生成的第二训练样本集中包含各种复杂的噪声和扭曲变形,可以满足各种复杂文字识别的需要,将所述第一训练样本集和第二训练样本集混合后输入所述深度神经网络中。Preferably, the anti-irritation treatment: select a certain number of target test samples; perform manual labeling, and divide the marked sample set into development samples and first training samples, and analyze the noise model and distortion that may appear in the collected images Feature design random sample generator, based on the standard characters of the selected font, automatically generates a large number of second training samples that can be used for neural network training. The automatically generated second training sample set contains various complex noises and distortions , which can meet the needs of various complex character recognition, and input the first training sample set and the second training sample set into the deep neural network after being mixed.

优选的,所述放大分隔处理:对采集的图像进行分隔,分隔为九个区域,对九个区域的图像依次放大5、10、15、20、25、30、40和50倍,获得放大后的图像,对放大后的图像进行组合,获得图像组。Preferably, the enlargement and separation process: divide the collected image into nine regions, and enlarge the images of the nine regions by 5, 10, 15, 20, 25, 30, 40 and 50 times in sequence to obtain the enlarged images, and combine the enlarged images to obtain an image group.

优选的,所述对预处理的图像进行特征提取具体为通过深度神经网络对预处理的图像进行特征提取,包括:使用Inception_V3结构单元实现图像的并行压缩;使用多层池化单元实现图像的并行压缩,并行整合特征,最大限度提取出具有平移不变性的特征;使用多层过滤器替代大尺寸过滤器;使用批量归一化,对数据内部进行标准化处理,使输出规范化到0到1之间的正态分布。Preferably, the feature extraction of the preprocessed image is specifically extracting the feature of the preprocessed image through a deep neural network, including: using the Inception_V3 structural unit to achieve parallel compression of the image; using a multi-layer pooling unit to achieve parallel image compression Compress and integrate features in parallel to maximize the extraction of features with translation invariance; use multi-layer filters instead of large-size filters; use batch normalization to standardize the data internally, so that the output can be normalized to between 0 and 1 normal distribution of .

优选的,所述对提取的特征通过统计分类器分类包括:将提取的特征通过分类器分类,实现不同时间的形体的演变识别,采取softmax函数作为统计分类器进行计算,所输出的模型预测概率为Preferably, classifying the extracted features through a statistical classifier includes: classifying the extracted features through a classifier to realize the evolution recognition of shapes at different times, taking the softmax function as a statistical classifier for calculation, and the output model prediction probability for

其中,表示当前实例属于第k类的概率,n表示总类别数,sk(x)表示当前实例x属于第k类的得分,exp(·)表示对括号内元素求指数,表示实例x关于从1到n的所有类别的得分的指数值的总和,k的范围为1到n,j的范围为1到n。in, Indicates the probability that the current instance belongs to the kth category, n indicates the total number of categories, sk(x) indicates the score of the current instance x belonging to the kth category, exp( ) indicates the index of the elements in the brackets, Represents the sum of the exponential values of the scores of instance x with respect to all categories from 1 to n, k ranges from 1 to n, and j ranges from 1 to n.

与相关技术相比较,本发明提供的基于深度学习的自然场景图片文字识别方法具有如下有益效果:Compared with related technologies, the deep learning-based natural scene picture text recognition method provided by the present invention has the following beneficial effects:

本发明提供一种基于深度学习的自然场景图片文字识别方法,通过对采集的图像进行对比度处理,获取采集的图像的全图关注程度权重映射图,全图关注程度权重映射图包括分别与采集的图像的多个像素对应的多个映射点,每一映射点均具有对应的权重值,而后将采集的图像划分为多个原始区块,利用全图关注程度权重映射图及预设的灰阶转换公式分别对多个原始区块进行直方图均衡化处理形成多个转换区块,具体为利用全图关注程度权重映射图及预设的灰阶转换公式分别计算每一原始区块中的各个像素的原始灰阶对应的转换灰阶并利用转换灰阶对原始灰阶进行替换,从而形成多个转换区块,最终对多个转换区块进行拼接处理形成处理后的图像;对采集的图像进行防噪处理,选取一定数量的目标测试样本;进行人工标注,并将标注好的样本集划分为开发样本和第一训练样本,通过分析采集的图像可能出现的噪声模型和扭曲特征设计随机样本生成器,在已选择字体的标准字的基础上,自动生成可供神经网络训练使用的大量第二训练样本,自动生成的第二训练样本集中包含各种复杂的噪声和扭曲变形,可以满足各种复杂文字识别的需要,将所述第一训练样本集和第二训练样本集混合后输入所述深度神经网络中;对采集的图像进行分隔,分隔为九个区域,对九个区域的图像依次放大5、10、15、20、25、30、40和50倍,获得放大后的图像,对放大后的图像进行组合,获得图像组,通过对比度处理、防噪处理和放大分隔处理,能够图像的对比度提升效果较好,使得图像具有较高的质量,同时能够有效的提高图像的识别率,降低外界环境因素对其造成的影响,并且有效的提高其特征提取率。The present invention provides a method for recognizing characters in natural scene pictures based on deep learning. By performing contrast processing on the collected images, the weight map of the full-image attention degree of the collected image is obtained. Multiple mapping points corresponding to multiple pixels of the image, each mapping point has a corresponding weight value, and then the collected image is divided into multiple original blocks, using the weight map of the full image attention level and the preset gray scale The conversion formula performs histogram equalization processing on multiple original blocks to form multiple conversion blocks. Specifically, each original block in each original block is calculated by using the full image attention weight map and the preset gray scale conversion formula. The converted gray scale corresponding to the original gray scale of the pixel is replaced by the converted gray scale to form multiple converted blocks, and finally the multiple converted blocks are spliced to form a processed image; the collected image Perform anti-noise processing, select a certain number of target test samples; perform manual labeling, and divide the marked sample set into development samples and first training samples, and design random samples by analyzing the noise model and distortion characteristics that may appear in the collected images The generator, based on the standard characters of the selected font, automatically generates a large number of second training samples that can be used for neural network training. The automatically generated second training sample set contains various complex noises and distortions, which can meet various requirements. In order to meet the needs of a complex character recognition, the first training sample set and the second training sample set are mixed and then input into the deep neural network; the collected images are divided into nine regions, and the images of the nine regions Sequentially enlarge 5, 10, 15, 20, 25, 30, 40 and 50 times to obtain the enlarged image, combine the enlarged images to obtain the image group, through contrast processing, anti-noise processing and amplification and separation processing, it can The contrast enhancement effect of the image is better, so that the image has a higher quality, and at the same time, it can effectively improve the recognition rate of the image, reduce the influence of external environmental factors on it, and effectively improve the feature extraction rate.

附图说明Description of drawings

图1为本发明提供的基于深度学习的自然场景图片文字识别方法的一种较佳实施例的原理框图;Fig. 1 is the principle block diagram of a kind of preferred embodiment of the natural scene picture character recognition method based on deep learning provided by the present invention;

图2为本发明提供的基于深度学习的自然场景图片文字识别方法的图像预处理的原理框图。Fig. 2 is a schematic block diagram of the image preprocessing of the deep learning-based natural scene picture text recognition method provided by the present invention.

具体实施方式Detailed ways

下面结合附图和实施方式对本发明作进一步说明。The present invention will be further described below in conjunction with the accompanying drawings and embodiments.

请结合参阅图1和图2,其中,图1为本发明提供的基于深度学习的自然场景图片文字识别方法的一种较佳实施例的原理框图;图2为本发明提供的基于深度学习的自然场景图片文字识别方法的图像预处理的原理框图。基于深度学习的自然场景图片文字识别方法包括以下步骤:Please refer to Fig. 1 and Fig. 2 in conjunction, wherein, Fig. 1 is the functional block diagram of a kind of preferred embodiment of the natural scene picture character recognition method based on deep learning provided by the present invention; Fig. 2 is the deep learning-based method provided by the present invention The principle block diagram of the image preprocessing of the natural scene picture text recognition method. The text recognition method of natural scene pictures based on deep learning comprises the following steps:

S1:建立文字图像数据库,并对其进行分类;S1: Establish a text image database and classify it;

S2:通过图像采集器进行图像采集,并对采集的图像进行预处理;S2: collecting images through the image collector, and preprocessing the collected images;

S3:利用深度神经网络对预处理后的图像进行特征提取,并对提取的特征通过统计分类器进行分类;S3: Use the deep neural network to perform feature extraction on the preprocessed image, and classify the extracted features through a statistical classifier;

S4:利用人工设计特征,使用模板匹配的方法进行辅助训练,并进行匹配,模板匹配的方法主要包括余弦相似度和欧几里得距离等。S4: Use artificial design features, use template matching method for auxiliary training, and perform matching. Template matching methods mainly include cosine similarity and Euclidean distance.

所述文字图像数据库以楷书单字、甲骨文单字等为索引条目,对指定来源中的单字的形体进行扫描或拍照,获取标准形体,并进行分类标签。The character image database takes regular script characters, oracle bone characters, etc. as index entries, scans or takes pictures of the shapes of the characters in the specified source, obtains standard shapes, and classifies and labels them.

所述S2中图像的预处理包括对比度处理、去噪处理、分隔剥离处理和扩充数据库。The preprocessing of the image in S2 includes contrast processing, denoising processing, separation and stripping processing and expanding the database.

所述对比度处理:对采集的图像进行处理,获取采集的图像的全图关注程度权重映射图,全图关注程度权重映射图包括分别与采集的图像的多个像素对应的多个映射点,每一映射点均具有对应的权重值,而后将采集的图像划分为多个原始区块,利用全图关注程度权重映射图及预设的灰阶转换公式分别对多个原始区块进行直方图均衡化处理形成多个转换区块,具体为利用全图关注程度权重映射图及预设的灰阶转换公式分别计算每一原始区块中的各个像素的原始灰阶对应的转换灰阶并利用转换灰阶对原始灰阶进行替换,从而形成多个转换区块,最终对多个转换区块进行拼接处理形成处理后的图像,能够有效避免大范围相似像素占用较大的灰度值范围导致的对比度增强效果较差的问题,同时能够依据人眼视觉关注程度动态的提升各个原始区块的对比度,从而最终获得的处理后的图像的对比度提升效果较好,使得图像具有较高的质量。The contrast processing: process the collected image, and obtain the full image attention degree weight map of the collected image, the full image attention degree weight map includes a plurality of mapping points respectively corresponding to a plurality of pixels of the collected image, each Each mapping point has a corresponding weight value, and then the collected image is divided into multiple original blocks, and the histogram equalization is performed on multiple original blocks by using the weight map of the whole image attention degree and the preset gray scale conversion formula Transformation processing forms a plurality of conversion blocks, specifically, using the full image attention degree weight map and the preset gray scale conversion formula to calculate the converted gray scale corresponding to the original gray scale of each pixel in each original block, and use the conversion The gray scale replaces the original gray scale to form multiple conversion blocks, and finally stitches the multiple conversion blocks to form the processed image, which can effectively avoid the large range of similar pixels occupying a large range of gray values. The contrast enhancement effect is poor, and at the same time, the contrast of each original block can be dynamically improved according to the degree of human visual attention, so that the contrast enhancement effect of the finally processed image is better, so that the image has higher quality.

所述防躁处理:选取一定数量的目标测试样本(比方1000张图片);进行人工标注,并将标注好的样本集划分为开发样本和第一训练样本(比如说将标注样本集中30%的样本作为为开发样本,70%的样本作为第一训练样本),通过分析采集的图像可能出现的噪声模型和扭曲特征设计随机样本生成器,在已选择字体的标准字的基础上,自动生成可供神经网络训练使用的大量第二训练样本,自动生成的第二训练样本集中包含各种复杂的噪声和扭曲变形,可以满足各种复杂文字识别的需要,将所述第一训练样本集和第二训练样本集混合后输入所述深度神经网络中,通过深度神经网络的学习来识别各种噪声和扭曲特征;解决了通过深度神经网络来识别文字时需要大量人工标注的问题;并且本基于深度学习的复杂文字识别方法在保留了原图片的噪声、扭曲等复杂性的前提下,使用最先进的深度神经网络进行分类自动化的深度学习。Described anti-irritation treatment: select a certain number of target test samples (such as 1000 pictures); carry out manual labeling, and divide the marked sample set into development samples and first training samples (for example, 30% of the marked sample set samples as development samples, and 70% of the samples as the first training samples), design a random sample generator by analyzing the noise model and distortion features that may appear in the collected images, and automatically generate available samples based on the standard characters of the selected font. A large number of second training samples for neural network training, the automatically generated second training sample set contains various complex noises and distortions, which can meet the needs of various complex character recognition, the first training sample set and the second training sample set The two training sample sets are mixed and input into the deep neural network, and various noises and distorted features are identified through the learning of the deep neural network; it solves the problem of requiring a large amount of manual labeling when recognizing text through the deep neural network; and this book is based on depth The learned complex text recognition method uses the most advanced deep neural network for classification and automatic deep learning on the premise of retaining the complexity of the original image such as noise and distortion.

所述放大分隔处理:对采集的图像进行分隔,分隔为九个区域,对九个区域的图像依次放大5、10、15、20、25、30、40和50倍,获得放大后的图像,对放大后的图像进行组合,获得图像组,将图像组输入至深度神经网络进行特征提取,能够有效提高特征提取的准确率。The enlargement and separation process: divide the collected image into nine regions, and enlarge the images of the nine regions by 5, 10, 15, 20, 25, 30, 40 and 50 times in sequence to obtain the enlarged image, Combine the enlarged images to obtain an image group, and input the image group to the deep neural network for feature extraction, which can effectively improve the accuracy of feature extraction.

所述对预处理的图像进行特征提取具体为通过深度神经网络对预处理的图像进行特征提取,包括:使用Inception_V3结构单元实现图像的并行压缩;使用多层池化单元实现图像的并行压缩,并行整合特征,最大限度提取出具有平移不变性的特征;使用多层过滤器替代大尺寸过滤器;使用批量归一化,对数据内部进行标准化处理,使输出规范化到0到1之间的正态分布,从而保证网络可以以较高的学习速率进行,防止发生梯度爆炸或者弥散现象。The feature extraction of the preprocessed image is specifically to extract the feature of the preprocessed image through a deep neural network, including: using the Inception_V3 structural unit to realize the parallel compression of the image; using a multi-layer pooling unit to realize the parallel compression of the image, parallel Integrate features to maximize the extraction of features with translation invariance; use multi-layer filters instead of large-size filters; use batch normalization to standardize the data internally, so that the output can be normalized to a normal value between 0 and 1 distribution, so as to ensure that the network can perform at a higher learning rate and prevent gradient explosion or dispersion phenomenon.

所述对提取的特征通过统计分类器分类包括:将提取的特征通过分类器分类,实现不同时间的形体的演变识别,采取softmax函数作为统计分类器进行计算,所输出的模型预测概率为The described classification of the extracted features by a statistical classifier includes: classifying the extracted features by a classifier to realize the evolution recognition of shapes at different times, and taking the softmax function as a statistical classifier for calculation, and the output model prediction probability is

其中,表示当前实例属于第k类的概率,n表示总类别数,sk(x)表示当前实例x属于第k类的得分,exp(·)表示对括号内元素求指数,表示实例x关于从1到n的所有类别的得分的指数值的总和,k的范围为1到n,j的范围为1到n,具体的,每一张输入用来预测的图像(图片)都是一个实例,当前输入系统的这张图像(图片)经过前面网络的特征提取到达最后一层,即softmax分类层,然后计算它的属于每个类的概率,所述总类别数在制作分类标签之后获知。in, Indicates the probability that the current instance belongs to the kth category, n indicates the total number of categories, sk(x) indicates the score of the current instance x belonging to the kth category, exp( ) indicates the index of the elements in the brackets, Represents the sum of the index values of the instance x with respect to all categories from 1 to n, the range of k is 1 to n, and the range of j is 1 to n. Specifically, each input image (picture) used for prediction It is all an example. The image (picture) currently input to the system reaches the last layer through the feature extraction of the previous network, that is, the softmax classification layer, and then calculates its probability of belonging to each class. After the label is learned.

与相关技术相比较,本发明提供的基于深度学习的自然场景图片文字识别方法有如下有益效果:Compared with related technologies, the deep learning-based natural scene picture text recognition method provided by the present invention has the following beneficial effects:

通过对采集的图像进行对比度处理,获取采集的图像的全图关注程度权重映射图,全图关注程度权重映射图包括分别与采集的图像的多个像素对应的多个映射点,每一映射点均具有对应的权重值,而后将采集的图像划分为多个原始区块,利用全图关注程度权重映射图及预设的灰阶转换公式分别对多个原始区块进行直方图均衡化处理形成多个转换区块,具体为利用全图关注程度权重映射图及预设的灰阶转换公式分别计算每一原始区块中的各个像素的原始灰阶对应的转换灰阶并利用转换灰阶对原始灰阶进行替换,从而形成多个转换区块,最终对多个转换区块进行拼接处理形成处理后的图像;对采集的图像进行防噪处理,选取一定数量的目标测试样本;进行人工标注,并将标注好的样本集划分为开发样本和第一训练样本,通过分析采集的图像可能出现的噪声模型和扭曲特征设计随机样本生成器,在已选择字体的标准字的基础上,自动生成可供神经网络训练使用的大量第二训练样本,自动生成的第二训练样本集中包含各种复杂的噪声和扭曲变形,可以满足各种复杂文字识别的需要,将所述第一训练样本集和第二训练样本集混合后输入所述深度神经网络中;对采集的图像进行分隔,分隔为九个区域,对九个区域的图像依次放大5、10、15、20、25、30、40和50倍,获得放大后的图像,对放大后的图像进行组合,获得图像组,通过对比度处理、防噪处理和放大分隔处理,能够图像的对比度提升效果较好,使得图像具有较高的质量,同时能够有效的提高图像的识别率,降低外界环境因素对其造成的影响,并且有效的提高其特征提取率。By performing contrast processing on the collected image, the weight map of the full-image attention degree of the collected image is obtained, and the full-image attention degree weight map includes a plurality of mapping points respectively corresponding to multiple pixels of the collected image, and each mapping point Each has a corresponding weight value, and then the collected image is divided into multiple original blocks, and the histogram equalization process is performed on multiple original blocks by using the weight map of the whole image attention degree and the preset gray scale conversion formula to form A plurality of conversion blocks, specifically, using the weight map of the attention degree of the whole image and the preset gray scale conversion formula to calculate the converted gray scale corresponding to the original gray scale of each pixel in each original block, and use the converted gray scale to The original gray scale is replaced to form multiple conversion blocks, and finally the multiple conversion blocks are spliced to form a processed image; the collected image is subjected to noise prevention processing, and a certain number of target test samples are selected; manual labeling , and divide the marked sample set into development samples and the first training samples, design a random sample generator by analyzing the noise model and distortion features that may appear in the collected images, and automatically generate them on the basis of the standard characters of the selected font A large number of second training samples that can be used for neural network training. The automatically generated second training sample set contains various complex noises and distortions, which can meet the needs of various complex character recognition. The first training sample set and The second training sample set is mixed and input into the deep neural network; the collected image is separated into nine regions, and the images in the nine regions are enlarged by 5, 10, 15, 20, 25, 30, 40 and 50 times, to obtain the enlarged image, combine the enlarged images to obtain an image group, through contrast processing, anti-noise processing and amplification and separation processing, the contrast of the image can be improved effectively, so that the image has a higher quality, At the same time, it can effectively improve the recognition rate of the image, reduce the influence of external environmental factors on it, and effectively improve the feature extraction rate.

以上所述仅为本发明的实施例,并非因此限制本发明的专利范围,凡是利用本发明说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其它相关的技术领域,均同理包括在本发明的专利保护范围内。The above is only an embodiment of the present invention, and does not limit the patent scope of the present invention. Any equivalent structure or equivalent process transformation made by using the description of the present invention and the contents of the accompanying drawings, or directly or indirectly used in other related technologies fields, all of which are equally included in the scope of patent protection of the present invention.

Claims (8)

4. The natural scene picture character recognition method based on deep learning of claim 3, wherein the contrast processing: processing the acquired image to obtain a full-image attention degree weight mapping map of the acquired image, wherein the full-image attention degree weight mapping map comprises a plurality of mapping points respectively corresponding to a plurality of pixels of the acquired image, each mapping point has a corresponding weight value, dividing the collected image into a plurality of original blocks, performing histogram equalization processing on the plurality of original blocks by using a full-image attention degree weight mapping image and a preset gray scale conversion formula to form a plurality of conversion blocks, specifically calculating conversion gray scales corresponding to the original gray scales of each pixel in each original block by using the full-image attention degree weight mapping image and the preset gray scale conversion formula and replacing the original gray scales by using the conversion gray scales, thus, a plurality of conversion blocks are formed, and finally the plurality of conversion blocks are spliced to form a processed image.
5. The natural scene picture character recognition method based on deep learning of claim 3, wherein the anti-noise processing: selecting a certain number of target test samples; the method comprises the steps of carrying out manual labeling, dividing a labeled sample set into a development sample and a first training sample, designing a random sample generator by analyzing a noise model and distortion characteristics which may appear in an acquired image, automatically generating a large number of second training samples which can be used for neural network training on the basis of a standard character with a selected font, enabling the automatically generated second training sample set to contain various complex noises and distortion deformations and meeting the requirements of various complex character recognition, and mixing the first training sample set and the second training sample set and inputting the mixed first training sample set and second training sample set into the deep neural network.
CN201910406709.4A2019-05-162019-05-16 A text recognition method for natural scene pictures based on deep learningPendingCN110188750A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201910406709.4ACN110188750A (en)2019-05-162019-05-16 A text recognition method for natural scene pictures based on deep learning

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201910406709.4ACN110188750A (en)2019-05-162019-05-16 A text recognition method for natural scene pictures based on deep learning

Publications (1)

Publication NumberPublication Date
CN110188750Atrue CN110188750A (en)2019-08-30

Family

ID=67716483

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201910406709.4APendingCN110188750A (en)2019-05-162019-05-16 A text recognition method for natural scene pictures based on deep learning

Country Status (1)

CountryLink
CN (1)CN110188750A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN111242024A (en)*2020-01-112020-06-05北京中科辅龙科技股份有限公司Method and system for recognizing legends and characters in drawings based on machine learning
CN111539437A (en)*2020-04-272020-08-14西南大学 Detection and recognition method of oracle bone radicals based on deep learning
CN112508845A (en)*2020-10-152021-03-16福州大学Depth learning-based automatic osd menu language detection method and system
CN113420647A (en)*2021-06-222021-09-21南开大学Method for creating new style font by expanding and deforming Chinese character center of gravity outwards
CN117496531A (en)*2023-11-022024-02-02四川轻化工大学Construction method of convolution self-encoder capable of reducing Chinese character recognition resource overhead

Citations (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20150213313A1 (en)*2014-01-302015-07-30Abbyy Development LlcMethods and systems for efficient automated symbol recognition using multiple clusters of symbol patterns
CN104966097A (en)*2015-06-122015-10-07成都数联铭品科技有限公司Complex character recognition method based on deep learning
CN108664996A (en)*2018-04-192018-10-16厦门大学A kind of ancient writing recognition methods and system based on deep learning
CN109658364A (en)*2018-11-292019-04-19深圳市华星光电半导体显示技术有限公司Image processing method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20150213313A1 (en)*2014-01-302015-07-30Abbyy Development LlcMethods and systems for efficient automated symbol recognition using multiple clusters of symbol patterns
CN104966097A (en)*2015-06-122015-10-07成都数联铭品科技有限公司Complex character recognition method based on deep learning
CN108664996A (en)*2018-04-192018-10-16厦门大学A kind of ancient writing recognition methods and system based on deep learning
CN109658364A (en)*2018-11-292019-04-19深圳市华星光电半导体显示技术有限公司Image processing method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
邢林杰: "基于深度学习的文字图像分析方法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》*

Cited By (8)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN111242024A (en)*2020-01-112020-06-05北京中科辅龙科技股份有限公司Method and system for recognizing legends and characters in drawings based on machine learning
CN111539437A (en)*2020-04-272020-08-14西南大学 Detection and recognition method of oracle bone radicals based on deep learning
CN111539437B (en)*2020-04-272022-06-28西南大学Detection and identification method of oracle-bone inscription components based on deep learning
CN112508845A (en)*2020-10-152021-03-16福州大学Depth learning-based automatic osd menu language detection method and system
CN113420647A (en)*2021-06-222021-09-21南开大学Method for creating new style font by expanding and deforming Chinese character center of gravity outwards
CN113420647B (en)*2021-06-222022-05-20南开大学Method for creating new style font by expanding and deforming Chinese character center of gravity outwards
CN117496531A (en)*2023-11-022024-02-02四川轻化工大学Construction method of convolution self-encoder capable of reducing Chinese character recognition resource overhead
CN117496531B (en)*2023-11-022024-05-24四川轻化工大学 A convolutional autoencoder construction method that can reduce resource overhead for Chinese character recognition

Similar Documents

PublicationPublication DateTitle
CN108664996B (en) A method and system for ancient text recognition based on deep learning
CN114038037B (en)Expression label correction and identification method based on separable residual error attention network
CN104463195B (en)Printing digit recognizing method based on template matches
CN110188750A (en) A text recognition method for natural scene pictures based on deep learning
CN114359998B (en)Identification method of face mask in wearing state
CN111126240A (en) A three-channel feature fusion face recognition method
CN112069900A (en)Bill character recognition method and system based on convolutional neural network
CN113901952A (en)Print form and handwritten form separated character recognition method based on deep learning
CN105608454A (en)Text structure part detection neural network based text detection method and system
CN107545243A (en)Yellow race's face identification method based on depth convolution model
CN104239872A (en)Abnormal Chinese character identification method
CN108805223A (en)A kind of recognition methods of seal character text and system based on Incep-CapsNet networks
Meng et al.Ancient Asian character recognition for literature preservation and understanding
Li et al.HEp-2 specimen classification via deep CNNs and pattern histogram
Zeng et al.Zero-shot Chinese character recognition with stroke-and radical-level decompositions
Inunganbi et al.Recognition of handwritten Meitei Mayek script based on texture feature
CN108280417A (en)A kind of finger vena method for quickly identifying
CN102592149B (en)Computer aided periodization method of oracle bone rubbings
CN106203414B (en)A method of based on the scene picture text detection for differentiating dictionary learning and rarefaction representation
CN118429980A (en) Evaluation method of calligraphy copying effect based on deep learning
CN106650629A (en)Kernel sparse representation-based fast remote sensing target detection and recognition method
Bhatt et al.Text Extraction & Recognition from Visiting Cards
CN114155613B (en)Offline signature comparison method based on convenient sample acquisition
CN112115949B (en)Optical character recognition method for tobacco certificate and order
GaoAn enhanced neural network model based on vgg-16 for accurate recognition of oracle

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
RJ01Rejection of invention patent application after publication
RJ01Rejection of invention patent application after publication

Application publication date:20190830


[8]ページ先頭

©2009-2025 Movatter.jp