CN110188750A

Movatterモバイル変換

Info

Publication number: CN110188750A
Application number: CN201910406709.4A
Authority: CN
Inventors: 赵春阳; 章奇; 陈晓飞; 欧杨磊
Original assignee: Hangzhou Electronic Science and Technology University
Current assignee: Hangzhou Electronic Science and Technology University
Priority date: 2019-05-16
Filing date: 2019-05-16
Publication date: 2019-08-30

Abstract

Translated fromChinese

本发明提供一种基于深度学习的自然场景图片文字识别方法。所述基于深度学习的自然场景图片文字识别方法包括以下步骤：S1：建立文字图像数据库，并对其进行分类；S2：通过图像采集器进行图像采集，并对采集的图像进行预处理；S3：利用深度神经网络对预处理后的图像进行特征提取，并对提取的特征通过统计分类器进行分类；S4：利用人工设计特征，使用模板匹配的方法进行辅助训练。本发明提供的基于深度学习的自然场景图片文字识别方法，通过对比度处理、防噪处理和放大分隔处理，能够图像的对比度提升效果较好，使得图像具有较高的质量，同时能够有效的提高图像的识别率，降低外界环境因素对其造成的影响，并且有效的提高其特征提取率。

The present invention provides a method for recognizing characters in natural scene pictures based on deep learning. The method for character recognition of natural scene pictures based on deep learning includes the following steps: S1: establish a character image database, and classify it; S2: collect images through an image collector, and preprocess the collected images; S3: Use the deep neural network to extract features from the preprocessed image, and classify the extracted features through a statistical classifier; S4: use artificially designed features, and use template matching methods for auxiliary training. The text recognition method of natural scene pictures based on deep learning provided by the present invention, through contrast processing, anti-noise processing and amplification and separation processing, can improve the contrast of the image better, so that the image has a higher quality, and can effectively improve the image quality. Recognition rate, reduce the impact of external environmental factors on it, and effectively improve its feature extraction rate.

Description

Translated fromChinese

一种基于深度学习的自然场景图片文字识别方法A text recognition method for natural scene pictures based on deep learning

技术领域technical field

本发明涉及文字识别领域，尤其涉及一种基于深度学习的自然场景图片文字识别方法。The present invention relates to the field of character recognition, in particular to a deep learning-based character recognition method for natural scene pictures.

背景技术Background technique

文字识别技术，是模式识别应用的一个重要领域，50年代开始探讨一般文字识别方法，并研制出光学字符识别器，60年代出现了采用磁性墨水和特殊字体的实用机器，60年代后期，出现了多种字体和手写体文字识别机，但识别精度和机器性能都很不理想，70年代，主要研究文字识别的基本理论和研制高性能的文字识别机，并着重于汉字识别的研究，现如今文字识别技术已经提升了很多。Text recognition technology is an important field of pattern recognition applications. In the 1950s, general text recognition methods were discussed, and optical character recognizers were developed. In the 1960s, practical machines using magnetic ink and special fonts appeared. In the late 1960s, there appeared A variety of fonts and handwritten character recognition machines, but the recognition accuracy and machine performance are not ideal. In the 1970s, it mainly studied the basic theory of character recognition and developed high-performance character recognition machines, and focused on the research of Chinese character recognition. Today's characters Recognition technology has improved a lot.

目前的基于深度学习的自然场景图片文字识别方法，在使用时，无法保证其识别和特征提取的准确率，不利于广泛推广使用。The current deep learning-based text recognition method for natural scene pictures cannot guarantee the accuracy of its recognition and feature extraction when used, which is not conducive to widespread promotion and use.

因此，有必要提供一种基于深度学习的自然场景图片文字识别方法解决上述技术问题。Therefore, it is necessary to provide a deep learning-based text recognition method for natural scene pictures to solve the above technical problems.

发明内容Contents of the invention

本发明提供一种基于深度学习的自然场景图片文字识别方法，解决了目前的基于深度学习的自然场景图片文字识别方法，在使用时，无法保证其识别和特征提取的准确率的问题。The present invention provides a deep learning-based natural scene picture text recognition method, which solves the problem that the current deep learning-based natural scene picture text recognition method cannot guarantee the accuracy of its recognition and feature extraction.

为解决上述技术问题，本发明提供的基于深度学习的自然场景图片文字识别方法包括以下步骤：In order to solve the above-mentioned technical problems, the natural scene picture text recognition method based on deep learning provided by the present invention comprises the following steps:

S1：建立文字图像数据库，并对其进行分类；S1: Establish a text image database and classify it;

S2：通过图像采集器进行图像采集，并对采集的图像进行预处理；S2: collecting images through the image collector, and preprocessing the collected images;

S3：利用深度神经网络对预处理后的图像进行特征提取，并对提取的特征通过统计分类器进行分类；S3: Use the deep neural network to perform feature extraction on the preprocessed image, and classify the extracted features through a statistical classifier;

S4：利用人工设计特征，使用模板匹配的方法进行辅助训练，并进行匹配。S4: Use artificial design features, use template matching method for auxiliary training, and perform matching.

优选的，所述文字图像数据库以楷书单字、甲骨文单字等为索引条目，对指定来源中的单字的形体进行扫描或拍照，获取标准形体，并进行分类标签。Preferably, the character image database uses regular script characters, oracle bone characters, etc. as index entries, scans or takes pictures of the shapes of the characters in the specified source, obtains standard shapes, and performs classification and labeling.

优选的，所述S2中图像的预处理包括对比度处理、去噪处理、分隔剥离处理和扩充数据库。Preferably, the preprocessing of the image in S2 includes contrast processing, denoising processing, separation and stripping processing, and database expansion.

优选的，所述对比度处理：对采集的图像进行处理，获取采集的图像的全图关注程度权重映射图，全图关注程度权重映射图包括分别与采集的图像的多个像素对应的多个映射点，每一映射点均具有对应的权重值，而后将采集的图像划分为多个原始区块，利用全图关注程度权重映射图及预设的灰阶转换公式分别对多个原始区块进行直方图均衡化处理形成多个转换区块，具体为利用全图关注程度权重映射图及预设的灰阶转换公式分别计算每一原始区块中的各个像素的原始灰阶对应的转换灰阶并利用转换灰阶对原始灰阶进行替换，从而形成多个转换区块，最终对多个转换区块进行拼接处理形成处理后的图像。Preferably, the contrast processing: process the collected image, and obtain the full-image attention degree weight map of the collected image, and the full-image attention degree weight map includes a plurality of maps respectively corresponding to a plurality of pixels of the collected image Points, each mapping point has a corresponding weight value, and then the collected image is divided into multiple original blocks, and the multiple original blocks are respectively processed by using the weight map of the whole image attention degree and the preset gray scale conversion formula The histogram equalization process forms multiple conversion blocks, specifically using the weight map of the full image attention degree and the preset gray scale conversion formula to calculate the converted gray scale corresponding to the original gray scale of each pixel in each original block And the converted gray scale is used to replace the original gray scale to form multiple converted blocks, and finally the multiple converted blocks are spliced to form a processed image.

优选的，所述防躁处理：选取一定数量的目标测试样本；进行人工标注，并将标注好的样本集划分为开发样本和第一训练样本，通过分析采集的图像可能出现的噪声模型和扭曲特征设计随机样本生成器，在已选择字体的标准字的基础上，自动生成可供神经网络训练使用的大量第二训练样本，自动生成的第二训练样本集中包含各种复杂的噪声和扭曲变形，可以满足各种复杂文字识别的需要，将所述第一训练样本集和第二训练样本集混合后输入所述深度神经网络中。Preferably, the anti-irritation treatment: select a certain number of target test samples; perform manual labeling, and divide the marked sample set into development samples and first training samples, and analyze the noise model and distortion that may appear in the collected images Feature design random sample generator, based on the standard characters of the selected font, automatically generates a large number of second training samples that can be used for neural network training. The automatically generated second training sample set contains various complex noises and distortions , which can meet the needs of various complex character recognition, and input the first training sample set and the second training sample set into the deep neural network after being mixed.

优选的，所述放大分隔处理：对采集的图像进行分隔，分隔为九个区域，对九个区域的图像依次放大5、10、15、20、25、30、40和50倍，获得放大后的图像，对放大后的图像进行组合，获得图像组。Preferably, the enlargement and separation process: divide the collected image into nine regions, and enlarge the images of the nine regions by 5, 10, 15, 20, 25, 30, 40 and 50 times in sequence to obtain the enlarged images, and combine the enlarged images to obtain an image group.

优选的，所述对预处理的图像进行特征提取具体为通过深度神经网络对预处理的图像进行特征提取，包括：使用Inception_V3结构单元实现图像的并行压缩；使用多层池化单元实现图像的并行压缩，并行整合特征，最大限度提取出具有平移不变性的特征；使用多层过滤器替代大尺寸过滤器；使用批量归一化，对数据内部进行标准化处理，使输出规范化到0到1之间的正态分布。Preferably, the feature extraction of the preprocessed image is specifically extracting the feature of the preprocessed image through a deep neural network, including: using the Inception_V3 structural unit to achieve parallel compression of the image; using a multi-layer pooling unit to achieve parallel image compression Compress and integrate features in parallel to maximize the extraction of features with translation invariance; use multi-layer filters instead of large-size filters; use batch normalization to standardize the data internally, so that the output can be normalized to between 0 and 1 normal distribution of .

优选的，所述对提取的特征通过统计分类器分类包括：将提取的特征通过分类器分类，实现不同时间的形体的演变识别，采取softmax函数作为统计分类器进行计算，所输出的模型预测概率为Preferably, classifying the extracted features through a statistical classifier includes: classifying the extracted features through a classifier to realize the evolution recognition of shapes at different times, taking the softmax function as a statistical classifier for calculation, and the output model prediction probability for

其中，表示当前实例属于第k类的概率，n表示总类别数，sk(x)表示当前实例x属于第k类的得分，exp(·)表示对括号内元素求指数，表示实例x关于从1到n的所有类别的得分的指数值的总和，k的范围为1到n，j的范围为1到n。in, Indicates the probability that the current instance belongs to the kth category, n indicates the total number of categories, sk(x) indicates the score of the current instance x belonging to the kth category, exp( ) indicates the index of the elements in the brackets, Represents the sum of the exponential values of the scores of instance x with respect to all categories from 1 to n, k ranges from 1 to n, and j ranges from 1 to n.

与相关技术相比较，本发明提供的基于深度学习的自然场景图片文字识别方法具有如下有益效果：Compared with related technologies, the deep learning-based natural scene picture text recognition method provided by the present invention has the following beneficial effects:

本发明提供一种基于深度学习的自然场景图片文字识别方法，通过对采集的图像进行对比度处理，获取采集的图像的全图关注程度权重映射图，全图关注程度权重映射图包括分别与采集的图像的多个像素对应的多个映射点，每一映射点均具有对应的权重值，而后将采集的图像划分为多个原始区块，利用全图关注程度权重映射图及预设的灰阶转换公式分别对多个原始区块进行直方图均衡化处理形成多个转换区块，具体为利用全图关注程度权重映射图及预设的灰阶转换公式分别计算每一原始区块中的各个像素的原始灰阶对应的转换灰阶并利用转换灰阶对原始灰阶进行替换，从而形成多个转换区块，最终对多个转换区块进行拼接处理形成处理后的图像；对采集的图像进行防噪处理，选取一定数量的目标测试样本；进行人工标注，并将标注好的样本集划分为开发样本和第一训练样本，通过分析采集的图像可能出现的噪声模型和扭曲特征设计随机样本生成器，在已选择字体的标准字的基础上，自动生成可供神经网络训练使用的大量第二训练样本，自动生成的第二训练样本集中包含各种复杂的噪声和扭曲变形，可以满足各种复杂文字识别的需要，将所述第一训练样本集和第二训练样本集混合后输入所述深度神经网络中；对采集的图像进行分隔，分隔为九个区域，对九个区域的图像依次放大5、10、15、20、25、30、40和50倍，获得放大后的图像，对放大后的图像进行组合，获得图像组，通过对比度处理、防噪处理和放大分隔处理，能够图像的对比度提升效果较好，使得图像具有较高的质量，同时能够有效的提高图像的识别率，降低外界环境因素对其造成的影响，并且有效的提高其特征提取率。The present invention provides a method for recognizing characters in natural scene pictures based on deep learning. By performing contrast processing on the collected images, the weight map of the full-image attention degree of the collected image is obtained. Multiple mapping points corresponding to multiple pixels of the image, each mapping point has a corresponding weight value, and then the collected image is divided into multiple original blocks, using the weight map of the full image attention level and the preset gray scale The conversion formula performs histogram equalization processing on multiple original blocks to form multiple conversion blocks. Specifically, each original block in each original block is calculated by using the full image attention weight map and the preset gray scale conversion formula. The converted gray scale corresponding to the original gray scale of the pixel is replaced by the converted gray scale to form multiple converted blocks, and finally the multiple converted blocks are spliced to form a processed image; the collected image Perform anti-noise processing, select a certain number of target test samples; perform manual labeling, and divide the marked sample set into development samples and first training samples, and design random samples by analyzing the noise model and distortion characteristics that may appear in the collected images The generator, based on the standard characters of the selected font, automatically generates a large number of second training samples that can be used for neural network training. The automatically generated second training sample set contains various complex noises and distortions, which can meet various requirements. In order to meet the needs of a complex character recognition, the first training sample set and the second training sample set are mixed and then input into the deep neural network; the collected images are divided into nine regions, and the images of the nine regions Sequentially enlarge 5, 10, 15, 20, 25, 30, 40 and 50 times to obtain the enlarged image, combine the enlarged images to obtain the image group, through contrast processing, anti-noise processing and amplification and separation processing, it can The contrast enhancement effect of the image is better, so that the image has a higher quality, and at the same time, it can effectively improve the recognition rate of the image, reduce the influence of external environmental factors on it, and effectively improve the feature extraction rate.

附图说明Description of drawings

图1为本发明提供的基于深度学习的自然场景图片文字识别方法的一种较佳实施例的原理框图；Fig. 1 is the principle block diagram of a kind of preferred embodiment of the natural scene picture character recognition method based on deep learning provided by the present invention;

图2为本发明提供的基于深度学习的自然场景图片文字识别方法的图像预处理的原理框图。Fig. 2 is a schematic block diagram of the image preprocessing of the deep learning-based natural scene picture text recognition method provided by the present invention.

具体实施方式Detailed ways

下面结合附图和实施方式对本发明作进一步说明。The present invention will be further described below in conjunction with the accompanying drawings and embodiments.

请结合参阅图1和图2，其中，图1为本发明提供的基于深度学习的自然场景图片文字识别方法的一种较佳实施例的原理框图；图2为本发明提供的基于深度学习的自然场景图片文字识别方法的图像预处理的原理框图。基于深度学习的自然场景图片文字识别方法包括以下步骤：Please refer to Fig. 1 and Fig. 2 in conjunction, wherein, Fig. 1 is the functional block diagram of a kind of preferred embodiment of the natural scene picture character recognition method based on deep learning provided by the present invention; Fig. 2 is the deep learning-based method provided by the present invention The principle block diagram of the image preprocessing of the natural scene picture text recognition method. The text recognition method of natural scene pictures based on deep learning comprises the following steps:

S4：利用人工设计特征，使用模板匹配的方法进行辅助训练，并进行匹配，模板匹配的方法主要包括余弦相似度和欧几里得距离等。S4: Use artificial design features, use template matching method for auxiliary training, and perform matching. Template matching methods mainly include cosine similarity and Euclidean distance.

所述文字图像数据库以楷书单字、甲骨文单字等为索引条目，对指定来源中的单字的形体进行扫描或拍照，获取标准形体，并进行分类标签。The character image database takes regular script characters, oracle bone characters, etc. as index entries, scans or takes pictures of the shapes of the characters in the specified source, obtains standard shapes, and classifies and labels them.

所述S2中图像的预处理包括对比度处理、去噪处理、分隔剥离处理和扩充数据库。The preprocessing of the image in S2 includes contrast processing, denoising processing, separation and stripping processing and expanding the database.

所述对比度处理：对采集的图像进行处理，获取采集的图像的全图关注程度权重映射图，全图关注程度权重映射图包括分别与采集的图像的多个像素对应的多个映射点，每一映射点均具有对应的权重值，而后将采集的图像划分为多个原始区块，利用全图关注程度权重映射图及预设的灰阶转换公式分别对多个原始区块进行直方图均衡化处理形成多个转换区块，具体为利用全图关注程度权重映射图及预设的灰阶转换公式分别计算每一原始区块中的各个像素的原始灰阶对应的转换灰阶并利用转换灰阶对原始灰阶进行替换，从而形成多个转换区块，最终对多个转换区块进行拼接处理形成处理后的图像，能够有效避免大范围相似像素占用较大的灰度值范围导致的对比度增强效果较差的问题，同时能够依据人眼视觉关注程度动态的提升各个原始区块的对比度，从而最终获得的处理后的图像的对比度提升效果较好，使得图像具有较高的质量。The contrast processing: process the collected image, and obtain the full image attention degree weight map of the collected image, the full image attention degree weight map includes a plurality of mapping points respectively corresponding to a plurality of pixels of the collected image, each Each mapping point has a corresponding weight value, and then the collected image is divided into multiple original blocks, and the histogram equalization is performed on multiple original blocks by using the weight map of the whole image attention degree and the preset gray scale conversion formula Transformation processing forms a plurality of conversion blocks, specifically, using the full image attention degree weight map and the preset gray scale conversion formula to calculate the converted gray scale corresponding to the original gray scale of each pixel in each original block, and use the conversion The gray scale replaces the original gray scale to form multiple conversion blocks, and finally stitches the multiple conversion blocks to form the processed image, which can effectively avoid the large range of similar pixels occupying a large range of gray values. The contrast enhancement effect is poor, and at the same time, the contrast of each original block can be dynamically improved according to the degree of human visual attention, so that the contrast enhancement effect of the finally processed image is better, so that the image has higher quality.

所述防躁处理：选取一定数量的目标测试样本(比方1000张图片)；进行人工标注，并将标注好的样本集划分为开发样本和第一训练样本(比如说将标注样本集中30％的样本作为为开发样本，70％的样本作为第一训练样本)，通过分析采集的图像可能出现的噪声模型和扭曲特征设计随机样本生成器，在已选择字体的标准字的基础上，自动生成可供神经网络训练使用的大量第二训练样本，自动生成的第二训练样本集中包含各种复杂的噪声和扭曲变形，可以满足各种复杂文字识别的需要，将所述第一训练样本集和第二训练样本集混合后输入所述深度神经网络中，通过深度神经网络的学习来识别各种噪声和扭曲特征；解决了通过深度神经网络来识别文字时需要大量人工标注的问题；并且本基于深度学习的复杂文字识别方法在保留了原图片的噪声、扭曲等复杂性的前提下，使用最先进的深度神经网络进行分类自动化的深度学习。Described anti-irritation treatment: select a certain number of target test samples (such as 1000 pictures); carry out manual labeling, and divide the marked sample set into development samples and first training samples (for example, 30% of the marked sample set samples as development samples, and 70% of the samples as the first training samples), design a random sample generator by analyzing the noise model and distortion features that may appear in the collected images, and automatically generate available samples based on the standard characters of the selected font. A large number of second training samples for neural network training, the automatically generated second training sample set contains various complex noises and distortions, which can meet the needs of various complex character recognition, the first training sample set and the second training sample set The two training sample sets are mixed and input into the deep neural network, and various noises and distorted features are identified through the learning of the deep neural network; it solves the problem of requiring a large amount of manual labeling when recognizing text through the deep neural network; and this book is based on depth The learned complex text recognition method uses the most advanced deep neural network for classification and automatic deep learning on the premise of retaining the complexity of the original image such as noise and distortion.

所述放大分隔处理：对采集的图像进行分隔，分隔为九个区域，对九个区域的图像依次放大5、10、15、20、25、30、40和50倍，获得放大后的图像，对放大后的图像进行组合，获得图像组，将图像组输入至深度神经网络进行特征提取，能够有效提高特征提取的准确率。The enlargement and separation process: divide the collected image into nine regions, and enlarge the images of the nine regions by 5, 10, 15, 20, 25, 30, 40 and 50 times in sequence to obtain the enlarged image, Combine the enlarged images to obtain an image group, and input the image group to the deep neural network for feature extraction, which can effectively improve the accuracy of feature extraction.

所述对预处理的图像进行特征提取具体为通过深度神经网络对预处理的图像进行特征提取，包括：使用Inception_V3结构单元实现图像的并行压缩；使用多层池化单元实现图像的并行压缩，并行整合特征，最大限度提取出具有平移不变性的特征；使用多层过滤器替代大尺寸过滤器；使用批量归一化，对数据内部进行标准化处理，使输出规范化到0到1之间的正态分布，从而保证网络可以以较高的学习速率进行，防止发生梯度爆炸或者弥散现象。The feature extraction of the preprocessed image is specifically to extract the feature of the preprocessed image through a deep neural network, including: using the Inception_V3 structural unit to realize the parallel compression of the image; using a multi-layer pooling unit to realize the parallel compression of the image, parallel Integrate features to maximize the extraction of features with translation invariance; use multi-layer filters instead of large-size filters; use batch normalization to standardize the data internally, so that the output can be normalized to a normal value between 0 and 1 distribution, so as to ensure that the network can perform at a higher learning rate and prevent gradient explosion or dispersion phenomenon.

所述对提取的特征通过统计分类器分类包括：将提取的特征通过分类器分类，实现不同时间的形体的演变识别，采取softmax函数作为统计分类器进行计算，所输出的模型预测概率为The described classification of the extracted features by a statistical classifier includes: classifying the extracted features by a classifier to realize the evolution recognition of shapes at different times, and taking the softmax function as a statistical classifier for calculation, and the output model prediction probability is

其中，表示当前实例属于第k类的概率，n表示总类别数，sk(x)表示当前实例x属于第k类的得分，exp(·)表示对括号内元素求指数，表示实例x关于从1到n的所有类别的得分的指数值的总和，k的范围为1到n，j的范围为1到n，具体的，每一张输入用来预测的图像(图片)都是一个实例，当前输入系统的这张图像(图片)经过前面网络的特征提取到达最后一层，即softmax分类层，然后计算它的属于每个类的概率，所述总类别数在制作分类标签之后获知。in, Indicates the probability that the current instance belongs to the kth category, n indicates the total number of categories, sk(x) indicates the score of the current instance x belonging to the kth category, exp( ) indicates the index of the elements in the brackets, Represents the sum of the index values of the instance x with respect to all categories from 1 to n, the range of k is 1 to n, and the range of j is 1 to n. Specifically, each input image (picture) used for prediction It is all an example. The image (picture) currently input to the system reaches the last layer through the feature extraction of the previous network, that is, the softmax classification layer, and then calculates its probability of belonging to each class. After the label is learned.

与相关技术相比较，本发明提供的基于深度学习的自然场景图片文字识别方法有如下有益效果：Compared with related technologies, the deep learning-based natural scene picture text recognition method provided by the present invention has the following beneficial effects:

通过对采集的图像进行对比度处理，获取采集的图像的全图关注程度权重映射图，全图关注程度权重映射图包括分别与采集的图像的多个像素对应的多个映射点，每一映射点均具有对应的权重值，而后将采集的图像划分为多个原始区块，利用全图关注程度权重映射图及预设的灰阶转换公式分别对多个原始区块进行直方图均衡化处理形成多个转换区块，具体为利用全图关注程度权重映射图及预设的灰阶转换公式分别计算每一原始区块中的各个像素的原始灰阶对应的转换灰阶并利用转换灰阶对原始灰阶进行替换，从而形成多个转换区块，最终对多个转换区块进行拼接处理形成处理后的图像；对采集的图像进行防噪处理，选取一定数量的目标测试样本；进行人工标注，并将标注好的样本集划分为开发样本和第一训练样本，通过分析采集的图像可能出现的噪声模型和扭曲特征设计随机样本生成器，在已选择字体的标准字的基础上，自动生成可供神经网络训练使用的大量第二训练样本，自动生成的第二训练样本集中包含各种复杂的噪声和扭曲变形，可以满足各种复杂文字识别的需要，将所述第一训练样本集和第二训练样本集混合后输入所述深度神经网络中；对采集的图像进行分隔，分隔为九个区域，对九个区域的图像依次放大5、10、15、20、25、30、40和50倍，获得放大后的图像，对放大后的图像进行组合，获得图像组，通过对比度处理、防噪处理和放大分隔处理，能够图像的对比度提升效果较好，使得图像具有较高的质量，同时能够有效的提高图像的识别率，降低外界环境因素对其造成的影响，并且有效的提高其特征提取率。By performing contrast processing on the collected image, the weight map of the full-image attention degree of the collected image is obtained, and the full-image attention degree weight map includes a plurality of mapping points respectively corresponding to multiple pixels of the collected image, and each mapping point Each has a corresponding weight value, and then the collected image is divided into multiple original blocks, and the histogram equalization process is performed on multiple original blocks by using the weight map of the whole image attention degree and the preset gray scale conversion formula to form A plurality of conversion blocks, specifically, using the weight map of the attention degree of the whole image and the preset gray scale conversion formula to calculate the converted gray scale corresponding to the original gray scale of each pixel in each original block, and use the converted gray scale to The original gray scale is replaced to form multiple conversion blocks, and finally the multiple conversion blocks are spliced to form a processed image; the collected image is subjected to noise prevention processing, and a certain number of target test samples are selected; manual labeling , and divide the marked sample set into development samples and the first training samples, design a random sample generator by analyzing the noise model and distortion features that may appear in the collected images, and automatically generate them on the basis of the standard characters of the selected font A large number of second training samples that can be used for neural network training. The automatically generated second training sample set contains various complex noises and distortions, which can meet the needs of various complex character recognition. The first training sample set and The second training sample set is mixed and input into the deep neural network; the collected image is separated into nine regions, and the images in the nine regions are enlarged by 5, 10, 15, 20, 25, 30, 40 and 50 times, to obtain the enlarged image, combine the enlarged images to obtain an image group, through contrast processing, anti-noise processing and amplification and separation processing, the contrast of the image can be improved effectively, so that the image has a higher quality, At the same time, it can effectively improve the recognition rate of the image, reduce the influence of external environmental factors on it, and effectively improve the feature extraction rate.

以上所述仅为本发明的实施例，并非因此限制本发明的专利范围，凡是利用本发明说明书及附图内容所作的等效结构或等效流程变换，或直接或间接运用在其它相关的技术领域，均同理包括在本发明的专利保护范围内。The above is only an embodiment of the present invention, and does not limit the patent scope of the present invention. Any equivalent structure or equivalent process transformation made by using the description of the present invention and the contents of the accompanying drawings, or directly or indirectly used in other related technologies fields, all of which are equally included in the scope of patent protection of the present invention.

Claims

1. A natural scene picture character recognition method based on deep learning is characterized by comprising the following steps:

s1: establishing a text image database and classifying the text image database;

s2: acquiring images through an image acquisition device, and preprocessing the acquired images;

s3: extracting the features of the preprocessed image by using a deep neural network, and classifying the extracted features by using a statistical classifier;

s4: and (4) performing auxiliary training by utilizing the artificial design characteristics and using a template matching method, and performing matching.

2. The natural scene picture character recognition method based on deep learning of claim 1, wherein the character image database uses characters in a regular script, characters in a carapace bone, and the like as index entries, scans or photographs the characters in the designated source, obtains standard characters, and performs classification labeling.

3. The natural scene picture word recognition method based on deep learning of claim 1, wherein the pre-processing of the image in S2 includes contrast processing, denoising processing, separation and stripping processing, and database expansion.

4. The natural scene picture character recognition method based on deep learning of claim 3, wherein the contrast processing: processing the acquired image to obtain a full-image attention degree weight mapping map of the acquired image, wherein the full-image attention degree weight mapping map comprises a plurality of mapping points respectively corresponding to a plurality of pixels of the acquired image, each mapping point has a corresponding weight value, dividing the collected image into a plurality of original blocks, performing histogram equalization processing on the plurality of original blocks by using a full-image attention degree weight mapping image and a preset gray scale conversion formula to form a plurality of conversion blocks, specifically calculating conversion gray scales corresponding to the original gray scales of each pixel in each original block by using the full-image attention degree weight mapping image and the preset gray scale conversion formula and replacing the original gray scales by using the conversion gray scales, thus, a plurality of conversion blocks are formed, and finally the plurality of conversion blocks are spliced to form a processed image.

5. The natural scene picture character recognition method based on deep learning of claim 3, wherein the anti-noise processing: selecting a certain number of target test samples; the method comprises the steps of carrying out manual labeling, dividing a labeled sample set into a development sample and a first training sample, designing a random sample generator by analyzing a noise model and distortion characteristics which may appear in an acquired image, automatically generating a large number of second training samples which can be used for neural network training on the basis of a standard character with a selected font, enabling the automatically generated second training sample set to contain various complex noises and distortion deformations and meeting the requirements of various complex character recognition, and mixing the first training sample set and the second training sample set and inputting the mixed first training sample set and second training sample set into the deep neural network.

6. The natural scene picture character recognition method based on deep learning of claim 3, wherein the enlargement separation process: the collected images are divided into nine areas, the images of the nine areas are sequentially magnified by 5, 10, 15, 20, 25, 30, 40 and 50 times to obtain magnified images, and the magnified images are combined to obtain an image group.

7. The natural scene picture character recognition method based on deep learning of claim 1, wherein the feature extraction of the preprocessed image is specifically a feature extraction of the preprocessed image through a deep neural network, and the method comprises: the Inception _ V3 structural unit is used for realizing parallel compression of the image; the parallel compression of the image is realized by using a multi-layer pooling unit, the features are integrated in parallel, and the features with translation invariance are extracted to the maximum extent; using a multi-layer filter instead of a large size filter; the data is internally normalized using batch normalization, normalizing the output to a normal distribution between 0 and 1.

8. The natural scene picture character recognition method based on deep learning of claim 1, wherein the classifying the extracted features through a statistical classifier comprises: classifying the extracted features through a classifier to realize the evolution identification of the body shape at different time,calculating by taking a softmax function as a statistical classifier, wherein the output model prediction probability is

Wherein,indicates the probability that the current instance belongs to class k, n indicates the total number of classes, sk (x) indicates the score that the current instance x belongs to class k, exp (-) indicates the exponentiation of the parenthetical element,the sum of the index values representing the scores of example x for all categories from 1 to n, k ranging from 1 to n, and j ranging from 1 to n.