CN111314697B

Movatterモバイル変換

Info

Publication number: CN111314697B
Application number: CN202010116219.3A
Authority: CN
Inventors: 张昊; 傅枧根; 钟培雄
Original assignee: Central South University
Current assignee: Central South University
Priority date: 2020-02-25
Filing date: 2020-02-25
Publication date: 2021-10-15
Anticipated expiration: 2040-02-25
Also published as: CN111314697A

Abstract

Translated fromChinese

本发明公开了一种针对光字符识别的码率设置方法、设备及存储介质，本方法首先对图片进行下采样，其次对下采样图片进行多次编码求取使得下采样图片能够被正确识别的最优QP/Rate值(最低的码率值/量化系数值)，然后根据置信度神经网络求取码率增加量M/量化系数增加量N，最后快速找到针对该原始图片的最优编码值，该最优编码值是使得原始图片的光字符识别精度不受影响的最低码率值/最低量化系数值。相较于现有技术，本发明不仅可以减少网络传输带宽，而且针对大量数以亿计的图片也可以减少存储空间的占用而减少成本。而且本发明涵盖了主流的混合编码架构的应用场景，可以采用任意一种图像编码标准或者视频编码标准的帧内编码方式，应用广泛。

The invention discloses a code rate setting method, device and storage medium for optical character recognition. The method firstly performs down-sampling on a picture, and secondly, performs multiple coding and retrieving on the down-sampled picture so that the down-sampled picture can be correctly identified. The optimal QP/Rate value (the lowest code rate value/quantization coefficient value), and then calculate the code rate increase M/quantization coefficient increase N according to the confidence neural network, and finally quickly find the optimal encoding value for the original picture , and the optimal encoding value is the lowest bit rate value/lowest quantization coefficient value that keeps the optical character recognition accuracy of the original picture unaffected. Compared with the prior art, the present invention can not only reduce the network transmission bandwidth, but also can reduce the occupation of storage space and cost for a large number of hundreds of millions of pictures. Moreover, the present invention covers the application scenarios of the mainstream hybrid coding architecture, and can adopt any intra-frame coding method of an image coding standard or a video coding standard, and is widely used.

Description

Code rate setting method, equipment and storage medium for optical character recognition

Technical Field

The invention relates to the technical field of video coding and deep learning, in particular to a code rate setting method, equipment and a storage medium for optical character recognition.

Background

With the continuous development and progress of artificial intelligence technology, it has become popular to collect data and perform simple processing based on a mobile terminal, and then to transmit motion for intelligent analysis. Among them, face recognition and optical character recognition have been widely used. The transmission of a large amount of images consumes a large amount of bandwidth, in order to save the bandwidth of a data network, code Rate (Rate) setting needs to be carried out on image data, the code Rate of the image is minimum (so that the consumed bandwidth is minimum), and the influence on the image quality is minimum, so that the optical character recognition (namely OCR) effect is good. In addition, even in an application scenario in which OCR is directly performed in a cloud or a local server without network transmission, hundreds of millions of pictures occupy a large amount of storage space. In order to reduce the storage space of the picture and reduce the cost, it is also necessary to control the size of the picture by fast coding of the picture, and a code rate as small as possible (i.e. the size of the picture is as small as possible) is adopted so as not to affect the optical character recognition effect.

The conventional common image encoding methods are JPEG, JPEG2000, and the like. In recent years, the intra-frame coding method of the video coding standard can also be used for image coding, and better coding efficiency is achieved than the conventional methods such as JPEG. Among them, a series of standards such as h.264, HEVC, VVC, AVS2, AVS3, AV1 adopt a hybrid coding architecture, mainly aiming at video coding, but their intra-coding is also gradually applied to image coding. At present, on the premise of how to ensure the accuracy of optical character recognition in a plurality of coding standards, the problem of reducing the picture code rate as much as possible is still to be solved.

Disclosure of Invention

The present invention at least solves one of the technical problems in the prior art, and provides a code rate setting method, device and storage medium for optical character recognition.

According to an embodiment of the present invention, there is provided a code rate setting method for optical character recognition, including the steps of:

setting a Rate interval of an original picture, and performing down-sampling on the original picture to obtain a down-sampled picture;

and calculating an optimal Rate value of the downsampled picture in the Rate interval, wherein the optimal Rate value is a minimum value which meets the following conditions in the Rate interval: coding the downsampled picture based on the optimal Rate value, and correctly identifying the downsampled picture after decoding;

inputting the downsampled picture into a confidence coefficient neural network, and performing confidence coefficient prediction to obtain a code rate increment M;

setting the optimal Rate value plus the n Rate increment M as the optimal coding value of the original picture, wherein the value of n meets the following conditions: and coding the original picture based on the optimal Rate value plus n Rate increment M and being correctly identified after decoding, and coding the original picture based on the optimal Rate value plus n +1 Rate increment M and being incorrectly identified after decoding.

The code rate setting method for optical character recognition provided by the embodiment of the invention at least has the following beneficial effects:

(1) the method comprises the steps of firstly down-sampling a picture, secondly coding the down-sampled picture for multiple times to obtain an optimal Rate value (the lowest code Rate value) which enables the down-sampled picture to be correctly identified, secondly obtaining a code Rate increment M according to a confidence neural network, and finally quickly finding the optimal coding value aiming at the original picture according to the optimal Rate value and the code Rate increment M, wherein the optimal coding value is the lowest code Rate value which enables the optical character identification precision of the original picture not to be influenced. Compared with the prior art, the method reduces the time consumption of coding.

(2) The method not only can reduce the network transmission bandwidth, but also can reduce the occupation of storage space and the cost for a large number of hundreds of millions of pictures.

(3) The method covers the application scene of the mainstream mixed coding architecture, can adopt any image coding standard or intra-frame coding mode of the video coding standard, and has wide application.

According to the code Rate setting method for optical character recognition, the optimal Rate value of the down-sampling picture is obtained based on the dichotomy.

According to the method for setting the code Rate for optical character recognition in the embodiment of the invention, the setting of the Rate interval of the original picture comprises the following steps:

and setting a Rate interval of the original picture according to a coding standard to be selected, or setting the Rate interval of the original picture according to the size or bandwidth of the original picture.

According to the code rate setting method for optical character recognition provided by the embodiment of the invention, the down-sampling multiple of the original picture is 0.25.

setting a QP interval of an original picture, and performing down-sampling on the original picture to obtain a down-sampled picture;

and solving an optimal QP value of the downsampled picture in the QP interval, wherein the optimal QP value is the minimum value of all values in the QP interval which meets the following conditions: coding the downsampled picture based on the optimal QP value, and correctly identifying the downsampled picture after decoding;

inputting the downsampled picture into a confidence coefficient neural network, and performing confidence coefficient prediction to obtain a quantization coefficient increment N;

setting the optimal QP value plus N quantization coefficient increment N as the optimal coding value of the original picture, wherein the value of N meets the following conditions: the original picture is encoded based on the optimal QP value plus N quantization coefficient increments N and can be correctly identified after decoding, and the original picture is encoded based on the optimal QP value plus N +1 quantization coefficient increments N and cannot be correctly identified after decoding.

(1) the method comprises the steps of firstly carrying out downsampling on a picture, secondly carrying out multiple times of coding on the downsampled picture to obtain an optimal QP value (the lowest quantization coefficient value) which enables the downsampled picture to be correctly identified, then obtaining a quantization coefficient increment N according to a confidence neural network, and finally quickly finding the optimal coding value aiming at the original picture according to the optimal QP value and the quantization coefficient increment N, wherein the optimal coding value is the lowest quantization coefficient value which enables the optical character identification precision of the original picture not to be influenced. Compared with the prior art, the method reduces the time consumption of coding.

According to the code rate setting method for optical character recognition, the optimal QP value of the downsampled picture is obtained based on the dichotomy.

According to the method for setting the code rate for optical character recognition in the embodiment of the invention, the setting of the QP interval of the original picture comprises the following steps:

and setting the QP interval of the original picture according to the coding standard to be selected, or setting the QP interval of the original picture according to the size or bandwidth of the original picture.

According to an embodiment of the present invention, there is provided a code rate setting apparatus for optical character recognition, including: at least one control processor and a memory for communicative connection with the at least one control processor; the memory stores instructions executable by the at least one control processor to enable the at least one control processor to perform a code rate setting method for optical character recognition as described above.

According to an embodiment of the present invention, there is provided a computer-readable storage medium storing computer-executable instructions for causing a computer to execute a code rate setting method for optical character recognition as described above.

Drawings

The invention is further described below with reference to the accompanying drawings and examples;

fig. 1 is a schematic flowchart of a code rate setting method for optical character recognition according to a first embodiment of the present invention;

FIG. 2 is a schematic view of the detailed process of step S102 in FIG. 1;

fig. 3 is a schematic flowchart of a code rate setting method for optical character recognition according to a second embodiment of the present invention;

FIG. 4 is a flowchart illustrating the step S202 in FIG. 2;

fig. 5 is a schematic structural diagram of a code rate setting device for optical character recognition according to a fifth embodiment of the present invention.

Detailed Description

The technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the accompanying drawings, and it is to be understood that the described embodiments are only a part of the embodiments of the present disclosure, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making any creative effort, shall fall within the protection scope of the disclosure. It should be noted that the features of the embodiments and examples of the present disclosure may be combined with each other without conflict. In addition, the purpose of the drawings is to graphically supplement the description in the written portion of the specification so that a person can intuitively and visually understand each technical feature and the whole technical solution of the present disclosure, but it should not be construed as limiting the scope of the present disclosure.

Referring to fig. 1 and 2, a first embodiment of the present invention provides a code rate setting method for optical character recognition, including the following steps:

s101, setting a Rate interval of an original picture, and performing down-sampling on the original picture to obtain a down-sampled picture;

as an alternative implementation manner, here, the Rate interval of the original picture may be set according to the coding standard to be selected, or the Rate interval of the original picture may be set according to the size or bandwidth of the original picture, which may be specifically adjusted according to the actual situation. For example: and coding the picture by using H.264 coding standard software JM, and setting the range of the Rate interval to be between [100 and 5000 ].

As an alternative embodiment, the original picture is downsampled by a factor of 0.25. In comparison with 0.1-time down-sampling and 0.5-time down-sampling, 0.25-time down-sampling is preferable because the size of a picture is reduced to a large extent and blurring of the picture can be avoided.

S102, solving an optimal Rate value of the downsampled picture based on a dichotomy, wherein the optimal Rate value is a minimum value meeting the following conditions in a Rate interval: coding the downsampled picture based on the optimal Rate value, and correctly identifying the downsampled picture after decoding;

it should be noted that, here, the optimal Rate value of the downsampled picture may also be obtained through a successive coding method, in this embodiment, it is preferable to obtain the optimal Rate value based on a bisection method, and the optimal Rate value can be obtained relatively quickly, and especially when the Rate interval is large, the efficiency of using the bisection method can be greatly improved.

The specific steps of solving the optimal Rate value of the downsampled picture based on the bisection method are as follows:

s1021, coding the downsampled picture based on the intermediate value of the Rate interval;

s1022, decoding the encoded downsampled picture, and then performing optical character recognition (recognition based on an optical character recognition model can be performed at the mobile terminal);

s1023, if the correct identification result can be obtained, updating the Rate interval by taking the middle value of the Rate interval as the updated right end point; if the correct recognition result cannot be obtained, updating the Rate interval by taking the middle value of the Rate interval as the updated left end point;

s1024, if the difference of the updated Rate values corresponding to the left end point and the right end point of the Rate interval is larger than 1, jumping to the step S1021; if the difference between the updated Rate values corresponding to the left and right endpoints of the Rate interval is less than or equal to 1, go to step S1025;

s1025, if the right end point of the Rate interval can obtain a correct recognition result, adopting the Rate value of the right end point as an optimal Rate value; if the right end point of the Rate interval can not obtain a correct recognition result, adopting the Rate value of the left end point as an optimal Rate value;

s103, inputting the down-sampling picture into a confidence coefficient neural network, and performing confidence coefficient prediction to obtain a code rate increment M;

in this step, a downsampled picture is input into a confidence neural network for deep learning, so that a confidence value that the downsampled picture can be correctly identified can be obtained, and then confidence prediction is performed to obtain a code rate increment M. Here, the confidence prediction is calculated by a functional formula, where the confidence value is input and the rate increase M is output, for example: when the confidence coefficient value is 90, solving that M is 5 according to a function calculation formula; when the confidence coefficient value is 80, solving M to be 4 according to a function calculation formula; it is understood that the function calculation formula can be set according to actual conditions.

S104, setting the optimal Rate value plus n code Rate increment M as the optimal coding value of the original picture, wherein the value of n meets the following conditions: the method comprises the steps of coding an original picture based on the optimal Rate value plus n Rate increment M and being correctly identified after decoding, and coding the original picture based on the optimal Rate value plus n +1 Rate increment M and being not correctly identified after decoding.

In the method provided by the embodiment, firstly, picture down-sampling is performed on an original picture; secondly, performing Rate value search on the downsampled picture based on a dichotomy to quickly obtain an optimal Rate value; then, continuously searching the original picture through the obtained optimal Rate value to obtain an optimal coding value, wherein the optimal coding value is a lowest code Rate value which enables the original picture to meet the accuracy of optical character recognition; the method can not only reduce the network transmission bandwidth, but also reduce the cost by reducing the occupation of storage space for a large number of hundreds of millions of pictures; the method can also be applied to any current image coding standard or intra-frame coding mode of video coding standard, and is widely applied.

Referring to fig. 3 and 4, a second embodiment of the present invention provides a code rate setting method for optical character recognition, including the following steps:

s201, setting a QP (quantization coefficient) interval of an original picture, and performing down-sampling on the original picture to obtain a down-sampled picture;

as an alternative implementation manner, the QP interval of the original picture may be set according to the coding standard to be selected, or the QP interval of the original picture may be set according to the size or bandwidth of the original picture, which may be adjusted according to the actual situation. For example: if the picture is coded by the h.264 coding standard software JM, the QP interval range is set to be between [10, 40 ].

S202, solving an optimal QP value of the downsampled picture based on a dichotomy, wherein the optimal QP value is the minimum value of all values in a QP interval which meet the following conditions: coding the downsampled picture based on the optimal QP value, and correctly identifying the downsampled picture after decoding;

it should be noted that, here, the optimal QP value of the downsampled picture may also be obtained through a successive coding method, and in this embodiment, it is preferable to obtain the optimal QP value based on the bisection method, so that the optimal QP value can be obtained relatively quickly, and especially when the QP interval is large, the efficiency of using the bisection method can be greatly improved.

The specific steps of solving the optimal QP value of the downsampled picture based on the bisection method are as follows:

s2021, coding the downsampled picture based on the middle value of the QP interval;

s2022, decoding the encoded downsampled picture, and performing optical character recognition (recognition based on an optical character recognition model can be performed at the mobile terminal);

s2023, if the correct recognition result can be obtained, updating the QP segment by using the middle value of the QP segment as the updated right end point; if the correct identification result cannot be obtained, updating the QP interval by taking the middle value of the QP interval as an updated left end point;

s2024, if the difference between the QP values corresponding to the left end point and the right end point of the updated QP interval is larger than 1, jumping to the step S2021; if the difference between the QP values corresponding to the left and right endpoints of the updated QP interval is less than or equal to 1, go to step S2025;

s2025, if the right end point of the QP interval can obtain a correct identification result, adopting the QP value of the right end point as the optimal QP value; if the right end point of the QP interval can not obtain the correct identification result, adopting the QP value of the left end point as the optimal QP value;

s203, inputting the down-sampling picture into a confidence coefficient neural network, and performing confidence coefficient prediction to obtain a quantization coefficient increment N;

in this step, a downsampled picture is input into a confidence neural network for deep learning, so that a confidence value that the downsampled picture can be correctly identified can be obtained, and then confidence prediction is performed to obtain a quantization coefficient increment N. It should be noted that the confidence prediction here is a function calculation formula, and the function calculation formula can be set according to the actual situation, and the setting principle is the same as that of the first embodiment, and will not be described in detail here.

S204, setting the optimal QP value plus the N quantization coefficient increment N as the optimal coding value of the original picture, wherein the value of N meets the following conditions: the original picture is encoded based on the optimal QP value plus N quantization coefficient increase N and can be correctly identified after decoding, and the original picture is encoded based on the optimal QP value plus N +1 quantization coefficient increase N and cannot be correctly identified after decoding.

In the method provided by the embodiment, firstly, picture down-sampling is performed on an original picture; secondly, searching QP values of the downsampled pictures based on a dichotomy to quickly obtain the optimal QP values; then, continuously searching the original picture through the obtained optimal QP value to obtain an optimal coding value, wherein the optimal coding value is a lowest quantization coefficient value which enables the original picture to meet the accuracy of optical character recognition; the method can not only reduce the network transmission bandwidth, but also reduce the cost by reducing the occupation of storage space for a large number of hundreds of millions of pictures; the method can also be applied to any current image coding standard or intra-frame coding mode of video coding standard, and is widely applied.

It should be noted that, since the QP value and the Rate value are mutually convertible in the art, the second embodiment is based on the same inventive concept as the first embodiment.

The third embodiment of the present invention provides a code rate setting method for optical character recognition, which uses h.264 coding standard software JM to exemplify the coding of a picture, and includes the following specific steps:

(1) the interval of the picture coding Rate is automatically set according to the requirement, for example: the Rate is required to be greater than 100 and less than 5000, and the interval can be set according to the conditions such as the size of the picture or the bandwidth;

(2) carrying out 1/4 times down-sampling on the original picture;

(3) coding based on the middle value of the downsampled picture code rate interval;

(4) performing optical character recognition after decoding the downsampled picture (recognition can be performed on the basis of an optical character recognition model at a mobile terminal);

(5) if the correct recognition result can be obtained under the condition of the middle value of the Rate, the right end point is used as an updated right end point, and the middle value is recalculated to be the middle value of a new interval; otherwise, the left endpoint is used as an updated left endpoint;

(6) repeating the steps (3), (4) and (5) until the separation is not carried out (the difference of the Rate values corresponding to the left end point and the right end point is less than or equal to 1), and finishing the searching process; if the right end point Rate can obtain the correct recognition result, adopting the right end point Rate for coding; otherwise, adopting the Rate of the left end point to carry out coding;

(7) after downsampling the picture, inputting the downsampled picture into a confidence neural network for deep learning to obtain a confidence value which can be correctly identified by the picture, and obtaining a code rate increment M according to the value of the confidence;

(8) taking the Rate value of the downsampled picture finally obtained in the step (6) as the initial Rate value of the original picture;

(9) continuously adding the code Rate increment M to the initial Rate value of the original picture to be used as a new Rate value;

(10) and the original picture is coded according to the new Rate value, and optical character recognition is carried out after decoding (the mobile terminal can carry out recognition based on an optical character recognition model).

(11) If the original picture can obtain a correct recognition result under the condition of the current Rate value, repeating the steps (9) and (10); otherwise, the Rate value is unchanged, and the update of the Rate value is finished;

(12) and subtracting a fixed value from the Rate value at the moment to be used as the Rate value for coding the original picture.

The method provided by the embodiment can not only reduce the network transmission bandwidth, but also reduce the cost by reducing the occupation of the storage space for hundreds of millions of pictures. It should be noted that, the present embodiment uses the h.264 coding standard for example, but the present invention is applicable to any coding standard and method.

A fourth embodiment of the present invention provides a code rate setting method for optical character recognition, which uses an h.264 coding standard software JM to perform example coding on a picture, and includes the following specific steps:

(1) automatically setting a QP interval of picture coding according to requirements; for example: the QP is required to be larger than 10 and smaller than 40, and the interval may be set according to conditions such as the picture size and the bandwidth.

(2) The original picture is downsampled 1/4 times.

(3) Encoding is performed based on the intermediate value of the QP interval of the downsampled picture.

(4) And (3) decoding the downsampled picture and then performing optical character recognition (recognition can be performed on the basis of an optical character recognition model at a mobile terminal).

(5) If the correct recognition result can be obtained under the condition of the QP intermediate value, the QP intermediate value is used as an updated right endpoint, and the intermediate value is recalculated to be the intermediate value of the new interval; otherwise, it is taken as the updated left endpoint.

(6) Repeating the steps (3), (4) and (5) until the separation is not carried out (the difference between the QP values corresponding to the left end point and the right end point is less than or equal to 1), and ending the search process; if the QP/Rate of the right endpoint can obtain a correct recognition result, adopting the QP/Rate of the right endpoint for coding; otherwise, coding is carried out by adopting the QP/Rate of the left end point.

(7) And inputting the downsampled picture into a deep learning confidence neural network to obtain a confidence value which can be correctly identified by the picture. And obtaining a quantization coefficient increment N according to the value of the confidence coefficient.

(8) And (4) taking the QP value of the downsampled picture finally obtained in the step (6) as a new initial QP value of the original picture.

(9) The original picture QP value continues to be incremented by the quantization coefficient increment amount N as a new QP value.

(10) And the original picture is coded according to the new QP value, and then optical character recognition is carried out after decoding (the mobile terminal can carry out recognition based on an optical character recognition model).

(11) If the original picture can obtain a correct recognition result under the condition of the current QP value, repeating the steps (9) and (10); otherwise, the QP value is not changed, and the updating of the QP value is finished.

(12) The original picture is encoded using the QP value at this time minus a fixed value as the QP value.

Referring to fig. 5, a fifth embodiment of the present invention further provides a code rate setting device for optical character recognition, where the code rate setting device for optical character recognition may be any type of intelligent terminal, such as a mobile phone, a tablet computer, a personal computer, and the like.

Specifically, the code rate setting device for optical character recognition includes: one or more control processors and memory, one control processor being exemplified in fig. 5. The control processor and the memory may be connected by a bus or other means, as exemplified by the bus connection in fig. 5.

The memory, which is a non-transitory computer-readable storage medium, may be used to store a non-transitory software program, a non-transitory computer-executable program, and a module, such as program instructions/modules corresponding to the code rate setting device for optical character recognition in the embodiments of the present invention, and the control processor implements the code rate setting method for optical character recognition by operating the non-transitory software program, instructions, and modules stored in the memory.

The memory may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store the generated data. Further, the memory may include high speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory remotely located from the control processor, and the remote memory may be connected to the code rate setting device for optical character recognition over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The one or more modules are stored in the memory and, when executed by the one or more control processors, perform a code rate setting method for optical character recognition in the above-described method embodiments, for example, perform the above-described method steps S101 to S104 in fig. 1 or the method steps S201 to S204 in fig. 3.

Embodiments of the present invention also provide a computer-readable storage medium, where the computer-readable storage medium stores computer-executable instructions, which are executed by one or more control processors, for example, by one of the control processors in fig. 5, and may cause the one or more control processors to perform the code rate setting method for optical character recognition in the above method embodiment, for example, perform the above-described method steps S101 to S104 in fig. 1, or perform the method steps S201 to S204 in fig. 3.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, may be located in one place, or may be distributed over a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

Through the above description of the embodiments, those skilled in the art can clearly understand that the embodiments can be implemented by software plus a general hardware platform. Those skilled in the art will appreciate that all or part of the processes of the methods of the above embodiments may be implemented by hardware related to instructions of a computer program, which may be stored in a computer readable storage medium, and when executed, may include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read Only Memory (ROM), a Random Access Memory (RAM), or the like.

The embodiments of the present invention have been described in detail with reference to the accompanying drawings, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the gist of the present invention.