Movatterモバイル変換


[0]ホーム

URL:


CN111314697B - A code rate setting method, device and storage medium for optical character recognition - Google Patents

A code rate setting method, device and storage medium for optical character recognition
Download PDF

Info

Publication number
CN111314697B
CN111314697BCN202010116219.3ACN202010116219ACN111314697BCN 111314697 BCN111314697 BCN 111314697BCN 202010116219 ACN202010116219 ACN 202010116219ACN 111314697 BCN111314697 BCN 111314697B
Authority
CN
China
Prior art keywords
value
rate
picture
interval
optimal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010116219.3A
Other languages
Chinese (zh)
Other versions
CN111314697A (en
Inventor
张昊
傅枧根
钟培雄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central South University
Original Assignee
Central South University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central South UniversityfiledCriticalCentral South University
Priority to CN202010116219.3ApriorityCriticalpatent/CN111314697B/en
Publication of CN111314697ApublicationCriticalpatent/CN111314697A/en
Application grantedgrantedCritical
Publication of CN111314697BpublicationCriticalpatent/CN111314697B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

Translated fromChinese

本发明公开了一种针对光字符识别的码率设置方法、设备及存储介质,本方法首先对图片进行下采样,其次对下采样图片进行多次编码求取使得下采样图片能够被正确识别的最优QP/Rate值(最低的码率值/量化系数值),然后根据置信度神经网络求取码率增加量M/量化系数增加量N,最后快速找到针对该原始图片的最优编码值,该最优编码值是使得原始图片的光字符识别精度不受影响的最低码率值/最低量化系数值。相较于现有技术,本发明不仅可以减少网络传输带宽,而且针对大量数以亿计的图片也可以减少存储空间的占用而减少成本。而且本发明涵盖了主流的混合编码架构的应用场景,可以采用任意一种图像编码标准或者视频编码标准的帧内编码方式,应用广泛。

Figure 202010116219

The invention discloses a code rate setting method, device and storage medium for optical character recognition. The method firstly performs down-sampling on a picture, and secondly, performs multiple coding and retrieving on the down-sampled picture so that the down-sampled picture can be correctly identified. The optimal QP/Rate value (the lowest code rate value/quantization coefficient value), and then calculate the code rate increase M/quantization coefficient increase N according to the confidence neural network, and finally quickly find the optimal encoding value for the original picture , and the optimal encoding value is the lowest bit rate value/lowest quantization coefficient value that keeps the optical character recognition accuracy of the original picture unaffected. Compared with the prior art, the present invention can not only reduce the network transmission bandwidth, but also can reduce the occupation of storage space and cost for a large number of hundreds of millions of pictures. Moreover, the present invention covers the application scenarios of the mainstream hybrid coding architecture, and can adopt any intra-frame coding method of an image coding standard or a video coding standard, and is widely used.

Figure 202010116219

Description

Code rate setting method, equipment and storage medium for optical character recognition
Technical Field
The invention relates to the technical field of video coding and deep learning, in particular to a code rate setting method, equipment and a storage medium for optical character recognition.
Background
With the continuous development and progress of artificial intelligence technology, it has become popular to collect data and perform simple processing based on a mobile terminal, and then to transmit motion for intelligent analysis. Among them, face recognition and optical character recognition have been widely used. The transmission of a large amount of images consumes a large amount of bandwidth, in order to save the bandwidth of a data network, code Rate (Rate) setting needs to be carried out on image data, the code Rate of the image is minimum (so that the consumed bandwidth is minimum), and the influence on the image quality is minimum, so that the optical character recognition (namely OCR) effect is good. In addition, even in an application scenario in which OCR is directly performed in a cloud or a local server without network transmission, hundreds of millions of pictures occupy a large amount of storage space. In order to reduce the storage space of the picture and reduce the cost, it is also necessary to control the size of the picture by fast coding of the picture, and a code rate as small as possible (i.e. the size of the picture is as small as possible) is adopted so as not to affect the optical character recognition effect.
The conventional common image encoding methods are JPEG, JPEG2000, and the like. In recent years, the intra-frame coding method of the video coding standard can also be used for image coding, and better coding efficiency is achieved than the conventional methods such as JPEG. Among them, a series of standards such as h.264, HEVC, VVC, AVS2, AVS3, AV1 adopt a hybrid coding architecture, mainly aiming at video coding, but their intra-coding is also gradually applied to image coding. At present, on the premise of how to ensure the accuracy of optical character recognition in a plurality of coding standards, the problem of reducing the picture code rate as much as possible is still to be solved.
Disclosure of Invention
The present invention at least solves one of the technical problems in the prior art, and provides a code rate setting method, device and storage medium for optical character recognition.
According to an embodiment of the present invention, there is provided a code rate setting method for optical character recognition, including the steps of:
setting a Rate interval of an original picture, and performing down-sampling on the original picture to obtain a down-sampled picture;
and calculating an optimal Rate value of the downsampled picture in the Rate interval, wherein the optimal Rate value is a minimum value which meets the following conditions in the Rate interval: coding the downsampled picture based on the optimal Rate value, and correctly identifying the downsampled picture after decoding;
inputting the downsampled picture into a confidence coefficient neural network, and performing confidence coefficient prediction to obtain a code rate increment M;
setting the optimal Rate value plus the n Rate increment M as the optimal coding value of the original picture, wherein the value of n meets the following conditions: and coding the original picture based on the optimal Rate value plus n Rate increment M and being correctly identified after decoding, and coding the original picture based on the optimal Rate value plus n +1 Rate increment M and being incorrectly identified after decoding.
The code rate setting method for optical character recognition provided by the embodiment of the invention at least has the following beneficial effects:
(1) the method comprises the steps of firstly down-sampling a picture, secondly coding the down-sampled picture for multiple times to obtain an optimal Rate value (the lowest code Rate value) which enables the down-sampled picture to be correctly identified, secondly obtaining a code Rate increment M according to a confidence neural network, and finally quickly finding the optimal coding value aiming at the original picture according to the optimal Rate value and the code Rate increment M, wherein the optimal coding value is the lowest code Rate value which enables the optical character identification precision of the original picture not to be influenced. Compared with the prior art, the method reduces the time consumption of coding.
(2) The method not only can reduce the network transmission bandwidth, but also can reduce the occupation of storage space and the cost for a large number of hundreds of millions of pictures.
(3) The method covers the application scene of the mainstream mixed coding architecture, can adopt any image coding standard or intra-frame coding mode of the video coding standard, and has wide application.
According to the code Rate setting method for optical character recognition, the optimal Rate value of the down-sampling picture is obtained based on the dichotomy.
According to the method for setting the code Rate for optical character recognition in the embodiment of the invention, the setting of the Rate interval of the original picture comprises the following steps:
and setting a Rate interval of the original picture according to a coding standard to be selected, or setting the Rate interval of the original picture according to the size or bandwidth of the original picture.
According to the code rate setting method for optical character recognition provided by the embodiment of the invention, the down-sampling multiple of the original picture is 0.25.
According to an embodiment of the present invention, there is provided a code rate setting method for optical character recognition, including the steps of:
setting a QP interval of an original picture, and performing down-sampling on the original picture to obtain a down-sampled picture;
and solving an optimal QP value of the downsampled picture in the QP interval, wherein the optimal QP value is the minimum value of all values in the QP interval which meets the following conditions: coding the downsampled picture based on the optimal QP value, and correctly identifying the downsampled picture after decoding;
inputting the downsampled picture into a confidence coefficient neural network, and performing confidence coefficient prediction to obtain a quantization coefficient increment N;
setting the optimal QP value plus N quantization coefficient increment N as the optimal coding value of the original picture, wherein the value of N meets the following conditions: the original picture is encoded based on the optimal QP value plus N quantization coefficient increments N and can be correctly identified after decoding, and the original picture is encoded based on the optimal QP value plus N +1 quantization coefficient increments N and cannot be correctly identified after decoding.
The code rate setting method for optical character recognition provided by the embodiment of the invention at least has the following beneficial effects:
(1) the method comprises the steps of firstly carrying out downsampling on a picture, secondly carrying out multiple times of coding on the downsampled picture to obtain an optimal QP value (the lowest quantization coefficient value) which enables the downsampled picture to be correctly identified, then obtaining a quantization coefficient increment N according to a confidence neural network, and finally quickly finding the optimal coding value aiming at the original picture according to the optimal QP value and the quantization coefficient increment N, wherein the optimal coding value is the lowest quantization coefficient value which enables the optical character identification precision of the original picture not to be influenced. Compared with the prior art, the method reduces the time consumption of coding.
(2) The method not only can reduce the network transmission bandwidth, but also can reduce the occupation of storage space and the cost for a large number of hundreds of millions of pictures.
(3) The method covers the application scene of the mainstream mixed coding architecture, can adopt any image coding standard or intra-frame coding mode of the video coding standard, and has wide application.
According to the code rate setting method for optical character recognition, the optimal QP value of the downsampled picture is obtained based on the dichotomy.
According to the method for setting the code rate for optical character recognition in the embodiment of the invention, the setting of the QP interval of the original picture comprises the following steps:
and setting the QP interval of the original picture according to the coding standard to be selected, or setting the QP interval of the original picture according to the size or bandwidth of the original picture.
According to the code rate setting method for optical character recognition provided by the embodiment of the invention, the down-sampling multiple of the original picture is 0.25.
According to an embodiment of the present invention, there is provided a code rate setting apparatus for optical character recognition, including: at least one control processor and a memory for communicative connection with the at least one control processor; the memory stores instructions executable by the at least one control processor to enable the at least one control processor to perform a code rate setting method for optical character recognition as described above.
According to an embodiment of the present invention, there is provided a computer-readable storage medium storing computer-executable instructions for causing a computer to execute a code rate setting method for optical character recognition as described above.
Drawings
The invention is further described below with reference to the accompanying drawings and examples;
fig. 1 is a schematic flowchart of a code rate setting method for optical character recognition according to a first embodiment of the present invention;
FIG. 2 is a schematic view of the detailed process of step S102 in FIG. 1;
fig. 3 is a schematic flowchart of a code rate setting method for optical character recognition according to a second embodiment of the present invention;
FIG. 4 is a flowchart illustrating the step S202 in FIG. 2;
fig. 5 is a schematic structural diagram of a code rate setting device for optical character recognition according to a fifth embodiment of the present invention.
Detailed Description
The technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the accompanying drawings, and it is to be understood that the described embodiments are only a part of the embodiments of the present disclosure, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making any creative effort, shall fall within the protection scope of the disclosure. It should be noted that the features of the embodiments and examples of the present disclosure may be combined with each other without conflict. In addition, the purpose of the drawings is to graphically supplement the description in the written portion of the specification so that a person can intuitively and visually understand each technical feature and the whole technical solution of the present disclosure, but it should not be construed as limiting the scope of the present disclosure.
Referring to fig. 1 and 2, a first embodiment of the present invention provides a code rate setting method for optical character recognition, including the following steps:
s101, setting a Rate interval of an original picture, and performing down-sampling on the original picture to obtain a down-sampled picture;
as an alternative implementation manner, here, the Rate interval of the original picture may be set according to the coding standard to be selected, or the Rate interval of the original picture may be set according to the size or bandwidth of the original picture, which may be specifically adjusted according to the actual situation. For example: and coding the picture by using H.264 coding standard software JM, and setting the range of the Rate interval to be between [100 and 5000 ].
As an alternative embodiment, the original picture is downsampled by a factor of 0.25. In comparison with 0.1-time down-sampling and 0.5-time down-sampling, 0.25-time down-sampling is preferable because the size of a picture is reduced to a large extent and blurring of the picture can be avoided.
S102, solving an optimal Rate value of the downsampled picture based on a dichotomy, wherein the optimal Rate value is a minimum value meeting the following conditions in a Rate interval: coding the downsampled picture based on the optimal Rate value, and correctly identifying the downsampled picture after decoding;
it should be noted that, here, the optimal Rate value of the downsampled picture may also be obtained through a successive coding method, in this embodiment, it is preferable to obtain the optimal Rate value based on a bisection method, and the optimal Rate value can be obtained relatively quickly, and especially when the Rate interval is large, the efficiency of using the bisection method can be greatly improved.
The specific steps of solving the optimal Rate value of the downsampled picture based on the bisection method are as follows:
s1021, coding the downsampled picture based on the intermediate value of the Rate interval;
s1022, decoding the encoded downsampled picture, and then performing optical character recognition (recognition based on an optical character recognition model can be performed at the mobile terminal);
s1023, if the correct identification result can be obtained, updating the Rate interval by taking the middle value of the Rate interval as the updated right end point; if the correct recognition result cannot be obtained, updating the Rate interval by taking the middle value of the Rate interval as the updated left end point;
s1024, if the difference of the updated Rate values corresponding to the left end point and the right end point of the Rate interval is larger than 1, jumping to the step S1021; if the difference between the updated Rate values corresponding to the left and right endpoints of the Rate interval is less than or equal to 1, go to step S1025;
s1025, if the right end point of the Rate interval can obtain a correct recognition result, adopting the Rate value of the right end point as an optimal Rate value; if the right end point of the Rate interval can not obtain a correct recognition result, adopting the Rate value of the left end point as an optimal Rate value;
s103, inputting the down-sampling picture into a confidence coefficient neural network, and performing confidence coefficient prediction to obtain a code rate increment M;
in this step, a downsampled picture is input into a confidence neural network for deep learning, so that a confidence value that the downsampled picture can be correctly identified can be obtained, and then confidence prediction is performed to obtain a code rate increment M. Here, the confidence prediction is calculated by a functional formula, where the confidence value is input and the rate increase M is output, for example: when the confidence coefficient value is 90, solving that M is 5 according to a function calculation formula; when the confidence coefficient value is 80, solving M to be 4 according to a function calculation formula; it is understood that the function calculation formula can be set according to actual conditions.
S104, setting the optimal Rate value plus n code Rate increment M as the optimal coding value of the original picture, wherein the value of n meets the following conditions: the method comprises the steps of coding an original picture based on the optimal Rate value plus n Rate increment M and being correctly identified after decoding, and coding the original picture based on the optimal Rate value plus n +1 Rate increment M and being not correctly identified after decoding.
In the method provided by the embodiment, firstly, picture down-sampling is performed on an original picture; secondly, performing Rate value search on the downsampled picture based on a dichotomy to quickly obtain an optimal Rate value; then, continuously searching the original picture through the obtained optimal Rate value to obtain an optimal coding value, wherein the optimal coding value is a lowest code Rate value which enables the original picture to meet the accuracy of optical character recognition; the method can not only reduce the network transmission bandwidth, but also reduce the cost by reducing the occupation of storage space for a large number of hundreds of millions of pictures; the method can also be applied to any current image coding standard or intra-frame coding mode of video coding standard, and is widely applied.
Referring to fig. 3 and 4, a second embodiment of the present invention provides a code rate setting method for optical character recognition, including the following steps:
s201, setting a QP (quantization coefficient) interval of an original picture, and performing down-sampling on the original picture to obtain a down-sampled picture;
as an alternative implementation manner, the QP interval of the original picture may be set according to the coding standard to be selected, or the QP interval of the original picture may be set according to the size or bandwidth of the original picture, which may be adjusted according to the actual situation. For example: if the picture is coded by the h.264 coding standard software JM, the QP interval range is set to be between [10, 40 ].
As an alternative embodiment, the original picture is downsampled by a factor of 0.25. In comparison with 0.1-time down-sampling and 0.5-time down-sampling, 0.25-time down-sampling is preferable because the size of a picture is reduced to a large extent and blurring of the picture can be avoided.
S202, solving an optimal QP value of the downsampled picture based on a dichotomy, wherein the optimal QP value is the minimum value of all values in a QP interval which meet the following conditions: coding the downsampled picture based on the optimal QP value, and correctly identifying the downsampled picture after decoding;
it should be noted that, here, the optimal QP value of the downsampled picture may also be obtained through a successive coding method, and in this embodiment, it is preferable to obtain the optimal QP value based on the bisection method, so that the optimal QP value can be obtained relatively quickly, and especially when the QP interval is large, the efficiency of using the bisection method can be greatly improved.
The specific steps of solving the optimal QP value of the downsampled picture based on the bisection method are as follows:
s2021, coding the downsampled picture based on the middle value of the QP interval;
s2022, decoding the encoded downsampled picture, and performing optical character recognition (recognition based on an optical character recognition model can be performed at the mobile terminal);
s2023, if the correct recognition result can be obtained, updating the QP segment by using the middle value of the QP segment as the updated right end point; if the correct identification result cannot be obtained, updating the QP interval by taking the middle value of the QP interval as an updated left end point;
s2024, if the difference between the QP values corresponding to the left end point and the right end point of the updated QP interval is larger than 1, jumping to the step S2021; if the difference between the QP values corresponding to the left and right endpoints of the updated QP interval is less than or equal to 1, go to step S2025;
s2025, if the right end point of the QP interval can obtain a correct identification result, adopting the QP value of the right end point as the optimal QP value; if the right end point of the QP interval can not obtain the correct identification result, adopting the QP value of the left end point as the optimal QP value;
s203, inputting the down-sampling picture into a confidence coefficient neural network, and performing confidence coefficient prediction to obtain a quantization coefficient increment N;
in this step, a downsampled picture is input into a confidence neural network for deep learning, so that a confidence value that the downsampled picture can be correctly identified can be obtained, and then confidence prediction is performed to obtain a quantization coefficient increment N. It should be noted that the confidence prediction here is a function calculation formula, and the function calculation formula can be set according to the actual situation, and the setting principle is the same as that of the first embodiment, and will not be described in detail here.
S204, setting the optimal QP value plus the N quantization coefficient increment N as the optimal coding value of the original picture, wherein the value of N meets the following conditions: the original picture is encoded based on the optimal QP value plus N quantization coefficient increase N and can be correctly identified after decoding, and the original picture is encoded based on the optimal QP value plus N +1 quantization coefficient increase N and cannot be correctly identified after decoding.
In the method provided by the embodiment, firstly, picture down-sampling is performed on an original picture; secondly, searching QP values of the downsampled pictures based on a dichotomy to quickly obtain the optimal QP values; then, continuously searching the original picture through the obtained optimal QP value to obtain an optimal coding value, wherein the optimal coding value is a lowest quantization coefficient value which enables the original picture to meet the accuracy of optical character recognition; the method can not only reduce the network transmission bandwidth, but also reduce the cost by reducing the occupation of storage space for a large number of hundreds of millions of pictures; the method can also be applied to any current image coding standard or intra-frame coding mode of video coding standard, and is widely applied.
It should be noted that, since the QP value and the Rate value are mutually convertible in the art, the second embodiment is based on the same inventive concept as the first embodiment.
The third embodiment of the present invention provides a code rate setting method for optical character recognition, which uses h.264 coding standard software JM to exemplify the coding of a picture, and includes the following specific steps:
(1) the interval of the picture coding Rate is automatically set according to the requirement, for example: the Rate is required to be greater than 100 and less than 5000, and the interval can be set according to the conditions such as the size of the picture or the bandwidth;
(2) carrying out 1/4 times down-sampling on the original picture;
(3) coding based on the middle value of the downsampled picture code rate interval;
(4) performing optical character recognition after decoding the downsampled picture (recognition can be performed on the basis of an optical character recognition model at a mobile terminal);
(5) if the correct recognition result can be obtained under the condition of the middle value of the Rate, the right end point is used as an updated right end point, and the middle value is recalculated to be the middle value of a new interval; otherwise, the left endpoint is used as an updated left endpoint;
(6) repeating the steps (3), (4) and (5) until the separation is not carried out (the difference of the Rate values corresponding to the left end point and the right end point is less than or equal to 1), and finishing the searching process; if the right end point Rate can obtain the correct recognition result, adopting the right end point Rate for coding; otherwise, adopting the Rate of the left end point to carry out coding;
(7) after downsampling the picture, inputting the downsampled picture into a confidence neural network for deep learning to obtain a confidence value which can be correctly identified by the picture, and obtaining a code rate increment M according to the value of the confidence;
(8) taking the Rate value of the downsampled picture finally obtained in the step (6) as the initial Rate value of the original picture;
(9) continuously adding the code Rate increment M to the initial Rate value of the original picture to be used as a new Rate value;
(10) and the original picture is coded according to the new Rate value, and optical character recognition is carried out after decoding (the mobile terminal can carry out recognition based on an optical character recognition model).
(11) If the original picture can obtain a correct recognition result under the condition of the current Rate value, repeating the steps (9) and (10); otherwise, the Rate value is unchanged, and the update of the Rate value is finished;
(12) and subtracting a fixed value from the Rate value at the moment to be used as the Rate value for coding the original picture.
The method provided by the embodiment can not only reduce the network transmission bandwidth, but also reduce the cost by reducing the occupation of the storage space for hundreds of millions of pictures. It should be noted that, the present embodiment uses the h.264 coding standard for example, but the present invention is applicable to any coding standard and method.
A fourth embodiment of the present invention provides a code rate setting method for optical character recognition, which uses an h.264 coding standard software JM to perform example coding on a picture, and includes the following specific steps:
(1) automatically setting a QP interval of picture coding according to requirements; for example: the QP is required to be larger than 10 and smaller than 40, and the interval may be set according to conditions such as the picture size and the bandwidth.
(2) The original picture is downsampled 1/4 times.
(3) Encoding is performed based on the intermediate value of the QP interval of the downsampled picture.
(4) And (3) decoding the downsampled picture and then performing optical character recognition (recognition can be performed on the basis of an optical character recognition model at a mobile terminal).
(5) If the correct recognition result can be obtained under the condition of the QP intermediate value, the QP intermediate value is used as an updated right endpoint, and the intermediate value is recalculated to be the intermediate value of the new interval; otherwise, it is taken as the updated left endpoint.
(6) Repeating the steps (3), (4) and (5) until the separation is not carried out (the difference between the QP values corresponding to the left end point and the right end point is less than or equal to 1), and ending the search process; if the QP/Rate of the right endpoint can obtain a correct recognition result, adopting the QP/Rate of the right endpoint for coding; otherwise, coding is carried out by adopting the QP/Rate of the left end point.
(7) And inputting the downsampled picture into a deep learning confidence neural network to obtain a confidence value which can be correctly identified by the picture. And obtaining a quantization coefficient increment N according to the value of the confidence coefficient.
(8) And (4) taking the QP value of the downsampled picture finally obtained in the step (6) as a new initial QP value of the original picture.
(9) The original picture QP value continues to be incremented by the quantization coefficient increment amount N as a new QP value.
(10) And the original picture is coded according to the new QP value, and then optical character recognition is carried out after decoding (the mobile terminal can carry out recognition based on an optical character recognition model).
(11) If the original picture can obtain a correct recognition result under the condition of the current QP value, repeating the steps (9) and (10); otherwise, the QP value is not changed, and the updating of the QP value is finished.
(12) The original picture is encoded using the QP value at this time minus a fixed value as the QP value.
The method provided by the embodiment can not only reduce the network transmission bandwidth, but also reduce the cost by reducing the occupation of the storage space for hundreds of millions of pictures. It should be noted that, the present embodiment uses the h.264 coding standard for example, but the present invention is applicable to any coding standard and method.
Referring to fig. 5, a fifth embodiment of the present invention further provides a code rate setting device for optical character recognition, where the code rate setting device for optical character recognition may be any type of intelligent terminal, such as a mobile phone, a tablet computer, a personal computer, and the like.
Specifically, the code rate setting device for optical character recognition includes: one or more control processors and memory, one control processor being exemplified in fig. 5. The control processor and the memory may be connected by a bus or other means, as exemplified by the bus connection in fig. 5.
The memory, which is a non-transitory computer-readable storage medium, may be used to store a non-transitory software program, a non-transitory computer-executable program, and a module, such as program instructions/modules corresponding to the code rate setting device for optical character recognition in the embodiments of the present invention, and the control processor implements the code rate setting method for optical character recognition by operating the non-transitory software program, instructions, and modules stored in the memory.
The memory may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store the generated data. Further, the memory may include high speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory remotely located from the control processor, and the remote memory may be connected to the code rate setting device for optical character recognition over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The one or more modules are stored in the memory and, when executed by the one or more control processors, perform a code rate setting method for optical character recognition in the above-described method embodiments, for example, perform the above-described method steps S101 to S104 in fig. 1 or the method steps S201 to S204 in fig. 3.
Embodiments of the present invention also provide a computer-readable storage medium, where the computer-readable storage medium stores computer-executable instructions, which are executed by one or more control processors, for example, by one of the control processors in fig. 5, and may cause the one or more control processors to perform the code rate setting method for optical character recognition in the above method embodiment, for example, perform the above-described method steps S101 to S104 in fig. 1, or perform the method steps S201 to S204 in fig. 3.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, may be located in one place, or may be distributed over a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
Through the above description of the embodiments, those skilled in the art can clearly understand that the embodiments can be implemented by software plus a general hardware platform. Those skilled in the art will appreciate that all or part of the processes of the methods of the above embodiments may be implemented by hardware related to instructions of a computer program, which may be stored in a computer readable storage medium, and when executed, may include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read Only Memory (ROM), a Random Access Memory (RAM), or the like.
The embodiments of the present invention have been described in detail with reference to the accompanying drawings, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the gist of the present invention.

Claims (10)

Translated fromChinese
1.一种针对光字符识别的码率设置方法,其特征在于,包括以下步骤:1. a code rate setting method for optical character recognition, is characterized in that, comprises the following steps:设置原始图片的Rate区间,对所述原始图片进行下采样,得到下采样图片,其中所述Rate表示所述原始图片的码率;Setting the Rate interval of the original picture, down-sampling the original picture to obtain a down-sampled picture, wherein the Rate represents the bit rate of the original picture;在所述Rate区间内求取所述下采样图片的最优Rate值,所述最优Rate值为所述Rate区间内满足以下条件的最小值:基于所述最优Rate值对所述下采样图片进行编码,并在解码后能被正确识别;The optimal Rate value of the down-sampled picture is obtained in the Rate interval, and the optimal Rate value is the minimum value within the Rate interval that satisfies the following conditions: the down-sampling is based on the optimal Rate value The picture is encoded and can be correctly recognized after decoding;将所述下采样图片输入至置信度神经网络中,并进行置信度预测,得到码率增加量M;Inputting the down-sampled picture into the confidence neural network, and performing confidence prediction to obtain the code rate increase M;将所述最优Rate值加上n个所述码率增加量M设置为所述原始图片的最优编码值,所述n的取值满足以下条件:基于所述最优Rate值加上n个所述码率增加量M对所述原始图片进行编码,并在解码后能被正确识别、以及基于所述最优Rate值加上n+1个所述码率增加量M对所述原始图片进行编码,并在解码后不能被正确识别。The optimal Rate value plus n of the bit rate increases M is set as the optimal encoding value of the original picture, and the value of n satisfies the following conditions: based on the optimal Rate value plus n The original picture is encoded by the number of the code rate increases M, and can be correctly identified after decoding, and the original picture is encoded based on the optimal Rate value plus n+1 of the code rate increases M The picture is encoded and cannot be correctly recognized after decoding.2.根据权利要求1所述的一种针对光字符识别的码率设置方法,其特征在于,基于二分法求取所述下采样图片的最优Rate值;其中所述基于二分法求取所述下采样图片的最优Rate值,包括以下步骤:2. a kind of code rate setting method for optical character recognition according to claim 1, is characterized in that, based on dichotomy to obtain the optimal Rate value of the down-sampled picture; The optimal Rate value of the downsampled image, including the following steps:S1021、将所述下采样图片基于所述Rate区间的中间值进行编码;S1021, encoding the down-sampled picture based on the median value of the Rate interval;S1022、将编码后的所述下采样图片解码后进行光字符识别;S1022, performing optical character recognition after decoding the encoded down-sampled picture;S1023、若能取得正确的识别结果,则将所述Rate区间的中间值作为更新的右端点,更新所述Rate区间;若不能取得正确的识别结果,则将所述Rate区间的中间值作为更新的左端点,更新所述Rate区间;S1023, if the correct identification result can be obtained, then the middle value of the Rate interval is used as the right endpoint of the update, and the Rate interval is updated; if the correct identification result cannot be obtained, the middle value of the Rate interval is used as the update The left endpoint of , update the Rate interval;S1024、若更新后的所述Rate区间左右端点对应的Rate值之差大于1,则跳转至步骤S1021;若更新后的所述Rate区间左右端点对应的Rate值之差小于或等于1,则进入步骤S1025;S1024, if the difference of the Rate values corresponding to the left and right endpoints of the updated Rate interval is greater than 1, then jump to step S1021; if the updated difference of the Rate values corresponding to the left and right endpoints of the Rate interval is less than or equal to 1, then Enter step S1025;S1025、若更新后的所述Rate区间的右端点能取得正确的识别结果,则采用右端点的Rate值作为所述最优Rate值;若更新后的所述Rate区间的右端点不能取得正确的识别结果,则采用左端点的Rate值作为所述最优Rate值。S1025, if the updated right endpoint of the Rate interval can obtain the correct identification result, then adopt the Rate value of the right endpoint as the optimal Rate value; if the updated right endpoint of the Rate interval cannot obtain the correct identification result After identifying the result, the Rate value of the left endpoint is used as the optimal Rate value.3.根据权利要求2所述的一种针对光字符识别的码率设置方法,其特征在于,所述设置原始图片的Rate区间,包括:3. a kind of code rate setting method for optical character recognition according to claim 2, is characterized in that, the Rate interval of described setting original picture, comprises:根据待选用的编码标准设置所述原始图片的Rate区间,或者根据所述原始图片的大小或带宽设置所述原始图片的Rate区间。The Rate interval of the original picture is set according to the coding standard to be selected, or the Rate interval of the original picture is set according to the size or bandwidth of the original picture.4.根据权利要求1至3任一项所述的一种针对光字符识别的码率设置方法,其特征在于,所述原始图片进行下采样的倍数为0.25倍。4 . The code rate setting method for optical character recognition according to any one of claims 1 to 3 , wherein the multiple of downsampling the original picture is 0.25 times. 5 .5.一种针对光字符识别的量化系数设置方法,其特征在于,包括以下步骤:5. a quantization coefficient setting method for optical character recognition, is characterized in that, comprises the following steps:设置原始图片的QP区间,对所述原始图片进行下采样,得到下采样图片,其中所述QP表示所述原始图片的量化系数;The QP interval of the original picture is set, and the original picture is down-sampled to obtain a down-sampled picture, wherein the QP represents the quantization coefficient of the original picture;在所述QP区间内求取所述下采样图片的最优QP值,所述最优QP值为所述QP区间内所有值中满足以下条件的最小值:基于所述最优QP值对所述下采样图片进行编码,并在解码后能被正确识别;The optimal QP value of the down-sampled picture is obtained in the QP interval, and the optimal QP value is the smallest value among all the values in the QP interval that satisfies the following conditions: The down-sampled picture is encoded and can be correctly identified after decoding;将所述下采样图片输入至置信度神经网络中,并进行置信度预测,得到量化系数增加量N;Inputting the down-sampled picture into the confidence neural network, and predicting the confidence to obtain the increase amount N of the quantization coefficient;将所述最优QP值加上n个所述量化系数增加量N设置为所述原始图片的最优编码值,所述n的取值满足以下条件:基于所述最优QP值加上n个所述量化系数增加量N对所述原始图片进行编码,并在解码后能被正确识别、以及基于所述最优QP值加上n+1个所述量化系数增加量N对所述原始图片进行编码,并在解码后不能被正确识别。The optimal QP value plus n of the quantization coefficient increments N is set as the optimal encoding value of the original picture, and the value of n satisfies the following conditions: based on the optimal QP value plus n The original picture is encoded by the quantization coefficient increments N, and can be correctly identified after decoding, and based on the optimal QP value plus n+1 of the quantization coefficient increments N, the original picture is The picture is encoded and cannot be correctly recognized after decoding.6.根据权利要求5所述的一种针对光字符识别的量化系数设置方法,其特征在于,基于二分法求取所述下采样图片的最优QP值,其中所述基于二分法求取所述下采样图片的最优QP值,包括以下步骤:6. a kind of quantization coefficient setting method for optical character recognition according to claim 5 is characterized in that, based on dichotomy to obtain the optimal QP value of the down-sampled picture, wherein said to obtain the optimal QP value based on dichotomy The optimal QP value of the down-sampled picture includes the following steps:S2021、将所述下采样图片基于所述QP区间的中间值进行编码;S2021, encoding the down-sampled picture based on the intermediate value of the QP interval;S2022、将编码后的所述下采样图片解码后进行光字符识别;S2022, performing optical character recognition after decoding the encoded down-sampled picture;S2023、若能取得正确的识别结果,则将所述QP区间的中间值作为更新的右端点,更新所述QP区间;若不能取得正确的识别结果,则将所述QP区间的中间值作为更新的左端点,更新所述QP区间;S2023. If a correct identification result can be obtained, the middle value of the QP interval is used as the right endpoint of the update, and the QP interval is updated; if the correct identification result cannot be obtained, the middle value of the QP interval is used as the update The left endpoint of , update the QP interval;S2024、若更新后的所述QP区间左右端点对应的QP值之差大于1,则跳转至步骤S2021;若更新后的所述QP区间左右端点对应的QP值之差小于或等于1,则进入步骤S2025;S2024. If the difference between the QP values corresponding to the left and right endpoints of the updated QP interval is greater than 1, then jump to step S2021; if the difference between the QP values corresponding to the left and right endpoints of the updated QP interval is less than or equal to 1, then Enter step S2025;S2025、若更新后的所述QP区间的右端点能取得正确的识别结果,则采用右端点的QP值作为所述最优QP值;若更新后的所述QP区间的右端点不能取得正确的识别结果,则采用左端点的QP值作为所述最优QP值。S2025, if the updated right endpoint of the QP interval can obtain a correct identification result, then adopt the QP value of the right endpoint as the optimal QP value; if the updated right endpoint of the QP interval cannot obtain the correct identification result After identifying the result, the QP value of the left endpoint is used as the optimal QP value.7.根据权利要求5所述的一种针对光字符识别的量化系数设置方法,其特征在于,所述设置原始图片的QP区间,包括:7. a kind of quantization coefficient setting method for optical character recognition according to claim 5, is characterized in that, described setting the QP interval of original picture, comprises:根据待选用的编码标准设置所述原始图片的QP区间,或者根据所述原始图片的大小或带宽设置所述原始图片的QP区间。The QP interval of the original picture is set according to the coding standard to be selected, or the QP interval of the original picture is set according to the size or bandwidth of the original picture.8.根据权利要求5至7任一项所述的一种针对光字符识别的量化系数设置方法,其特征在于,所述原始图片进行下采样的倍数为0.25倍。8. The method for setting quantization coefficients for optical character recognition according to any one of claims 5 to 7, wherein the multiple of downsampling the original picture is 0.25 times.9.一种针对光字符识别的码率设置设备,其特征在于,包括:至少一个控制处理器和用于与所述至少一个控制处理器通信连接的存储器;所述存储器存储有可被所述至少一个控制处理器执行的指令,所述指令被所述至少一个控制处理器执行,以使所述至少一个控制处理器能够执行如权利要求1至4任一项所述的一种针对光字符识别的码率设置方法、以及权利要求5至8任一项所述的一种针对光字符识别的量化系数设置方法。9. A code rate setting device for optical character recognition, comprising: at least one control processor and a memory for being communicatively connected with the at least one control processor; Instructions executed by at least one control processor, said instructions being executed by said at least one control processor, to enable said at least one control processor to execute a method for light characters as claimed in any one of claims 1 to 4 A code rate setting method for recognition, and a quantization coefficient setting method for optical character recognition according to any one of claims 5 to 8.10.一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机可执行指令,所述计算机可执行指令用于使计算机执行如权利要求1至4任一项所述的一种针对光字符识别的码率设置方法、以及权利要求5至8任一项所述的一种针对光字符识别的量化系数设置方法。10. A computer-readable storage medium, wherein the computer-readable storage medium stores computer-executable instructions, and the computer-executable instructions are used to cause a computer to execute any one of claims 1 to 4. A code rate setting method for optical character recognition, and a quantization coefficient setting method for optical character recognition according to any one of claims 5 to 8.
CN202010116219.3A2020-02-252020-02-25 A code rate setting method, device and storage medium for optical character recognitionActiveCN111314697B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202010116219.3ACN111314697B (en)2020-02-252020-02-25 A code rate setting method, device and storage medium for optical character recognition

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202010116219.3ACN111314697B (en)2020-02-252020-02-25 A code rate setting method, device and storage medium for optical character recognition

Publications (2)

Publication NumberPublication Date
CN111314697A CN111314697A (en)2020-06-19
CN111314697Btrue CN111314697B (en)2021-10-15

Family

ID=71147740

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202010116219.3AActiveCN111314697B (en)2020-02-252020-02-25 A code rate setting method, device and storage medium for optical character recognition

Country Status (1)

CountryLink
CN (1)CN111314697B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN109302608A (en)*2017-07-252019-02-01华为技术有限公司 Image processing method, device and system
CN109495741A (en)*2018-11-292019-03-19四川大学Method for compressing image based on adaptive down-sampling and deep learning

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
KR20080067922A (en)*2007-01-172008-07-22삼성전자주식회사 An image decoding method and apparatus having an image reduction function
CN101778275B (en)*2009-01-092012-05-02深圳市融创天下科技股份有限公司Image processing method of self-adaptive time domain and spatial domain resolution ratio frame
WO2015104451A1 (en)*2014-01-072015-07-16Nokia Technologies OyMethod and apparatus for video coding and decoding
MX378531B (en)*2014-12-312025-03-05Nokia Technologies Oy INTER-LAYER PREDICTION FOR SCALABLE VIDEO ENCODING AND DECODING.
CN109120926B (en)*2017-06-232019-08-13腾讯科技(深圳)有限公司Predicting mode selecting method, device and medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN109302608A (en)*2017-07-252019-02-01华为技术有限公司 Image processing method, device and system
CN109495741A (en)*2018-11-292019-03-19四川大学Method for compressing image based on adaptive down-sampling and deep learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Low Bit-Rate Image Compression via Adaptive;Xiaolin Wu, Senior Member;《IEEE TRANSACTIONS ON IMAGE PROCESSING》;20090210;全文*
基于HEVC 帧内预测的复杂度控制;李林格,张恋,王洁,周巧,张昊;《电视技术》;20161117;全文*

Also Published As

Publication numberPublication date
CN111314697A (en)2020-06-19

Similar Documents

PublicationPublication DateTitle
CN113067873B (en)Edge cloud collaborative optimization method based on deep reinforcement learning
TWI806199B (en)Method for signaling of feature map information, device and computer program
KR20220137076A (en) Image processing method and related device
CN114501031B (en)Compression coding and decompression method and device
JP7687633B2 (en) Rate Control Based Reinforcement Learning
WO2022246986A1 (en)Data processing method, apparatus and device, and computer-readable storage medium
WO2023098688A1 (en)Image encoding and decoding method and device
US20250142099A1 (en)Parallel processing of image regions with neural networks – decoding, post filtering, and rdoq
CN114125455B (en) A bidirectional coded video frame insertion method, system and device based on deep learning
CN116309135A (en)Diffusion model processing method and device and picture processing method and device
CN116935166A (en)Model training method, image processing method and device, medium and equipment
US20250133223A1 (en)Method and Apparatus for Image Encoding and Decoding
CN113596442A (en)Video processing method and device, electronic equipment and storage medium
CN115426075A (en)Encoding transmission method of semantic communication and related equipment
TW202337211A (en)Conditional image compression
CN117787380A (en)Model acquisition method, device, medium and equipment
CN111314697B (en) A code rate setting method, device and storage medium for optical character recognition
CN112399177B (en)Video coding method, device, computer equipment and storage medium
CN111510740B (en)Transcoding method, transcoding device, electronic equipment and computer readable storage medium
CN116433491A (en)Image processing method, device, equipment, storage medium and product
CN118133006A (en)Training system and method of action strategy model
CN117972130A (en) Image processing method and device
CN113808157B (en)Image processing method and device and computer equipment
CN114330239B (en)Text processing method and device, storage medium and electronic equipment
CN117494762A (en)Training method of student model, material processing method, device and electronic equipment

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp