Disclosure of Invention
The present invention at least solves one of the technical problems in the prior art, and provides a code rate setting method, device and storage medium for optical character recognition.
According to an embodiment of the present invention, there is provided a code rate setting method for optical character recognition, including the steps of:
setting a Rate interval of an original picture, and performing down-sampling on the original picture to obtain a down-sampled picture;
and calculating an optimal Rate value of the downsampled picture in the Rate interval, wherein the optimal Rate value is a minimum value which meets the following conditions in the Rate interval: coding the downsampled picture based on the optimal Rate value, and correctly identifying the downsampled picture after decoding;
inputting the downsampled picture into a confidence coefficient neural network, and performing confidence coefficient prediction to obtain a code rate increment M;
setting the optimal Rate value plus the n Rate increment M as the optimal coding value of the original picture, wherein the value of n meets the following conditions: and coding the original picture based on the optimal Rate value plus n Rate increment M and being correctly identified after decoding, and coding the original picture based on the optimal Rate value plus n +1 Rate increment M and being incorrectly identified after decoding.
The code rate setting method for optical character recognition provided by the embodiment of the invention at least has the following beneficial effects:
(1) the method comprises the steps of firstly down-sampling a picture, secondly coding the down-sampled picture for multiple times to obtain an optimal Rate value (the lowest code Rate value) which enables the down-sampled picture to be correctly identified, secondly obtaining a code Rate increment M according to a confidence neural network, and finally quickly finding the optimal coding value aiming at the original picture according to the optimal Rate value and the code Rate increment M, wherein the optimal coding value is the lowest code Rate value which enables the optical character identification precision of the original picture not to be influenced. Compared with the prior art, the method reduces the time consumption of coding.
(2) The method not only can reduce the network transmission bandwidth, but also can reduce the occupation of storage space and the cost for a large number of hundreds of millions of pictures.
(3) The method covers the application scene of the mainstream mixed coding architecture, can adopt any image coding standard or intra-frame coding mode of the video coding standard, and has wide application.
According to the code Rate setting method for optical character recognition, the optimal Rate value of the down-sampling picture is obtained based on the dichotomy.
According to the method for setting the code Rate for optical character recognition in the embodiment of the invention, the setting of the Rate interval of the original picture comprises the following steps:
and setting a Rate interval of the original picture according to a coding standard to be selected, or setting the Rate interval of the original picture according to the size or bandwidth of the original picture.
According to the code rate setting method for optical character recognition provided by the embodiment of the invention, the down-sampling multiple of the original picture is 0.25.
According to an embodiment of the present invention, there is provided a code rate setting method for optical character recognition, including the steps of:
setting a QP interval of an original picture, and performing down-sampling on the original picture to obtain a down-sampled picture;
and solving an optimal QP value of the downsampled picture in the QP interval, wherein the optimal QP value is the minimum value of all values in the QP interval which meets the following conditions: coding the downsampled picture based on the optimal QP value, and correctly identifying the downsampled picture after decoding;
inputting the downsampled picture into a confidence coefficient neural network, and performing confidence coefficient prediction to obtain a quantization coefficient increment N;
setting the optimal QP value plus N quantization coefficient increment N as the optimal coding value of the original picture, wherein the value of N meets the following conditions: the original picture is encoded based on the optimal QP value plus N quantization coefficient increments N and can be correctly identified after decoding, and the original picture is encoded based on the optimal QP value plus N +1 quantization coefficient increments N and cannot be correctly identified after decoding.
The code rate setting method for optical character recognition provided by the embodiment of the invention at least has the following beneficial effects:
(1) the method comprises the steps of firstly carrying out downsampling on a picture, secondly carrying out multiple times of coding on the downsampled picture to obtain an optimal QP value (the lowest quantization coefficient value) which enables the downsampled picture to be correctly identified, then obtaining a quantization coefficient increment N according to a confidence neural network, and finally quickly finding the optimal coding value aiming at the original picture according to the optimal QP value and the quantization coefficient increment N, wherein the optimal coding value is the lowest quantization coefficient value which enables the optical character identification precision of the original picture not to be influenced. Compared with the prior art, the method reduces the time consumption of coding.
(2) The method not only can reduce the network transmission bandwidth, but also can reduce the occupation of storage space and the cost for a large number of hundreds of millions of pictures.
(3) The method covers the application scene of the mainstream mixed coding architecture, can adopt any image coding standard or intra-frame coding mode of the video coding standard, and has wide application.
According to the code rate setting method for optical character recognition, the optimal QP value of the downsampled picture is obtained based on the dichotomy.
According to the method for setting the code rate for optical character recognition in the embodiment of the invention, the setting of the QP interval of the original picture comprises the following steps:
and setting the QP interval of the original picture according to the coding standard to be selected, or setting the QP interval of the original picture according to the size or bandwidth of the original picture.
According to the code rate setting method for optical character recognition provided by the embodiment of the invention, the down-sampling multiple of the original picture is 0.25.
According to an embodiment of the present invention, there is provided a code rate setting apparatus for optical character recognition, including: at least one control processor and a memory for communicative connection with the at least one control processor; the memory stores instructions executable by the at least one control processor to enable the at least one control processor to perform a code rate setting method for optical character recognition as described above.
According to an embodiment of the present invention, there is provided a computer-readable storage medium storing computer-executable instructions for causing a computer to execute a code rate setting method for optical character recognition as described above.
Detailed Description
The technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the accompanying drawings, and it is to be understood that the described embodiments are only a part of the embodiments of the present disclosure, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making any creative effort, shall fall within the protection scope of the disclosure. It should be noted that the features of the embodiments and examples of the present disclosure may be combined with each other without conflict. In addition, the purpose of the drawings is to graphically supplement the description in the written portion of the specification so that a person can intuitively and visually understand each technical feature and the whole technical solution of the present disclosure, but it should not be construed as limiting the scope of the present disclosure.
Referring to fig. 1 and 2, a first embodiment of the present invention provides a code rate setting method for optical character recognition, including the following steps:
s101, setting a Rate interval of an original picture, and performing down-sampling on the original picture to obtain a down-sampled picture;
as an alternative implementation manner, here, the Rate interval of the original picture may be set according to the coding standard to be selected, or the Rate interval of the original picture may be set according to the size or bandwidth of the original picture, which may be specifically adjusted according to the actual situation. For example: and coding the picture by using H.264 coding standard software JM, and setting the range of the Rate interval to be between [100 and 5000 ].
As an alternative embodiment, the original picture is downsampled by a factor of 0.25. In comparison with 0.1-time down-sampling and 0.5-time down-sampling, 0.25-time down-sampling is preferable because the size of a picture is reduced to a large extent and blurring of the picture can be avoided.
S102, solving an optimal Rate value of the downsampled picture based on a dichotomy, wherein the optimal Rate value is a minimum value meeting the following conditions in a Rate interval: coding the downsampled picture based on the optimal Rate value, and correctly identifying the downsampled picture after decoding;
it should be noted that, here, the optimal Rate value of the downsampled picture may also be obtained through a successive coding method, in this embodiment, it is preferable to obtain the optimal Rate value based on a bisection method, and the optimal Rate value can be obtained relatively quickly, and especially when the Rate interval is large, the efficiency of using the bisection method can be greatly improved.
The specific steps of solving the optimal Rate value of the downsampled picture based on the bisection method are as follows:
s1021, coding the downsampled picture based on the intermediate value of the Rate interval;
s1022, decoding the encoded downsampled picture, and then performing optical character recognition (recognition based on an optical character recognition model can be performed at the mobile terminal);
s1023, if the correct identification result can be obtained, updating the Rate interval by taking the middle value of the Rate interval as the updated right end point; if the correct recognition result cannot be obtained, updating the Rate interval by taking the middle value of the Rate interval as the updated left end point;
s1024, if the difference of the updated Rate values corresponding to the left end point and the right end point of the Rate interval is larger than 1, jumping to the step S1021; if the difference between the updated Rate values corresponding to the left and right endpoints of the Rate interval is less than or equal to 1, go to step S1025;
s1025, if the right end point of the Rate interval can obtain a correct recognition result, adopting the Rate value of the right end point as an optimal Rate value; if the right end point of the Rate interval can not obtain a correct recognition result, adopting the Rate value of the left end point as an optimal Rate value;
s103, inputting the down-sampling picture into a confidence coefficient neural network, and performing confidence coefficient prediction to obtain a code rate increment M;
in this step, a downsampled picture is input into a confidence neural network for deep learning, so that a confidence value that the downsampled picture can be correctly identified can be obtained, and then confidence prediction is performed to obtain a code rate increment M. Here, the confidence prediction is calculated by a functional formula, where the confidence value is input and the rate increase M is output, for example: when the confidence coefficient value is 90, solving that M is 5 according to a function calculation formula; when the confidence coefficient value is 80, solving M to be 4 according to a function calculation formula; it is understood that the function calculation formula can be set according to actual conditions.
S104, setting the optimal Rate value plus n code Rate increment M as the optimal coding value of the original picture, wherein the value of n meets the following conditions: the method comprises the steps of coding an original picture based on the optimal Rate value plus n Rate increment M and being correctly identified after decoding, and coding the original picture based on the optimal Rate value plus n +1 Rate increment M and being not correctly identified after decoding.
In the method provided by the embodiment, firstly, picture down-sampling is performed on an original picture; secondly, performing Rate value search on the downsampled picture based on a dichotomy to quickly obtain an optimal Rate value; then, continuously searching the original picture through the obtained optimal Rate value to obtain an optimal coding value, wherein the optimal coding value is a lowest code Rate value which enables the original picture to meet the accuracy of optical character recognition; the method can not only reduce the network transmission bandwidth, but also reduce the cost by reducing the occupation of storage space for a large number of hundreds of millions of pictures; the method can also be applied to any current image coding standard or intra-frame coding mode of video coding standard, and is widely applied.
Referring to fig. 3 and 4, a second embodiment of the present invention provides a code rate setting method for optical character recognition, including the following steps:
s201, setting a QP (quantization coefficient) interval of an original picture, and performing down-sampling on the original picture to obtain a down-sampled picture;
as an alternative implementation manner, the QP interval of the original picture may be set according to the coding standard to be selected, or the QP interval of the original picture may be set according to the size or bandwidth of the original picture, which may be adjusted according to the actual situation. For example: if the picture is coded by the h.264 coding standard software JM, the QP interval range is set to be between [10, 40 ].
As an alternative embodiment, the original picture is downsampled by a factor of 0.25. In comparison with 0.1-time down-sampling and 0.5-time down-sampling, 0.25-time down-sampling is preferable because the size of a picture is reduced to a large extent and blurring of the picture can be avoided.
S202, solving an optimal QP value of the downsampled picture based on a dichotomy, wherein the optimal QP value is the minimum value of all values in a QP interval which meet the following conditions: coding the downsampled picture based on the optimal QP value, and correctly identifying the downsampled picture after decoding;
it should be noted that, here, the optimal QP value of the downsampled picture may also be obtained through a successive coding method, and in this embodiment, it is preferable to obtain the optimal QP value based on the bisection method, so that the optimal QP value can be obtained relatively quickly, and especially when the QP interval is large, the efficiency of using the bisection method can be greatly improved.
The specific steps of solving the optimal QP value of the downsampled picture based on the bisection method are as follows:
s2021, coding the downsampled picture based on the middle value of the QP interval;
s2022, decoding the encoded downsampled picture, and performing optical character recognition (recognition based on an optical character recognition model can be performed at the mobile terminal);
s2023, if the correct recognition result can be obtained, updating the QP segment by using the middle value of the QP segment as the updated right end point; if the correct identification result cannot be obtained, updating the QP interval by taking the middle value of the QP interval as an updated left end point;
s2024, if the difference between the QP values corresponding to the left end point and the right end point of the updated QP interval is larger than 1, jumping to the step S2021; if the difference between the QP values corresponding to the left and right endpoints of the updated QP interval is less than or equal to 1, go to step S2025;
s2025, if the right end point of the QP interval can obtain a correct identification result, adopting the QP value of the right end point as the optimal QP value; if the right end point of the QP interval can not obtain the correct identification result, adopting the QP value of the left end point as the optimal QP value;
s203, inputting the down-sampling picture into a confidence coefficient neural network, and performing confidence coefficient prediction to obtain a quantization coefficient increment N;
in this step, a downsampled picture is input into a confidence neural network for deep learning, so that a confidence value that the downsampled picture can be correctly identified can be obtained, and then confidence prediction is performed to obtain a quantization coefficient increment N. It should be noted that the confidence prediction here is a function calculation formula, and the function calculation formula can be set according to the actual situation, and the setting principle is the same as that of the first embodiment, and will not be described in detail here.
S204, setting the optimal QP value plus the N quantization coefficient increment N as the optimal coding value of the original picture, wherein the value of N meets the following conditions: the original picture is encoded based on the optimal QP value plus N quantization coefficient increase N and can be correctly identified after decoding, and the original picture is encoded based on the optimal QP value plus N +1 quantization coefficient increase N and cannot be correctly identified after decoding.
In the method provided by the embodiment, firstly, picture down-sampling is performed on an original picture; secondly, searching QP values of the downsampled pictures based on a dichotomy to quickly obtain the optimal QP values; then, continuously searching the original picture through the obtained optimal QP value to obtain an optimal coding value, wherein the optimal coding value is a lowest quantization coefficient value which enables the original picture to meet the accuracy of optical character recognition; the method can not only reduce the network transmission bandwidth, but also reduce the cost by reducing the occupation of storage space for a large number of hundreds of millions of pictures; the method can also be applied to any current image coding standard or intra-frame coding mode of video coding standard, and is widely applied.
It should be noted that, since the QP value and the Rate value are mutually convertible in the art, the second embodiment is based on the same inventive concept as the first embodiment.
The third embodiment of the present invention provides a code rate setting method for optical character recognition, which uses h.264 coding standard software JM to exemplify the coding of a picture, and includes the following specific steps:
(1) the interval of the picture coding Rate is automatically set according to the requirement, for example: the Rate is required to be greater than 100 and less than 5000, and the interval can be set according to the conditions such as the size of the picture or the bandwidth;
(2) carrying out 1/4 times down-sampling on the original picture;
(3) coding based on the middle value of the downsampled picture code rate interval;
(4) performing optical character recognition after decoding the downsampled picture (recognition can be performed on the basis of an optical character recognition model at a mobile terminal);
(5) if the correct recognition result can be obtained under the condition of the middle value of the Rate, the right end point is used as an updated right end point, and the middle value is recalculated to be the middle value of a new interval; otherwise, the left endpoint is used as an updated left endpoint;
(6) repeating the steps (3), (4) and (5) until the separation is not carried out (the difference of the Rate values corresponding to the left end point and the right end point is less than or equal to 1), and finishing the searching process; if the right end point Rate can obtain the correct recognition result, adopting the right end point Rate for coding; otherwise, adopting the Rate of the left end point to carry out coding;
(7) after downsampling the picture, inputting the downsampled picture into a confidence neural network for deep learning to obtain a confidence value which can be correctly identified by the picture, and obtaining a code rate increment M according to the value of the confidence;
(8) taking the Rate value of the downsampled picture finally obtained in the step (6) as the initial Rate value of the original picture;
(9) continuously adding the code Rate increment M to the initial Rate value of the original picture to be used as a new Rate value;
(10) and the original picture is coded according to the new Rate value, and optical character recognition is carried out after decoding (the mobile terminal can carry out recognition based on an optical character recognition model).
(11) If the original picture can obtain a correct recognition result under the condition of the current Rate value, repeating the steps (9) and (10); otherwise, the Rate value is unchanged, and the update of the Rate value is finished;
(12) and subtracting a fixed value from the Rate value at the moment to be used as the Rate value for coding the original picture.
The method provided by the embodiment can not only reduce the network transmission bandwidth, but also reduce the cost by reducing the occupation of the storage space for hundreds of millions of pictures. It should be noted that, the present embodiment uses the h.264 coding standard for example, but the present invention is applicable to any coding standard and method.
A fourth embodiment of the present invention provides a code rate setting method for optical character recognition, which uses an h.264 coding standard software JM to perform example coding on a picture, and includes the following specific steps:
(1) automatically setting a QP interval of picture coding according to requirements; for example: the QP is required to be larger than 10 and smaller than 40, and the interval may be set according to conditions such as the picture size and the bandwidth.
(2) The original picture is downsampled 1/4 times.
(3) Encoding is performed based on the intermediate value of the QP interval of the downsampled picture.
(4) And (3) decoding the downsampled picture and then performing optical character recognition (recognition can be performed on the basis of an optical character recognition model at a mobile terminal).
(5) If the correct recognition result can be obtained under the condition of the QP intermediate value, the QP intermediate value is used as an updated right endpoint, and the intermediate value is recalculated to be the intermediate value of the new interval; otherwise, it is taken as the updated left endpoint.
(6) Repeating the steps (3), (4) and (5) until the separation is not carried out (the difference between the QP values corresponding to the left end point and the right end point is less than or equal to 1), and ending the search process; if the QP/Rate of the right endpoint can obtain a correct recognition result, adopting the QP/Rate of the right endpoint for coding; otherwise, coding is carried out by adopting the QP/Rate of the left end point.
(7) And inputting the downsampled picture into a deep learning confidence neural network to obtain a confidence value which can be correctly identified by the picture. And obtaining a quantization coefficient increment N according to the value of the confidence coefficient.
(8) And (4) taking the QP value of the downsampled picture finally obtained in the step (6) as a new initial QP value of the original picture.
(9) The original picture QP value continues to be incremented by the quantization coefficient increment amount N as a new QP value.
(10) And the original picture is coded according to the new QP value, and then optical character recognition is carried out after decoding (the mobile terminal can carry out recognition based on an optical character recognition model).
(11) If the original picture can obtain a correct recognition result under the condition of the current QP value, repeating the steps (9) and (10); otherwise, the QP value is not changed, and the updating of the QP value is finished.
(12) The original picture is encoded using the QP value at this time minus a fixed value as the QP value.
The method provided by the embodiment can not only reduce the network transmission bandwidth, but also reduce the cost by reducing the occupation of the storage space for hundreds of millions of pictures. It should be noted that, the present embodiment uses the h.264 coding standard for example, but the present invention is applicable to any coding standard and method.
Referring to fig. 5, a fifth embodiment of the present invention further provides a code rate setting device for optical character recognition, where the code rate setting device for optical character recognition may be any type of intelligent terminal, such as a mobile phone, a tablet computer, a personal computer, and the like.
Specifically, the code rate setting device for optical character recognition includes: one or more control processors and memory, one control processor being exemplified in fig. 5. The control processor and the memory may be connected by a bus or other means, as exemplified by the bus connection in fig. 5.
The memory, which is a non-transitory computer-readable storage medium, may be used to store a non-transitory software program, a non-transitory computer-executable program, and a module, such as program instructions/modules corresponding to the code rate setting device for optical character recognition in the embodiments of the present invention, and the control processor implements the code rate setting method for optical character recognition by operating the non-transitory software program, instructions, and modules stored in the memory.
The memory may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store the generated data. Further, the memory may include high speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory remotely located from the control processor, and the remote memory may be connected to the code rate setting device for optical character recognition over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The one or more modules are stored in the memory and, when executed by the one or more control processors, perform a code rate setting method for optical character recognition in the above-described method embodiments, for example, perform the above-described method steps S101 to S104 in fig. 1 or the method steps S201 to S204 in fig. 3.
Embodiments of the present invention also provide a computer-readable storage medium, where the computer-readable storage medium stores computer-executable instructions, which are executed by one or more control processors, for example, by one of the control processors in fig. 5, and may cause the one or more control processors to perform the code rate setting method for optical character recognition in the above method embodiment, for example, perform the above-described method steps S101 to S104 in fig. 1, or perform the method steps S201 to S204 in fig. 3.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, may be located in one place, or may be distributed over a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
Through the above description of the embodiments, those skilled in the art can clearly understand that the embodiments can be implemented by software plus a general hardware platform. Those skilled in the art will appreciate that all or part of the processes of the methods of the above embodiments may be implemented by hardware related to instructions of a computer program, which may be stored in a computer readable storage medium, and when executed, may include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read Only Memory (ROM), a Random Access Memory (RAM), or the like.
The embodiments of the present invention have been described in detail with reference to the accompanying drawings, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the gist of the present invention.