CN114399766B

Movatterモバイル変換

Info

Publication number: CN114399766B
Application number: CN202210056338.3A
Authority: CN
Inventors: 吴天学; 刘鹏
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2022-01-18
Filing date: 2022-01-18
Publication date: 2024-05-10
Anticipated expiration: 2042-01-18
Also published as: CN114399766A

Abstract

The invention relates to the field of artificial intelligence, and discloses an optical character recognition model training method, which comprises the following steps: performing error data screening on an original picture set and an original data set in actual production by using a search engine, and determining that error data form a negative sample data set and non-error data form a positive sample data set; identifying a positive sample data set, a negative sample data set and a predicted character set of an original picture set by utilizing an optical character identification model; calculating the loss values of the predicted character set, the real character labeling set and the error character labeling set, and if the loss values do not meet the preset conditions, adjusting the parameters of the model until the loss values meet the preset conditions, so as to obtain the optical character recognition model after training is completed. The invention also relates to a blockchain technology, and the trained optical character recognition model can be stored in a blockchain node. The invention also provides an optical character recognition model training device, equipment and medium. The invention can improve the training efficiency and accuracy of the optical character recognition model.

Description

Optical character recognition model training method, device, equipment and medium

Technical Field

The present invention relates to the field of artificial intelligence, and in particular, to a method and apparatus for training an optical character recognition model, an electronic device, and a computer readable storage medium.

Background

Along with research and development of artificial intelligence technology, an increasing requirement is put forward on the recognition accuracy of an optical character recognition model (such as an OCR deep learning recognition model), and as a mature OCR deep learning recognition model needs to be iterated for tens or hundreds of times, a great deal of manpower and material resources are invested by some scientific enterprises in order to acquire the mature OCR deep learning recognition model, so that quick development and iteration of the OCR deep learning recognition model are realized, and the requirement of service growth is met.

However, the traditional optical character recognition model has the difference between development environment training data and production environment data distribution in training, so that the optical character recognition model with better recognition effect in the development environment can not achieve the same good recognition effect in the production environment, and the accuracy of the optical character recognition model is low; when the recognition effect is poor, the test data are repeatedly built for testing, so that the training efficiency of the optical character recognition model is low and the accuracy rate still cannot be improved.

Disclosure of Invention

The invention provides an optical character recognition model training method, an optical character recognition model training device, electronic equipment and a computer readable storage medium, and aims to improve the efficiency and accuracy of optical character recognition model training.

In order to achieve the above object, the present invention provides a training method for an optical character recognition model, comprising:

Acquiring an original picture set in actual production and an original data set corresponding to the original picture set, and storing the original picture set and the original data set into a preset message queue channel;

When a preset search engine is idle, acquiring an original data set corresponding to the original picture set from the message queue channel by using the search engine, performing error data screening on the original data set, and determining that the screened error data form a negative sample data set and non-error data except the error data form a positive sample data set;

Acquiring a real character labeling set corresponding to the positive sample data set and an error character labeling set corresponding to the negative sample data set, wherein the error character labeling set is dynamically updated in real time;

Inputting the positive sample data set, the negative sample data set and the original picture set as training data sets to a preset optical character recognition model, and recognizing a predicted character set of the training data set by using the optical character recognition model;

And obtaining the loss values of the predicted character set, the real character labeling set and the error character labeling set through calculation, and if the loss values do not meet the preset conditions, adjusting the parameters of the optical character recognition model until the loss values meet the preset conditions, so as to obtain the trained optical character recognition model.

Optionally, the performing error data filtering on the original data set, determining that the filtered error data forms a negative sample data set, and that non-error data other than the error data forms a positive sample data set, including:

Acquiring the sequence length of the original data in the original data set by using a preset search engine, and setting a sequence length index by using a preset screening statement in the preset search engine;

Comparing the sequence length with the sequence length index, composing the original data corresponding to the sequence length inconsistent with the length of the sequence length index into a negative sample data set, and composing the original data corresponding to the sequence length consistent with the length of the sequence length index into a positive sample data set.

Optionally, after determining that the filtered error data forms a negative sample data set and the non-error data other than the error data forms a positive sample data set, the method further includes:

acquiring data fields of the positive sample data set and the negative sample data set, and identifying sensitive fields in the data fields;

and desensitizing the sensitive field by using a preset desensitizing function.

Optionally, the identifying the predicted character set of the training data set by using a preset optical character recognition model includes:

extracting a characteristic sequence of the training data set by using a convolution layer in a preset optical character recognition model to obtain a character vector set;

Predicting a character tag set of the character vector set by using a loop layer in the optical character recognition model;

and integrating the character tag set by utilizing a transcription layer in the optical character recognition model to obtain a predicted character set.

Optionally, the predicting the character tag set of the character vector set by using a loop layer in the optical character recognition model includes:

calculating a state value of the character vector set by using an input gate in the loop layer;

calculating an activation value of the character vector set by using a forgetting gate in the loop layer;

Calculating a state update value of the character vector set according to the state up-to-activation value;

And calculating a character tag set of the state update value by using an output gate in the circulating layer to obtain the character tag set of the character vector set.

Optionally, the integrating the character tag set by using a transcription layer in the optical character recognition model to obtain a predicted character set includes:

Acquiring all path probabilities of the character tag set by using the transcription layer, and searching the maximum path probability corresponding to each character tag from a plurality of path probabilities;

and merging each maximum path probability to obtain a predicted character set of the character tag set.

Optionally, before the storing the original picture set and the original data set in a preset message queue channel, the method further includes:

establishing links between the original data set and the original picture set and the message middleware, and forming a message queue channel through the links;

and storing the original picture set and the original data set through the message queue channel.

In order to solve the above problems, the present invention also provides an optical character recognition model training apparatus, the apparatus comprising:

The data set acquisition module is used for acquiring an original picture set in actual production and an original data set corresponding to the original picture set, and storing the original picture set and the original data set into a preset message queue channel;

The data set screening module is used for acquiring an original data set corresponding to the original picture set from the message queue channel by utilizing the search engine when a preset search engine is idle, carrying out error data screening on the original data set, determining that the screened error data form a negative sample data set, and determining that non-error data except the error data form a positive sample data set;

The data set labeling module is used for acquiring a real character labeling set corresponding to the positive sample data set and an error character labeling set corresponding to the negative sample data set, wherein the error character labeling set is dynamically updated in real time;

The training data set identification module is used for inputting the positive sample data set, the negative sample data set and the original picture set as training data sets to a preset optical character identification model, and identifying a predicted character set of the training data set by using the optical character identification model;

And the model training module is used for obtaining the loss values of the predicted character set, the real character labeling set and the error character labeling set through calculation, and adjusting the parameters of the optical character recognition model if the loss values do not meet the preset conditions until the loss values meet the preset conditions, so as to obtain the trained optical character recognition model.

In order to solve the above-mentioned problems, the present invention also provides an electronic apparatus including:

a memory storing at least one computer program; and

And the processor executes the computer program stored in the memory to realize the optical character recognition model training method.

In order to solve the above-mentioned problems, the present invention also provides a computer-readable storage medium having stored therein at least one computer program that is executed by a processor in an electronic device to implement the above-mentioned optical character recognition model training method.

In the embodiment of the invention, the original picture set in actual production and the original data set corresponding to the original picture set are firstly obtained, so that the data distribution difference between a development environment and a production environment can be avoided, and the accuracy of subsequent model training is improved; secondly, when a preset search engine is idle, acquiring an original data set corresponding to the original picture set from the message queue channel by utilizing the search engine, and carrying out error data identification on the original data set by utilizing the search engine, so that error data can be screened directly by the search engine, the screening of the error data can be carried out instead of manual work, the manpower and time are saved, the iteration period of the subsequent model training is shortened, and the training efficiency of the subsequent model is improved; furthermore, by marking the real characters corresponding to the positive sample data and marking the error characters corresponding to the negative sample data, the training of the subsequent model is facilitated; and finally, recognizing predicted characters of the training data by using the optical character recognition model, respectively calculating loss values of the predicted characters, the real characters and the error characters, and if the loss values do not meet preset conditions, adjusting parameters of the optical character recognition model, further improving the accuracy of the model until the loss values meet the preset conditions, so as to obtain the trained optical character recognition model. Therefore, the optical character recognition model training method, the device, the electronic equipment and the storage medium can improve the efficiency and the accuracy of the optical character recognition model training.

Drawings

FIG. 1 is a flowchart of an optical character recognition model training method according to an embodiment of the present invention;

FIG. 2 is a schematic block diagram of an optical character recognition model training apparatus according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of an internal structure of an electronic device for implementing an optical character recognition model training method according to an embodiment of the present invention;

The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

The embodiment of the application provides an optical character recognition model training method. The execution subject of the optical character recognition model training method includes, but is not limited to, at least one of a server, a terminal, and the like, which can be configured to execute the method provided by the embodiment of the application. In other words, the optical character recognition model training method may be performed by software or hardware installed in a terminal device or a server device, and the software may be a blockchain platform. The service end includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like.

Referring to fig. 1, a flowchart of an optical character recognition model training method according to an embodiment of the present invention is shown, where in the embodiment of the present invention, the optical character recognition model training method includes:

in detail, the optical character recognition model training method comprises the following steps:

s1, acquiring an original picture set in actual production and an original data set corresponding to the original picture set, and storing the original picture set and the original data set into a preset message queue channel.

In the embodiment of the invention, the original picture set is unstructured data obtained from an actual production environment process by using a preset optical character recognition interface, the original data set is character information extracted from the original picture set by using the optical character recognition interface, and the character information is structured data.

The structured data refers to data which can be stored in a database and realized by two-dimensional logic expression; the unstructured data refers to data which cannot be realized by two-dimensional logic expression, such as text, picture, XML, HTML, audio, video and the like.

Specifically, in this embodiment, the identified structured data and unstructured data are sent to the message queue channel by using a preset optical character recognition interface, and then the data are processed according to the user requirements.

In detail, before the storing the original picture set and the original data set in a preset message queue channel, the method further includes:

In the embodiment of the invention, the message queue channel is a channel which is formed by linking the original data and the message middleware and can receive information, store information and send information.

Preferably, the link may be a TCP link.

Preferably, the message middleware may be kafka.

In another embodiment of the present invention, the identified structured data may be sent to the message queue channel by using a preset optical character recognition interface, and the unstructured data may be stored in the NAS disc, and then processed according to the user requirements.

S2, when a preset search engine is idle, acquiring an original data set corresponding to the original picture set from the message queue channel by using the search engine, performing error data screening on the original data set, determining that the screened error data form a negative sample data set, and determining that non-error data except the error data form a positive sample data set.

Preferably, the preset search engine may be an elastic search.

In detail, the filtering the error data of the original data set, determining that the filtered error data forms a negative sample data set, and the non-error data except the error data forms a positive sample data set, including:

In the embodiment of the invention, the sequence length can be the character length corresponding to the original data, and the sequence length index set by using the preset screening statement is an index set for the fixed length of the characters in the original data.

For example, if a certain original picture is a car license plate picture, the sequence length contained in the original data corresponding to the original picture is seven-bit number (i.e. the length of the characters contained in the original picture is seven-bit number), the screening statement sets the sequence length index to be length 7, and if the sequence length in the original data is identified to be 7, the original data is determined to be positive sample data; if the sequence length in the original data is not 7, the original data is determined to be negative sample data.

Further, after the determining that the filtered error data forms a negative sample data set and the non-error data other than the error data forms a positive sample data set, the method further includes:

and desensitizing the sensitive field by using a preset desensitizing function.

In the embodiment of the present invention, the fields related to personal privacy information such as personal name, identification card information and mobile phone number are all sensitive fields, and after the sensitive fields are identified, data replacement or mask shielding treatment can be performed on the sensitive fields to implement desensitization, for example, data replacement is performed on the middle four digits of the mobile phone number to obtain 13800001248, or middle four digits mask shielding is performed to obtain 138×1248.

For example, personal identification information, mobile phone numbers, bank card information, etc. collected by institutions and enterprises are subjected to desensitization processing.

In the embodiment of the invention, the privacy information in the positive sample data and the negative sample data can be shielded or hidden through the desensitization processing of the desensitization function on the sensitive field, so that the security of the privacy data of users in the real production environment is protected.

For example, because the embodiment of the invention uses the data in the actual production process to perform model training, confidential data of enterprises may be involved, and the data use permission can be preset, and the use restriction is performed in the data use permission, so that the developer cannot view or download the data.

In an embodiment of the invention, the preset data use authority can also ensure that a developer can only use data when performing model iteration and training, but cannot check or download the data, thereby avoiding the risk of data leakage.

S3, acquiring a real character labeling set corresponding to the positive sample data set and an error character labeling set corresponding to the negative sample data set, wherein the error character labeling set is dynamically updated in real time.

In the embodiment of the invention, the positive sample data set and the negative sample data set can be transmitted to a preset labeling platform, the real character labeling of the positive sample data set is labeled by utilizing the preset labeling platform, the corresponding position information of the positive sample data set in the original picture (namely the position information of the real character in the original picture) is labeled, and the real character labeling of the positive sample data set and the position information of the real character in the original picture are combined into the real character labeling set; similarly, the character labeling of the negative sample data set is labeled by using a preset labeling platform, and the corresponding position information of the negative sample data set in the original picture (namely, the position information of the error character in the original picture) is labeled, and the character labeling of the negative sample data set and the position information of the character in the original picture are combined into the error character labeling set.

For example, the seven digits of license plate numbers XA and XXXXX can be marked by calling a marking platform interface to mark real characters of positive sample data XA and XXXXX and the number of digits of marked real characters respectively corresponding to the license plate numbers, and the real characters marked by the real characters of the positive sample data XA and XXXXX and the number of digits of marked real characters respectively corresponding to the license plate numbers are combined into a real character marking set; the license plate number of the negative sample data is XB.XXX, the character label of the negative sample data XB.XXX and the number of digits corresponding to the license plate number of the labeled character respectively can be labeled by calling the label platform interface, and the character label of the negative sample data XB.XXX and the number of digits corresponding to the license plate number of the labeled character respectively form an error character label set.

In the embodiment of the invention, after the index annotation platform discovers the error character annotation set, the negative sample data set corresponding to the error character annotation can be subjected to real-time data updating, and the negative sample data set is used as an updated training data set to be input into a subsequent model so as to improve the accuracy of subsequent model training.

S4, the positive sample data set, the negative sample data set and the original picture set are used as training data sets to be input into a preset optical character recognition model, and the optical character recognition model is used for recognizing a predicted character set of the training data set.

In the embodiment of the present invention, the preset optical character recognition model may be a deep learning model of a CRNN structure, where the CRNN structure includes: cnn+lstm+ctc, and the optical character recognition model includes: convolutional layer (CNN), cyclic Layer (LSTM), transcriptional layer (CTC), and loss function.

In detail, the identifying the predicted character of the training data set using the preset optical character recognition model includes:

In an embodiment of the present invention, the convolution layer includes a convolution sub-layer and a pooling layer, and feature extraction may be performed on the training data set by the convolution sub-layer to obtain a feature map; and extracting the feature sequence vectors in the feature map by using a pooling layer pair to obtain a character vector set.

In another embodiment of the present invention, the loop layer is mainly composed of variant LTSM of RNN, and since RNN has a problem of gradient disappearance, and cannot obtain more context information, LSTM replaces RNN, so that it can better extract context information, where the loop layer includes an input gate, a forget gate, and an output gate.

In detail, the predicting the character tag set of the character vector set using the loop layer in the optical character recognition model includes:

In the embodiment of the invention, the input gate can control the number of the character vector sets entering and exiting and the number of the character vector sets passing through the gate; the forgetting gate is used for controlling the quantity of the character vector sets flowing to the current moment from the character vector set at the previous moment; the state updating value refers to the character vector set passing through the forgetting gate is used as the state updating value if the character vector set is not selected to be forgotten by the forgetting gate; the output gate may output a character tag set of the character vector set.

In the embodiment of the invention, the transcription layer mainly consists of CTC (Connectionist Temporal Classifification) and mainly aims to convert the predicted character tag set in the LSTM into the predicted character set with the tag.

Further, the integrating the character tag set by using the transcription layer in the optical character recognition model to obtain a predicted character set includes:

And merging each maximum path probability to obtain the predicted character of the character tag set.

In the embodiment of the invention, the predicted character can be obtained through the following formula:

In the embodiment of the invention, P (pi|x) is the path probability of all character labels, B (pi) is the path set of all character labels, pi is the maximum path probability corresponding to each character label, and y is the predicted character corresponding to the character label.

S5, obtaining loss values of the predicted character set, the real character labeling set and the error character labeling set through calculation, and if the loss values do not meet preset conditions, adjusting parameters of the optical character recognition model until the loss values meet the preset conditions, so that the trained optical character recognition model is obtained.

In an embodiment of the present invention, the first loss value and the second loss value of the predicted character set and the error character labeling set are fused by calculating the first loss value of the predicted character set and the real character labeling set, and calculating the second loss value of the predicted character set and the error character labeling set to obtain the loss value, if the loss value does not satisfy a preset condition, the parameters of the optical character recognition model are adjusted until the loss value satisfies the preset condition, and the trained optical character recognition model is obtained.

For example, the preset condition may be a preset threshold value of 0.1, and when the loss value is less than 0.1, the model parameters are adjusted until the loss value is greater than or equal to 0.1, so as to obtain the trained optical character recognition model.

In the embodiment of the invention, the original picture set in actual production and the original data set corresponding to the original picture set are firstly obtained, so that the data distribution difference between a development environment and a production environment can be avoided, and the accuracy of subsequent model training is improved; secondly, when a preset search engine is idle, acquiring an original data set corresponding to the original picture set from the message queue channel by utilizing the search engine, and carrying out error data identification on the original data set by utilizing the search engine, so that error data can be screened directly by the search engine, the screening of the error data can be carried out instead of manual work, the manpower and time are saved, the iteration period of the subsequent model training is shortened, and the training efficiency of the subsequent model is improved; furthermore, by marking the real characters corresponding to the positive sample data and marking the error characters corresponding to the negative sample data, the training of the subsequent model is facilitated; and finally, recognizing predicted characters of the training data by using the optical character recognition model, respectively calculating loss values of the predicted characters, the real characters and the error characters, and if the loss values do not meet preset conditions, adjusting parameters of the optical character recognition model, further improving the accuracy of the model until the loss values meet the preset conditions, so as to obtain the trained optical character recognition model. Therefore, the optical character recognition model training method provided by the embodiment of the invention can improve the efficiency and accuracy of optical character recognition model training.

FIG. 2 is a functional block diagram of the optical character recognition model training device of the present invention.

The optical character recognition model training apparatus 100 of the present invention may be mounted in an electronic device. Depending on the functions implemented, the optical character recognition model training apparatus may include a data set acquisition module 101, a data set screening module 102, a data set labeling module 103, a training data set recognition module 104, and a model training module 105, which may also be referred to herein as a unit, refers to a series of computer program segments capable of being executed by a processor of an electronic device and performing a fixed function, which are stored in a memory of the electronic device.

In the present embodiment, the functions concerning the respective modules/units are as follows:

The data set obtaining module 101 is configured to obtain an original picture set in actual production and an original data set corresponding to the original picture set, and store the original picture set and the original data set into a preset message queue channel.

The data set acquisition module may be configured to:

Preferably, the link may be a TCP link.

Preferably, the message middleware may be kafka.

The data set screening module 102 is configured to, when a preset search engine is idle, obtain an original data set corresponding to the original picture set from the message queue channel by using the search engine, perform error data screening on the original data set, determine that the screened error data form a negative sample data set, and form non-error data other than the error data into a positive sample data set.

Preferably, the preset search engine may be an elastic search.

In detail, the data set filtering module 102 performs error data filtering on the original data set by performing the following operations, determining that the filtered error data forms a negative sample data set, and that non-error data other than the error data forms a positive sample data set, including:

The dataset screening module may be further operable to:

and desensitizing the sensitive field by using a preset desensitizing function.

The data set labeling module 103 is configured to obtain a real character labeling set corresponding to the positive sample data set and an error character labeling set corresponding to the negative sample data set, where the error character labeling set is dynamically updated in real time.

The training data set recognition module 104 is configured to input the positive sample data set, the negative sample data set, and the original picture set as training data sets to a preset optical character recognition model, and recognize a predicted character set of the training data set using the optical character recognition model.

In detail, the training data set recognition module 104 recognizes predicted characters of the training data set using a preset optical character recognition model by performing operations including:

The model training module 105 is configured to obtain, by calculating, a loss value of the predicted character set, the real character labeling set, and the error character labeling set, and if the loss value does not satisfy a preset condition, adjust parameters of the optical character recognition model until the loss value satisfies the preset condition, thereby obtaining a trained optical character recognition model.

In the embodiment of the invention, the original picture set in actual production and the original data set corresponding to the original picture set are firstly obtained, so that the data distribution difference between a development environment and a production environment can be avoided, and the accuracy of subsequent model training is improved; secondly, when a preset search engine is idle, acquiring an original data set corresponding to the original picture set from the message queue channel by utilizing the search engine, and carrying out error data identification on the original data set by utilizing the search engine, so that error data can be screened directly by the search engine, the screening of the error data can be carried out instead of manual work, the manpower and time are saved, the iteration period of the subsequent model training is shortened, and the training efficiency of the subsequent model is improved; furthermore, by marking the real characters corresponding to the positive sample data and marking the error characters corresponding to the negative sample data, the training of the subsequent model is facilitated; and finally, recognizing predicted characters of the training data by using the optical character recognition model, respectively calculating loss values of the predicted characters, the real characters and the error characters, and if the loss values do not meet preset conditions, adjusting parameters of the optical character recognition model, further improving the accuracy of the model until the loss values meet the preset conditions, so as to obtain the trained optical character recognition model. Therefore, the optical character recognition model training device provided by the embodiment of the invention can improve the efficiency and accuracy of optical character recognition model training.

Fig. 3 is a schematic structural diagram of an electronic device for implementing the training method of the optical character recognition model according to the present invention.

The electronic device may comprise a processor 10, a memory 11, a communication bus 12 and a communication interface 13, and may further comprise a computer program, such as an optical character recognition model training program, stored in the memory 11 and executable on the processor 10.

The memory 11 includes at least one type of medium, including flash memory, a mobile hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a magnetic memory, a local magnetic disk, an optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device, such as a mobile hard disk of the electronic device. The memory 11 may also be an external storage device of the electronic device in other embodiments, such as a plug-in mobile hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD) or the like, which are provided on the electronic device. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device. The memory 11 may be used not only for storing application software installed in an electronic device and various types of data, such as codes of an optical character recognition model training program, but also for temporarily storing data that has been output or is to be output.

The processor 10 may be comprised of integrated circuits in some embodiments, for example, a single packaged integrated circuit, or may be comprised of multiple integrated circuits packaged with the same or different functions, including one or more central processing units (Central Processing unit, CPU), microprocessors, digital processing chips, graphics processors, combinations of various control chips, and the like. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects various components of the entire electronic device using various interfaces and lines, executes or executes programs or modules (e.g., an optical character recognition model training program, etc.) stored in the memory 11, and invokes data stored in the memory 11 to perform various functions of the electronic device and process data.

The communication bus 12 may be a peripheral component interconnect standard (PERIPHERAL COMPONENT INTERCONNECT, PCI) bus, or an extended industry standard architecture (extended industry standard architecture, EISA) bus, among others. The bus may be classified as an address bus, a data bus, a control bus, etc. The communication bus 12 is arranged to enable a connection communication between the memory 11 and at least one processor 10 etc. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.

Fig. 3 shows only an electronic device with components, and it will be understood by those skilled in the art that the structure shown in fig. 3 is not limiting of the electronic device and may include fewer or more components than shown, or may combine certain components, or a different arrangement of components.

For example, although not shown, the electronic device may further include a power source (such as a battery) for supplying power to the respective components, and preferably, the power source may be logically connected to the at least one processor 10 through a power management device, so that functions of charge management, discharge management, power consumption management, and the like are implemented through the power management device. The power supply may also include one or more of any of a direct current or alternating current power supply, recharging device, power failure detection circuit, power converter or inverter, power status indicator, etc. The electronic device may further include various sensors, bluetooth modules, wi-Fi modules, etc., which are not described herein.

Optionally, the communication interface 13 may comprise a wired interface and/or a wireless interface (e.g., WI-FI interface, bluetooth interface, etc.), typically used to establish a communication connection between the electronic device and other electronic devices.

Optionally, the communication interface 13 may further comprise a user interface, which may be a Display, an input unit, such as a Keyboard (Keyboard), or a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch, or the like. The display may also be referred to as a display screen or display unit, as appropriate, for displaying information processed in the electronic device and for displaying a visual user interface.

It should be understood that the embodiments described are for illustrative purposes only and are not limited to this configuration in the scope of the patent application.

The optical character recognition model training program stored by the memory 11 in the electronic device is a combination of a plurality of computer programs that, when run in the processor 10, implement:

In particular, the specific implementation method of the processor 10 on the computer program may refer to the description of the relevant steps in the corresponding embodiment of fig. 1, which is not repeated herein.

Further, the electronic device integrated modules/units may be stored in a computer readable medium if implemented in the form of software functional units and sold or used as stand alone products. The computer readable medium may be non-volatile or volatile. The computer readable medium may include: any entity or device capable of carrying the computer program code to be described, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM).

Embodiments of the present invention may also provide a computer readable storage medium storing a computer program which, when executed by a processor of an electronic device, may implement:

Further, the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created from the use of blockchain nodes, and the like.

In the several embodiments provided by the present invention, it should be understood that the disclosed media, devices, apparatuses, and methods may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be other manners of division when actually implemented.

The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional module in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units can be realized in a form of hardware or a form of hardware and a form of software functional modules.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof.

The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.

The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The blockchain (Blockchain), essentially a de-centralized database, is a string of data blocks that are generated in association using cryptographic methods, each of which contains information from a batch of network transactions for verifying the validity (anti-counterfeit) of its information and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.

Furthermore, it is evident that the word "comprising" does not exclude other elements or steps, and that the singular does not exclude a plurality. A plurality of units or means recited in the system claims can also be implemented by means of software or hardware by means of one unit or means. The terms second, etc. are used to denote a name, but not any particular order.

Finally, it should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made to the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention.

Claims

1. A method of training an optical character recognition model, the method comprising:

2. The method of claim 1, wherein said performing error data filtering on said original data set to determine that the filtered error data constitutes a negative sample data set and that non-error data other than said error data constitutes a positive sample data set comprises:

3. The method of claim 1, wherein after determining that the screened error data comprises a negative sample data set and the non-error data other than the error data comprises a positive sample data set, the method further comprises:

and desensitizing the sensitive field by using a preset desensitizing function.

4. The method of claim 1, wherein the identifying the predicted character set of the training data set using the predetermined optical character recognition model comprises:

5. The method of claim 4, wherein predicting the character tag set of the character vector set using a loop layer in the optical character recognition model comprises:

6. The method of claim 4, wherein integrating the character tag set with the transcription layer in the optical character recognition model to obtain the predicted character set comprises:

7. The method of claim 1, wherein prior to storing the original picture set and the original data set in a predetermined message queue channel, the method further comprises:

8. An optical character recognition model training apparatus, the apparatus comprising:

9. An electronic device, the electronic device comprising:

at least one processor; and

A memory communicatively coupled to the at least one processor; wherein,

The memory stores computer program instructions executable by the at least one processor to enable the at least one processor to perform the optical character recognition model training method of any one of claims 1 to 7.

10. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the optical character recognition model training method according to any one of claims 1 to 7.