Movatterモバイル変換


[0]ホーム

URL:


CN115035526A - Deep learning-based automatic LED character positioning and identifying method - Google Patents

Deep learning-based automatic LED character positioning and identifying method
Download PDF

Info

Publication number
CN115035526A
CN115035526ACN202210709461.0ACN202210709461ACN115035526ACN 115035526 ACN115035526 ACN 115035526ACN 202210709461 ACN202210709461 ACN 202210709461ACN 115035526 ACN115035526 ACN 115035526A
Authority
CN
China
Prior art keywords
led
character
network
detection
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210709461.0A
Other languages
Chinese (zh)
Other versions
CN115035526B (en
Inventor
邓轩
项导
陆盼盼
彭冲冲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yijiahe Technology Co Ltd
Original Assignee
Yijiahe Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yijiahe Technology Co LtdfiledCriticalYijiahe Technology Co Ltd
Priority to CN202210709461.0ApriorityCriticalpatent/CN115035526B/en
Publication of CN115035526ApublicationCriticalpatent/CN115035526A/en
Application grantedgrantedCritical
Publication of CN115035526BpublicationCriticalpatent/CN115035526B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Classifications

Landscapes

Abstract

The invention provides an automatic LED character positioning and identifying method based on deep learning, which utilizes a YOLOv4 algorithm to position all areas of LED characters to be identified, realizes the positioning of digital character areas of an LED dial plate from a panorama, then utilizes a PSENet network to position and detect one-line or multi-line characters, and finally utilizes a CRNN network to realize the identification of the LED multi-line characters. The method can also solve the problem of inaccurate identification caused by the problems of inclined LED tables, fuzzy characters and the like.

Description

Deep learning-based automatic LED character positioning and identifying method
Technical Field
The invention relates to the field of intelligent instruments, in particular to an automatic LED character positioning and identifying method based on deep learning.
Background
The LED digital meter is commonly found in novel intelligent instrument, compares with traditional mechanical type ammeter, has advantages such as the installation is simple, the engineering volume is little, the degree of accuracy is high, small, has the wide application in various control system, transformer automation, distribution automation, district electric power control, intelligent power distribution cabinet and cubical switchboard. However, the mode of reading the instrument usually depends on manual operation, which is not only inefficient, but also needs to consume a lot of manpower and energy, and in a high-voltage transformer substation and a distribution substation, the reading of the LED instrument needs to be read manually, and some dangerous factors cannot be denied in these situations, so that an algorithm capable of automatically identifying the reading of the LED instrument needs to be developed.
At present, a reading identification method of a digital LED instrument only identifies numbers of 0-9, and identification logics of decimal point, sign and ABC three phases are not designed, so that characters are mainly segmented and identified for the LED instrument with a single background, and the condition that the characters segmented by the characters cannot be identified due to incomplete display of the characters under the condition of complex background is generally caused; the method comprises the steps of designing different identification logics for numbers, decimal points, signs and ABC, positioning a character area of the LED instrument to be identified in a panoramic image by using a preset digital LED instrument image, a single character area in the LED instrument to be identified, a possibly-appearing decimal point area and the like by using a traditional template matching mode, acquiring the single character area and the decimal point area to be identified according to the relative position relation between a positioning frame and a target frame, inputting the acquired single character area into a trained convolutional neural network such as Alexnet for identification, detecting the brightness of the decimal point area, post-processing the detection result, and finally acquiring a final reading according to the character, decimal points, ABC three phases and sign identification results. The method has the defects that the single character is recognized, the digital recognition is seriously influenced by the character segmentation effect, and the problem that the LED character which is inclined, influenced by illumination or fuzzy in shooting is easy to be recognized wrongly is solved.
Disclosure of Invention
The invention provides an automatic LED character positioning and identifying method based on deep learning, which aims to solve the problems in the prior art, and comprises the steps of positioning all areas of LED characters to be identified by using a YOLOv4 algorithm, realizing the positioning of digital character areas of an LED dial from a panoramic picture, then carrying out positioning detection on one-line or multi-line characters by using a PSENet network, and finally realizing the identification of the LED multi-line characters by using a CRNN network. The method can also solve the problem of inaccurate identification caused by the problems of inclined LED tables, fuzzy characters and the like.
The invention fully uses the powerful ability of deep learning, namely, firstly, a character area of an LED instrument is positioned by using a YOLOv4 network, the position of the instrument can be accurately detected by using the YOLO target detection network based on the deep learning, the position information of the instrument does not need to be marked in a picture shot by a camera, the shot picture can be directly input into the network for detection, after an ROI (region of interest) of the instrument is detected, the ROI is input into the character detection network, a progressive expansion network PSENet is used, the output with the same size as an original image is obtained through downsampling, feature fusion and upsampling, a final text connected domain is obtained, namely, the position of each line of characters in the LED instrument is positioned, the network can position a single line or a plurality of lines of character areas, namely, all the character areas in the instrument can be detected and positioned, and finally the detected text character areas are input into a character recognition network for recognition, and the CRNN network is used for automatic identification, and the automatic positioning and identification of the LED multi-line characters are completed in an effective mode.
The technical scheme adopted by the invention is that the LED character automatic positioning and identifying method based on deep learning is implemented according to the following steps:
step 1, an LED instrument area positioning module performs LED instrument target detection in a transformer substation scene by using a YOLOv4 target detection algorithm, and only positions a digital character area where an LED meter is located in order to avoid interference of other area problems of a dial plate on character recognition;
step 2, an LED instrument character detection module uses a progressive scale expansion network PSENet algorithm as a digital LED instrument character detection module in a transformer substation scene, and the detection performance of the model under LED multi-line characters is improved by detecting an LED character target area at a pixel level through an image segmentation technology;
and 3, the LED instrument character recognition module trains the acquired one-line or multi-line character target area characteristics by using the CRNN according to the one-line or multi-line character target area acquired in the step 2, and finally recognizes specific characters by using a CTC algorithm to acquire the recognition result of the LED instrument.
The step 1 is implemented according to the following steps:
step 1.1, data enhancement is carried out on LED sample data, a GridMask data enhancement method is used, the GridMask belongs to an information deleting method, and specifically, a region is randomly discarded on an image, namely, in order to avoid network overfitting, a regular item is newly added on a network, and balance is carried out before information is deleted and retained. Random erasure, cutout and hide-seek methods may cause all discriminable areas to be deleted or reserved, noise is introduced, and training of the model is not facilitated.
A GridMask corresponds to 4 parameters which are x, y, r and d respectively, a group of specific Mask areas are determined through the 4 parameters, and the Mask areas are also rotated in the actual training process.
Figure BDA0003706522590000031
K is the reserved proportion of image information, W and H are the width and height of the original image respectively, M is the reserved pixel number, K has no direct relation with the four parameters, the parameters indirectly define r, and the definition of r can be obtained through K conversion
k=1-(1-r)2
x and y are defined as random over a certain area:
δxy )=random(0,d-1)
in a task of detecting the LED instrument, r in 4 hyper-parameters of the GridMask is set to be 0.4, d is set to be (96,224), in the using process, the GridMask is enhanced on a training image with the probability that P is 0.6, the detection task is set to be 0 at the beginning, the GridMask enhancement mode is gradually increased on the training image along with the increase of training times, and finally the detection task is changed to be P.
Step 1.2, a YOLOv4 target detection network is constructed to position the position of a character area of an LED instrument dial in a picture, learned high-level semantic information is transmitted into a low-level network through an FPN network, then the high-level semantic information and low-level high-resolution information are fused to improve the detection effect, an information transmission path from the low level to the high level is added, feature information is enhanced through down-sampling operation, and finally the feature information of different convolution layers is fused to achieve the detection effect. The trunk extraction network CSPDarknet53 of YOLOv4 uses a Mish activation function, the Mish function is a smooth curve, the smooth activation function can enable information to be better input into a neural network, so that better accuracy and generalization are obtained, and smaller negative gradient input can be allowed. The functional expression is as follows:
Mish=x×tanh(ln(1+ex ))
step 1.3, defining a target marking frame of a character area of the dial plate of the LED instrument marked in advance, wherein the area is defined as a Ground route, inputting a marked target picture and a marking file thereof into a YOLOv4 network for training, and positioning the character area of the LED instrument with different characters by utilizing the trained YOLOv4 target detection network.
And step 1.4, using DIoU-NMS, and simultaneously considering the distance between the central points of the overlapped area and the two boxes to achieve the purpose of removing repeated target frames and finally obtaining the digital character area of the LED instrument.
The step 2 is implemented according to the following steps:
step 2.1, feature extraction, namely performing feature extraction on an input picture through a Resnet50 residual network, wherein ResNet50 has 50 Conv2d layers, extracting feature maps output by Conv2, Conv3, Conv4 and Conv5 layers respectively to construct a feature pyramid, and extracting 4-layer feature P in a top-down and transverse connection mode2 ,P3 ,P4 ,P5 And extracting to obtain 4 feature layers with 256 channels.
Step 2.2, feature fusion, namely fusing 4 feature graphs obtained in the step 2.1 and fusing P3 ,P4 ,P5 Respectively characterized by the characteristic layer P through 2 times, 4 times and 8 times of upsampling2 And performing characteristic cascade to finally obtain a 1024-dimensional fused characteristic vector F. The high-level semantic features and the low-level semantic features are fused together, so that the distribution of the LED characters can be effectively sensed, and the character boundary can be more accurately detected. The specific implementation mode is as follows:
F=C(P2 ,P3 ,P4 ,P5 )=P2 ||UP×2 (P3 )||UP×4 (P4 )||UP×8 (P5 )
wherein, "|" represents the connection operation, and the upsampling is performed in a manner of 2 times, 4 times and 8 times respectively.
Step 2.3, the fusion characteristic F obtained in the step 2.2 is convoluted by 3 multiplied by 3, characteristic diagrams of 256 channels are obtained by a BN layer and a ReLU layer, and the characteristic diagrams are input into the convolution of 1 multiplied by 1 to obtain s1 ,s2 ,...,sn And (4) dividing the results, and arranging the division results in the order from small to large according to the kernel scale.
And 2.4, sequentially performing scale expansion from the minimum kernel through a PSENet algorithm, and adopting a scheme of first-come first-obtained to solve the problem of boundary conflict in the scale expansion to finally obtain an LED character detection result with clear boundary.
Step 3 is specifically implemented according to the following steps:
step 3.1, detecting LED character feature extraction, wherein a VGG structure is used in a CNN part in a CRNN, in order to enable the model convergence speed to be faster, in consideration of the actual aspect ratio of LED characters, pictures are unified and normalized to the size of [240,50], and because the network has deep convolutional layers and recursive layers, the training of the deep convolutional layers and the recursive layers is difficult, a batch normalization layer BN layer is added after the fifth convolutional layer and the sixth convolutional layer of the network, and a batch normalization layer is adopted, so that the training speed is greatly increased. And finally, carrying out feature extraction through a CNN network to obtain 240/4 feature sequences of 512 channels.
And 3.2, an LED character prediction part inputs the characteristic diagram extracted from the CNN network in the step 3.1 into the RNN network for character prediction by utilizing the RNN network, the used CNN network has four maximum pooling layers, and the window sizes of the last two pooling layers are changed from 2 multiplied by 2 to 1 multiplied by 2, because most LED character areas are small in height and long in width, and the use of the 1 multiplied by 2 pooling windows can ensure that information in the width direction is not lost as much as possible, and the LED character prediction part is more suitable for identifying English letters and numbers. Because the shot LED characters are fuzzy in the actual station, in order to improve the accuracy of fuzzy LED character recognition, a deep bidirectional RNN is adopted as the RNN in the CRNN, and the RNN is a characteristic sequence x which is output by the CNN and is x1 ,…,xt Each input xt All have an output yt (ii) a Because different LED tables and characters are different in length, in order to identify the phenomenon of an indefinite-length sequence, a long-short-time memory unit LSTM is selected as a unit of the RNN, and meanwhile, the LSTM can also effectively prevent the gradient disappearance phenomenon of the RNN network in the training process. Firstly, extracting a feature map of a text picture based on 7-layer CNN, segmenting the feature map according to columns, and inputting each channel into two layers of bidirectional LSTMs with 256 units as 512-dimensional time sequences for classification.
And 3.3, a character transcription part, namely, after the LED character sequence passes through an RNN network, the obtained prediction result needs to be converted into a character tag through a transcription layer CTC, a blank character epsilon is introduced into the CTC, pauses in the character interval all represent epsilon, and the CTC mainly relates to two parts of repeated letter removal and epsilon removal. The invention adopts a dictionary-based CTC algorithm to transcribe characters, error difference is propagated backwards through a forward algorithm and a backward algorithm in a transcription layer, the probability of all labels is finally obtained based on a prediction result of the dictionary, and finally, the corresponding label value with the maximum probability is selected as a recognition result.
The invention has the beneficial effects that:
1. the method is suitable for identifying the digital instruments of the power distribution room and the transformer substation, solves the problem of low current reading efficiency of manually entered instruments on one hand, and can obtain higher identification effect under the influence of external factors such as illumination, shooting angle, instrument form and the like on the other hand.
2. The YOLO target detection network based on deep learning can accurately detect the position of the instrument, does not need to mark instrument position information in a picture shot by a camera, and can directly input the shot picture into the network for detection.
3. The method comprises the steps of utilizing a PSENet algorithm of a progressive scale expansion network as a character detection module of a digital LED instrument in a transformer substation scene, positioning a single-line or multi-line character area by the network, namely detecting and positioning all character areas in the instrument, and finally inputting the detected text character area into a character recognition network for recognition.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic diagram of the working procedure of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The whole work flow of the invention is as shown in figure 1, firstly, the character area of the LED meter is positioned by using a YOLOv4 network, the position of the meter can be accurately detected by using a YOLO target detection network based on deep learning, the position information of the meter does not need to be marked in the picture shot by a camera, the shot picture can be directly input into the network for detection, after the ROI (region of interest) of the meter is detected, the ROI is input into the character detection network, the network PSENet is progressively expanded by using a progressive network, the output with the same size as the original picture is obtained by down sampling, feature fusion and up sampling, the final text connected domain is obtained, namely the position of each line of characters of the LED meter is positioned, the network can position a single line or a plurality of lines of character areas, namely all the character areas in the meter can be detected and positioned, and finally the detected text character areas are input into a character recognition network for recognition, and the CRNN network is used for automatic identification, and the automatic positioning and identification of the LED multi-line characters are completed in an effective mode.
The technical scheme adopted by the invention is that the LED character automatic positioning and identifying method based on deep learning is implemented according to the following steps:
step 1, an LED meter area positioning module, as shown in the LED meter character area positioning module in fig. 1, inputs a picture with a resolution of 1920 × 1080 taken by a camera into a target detection network, and because there are some other interference factors, such as indicator lights, other signboard characters, etc., in the picture, the LED meter to be recognized is to be accurately detected, so that the YOLOv4 target detection algorithm is used for LED meter target detection in a substation scene, and in order to avoid the problem of other areas of the dial plate from interfering with character recognition, only a digital character area where the LED meter is located is positioned, and a roi area where the character is located is output as the input of step 2;
step 2, an LED instrument character detection module, as shown in the LED instrument character detection module in FIG. 1, inputting the digital character region output in the step 1 into a network, wherein the LED instrument has 3 rows of character regions as shown in the figure, and in order to accurately identify each row of character region, each row of character region needs to be detected through the character detection module, so that a PSENet algorithm of a progressive scale expansion network is used as the digital LED instrument character detection module in a transformer substation scene, and through an image segmentation technology, the LED character target region is detected at a pixel level, and the detection performance of the model under the LED multi-row characters is improved;
and 3, an LED instrument character recognition module, as shown in the LED instrument character recognition module in fig. 1, sequentially inputting the multiple lines of character target areas detected in the step 2 into a recognition network, training the acquired one or more lines of character target area characteristics by using a CRNN network, and finally recognizing specific characters by using a CTC algorithm to acquire the recognition result of the LED instrument.
The step 1 is implemented according to the following steps:
step 1.1, data enhancement is carried out on LED sample data, and by using a GridMask data enhancement method, GridMask belongs to a method for deleting information, and specifically, a region is randomly discarded on an image, namely, a regular item is newly added on a network to avoid network overfitting, so that a balance is carried out before information is deleted and retained. Random erasure, cutout and hide-seek methods may cause all discriminable areas to be deleted or reserved, noise is introduced, and training of the model is not facilitated.
One GridMask corresponds to 4 parameters, namely x, y, r and d, a group of specific Mask regions is determined through the 4 parameters, and the Mask regions are also rotated in the actual training process.
Figure BDA0003706522590000071
K is the reserved proportion of image information, W and H are the width and height of the original image respectively, M is the reserved pixel number, K has no direct relation with the four parameters, the parameters indirectly define r, and the definition of r can be obtained through K conversion
k=1-(1-r)2
x and y are defined as random over a certain area:
δxy )=random(0,d-1)
in a task of detecting the LED instrument, r in 4 hyper-parameters of the GridMask is set to be 0.4, d is set to be (96,224), in the using process, the GridMask is enhanced on a training image with the probability that P is 0.6, the detection task is set to be 0 at the beginning, the GridMask enhancement mode is gradually increased on the training image along with the increase of training times, and finally the detection task is changed to be P.
Step 1.2, a YOLOv4 target detection network is constructed to position the position of a character area of an LED instrument dial in a picture, learned high-level semantic information is transmitted into a low-level network through an FPN network, then the high-level semantic information and low-level high-resolution information are fused to improve the detection effect, an information transmission path from the low level to the high level is added, feature information is enhanced through down-sampling operation, and finally the feature information of different convolution layers is fused to achieve the detection effect. The trunk extraction network CSPDarknet53 of YOLOv4 uses a Mish activation function, the Mish function is a smooth curve, the smooth activation function can enable information to be better input into a neural network, so that better accuracy and generalization are obtained, and smaller negative gradient input can be allowed. The functional expression is as follows:
Mish=x×tanh(ln(1+ex ))
step 1.3, defining a target marking frame of a character area of the dial plate of the LED instrument marked in advance, wherein the area is defined as a Ground route, inputting a marked target picture and a marking file thereof into a YOLOv4 network for training, and positioning the character area of the LED instrument with different characters by utilizing the trained YOLOv4 target detection network.
And step 1.4, using DIoU-NMS, and simultaneously considering the distance between the central points of the overlapped area and the two boxes to achieve the purpose of removing repeated target frames and finally obtaining the digital character area of the LED instrument.
The step 2 is implemented according to the following steps:
and 2. step 2.1. Feature extraction, namely performing feature extraction on an input picture through a Resnet50 residual network, wherein ResNet50 has 50 Conv2d layers, extracting feature maps output by Conv2, Conv3, Conv4 and Conv5 layers respectively to construct a feature pyramid, and extracting 4-layer features P by using a top-down and transverse connection mode2 ,P3 ,P4 ,P5 Extraction is carried out, and 4 feature layers with 256 channels are obtained.
Step 2.2, feature fusion, namely fusing 4 feature graphs obtained in the step 2.1 and fusing P3 ,P4 ,P5 Respectively characterized by characteristic layer P through 2 times, 4 times and 8 times of upsampling2 And performing characteristic cascade to finally obtain a 1024-dimensional fused characteristic vector F. The high-level semantic features and the low-level semantic features are fused together, so that the distribution of the LED characters can be effectively sensed, and the character boundary can be more accurately detected. The specific implementation mode is as follows:
F=C(P2 ,P3 ,P4 ,P5 )=P2 ||UP×2 (P3 )||UP×4 (P4 )||UP×8 (P5 )
wherein, "|" represents the connection operation, and the upsampling is performed in a manner of 2 times, 4 times and 8 times respectively.
Step 2.3, the fusion characteristic F obtained in the step 2.2 is convoluted by 3 multiplied by 3, characteristic diagrams of 256 channels are obtained by a BN layer and a ReLU layer, and the characteristic diagrams are input into the convolution of 1 multiplied by 1 to obtain s1 ,s2 ,...,sn And dividing the results, and arranging the division results in a sequence from small to large according to the kernel scale.
And 2.4, sequentially performing scale expansion from the minimum kernel through a PSENet algorithm, and adopting a scheme of first-come first-obtained to solve the problem of boundary conflict in the scale expansion to finally obtain an LED character detection result with clear boundary.
Step 3 is implemented specifically according to the following steps:
step 3.1, detecting LED character feature extraction, wherein a VGG structure is used in a CNN part in a CRNN, in order to enable the model convergence speed to be faster, in consideration of the actual aspect ratio of LED characters, pictures are unified and normalized to the size of [240,50], and because the network has deep convolutional layers and recursive layers, the training of the deep convolutional layers and the recursive layers is difficult, a batch normalization layer BN layer is added after the fifth convolutional layer and the sixth convolutional layer of the network, and a batch normalization layer is adopted, so that the training speed is greatly increased. And finally, performing feature extraction through a CNN network to obtain 240/4 feature sequences of 512 channels.
And 3.2, an LED character prediction part inputs the characteristic diagram extracted from the CNN network in the step 3.1 into the RNN network for character prediction by utilizing the RNN network, the used CNN network has four maximum pooling layers, and the window sizes of the last two pooling layers are changed from 2 multiplied by 2 to 1 multiplied by 2, because most LED character areas are small in height and long in width, and the use of the 1 multiplied by 2 pooling windows can ensure that information in the width direction is not lost as much as possible, and the LED character prediction part is more suitable for identifying English letters and numbers. Because the captured LED characters are fuzzy in the actual station, in order to improve the accuracy of fuzzy LED character recognition, a deep bidirectional RNN is adopted as the RNN in the CRNN, and the RNN is a characteristic sequence x ═ x output by the CNN1 ,…,xt Each input xt All have an output yt (ii) a Because different LED tables and characters are different in length, in order to identify the phenomenon of an indefinite-length sequence, a long-short-time memory unit LSTM is selected as a unit of the RNN, and meanwhile, the LSTM can also effectively prevent the gradient disappearance phenomenon of the RNN network in the training process. Firstly, extracting a feature map of a text picture based on 7-layer CNN, segmenting the feature map according to columns, and inputting each channel into two layers of bidirectional LSTMs with 256 units as 512-dimensional time sequences for classification.
And 3.3, a character transcription part, namely, after the LED character sequence passes through an RNN network, the obtained prediction result needs to be converted into a character tag through a transcription layer CTC, a blank character epsilon is introduced into the CTC, pauses in the character interval all represent epsilon, and the CTC mainly relates to two parts of repeated letter removal and epsilon removal. The invention adopts a dictionary-based CTC algorithm to transcribe characters, error difference is propagated backwards through a forward algorithm and a backward algorithm in a transcription layer, the probability of all labels is finally obtained based on a prediction result of the dictionary, and finally, the corresponding label value with the maximum probability is selected as a recognition result.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus embodiment, the above is only a preferred embodiment of the present invention, and since it is basically similar to the method embodiment, it is described simply, and the relevant points can be referred to the partial description of the method embodiment. The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily made by those skilled in the art within the technical scope of the present invention will be covered by the present invention without departing from the principle of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (6)

1. An LED character automatic positioning and identifying method based on deep learning is characterized in that: the method comprises the steps of firstly, positioning all areas of LED characters needing to be recognized by using a YOLOv4 algorithm, realizing the positioning of a digital character area of an LED dial plate from a panoramic picture, then inputting the recognized LED character area into a character detection network, carrying out positioning detection on one-line or multi-line characters by using a PSENet network, finally obtaining output with the same size as an original picture through downsampling, feature fusion and upsampling, obtaining a final text connected domain, positioning the position of each line of characters in the LED dial plate, and finally realizing the recognition of the LED multi-line characters by using a CRNN network.
2. The deep learning based LED character automatic positioning and recognition method according to claim 1, characterized by comprising the following steps:
step 1), an LED instrument area positioning module performs LED instrument target detection in a transformer substation scene by using a YOLOv4 target detection algorithm, and only positions a digital character area where an LED instrument is located;
step 2) an LED instrument character detection module, which uses a progressive scale expansion network PSENet algorithm as a digital LED instrument character detection module in a transformer substation scene, and improves the detection performance of the model under LED multi-line characters through an image segmentation technology and a pixel-level LED character detection target area;
and 3) the LED instrument character recognition module trains the acquired one-line or multi-line character target area characteristics by using the CRNN according to the one-line or multi-line character target area acquired in the step 2), and finally recognizes specific characters by using a CTC algorithm to acquire the recognition result of the LED instrument.
3. The deep learning-based LED character automatic positioning and recognition method according to claim 2, characterized in that the step 1) is implemented as follows:
step 1.1) data enhancement is carried out on LED sample data, a GridMask corresponds to 4 parameters, namely x, y, r and d, by utilizing a GridMask data enhancement method, a group of specific Mask regions are determined according to the 4 parameters, and the Mask regions are rotated in the actual training process.
Figure FDA0003706522580000011
K is the reserved proportion of image information, W and H are the width and height of the original image respectively, M is the reserved pixel number, K has no direct relation with the four parameters, the parameters indirectly define r, and the definition of r can be obtained through K conversion
k=1-(1-r)2
x and y are defined as random over a certain area:
δxy )=random(0,d-1)
in a task of detecting an LED instrument, r in 4 hyper-parameters of GridMask is set to be 0.4, d is set to be (96,224), in the using process, GridMask is enhanced on a training image with the probability that P is 0.6, the detection task is set to be 0 at the beginning, the GridMask enhancement mode on the training image is gradually increased along with the increase of training times, and finally the detection task is changed into P;
step 1.2, constructing a YOLOv4 target detection network to position the position of a character area of an LED instrument dial in a picture, firstly transmitting learned high-level semantic information into a low-level network through an FPN network, then fusing the high-level semantic information with low-level high-resolution information to improve the detection effect, then increasing an information transmission path from the low level to the high level, enhancing the characteristic information through down-sampling operation, and finally fusing the characteristic information of different convolution layers to achieve the detection effect; the trunk extraction network CSPDarknet53 of YOLOv4 uses a hash activation function, which is a smooth curve whose functional expression is:
Mish=x×tanh(ln(1+ex ));
step 1.3, defining a target marking frame of a character area of the dial plate of the LED instrument marked in advance, wherein the area is defined as a Ground route, inputting a marked target picture and a marking file thereof into a YOLOv4 network for training, and positioning the character area of the LED instrument with different characters by utilizing the trained YOLOv4 target detection network;
and step 1.4, using DIoU-NMS, and simultaneously considering the distance between the central points of the overlapped area and the two boxes to achieve the purpose of removing repeated target frames and finally obtaining the digital character area of the LED instrument.
4. The deep learning based LED character automatic positioning and recognition method according to claim 3, characterized in that: the GridMask data enhancement method in the step 1.1) belongs to a method for deleting information, and is specifically realized by randomly discarding an area on an image.
5. The deep learning based LED character automatic positioning and recognition method according to claim 2, characterized in that the step 2) is implemented as follows:
step 2.1) feature extraction, namely, passing the input picture through Resnet50 residual error networkPerforming feature extraction, wherein the ResNet50 has 50 Conv2d layers, extracting feature maps output by Conv2, Conv3, Conv4 and Conv5 layers respectively to construct a feature pyramid, and extracting 4-layer features P in a top-down and transverse connection mode2 ,P3 ,P4 ,P5 Extracting to obtain feature layers of 4 256 channels;
step 2.2) feature fusion, fusing 4 feature maps obtained in step 2.1), and fusing P3 ,P4 ,P5 Respectively characterized by the characteristic layer P through 2 times, 4 times and 8 times of upsampling2 Carrying out feature cascade to finally obtain a 1024-dimensional fused feature vector F; the high-level semantic features and the low-level semantic features are fused together, and the specific implementation mode is as follows:
F=C(P2 ,P3 ,P4 ,P5 )=P2 ||UP×2 (P3 )||UP×4 (P4 )||UP×8 (P5 )
wherein, "|" represents the connection operation, and the up-sampling is carried out by respectively adopting 2 times, 4 times and 8 times;
step 2.3) performing convolution of 3 multiplied by 3 on the fusion characteristic F obtained in the step 2.2), performing BN layer and ReLU layer to obtain characteristic diagrams of 256 channels, and inputting the characteristic diagrams into convolution of 1 multiplied by 1 to obtain s1 ,s2 ,...,sn Dividing results, and arranging the dividing results in a sequence from small to large according to the kernel scale;
and step 2.4) sequentially carrying out scale expansion from the minimum kernel through a PSENet algorithm, and finally obtaining an LED character detection result with clear boundary by adopting a first-come-first-obtained scheme.
6. The deep learning based LED character automatic positioning and recognition method according to claim 2, characterized in that the step 3) is implemented as follows:
step 3.1) detecting LED character feature extraction, wherein a CNN part in a CRNN network uses a VGG structure, pictures are firstly unified and normalized to the size of [240,50], a batch normalization layer BN layer is added after the fifth convolution layer and the sixth convolution layer of the network, and finally feature extraction is carried out through the CNN network to obtain 240/4 feature sequences of 512 channels;
step 3.2) LED character prediction part, utilizing RNN network, inputting the characteristic diagram extracted from the CNN network in step 3.1) into RNN network for character prediction, wherein the used CNN network has four maximum pooling layers, and the window size of the last two pooling layers is changed from 2 x 2 to 1 x 2; a deep bidirectional RNN network is adopted as the RNN network in the CRNN, and the RNN network is used for the characteristic sequence x which is output by the CNN1 ,…,xt Each inputting xt All have an output yt (ii) a Selecting a long-short-time memory unit LSTM as a unit of RNN, firstly extracting a feature map of a text picture based on 7-layer CNN, segmenting the feature map according to columns, and then inputting each channel as a 512-dimensional time sequence into two layers of bidirectional LSTMs of 256 units for classification;
step 3.3) a character transcription part, wherein after the LED character sequence passes through an RNN network, the obtained prediction result needs to be converted into a character tag through a transcription layer CTC, a blank character epsilon is introduced into the CTC, pauses in the character space all represent epsilon, and the CTC mainly relates to two parts of removing repeated letters and removing epsilon; and (3) performing character transcription by adopting a CTC algorithm based on a dictionary, in a transcription layer, transmitting error difference backwards through a forward and backward algorithm, finally obtaining the probability of all labels based on the prediction result of the dictionary, and finally selecting the corresponding label value with the maximum probability as a recognition result.
CN202210709461.0A2022-06-212022-06-21LED character automatic positioning and recognition method based on deep learningActiveCN115035526B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202210709461.0ACN115035526B (en)2022-06-212022-06-21LED character automatic positioning and recognition method based on deep learning

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202210709461.0ACN115035526B (en)2022-06-212022-06-21LED character automatic positioning and recognition method based on deep learning

Publications (2)

Publication NumberPublication Date
CN115035526Atrue CN115035526A (en)2022-09-09
CN115035526B CN115035526B (en)2024-10-22

Family

ID=83127660

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202210709461.0AActiveCN115035526B (en)2022-06-212022-06-21LED character automatic positioning and recognition method based on deep learning

Country Status (1)

CountryLink
CN (1)CN115035526B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN117218505A (en)*2023-09-252023-12-12佳源科技股份有限公司Substation state indicator lamp identification method based on deep learning

Citations (11)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN109034160A (en)*2018-07-062018-12-18江苏迪伦智能科技有限公司A kind of mixed decimal point digital instrument automatic identifying method based on convolutional neural networks
CN110059694A (en)*2019-04-192019-07-26山东大学The intelligent identification Method of lteral data under power industry complex scene
CN110211097A (en)*2019-05-142019-09-06河海大学Crack image detection method based on fast R-CNN parameter migration
CN111339902A (en)*2020-02-212020-06-26北方工业大学Liquid crystal display number identification method and device of digital display instrument
CN112183233A (en)*2020-09-092021-01-05上海鹰觉科技有限公司 Method and system for ship license recognition based on deep learning
CN112528963A (en)*2021-01-092021-03-19江苏拓邮信息智能技术研究院有限公司Intelligent arithmetic question reading system based on MixNet-YOLOv3 and convolutional recurrent neural network CRNN
CN112836748A (en)*2021-02-022021-05-25太原科技大学 A Character Recognition Method for Casting Identification Based on CRNN-CTC
CN112906699A (en)*2020-12-232021-06-04深圳市信义科技有限公司Method for detecting and identifying enlarged number of license plate
CN113255659A (en)*2021-01-262021-08-13南京邮电大学License plate correction detection and identification method based on MSAFF-yolk 3
CN113673509A (en)*2021-07-282021-11-19华南理工大学 A method of instrument detection and classification based on image text
WO2022111355A1 (en)*2020-11-302022-06-02展讯通信(上海)有限公司License plate recognition method and apparatus, storage medium and terminal

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN109034160A (en)*2018-07-062018-12-18江苏迪伦智能科技有限公司A kind of mixed decimal point digital instrument automatic identifying method based on convolutional neural networks
CN110059694A (en)*2019-04-192019-07-26山东大学The intelligent identification Method of lteral data under power industry complex scene
CN110211097A (en)*2019-05-142019-09-06河海大学Crack image detection method based on fast R-CNN parameter migration
CN111339902A (en)*2020-02-212020-06-26北方工业大学Liquid crystal display number identification method and device of digital display instrument
CN112183233A (en)*2020-09-092021-01-05上海鹰觉科技有限公司 Method and system for ship license recognition based on deep learning
WO2022111355A1 (en)*2020-11-302022-06-02展讯通信(上海)有限公司License plate recognition method and apparatus, storage medium and terminal
CN112906699A (en)*2020-12-232021-06-04深圳市信义科技有限公司Method for detecting and identifying enlarged number of license plate
CN112528963A (en)*2021-01-092021-03-19江苏拓邮信息智能技术研究院有限公司Intelligent arithmetic question reading system based on MixNet-YOLOv3 and convolutional recurrent neural network CRNN
CN113255659A (en)*2021-01-262021-08-13南京邮电大学License plate correction detection and identification method based on MSAFF-yolk 3
CN112836748A (en)*2021-02-022021-05-25太原科技大学 A Character Recognition Method for Casting Identification Based on CRNN-CTC
CN113673509A (en)*2021-07-282021-11-19华南理工大学 A method of instrument detection and classification based on image text

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
史建伟;章韵;: "基于改进YOLOv3和BGRU的车牌识别系统", 计算机工程与设计, no. 08, 16 August 2020 (2020-08-16)*
朱立倩;: "基于深度学习的数显仪表字符识别", 计算机技术与发展, no. 06, 10 June 2020 (2020-06-10)*
项 导: "一种数显仪表显示数字的自动定位和识别方法", 机械设计与制造工程, 31 December 2021 (2021-12-31)*

Cited By (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN117218505A (en)*2023-09-252023-12-12佳源科技股份有限公司Substation state indicator lamp identification method based on deep learning

Also Published As

Publication numberPublication date
CN115035526B (en)2024-10-22

Similar Documents

PublicationPublication DateTitle
CN111860348A (en) OCR recognition method of weakly supervised power drawings based on deep learning
CN110059694A (en)The intelligent identification Method of lteral data under power industry complex scene
CN110969129B (en)End-to-end tax bill text detection and recognition method
CN111476210B (en)Image-based text recognition method, system, device and storage medium
CN102024152B (en) A Method for Traffic Sign Recognition Based on Sparse Representation and Dictionary Learning
CN101859382A (en) A license plate detection and recognition method based on the maximum stable extremum region
CN112270317B (en) A traditional digital water meter reading recognition method based on deep learning and frame difference method
CN111489324A (en)Cervical cancer lesion diagnosis method fusing multi-modal prior pathology depth features
CN112712052A (en)Method for detecting and identifying weak target in airport panoramic video
CN114862768A (en)Improved YOLOv5-LITE lightweight-based power distribution assembly defect identification method
CN115424017B (en) A method, device and storage medium for segmenting the interior and exterior contours of a building
CN111881914B (en)License plate character segmentation method and system based on self-learning threshold
CN114359949B (en) Recognition method for the text of power grid wiring diagram
CN116246059A (en)Vehicle target recognition method based on improved YOLO multi-scale detection
CN116012709B (en)High-resolution remote sensing image building extraction method and system
CN114266881A (en)Pointer type instrument automatic reading method based on improved semantic segmentation network
CN111461121A (en) A method for identifying electrical representation numbers based on YOLOV3 network
CN116704512A (en) A meter recognition method and system integrating semantic and visual information
CN110458132A (en) An End-to-End Text Recognition Method of Indefinite Length
CN116188756A (en)Instrument angle correction and indication recognition method based on deep learning
CN115423740A (en)Method for identifying size of internal defect area of EPR vehicle-mounted cable terminal
CN113077438A (en)Cell nucleus region extraction method and imaging method for multi-cell nucleus color image
CN113192018A (en) Video recognition method of water wall surface defects based on fast segmentation convolutional neural network
CN114694133B (en)Text recognition method based on combination of image processing and deep learning
CN115035526A (en)Deep learning-based automatic LED character positioning and identifying method

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp