Movatterモバイル変換


[0]ホーム

URL:


CN108121986B - Object detection method and device, computer device and computer readable storage medium - Google Patents

Object detection method and device, computer device and computer readable storage medium
Download PDF

Info

Publication number
CN108121986B
CN108121986BCN201711484723.3ACN201711484723ACN108121986BCN 108121986 BCN108121986 BCN 108121986BCN 201711484723 ACN201711484723 ACN 201711484723ACN 108121986 BCN108121986 BCN 108121986B
Authority
CN
China
Prior art keywords
target
region
neural network
training
convolutional neural
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711484723.3A
Other languages
Chinese (zh)
Other versions
CN108121986A (en
Inventor
牟永强
刘荣杰
裴超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Intellifusion Technologies Co Ltd
Original Assignee
Shenzhen Intellifusion Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Intellifusion Technologies Co LtdfiledCriticalShenzhen Intellifusion Technologies Co Ltd
Priority to CN201711484723.3ApriorityCriticalpatent/CN108121986B/en
Publication of CN108121986ApublicationCriticalpatent/CN108121986A/en
Application grantedgrantedCritical
Publication of CN108121986BpublicationCriticalpatent/CN108121986B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

a method of target detection, the method comprising: acquiring a training sample set, wherein the training sample set comprises a plurality of target images marked with target positions and target angle types; training an acceleration region convolutional neural network model by using the training sample set to obtain a trained acceleration region convolutional neural network model; acquiring an image to be detected; and carrying out target detection on the image to be detected by utilizing the trained acceleration region convolutional neural network model to obtain a target region of the image to be detected and a target angle type of the target region. The invention also provides a target detection device, a computer device and a readable storage medium. The invention can realize the rapid target detection with high detection rate.

Description

Object detection method and device, computer device and computer readable storage medium
Technical Field
The invention relates to the technical field of image processing, in particular to a target detection method and device, a computer device and a computer readable storage medium.
Background
Existing target detection techniques include target detection based on simple pixel features or manually designed complex features. Although the simple pixel features such as representative HAAR and pixel difference are used, the calculation efficiency is high and the real-time performance is good, but the robustness to factors such as complex and diverse background changes is poor and the detection accuracy is poor. And although complex features based on manual design, such as HOG in DPM, have better feature expression and stronger robustness, the computation on the CPU is complex because GPU acceleration cannot be used, and the requirement of real-time property is difficult to achieve.
Existing target detection techniques also include convolutional neural network based target detection. Although the target detection method based on the convolutional neural network improves the detection precision, the calculation amount is greatly improved. Although GPU computing solves the computational problem of extracting convolution features, candidate region extraction still takes a considerable amount of time. In addition, the whole scheme is a frame process of extracting the candidate region and classifying, so that end-to-end detection cannot be realized, and the application is relatively complicated.
In addition, due to different shooting angles, the appearance of the target object can be greatly changed on the image, and the problem of the shooting angle is not considered in the existing target detection technology, so that the detection rate of the target is low.
Disclosure of Invention
In view of the above, it is desirable to provide a target detection method and apparatus, a computer apparatus and a computer-readable storage medium, which can achieve target detection with a fast high detection rate.
a first aspect of the present application provides a target detection method, the method comprising:
Acquiring a training sample set, wherein the training sample set comprises a plurality of target images marked with target positions and target angle types;
Training an acceleration region convolutional neural network model by using the training sample set to obtain a trained acceleration region convolutional neural network model, wherein the acceleration region convolutional neural network model comprises a region suggestion network and a fast region convolutional neural network, the region suggestion network and the fast region convolutional neural network share a convolutional layer, the convolutional layer extracts a feature map of each target image in the training sample set, the region suggestion network obtains a candidate region in each target image and a target angle type of the candidate region according to the feature map, and the fast region convolutional neural network screens and adjusts the candidate region according to the feature map to obtain the target region of each target image and the target angle type of the target region;
Acquiring an image to be detected;
And performing target detection on the image to be detected by utilizing the trained acceleration region convolutional neural network model to obtain a target region of the image to be detected and a target angle type of the target region.
in another possible implementation manner, the training the acceleration region convolutional neural network model by using the training sample set includes:
(1) initializing the regional recommendation network using an Imagenet model, training the regional recommendation network using the training sample set;
(2) Generating candidate regions of each target image by using the trained region suggestion network in the step (1), and training the fast regional convolutional neural network by using the candidate regions;
(3) Initializing the area suggestion network by using the fast area convolution neural network trained in the step (2), and training the area suggestion network by using the training sample set;
(4) Initializing the fast regional convolutional neural network by using the trained regional suggestion network in the step (3), keeping the convolutional layer fixed, and training the fast regional convolutional neural network by using the training sample set.
in another possible implementation manner, the training the acceleration region convolutional neural network model by using the training sample set includes:
And training the regional proposal network and the fast regional convolutional neural network by using a back propagation algorithm, and adjusting network parameters of the regional proposal network and the fast regional convolutional neural network in the training process to minimize a loss function, wherein the loss function comprises target classification loss, angle classification loss and regression loss.
In another possible implementation manner, the acceleration region convolutional neural network model adopts a ZF framework, and the region suggestion network and the fast region convolutional neural network share 5 convolutional layers.
in another possible implementation manner, a negative sample hard mining method is added in the training of the fast regional convolutional network.
A second aspect of the present application provides an object detection apparatus, the apparatus comprising:
A first acquisition unit configured to acquire a training sample set including a plurality of target images in which target positions and target angle types are marked;
A training unit, configured to train an acceleration region convolutional neural network model by using the training sample set to obtain a trained acceleration region convolutional neural network model, where the acceleration region convolutional neural network model includes a region suggestion network and a fast region convolutional neural network, the region suggestion network and the fast region convolutional neural network share a convolutional layer, the convolutional layer extracts a feature map of each target image in the training sample set, the region suggestion network obtains a candidate region in each target image and a target angle type of the candidate region according to the feature map, and the fast region convolutional neural network screens and adjusts the candidate region according to the feature map to obtain a target region of each target image and a target angle type of the target region;
The second acquisition unit is used for acquiring an image to be detected;
And the detection unit is used for performing target detection on the image to be detected by utilizing the trained acceleration region convolutional neural network model to obtain a target region of the image to be detected and a target angle type of the target region.
In another possible implementation manner, the training unit is specifically configured to:
(1) Initializing the regional recommendation network using an Imagenet model, training the regional recommendation network using the training sample set;
(2) Generating candidate regions of each target image by using the trained region suggestion network in the step (1), and training the fast regional convolutional neural network by using the candidate regions;
(3) Initializing the area suggestion network by using the fast area convolution neural network trained in the step (2), and training the area suggestion network by using the training sample set;
(4) Initializing the fast regional convolutional neural network by using the trained regional suggestion network in the step (3), keeping the convolutional layer fixed, and training the fast regional convolutional neural network by using the training sample set.
in another possible implementation manner, the training unit is specifically configured to:
and training the regional proposal network and the fast regional convolutional neural network by using a back propagation algorithm, and adjusting network parameters of the regional proposal network and the fast regional convolutional neural network in the training process to minimize a loss function, wherein the loss function comprises target classification loss, angle classification loss and regression loss.
A third aspect of the application provides a computer apparatus comprising a processor for implementing the object detection method when executing a computer program stored in a memory.
a fourth aspect of the present application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the object detection method.
The method comprises the steps of obtaining a training sample set, wherein the training sample set comprises a plurality of target images marked with target positions and target angle types; training an acceleration region convolutional neural network model by using the training sample set to obtain a trained acceleration region convolutional neural network model, wherein the acceleration region convolutional neural network model comprises a region suggestion network and a fast region convolutional neural network, the region suggestion network and the fast region convolutional neural network share a convolutional layer, the convolutional layer extracts a feature map of each target image in the training sample set, the region suggestion network obtains a candidate region in each target image and a target angle type of the candidate region according to the feature map, and the fast region convolutional neural network screens and adjusts the candidate region according to the feature map to obtain the target region of each target image and the target angle type of the target region; acquiring an image to be detected; and performing target detection on the image to be detected by utilizing the trained acceleration region convolutional neural network model to obtain a target region of the image to be detected and a target angle type of the target region.
The existing target detection based on the convolutional neural network uses a selective search algorithm to generate a candidate region, the time consumption is high, and the region extraction and the target detection are separated. The method introduces the area suggestion network into the acceleration area convolution neural network model, and extracts the candidate area by using the deep convolution neural network. After network training, by a method of sharing convolutional network parameters, a feature map obtained by the image through a convolutional layer can be simultaneously applied to region extraction and target detection, namely, the calculation result of the convolutional network is shared, so that the region extraction speed is greatly increased, the speed of the whole detection process is accelerated, and an end-to-end detection scheme is realized. In addition, the problem of detection rate reduction caused by different shooting angles is considered, the target image marked with the target angle type is used for training the acceleration region convolution neural network model, and the detection rate of the target is improved. Therefore, the invention can realize the target detection with high detection rate.
Drawings
Fig. 1 is a flowchart of a target detection method according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of a regional recommendation network.
Fig. 3 is a structural diagram of an object detection apparatus according to a second embodiment of the present invention.
Fig. 4 is a schematic diagram of a computer device according to a third embodiment of the present invention.
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, a detailed description of the present invention will be given below with reference to the accompanying drawings and specific embodiments. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.
In the following description, numerous specific details are set forth to provide a thorough understanding of the present invention, and the described embodiments are merely a subset of the embodiments of the present invention, rather than a complete embodiment. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
Preferably, the object detection method of the present invention is applied in one or more computer devices. The computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.
The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing equipment. The computer device can be in man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.
Example one
Fig. 1 is a flowchart of a target detection method according to an embodiment of the present invention. The target detection method is applied to a computer device. The target detection method can detect the position of a preset target (such as a vehicle, a ship and the like) in the image and can detect the angle type (such as the front, the side and the back) of the preset target in the image.
As shown in fig. 1, the target detection method specifically includes the following steps:
101: a training sample set is obtained.
the training sample set includes a plurality of target images labeled with target positions and target angle types. The target image is an image including a preset target (e.g., a ship, a vehicle, etc.). The target image may include one or more preset targets. The target position represents a position of a preset target in a target image. The target angle type represents a photographing angle (e.g., front, back, side) of a preset target.
In one embodiment, the training sample set includes about 10000 target images. The target location may be labeled [ x, y, w, h ], where x, y represents the top left coordinate of the target region, w represents the width of the target region, and h represents the height of the target region. The target angle types include a front angle type, a side angle type, and a back angle type. For example, when the target detection method is used for detecting a ship, if the target image is a front image of the ship, the marked target angle type is a front angle type; if the target image is a side image of the ship, the marked target angle type is a side angle type; and if the target image is the back image of the ship, the marked target angle type is the back angle type.
102: and training a fast Region-based Convolution Neural Network (fast R-CNN) by using the training sample set to obtain a trained acceleration Region Convolution Neural Network model.
The acceleration Region convolutional Neural Network model comprises a Region suggestion Network (RPN) and a Fast Region convolutional Neural Network (FastR-CNN). It is necessary to train alternately the area proposal network and the fast convolution network.
The region suggestion network and the fast region convolutional neural network share a convolutional layer, and the convolutional layer is used for extracting a feature map of an image. And the area suggestion network generates a candidate area of the image and a target angle type of the candidate area according to the feature map, and inputs the generated candidate area and the target angle type of the candidate area into the fast area convolution neural network. And the fast regional convolution neural network screens and adjusts the candidate region according to the characteristic graph to obtain a target region of the image and a target angle type of the target region.
Specifically, during training, the convolutional layer extracts a feature map of each target image in the training sample set, the area suggestion network obtains a candidate area and a target angle type of the candidate area in each target image according to the feature map, and the fast area convolutional neural network screens and adjusts the candidate area according to the feature map to obtain the target area of each target image and the target angle type of the target area.
In a preferred embodiment, the accelerated regional convolutional neural network model employs a ZF framework, and the regional proposal network and the fast regional convolutional neural network share 5 convolutional layers.
In one embodiment, the target images in the training sample set may be images of any size, scaled to a uniform size (e.g., 1000 x 600) before entering the convolutional layer. In one embodiment, the length and width of the feature map extracted by the convolutional layer are reduced by 16 times relative to the input image, and the depth of the feature map is 256.
in a particular embodiment, training the acceleration region convolutional neural network model using the training sample set may include:
(1) initializing the regional recommendation network using an Imagenet model, training the regional recommendation network using the training sample set.
(2) and (3) generating candidate regions of each target image in the training sample set by using the region suggestion network trained in the step (1), and training the fast region convolution neural network by using the candidate regions. At this point, the area proposal network and the fast area convolutional neural network have not shared convolutional layers.
(3) Initializing the regional proposal network by using the fast regional convolutional neural network trained in the step (2), and training the regional proposal network by using a training sample set.
(4) Initializing the fast regional convolutional neural network using the trained regional proposal network in (3), keeping the convolutional layer fixed, and training the fast regional convolutional neural network using a training sample set. At this time, the area proposal network and the fast area convolution neural network share the same convolution layer, and a unified network is formed.
fig. 2 is a schematic diagram of a regional recommendation network.
And obtaining a characteristic diagram of the image after the image passes through the shared convolution layer. And sliding a sliding window with a preset size (for example, 3 x 3) on the feature map according to a preset step (for example, the step is 1), wherein each position of the sliding window corresponds to a central point. When the sliding window is slid to a location, an anchor frame of a preset scale (e.g., 3 scales 128, 256, 512) and a preset aspect ratio (e.g., 3 aspect ratios 1:1, 1:2, 2:1) is applied to the center point of the location to obtain a preset number (e.g., 9) of candidate regions. Each sliding window is mapped onto a low-dimensional feature vector (e.g., 256-d or 512-d) by a convolutional layer, which is concatenated with the shared convolutional layer. And outputting the feature vector to three full-connection layers of the same level, wherein one is a target classification layer, one is an angle classification layer, and the other is a boundary regression layer. The target classification layer outputs a target classification score for the candidate region indicating whether the candidate region is a target (i.e., foreground) or background. The candidate region belongs to the foreground or the background, and is determined by the contact ratio of the candidate region and the labeled target region (i.e., the region determined by the labeled target position), if the contact ratio is greater than a certain threshold, the candidate region is positioned as the foreground, and if the contact ratio is less than the threshold, the candidate region is positioned as the background. The angle classification layer outputs an angle classification score of the candidate region for indicating a target angle type of the candidate region. And the boundary regression layer outputs the position of the trimmed candidate region for trimming the boundary of the candidate region.
The area suggestion network selects more candidate areas, and a plurality of candidate areas with the highest scores can be screened according to the target classification scores of the candidate areas and input into the fast area convolution neural network, so that the training and detecting speed is increased.
To train the area proposal network, each candidate area is assigned a label, which includes a positive label and a negative label, and the positive label can be assigned to two types of candidate areas: (1) a candidate region that overlaps with a bounding box of a real target (GT) with the highest IoU (Intersection over Union); (2) candidate regions with IoU overlap with any GT bounding box greater than 0.7. For a GT bounding box, positive labels may be assigned to multiple candidate regions. Negative labels are assigned to candidate regions where the IoU ratio to all GT bounding boxes is below 0.3. Candidate regions that are not positive or negative have no effect on the training target.
The training of the regional proposed network is trained by using a back propagation algorithm, and network parameters of the regional proposed network are adjusted in the training process to minimize a loss function. The loss function indicates a difference between the prediction confidence and the true confidence of the candidate regions of the region proposed network prediction. In the present embodiment, the loss function includes three parts, namely, a target classification loss, an angle classification loss and a regression loss.
the loss function of an image can be defined as:
Where i is an index of a candidate region in a training batch (mini-batch).
is the target classification penalty for the candidate region. N is a radical ofclsTo train the size of the batch, for example 256. p is a radical ofiIs the prediction probability that the ith candidate region is the target.is a GT label if the candidate region is positive (i.e., assigned label)Is a positive label, called a positive candidate region),Is 1; if the candidate region is negative (i.e., the assigned label is a negative label, referred to as a negative candidate region),Is 0.Can be calculated as
Is the loss of angular classification of the candidate region,Can be referred to
Is the regression loss of the candidate region. λ is the balance weight and can be taken to be 10. N is a radical ofregis the number of candidate regions.Can be calculated astiIs a coordinate vector, i.e. ti=(tx,ty,tw,th) 4 parameterized coordinates representing the candidate region (e.g. coordinates in the upper left corner of the candidate region and widthDegree, height).Is a coordinate vector of the GT bounding box corresponding to the positive candidate region, i.e.(e.g., coordinates of the top left corner of the real target area as well as width, height). R is a loss function (smooth) with robustnessL1) Defined as:
The embodiment considers the problem of detection rate reduction caused by different shooting angles, uses the loss function comprising angle classification loss in the training of the acceleration region convolutional neural network model, calculates the angle classification loss of the candidate region according to the predicted target angle type, and improves the detection rate of the target.
the above suggests a network training method for the area. The training method of the fast convolutional network may refer to the training method of the area recommendation network, and is not described herein again.
In this embodiment, a Hard Negative sample Mining (HNM) method is added to the training of the fast convolutional network. For negative samples (namely, difficult samples) which are wrongly classified into positive samples by the fast regional convolutional network, the information of the negative samples is recorded, and in the process of next iterative training, the negative samples are input into the training sample set again, the lost weight of the negative samples is increased, and the influence of the negative samples on the classifier is enhanced, so that the negative samples which are difficult to learn can be continuously classified, the features learned by the classifier are difficult to learn, and the distribution of the covered samples is more diverse.
103: and acquiring an image to be detected.
The image to be detected is an image including a preset target (e.g., a ship). The preset target is a detection object in the image to be detected. For example, when ship detection is performed on an image to be detected, the preset target is a ship in the image to be detected.
The image to be detected may be an image received from an external device, for example an image of a ship taken by a camera near the quay, from which the image of the ship is received.
Alternatively, the image to be detected may be an image taken by the computer device, for example an image of a ship taken by the computer device.
Alternatively, the image to be detected may also be an image read from a memory of the computer device, for example an image of a ship read from a memory of the computer device.
104: and detecting the image to be detected by using the trained acceleration region convolutional neural network model to obtain a target region of the image to be detected and a target angle type of the target region.
Specifically, the convolution layer shared by the area suggestion network and the fast area convolution neural network extracts a characteristic diagram of the image to be detected. And the area suggestion network acquires a candidate area in the image to be detected and the target angle type of the candidate area according to the characteristic diagram. And the fast regional convolution neural network screens and adjusts the candidate region according to the characteristic diagram to obtain a target region of the image to be detected and a target angle type of the target region.
the target detection method of the first embodiment obtains a training sample set, where the training sample set includes a plurality of target images labeled with target positions and target angle types; training an acceleration region convolutional neural network model by using the training sample set to obtain a trained acceleration region convolutional neural network model, wherein the acceleration region convolutional neural network model comprises a region suggestion network and a fast region convolutional neural network, the region suggestion network and the fast region convolutional neural network share a convolutional layer, the convolutional layer extracts a feature map of each target image in the training sample set, the region suggestion network obtains a candidate region in each target image and a target angle type of the candidate region according to the feature map, and the fast region convolutional neural network screens and adjusts the candidate region according to the feature map to obtain the target region of each target image and the target angle type of the target region; acquiring an image to be detected; and performing target detection on the image to be detected by utilizing the trained acceleration region convolutional neural network model to obtain a target region of the image to be detected and a target angle type of the target region.
The existing target detection based on the convolutional neural network uses a selective search algorithm to generate a candidate region, the time consumption is high, and the region extraction and the target detection are separated. The target detection method of the first embodiment introduces the area suggestion network into the acceleration area convolutional neural network model, and extracts the candidate area by using the deep convolutional neural network. After network training, by a method of sharing convolutional network parameters, a feature map obtained by the image through a convolutional layer can be simultaneously applied to region extraction and target detection, namely, the calculation result of the convolutional network is shared, so that the region extraction speed is greatly increased, the speed of the whole detection process is accelerated, and an end-to-end detection scheme is realized. In addition, the target detection method in the first embodiment considers the problem of detection rate reduction caused by different shooting angles, and trains the acceleration region convolution neural network model by using the target image marked with the target angle type, so that the detection rate of the target is improved. Therefore, the target detection method of the first embodiment can achieve rapid target detection with a high detection rate.
Example two
Fig. 3 is a structural diagram of an object detection apparatus according to a second embodiment of the present invention. As shown in fig. 3, the object detection device 10 may include: a first acquisition unit 301, a training unit 302, a second acquisition unit 303, and a detection unit 304.
a first obtaining unit 301, configured to obtain a training sample set.
the training sample set includes a plurality of target images labeled with target positions and target angle types. The target image is an image including a preset target (e.g., a ship, a vehicle, etc.). The target image may include one or more preset targets. The target position represents a position of a preset target in a target image. The target angle type represents a photographing angle (e.g., front, back, side) of a preset target.
In one embodiment, the training sample set includes about 10000 target images. The target location may be labeled [ x, y, w, h ], where x, y represents the top left coordinate of the target region, w represents the width of the target region, and h represents the height of the target region. The target angle types include a front angle type, a side angle type, and a back angle type. For example, when the target detection method is used for detecting a ship, if the target image is a front image of the ship, the marked target angle type is a front angle type; if the target image is a side image of the ship, the marked target angle type is a side angle type; and if the target image is the back image of the ship, the marked target angle type is the back angle type.
A training unit 302, configured to train a fast Region-based Convolution Neural Network model (fast R-CNN) using the training sample set, so as to obtain a trained acceleration Region Convolution Neural Network model.
the acceleration Region convolutional Neural Network model comprises a Region suggestion Network (RPN) and a Fast Region convolutional Neural Network (FastR-CNN). It is necessary to train alternately the area proposal network and the fast convolution network.
The region suggestion network and the fast region convolutional neural network share a convolutional layer, and the convolutional layer is used for extracting a feature map of an image. And the area suggestion network generates a candidate area of the image and a target angle type of the candidate area according to the feature map, and inputs the generated candidate area and the target angle type of the candidate area into the fast area convolution neural network. And the fast regional convolution neural network screens and adjusts the candidate region according to the characteristic graph to obtain a target region of the image and a target angle type of the target region.
Specifically, during training, the convolutional layer extracts a feature map of each target image in the training sample set, the area suggestion network obtains a candidate area and a target angle type of the candidate area in each target image according to the feature map, and the fast area convolutional neural network screens and adjusts the candidate area according to the feature map to obtain the target area of each target image and the target angle type of the target area.
in a preferred embodiment, the accelerated regional convolutional neural network model employs a ZF framework, and the regional proposal network and the fast regional convolutional neural network share 5 convolutional layers.
In one embodiment, the target images in the training sample set may be images of any size, scaled to a uniform size (e.g., 1000 x 600) before entering the convolutional layer. In one embodiment, the length and width of the feature map extracted by the convolutional layer are reduced by 16 times relative to the input image, and the depth of the feature map is 256.
In a particular embodiment, training the acceleration region convolutional neural network model using the training sample set may include:
(1) Initializing the regional recommendation network using an Imagenet model, training the regional recommendation network using the training sample set.
(2) And (3) generating candidate regions of each target image in the training sample set by using the region suggestion network trained in the step (1), and training the fast region convolution neural network by using the candidate regions. At this point, the area proposal network and the fast area convolutional neural network have not shared convolutional layers.
(3) initializing the regional proposal network by using the fast regional convolutional neural network trained in the step (2), and training the regional proposal network by using a training sample set.
(4) initializing the fast regional convolutional neural network using the trained regional proposal network in (3), keeping the convolutional layer fixed, and training the fast regional convolutional neural network using a training sample set. At this time, the area proposal network and the fast area convolution neural network share the same convolution layer, and a unified network is formed.
Fig. 2 is a schematic diagram of a regional recommendation network.
And obtaining a characteristic diagram of the image after the image passes through the shared convolution layer. And sliding a sliding window with a preset size (for example, 3 x 3) on the feature map according to a preset step (for example, the step is 1), wherein each position of the sliding window corresponds to a central point. When the sliding window is slid to a location, an anchor frame of a preset scale (e.g., 3 scales 128, 256, 512) and a preset aspect ratio (e.g., 3 aspect ratios 1:1, 1:2, 2:1) is applied to the center point of the location to obtain a preset number (e.g., 9) of candidate regions. Each sliding window is mapped onto a low-dimensional feature vector (e.g., 256-d or 512-d) by a convolutional layer, which is concatenated with the shared convolutional layer. And outputting the feature vector to three full-connection layers of the same level, wherein one is a target classification layer, one is an angle classification layer, and the other is a boundary regression layer. The target classification layer outputs a target classification score for the candidate region indicating whether the candidate region is a target (i.e., foreground) or background. The candidate region belongs to the foreground or the background, and is determined by the contact ratio of the candidate region and the labeled target region (i.e., the region determined by the labeled target position), if the contact ratio is greater than a certain threshold, the candidate region is positioned as the foreground, and if the contact ratio is less than the threshold, the candidate region is positioned as the background. The angle classification layer outputs an angle classification score of the candidate region for indicating a target angle type of the candidate region. And the boundary regression layer outputs the position of the trimmed candidate region for trimming the boundary of the candidate region.
the area suggestion network selects more candidate areas, and a plurality of candidate areas with the highest scores can be screened according to the target classification scores of the candidate areas and input into the fast area convolution neural network, so that the training and detecting speed is increased.
To train the area proposal network, each candidate area is assigned a label, which includes a positive label and a negative label, and the positive label can be assigned to two types of candidate areas: (1) a candidate region that overlaps with a bounding box of a real target (GT) with the highest IoU (Intersection over Union); (2) candidate regions with IoU overlap with any GT bounding box greater than 0.7. For a GT bounding box, positive labels may be assigned to multiple candidate regions. Negative labels are assigned to candidate regions where the IoU ratio to all GT bounding boxes is below 0.3. Candidate regions that are not positive or negative have no effect on the training target.
The training of the regional proposed network is trained by using a back propagation algorithm, and network parameters of the regional proposed network are adjusted in the training process to minimize a loss function. The loss function indicates a difference between the prediction confidence and the true confidence of the candidate regions of the region proposed network prediction. In the present embodiment, the loss function includes three parts, namely, a target classification loss, an angle classification loss and a regression loss.
The loss function of an image can be defined as:
Where i is an index of a candidate region in a training batch (mini-batch).
Is the target classification penalty for the candidate region. N is a radical ofclsTo train the size of the batch, for example 256. p is a radical ofiIs the prediction probability that the ith candidate region is the target.Is a GT tag, if the candidate region is positive (i.e., the assigned tag is a positive tag, called a positive candidate region),is 1; if the candidate region is negative (i.e., the assigned label is a negative label, referred to as a negative candidate region),is 0.Can be calculated as
is the loss of angular classification of the candidate region,Can be referred to
is the regression loss of the candidate region. λ is the balance weight and can be taken to be 10. N is a radical ofregis the number of candidate regions.Can be calculated astiIs a coordinate vector, i.e. ti=(tx,ty,tw,th) And 4 parameterized coordinates representing the candidate region (e.g., the coordinates of the upper left corner of the candidate region, as well as width and height).Is a coordinate vector of the GT bounding box corresponding to the positive candidate region, i.e.(e.g., coordinates of the top left corner of the real target area as well as width, height). R is a loss function (smooth) with robustnessL1) Defined as:
the embodiment considers the problem of detection rate reduction caused by different shooting angles, uses the loss function comprising angle classification loss in the training of the acceleration region convolutional neural network model, calculates the angle classification loss of the candidate region according to the predicted target angle type, and improves the detection rate of the target.
the above suggests a network training method for the area. The training method of the fast convolutional network may refer to the training method of the area recommendation network, and is not described herein again.
In this embodiment, a Hard Negative sample Mining (HNM) method is added to the training of the fast convolutional network. For negative samples (namely, difficult samples) which are wrongly classified into positive samples by the fast regional convolutional network, the information of the negative samples is recorded, and in the process of next iterative training, the negative samples are input into the training sample set again, the lost weight of the negative samples is increased, and the influence of the negative samples on the classifier is enhanced, so that the negative samples which are difficult to learn can be continuously classified, the features learned by the classifier are difficult to learn, and the distribution of the covered samples is more diverse.
A second obtaining unit 303, configured to obtain an image to be detected.
the image to be detected is an image including a preset target (e.g., a ship). The preset target is a detection object in the image to be detected. For example, when ship detection is performed on an image to be detected, the preset target is a ship in the image to be detected.
The image to be detected may be an image received from an external device, for example an image of a ship taken by a camera near the quay, from which the image of the ship is received.
Alternatively, the image to be detected may be an image taken by the computer device, for example an image of a ship taken by the computer device.
Alternatively, the image to be detected may also be an image read from a memory of the computer device, for example an image of a ship read from a memory of the computer device.
The detection unit 304 is configured to detect the image to be detected by using the trained acceleration region convolutional neural network model, so as to obtain a target region of the image to be detected and a target angle type of the target region.
Specifically, the convolution layer shared by the area suggestion network and the fast area convolution neural network extracts a characteristic diagram of the image to be detected. And the area suggestion network acquires a candidate area in the image to be detected and the target angle type of the candidate area according to the characteristic diagram. And the fast regional convolution neural network screens and adjusts the candidate region according to the characteristic diagram to obtain a target region of the image to be detected and a target angle type of the target region.
Acquiring a training sample set, wherein the training sample set comprises a plurality of target images marked with target positions and target angle types; training an acceleration region convolutional neural network model by using the training sample set to obtain a trained acceleration region convolutional neural network model, wherein the acceleration region convolutional neural network model comprises a region suggestion network and a fast region convolutional neural network, the region suggestion network and the fast region convolutional neural network share a convolutional layer, the convolutional layer extracts a feature map of each target image in the training sample set, the region suggestion network obtains a candidate region in each target image and a target angle type of the candidate region according to the feature map, and the fast region convolutional neural network screens and adjusts the candidate region according to the feature map to obtain the target region of each target image and the target angle type of the target region; acquiring an image to be detected; and performing target detection on the image to be detected by utilizing the trained acceleration region convolutional neural network model to obtain a target region of the image to be detected and a target angle type of the target region.
The existing target detection based on the convolutional neural network uses a selective search algorithm to generate a candidate region, the time consumption is high, and the region extraction and the target detection are separated. Embodiment two introduces a region suggestion network in the acceleration region convolution neural network model, and uses a deep convolution neural network to extract a candidate region. After network training, by a method of sharing convolutional network parameters, a feature map obtained by the image through a convolutional layer can be simultaneously applied to region extraction and target detection, namely, the calculation result of the convolutional network is shared, so that the region extraction speed is greatly increased, the speed of the whole detection process is accelerated, and an end-to-end detection scheme is realized. In addition, the second embodiment considers the problem of detection rate reduction caused by different shooting angles, and trains the acceleration region convolution neural network model by using the target image marked with the target angle type, so that the detection rate of the target is improved. Therefore, the second embodiment can realize the target detection with fast high detection rate.
EXAMPLE III
Fig. 4 is a schematic diagram of a computer device according to a third embodiment of the present invention. The computer device 1 comprises a memory 20, a processor 30 and a computer program 40, such as an object detection program, stored in the memory 20 and executable on the processor 30. The processor 30, when executing the computer program 40, implements the steps of the above-mentioned object detection method embodiments, such as the steps 101 to 104 shown in fig. 1. Alternatively, the processor 30, when executing the computer program 40, implements the functions of the modules/units in the above-mentioned device embodiments, such as the units 301 to 304 in fig. 3.
Illustratively, the computer program 40 may be partitioned into one or more modules/units that are stored in the memory 20 and executed by the processor 30 to implement the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program 40 in the computer apparatus 1. For example, the computer program 40 may be divided into a first obtaining unit 301, a training unit 302, a second obtaining unit 303, and a detecting unit 304 in fig. 3, and the specific functions of each unit are shown in embodiment two.
the computer device 1 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. It will be understood by those skilled in the art that the schematic diagram 4 is only an example of the computer apparatus 1, and does not constitute a limitation to the computer apparatus 1, and may include more or less components than those shown, or combine some components, or different components, for example, the computer apparatus 1 may further include an input and output device, a network access device, a bus, and the like.
The Processor 30 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The general purpose processor may be a microprocessor or the processor 30 may be any conventional processor or the like, the processor 30 being the control center of the computer device 1 and connecting the various parts of the whole computer device 1 with various interfaces and lines.
The memory 20 may be used for storing the computer program 40 and/or the module/unit, and the processor 30 implements various functions of the computer device 1 by running or executing the computer program and/or the module/unit stored in the memory 20 and calling data stored in the memory 20. The memory 20 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the computer apparatus 1, and the like. In addition, the memory 20 may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.
the modules/units integrated with the computer device 1 may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
In the embodiments provided in the present invention, it should be understood that the disclosed computer apparatus and method can be implemented in other ways. For example, the above-described embodiments of the computer apparatus are merely illustrative, and for example, the division of the units is only one logical function division, and there may be other divisions when the actual implementation is performed.
In addition, functional units in the embodiments of the present invention may be integrated into the same processing unit, or each unit may exist alone physically, or two or more units are integrated into the same unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. The units or computer means recited in the computer means claims may also be implemented by the same unit or computer means, either in software or in hardware. The terms first, second, etc. are used to denote names, but not any particular order.
finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (10)

training an acceleration region convolutional neural network model by using the training sample set to obtain a trained acceleration region convolutional neural network model, wherein the acceleration region convolutional neural network model comprises a region suggestion network and a fast region convolutional neural network, the region suggestion network and the fast region convolutional neural network share a convolutional layer, the convolutional layer extracts a feature map of each target image in the training sample set, the region suggestion network obtains a candidate region in each target image and a target angle type of the candidate region according to the feature map, and the fast region convolutional neural network screens and adjusts the candidate region according to the feature map to obtain the target region of each target image and the target angle type of the target region;
A training unit, configured to train an acceleration region convolutional neural network model by using the training sample set to obtain a trained acceleration region convolutional neural network model, where the acceleration region convolutional neural network model includes a region suggestion network and a fast region convolutional neural network, the region suggestion network and the fast region convolutional neural network share a convolutional layer, the convolutional layer extracts a feature map of each target image in the training sample set, the region suggestion network obtains a candidate region in each target image and a target angle type of the candidate region according to the feature map, and the fast region convolutional neural network screens and adjusts the candidate region according to the feature map to obtain a target region of each target image and a target angle type of the target region;
CN201711484723.3A2017-12-292017-12-29Object detection method and device, computer device and computer readable storage mediumActiveCN108121986B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201711484723.3ACN108121986B (en)2017-12-292017-12-29Object detection method and device, computer device and computer readable storage medium

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201711484723.3ACN108121986B (en)2017-12-292017-12-29Object detection method and device, computer device and computer readable storage medium

Publications (2)

Publication NumberPublication Date
CN108121986A CN108121986A (en)2018-06-05
CN108121986Btrue CN108121986B (en)2019-12-17

Family

ID=62230688

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201711484723.3AActiveCN108121986B (en)2017-12-292017-12-29Object detection method and device, computer device and computer readable storage medium

Country Status (1)

CountryLink
CN (1)CN108121986B (en)

Families Citing this family (34)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN109101966B (en)*2018-06-082022-03-08中国科学院宁波材料技术与工程研究所 Workpiece recognition, positioning and pose estimation system and method based on deep learning
CN110619255B (en)*2018-06-192022-08-26杭州海康威视数字技术股份有限公司Target detection method and device
CN108921099A (en)*2018-07-032018-11-30常州大学Moving ship object detection method in a kind of navigation channel based on deep learning
CN109063740A (en)*2018-07-052018-12-21高镜尧The detection model of ultrasonic image common-denominator target constructs and detection method, device
CN110795976B (en)2018-08-032023-05-05华为云计算技术有限公司Method, device and equipment for training object detection model
WO2020031380A1 (en)*2018-08-102020-02-13オリンパス株式会社Image processing method and image processing device
CN110837760B (en)*2018-08-172022-10-14北京四维图新科技股份有限公司 Target detection method, training method and apparatus for target detection
CN109376619B (en)*2018-09-302021-10-15中国人民解放军陆军军医大学 A kind of cell detection method
CN109359683B (en)*2018-10-152021-07-27百度在线网络技术(北京)有限公司Target detection method, device, terminal and computer-readable storage medium
CN111144398A (en)*2018-11-022020-05-12银河水滴科技(北京)有限公司Target detection method, target detection device, computer equipment and storage medium
CN109583445B (en)*2018-11-262024-08-02平安科技(深圳)有限公司Text image correction processing method, device, equipment and storage medium
CN109583396A (en)*2018-12-052019-04-05广东亿迅科技有限公司A kind of region prevention method, system and terminal based on CNN two stages human testing
CN111310775B (en)*2018-12-112023-08-25Tcl科技集团股份有限公司Data training method, device, terminal equipment and computer readable storage medium
CN109784385A (en)*2018-12-292019-05-21广州海昇计算机科技有限公司A kind of commodity automatic identifying method, system, device and storage medium
CN109934088A (en)*2019-01-102019-06-25海南大学 A deep learning-based method for the identification of ships on the sea surface
CN109886997B (en)*2019-01-232023-07-11平安科技(深圳)有限公司Identification frame determining method and device based on target detection and terminal equipment
CN109886998B (en)*2019-01-232024-09-06平安科技(深圳)有限公司Multi-target tracking method, device, computer device and computer storage medium
CN110288082B (en)*2019-06-052022-04-05北京字节跳动网络技术有限公司Convolutional neural network model training method and device and computer readable storage medium
CN110428357A (en)*2019-08-092019-11-08厦门美图之家科技有限公司The detection method of watermark, device, electronic equipment and storage medium in image
CN110443244B (en)*2019-08-122023-12-05深圳市捷顺科技实业股份有限公司Graphics processing method and related device
CN110599456B (en)*2019-08-132023-05-30杭州智团信息技术有限公司Method for extracting specific region of medical image
CN110717905B (en)*2019-09-302022-07-05上海联影智能医疗科技有限公司Brain image detection method, computer device, and storage medium
CN110807431A (en)*2019-11-062020-02-18上海眼控科技股份有限公司Object positioning method and device, electronic equipment and storage medium
CN110991531A (en)*2019-12-022020-04-10中电科特种飞机系统工程有限公司Training sample library construction method, device and medium based on air-to-ground small and slow target
CN113496223B (en)*2020-03-192024-10-18顺丰科技有限公司Method and device for establishing text region detection model
CN111462094A (en)*2020-04-032020-07-28联觉(深圳)科技有限公司PCBA component detection method and device and computer readable storage medium
CN113935389B (en)*2020-06-292025-09-09华为云计算技术有限公司Method, device, computing equipment and storage medium for data annotation
CN112256906A (en)*2020-10-232021-01-22安徽启新明智科技有限公司Method, device and storage medium for marking annotation on display screen
CN112001375B (en)*2020-10-292021-01-05成都睿沿科技有限公司Flame detection method and device, electronic equipment and storage medium
CN112464785B (en)*2020-11-252024-08-09浙江大华技术股份有限公司Target detection method, device, computer equipment and storage medium
CN113239975B (en)*2021-04-212022-12-20国网甘肃省电力公司白银供电公司 A neural network-based object detection method and device
CN112949614B (en)*2021-04-292021-09-10成都市威虎科技有限公司Face detection method and device for automatically allocating candidate areas and electronic equipment
CN113333321A (en)*2021-05-112021-09-03北京若贝特智能机器人科技有限公司Automatic identification and classification conveying method, system and device and storage medium
CN115439394A (en)*2021-06-032022-12-06中国移动通信集团四川有限公司 A light transmission box detection method, device and electronic equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
WO2015078185A1 (en)*2013-11-292015-06-04华为技术有限公司Convolutional neural network and target object detection method based on same
CN106647758A (en)*2016-12-272017-05-10深圳市盛世智能装备有限公司Target object detection method and device and automatic guiding vehicle following method

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN101271515B (en)*2007-03-212014-03-19株式会社理光Image detection device capable of recognizing multi-angle objective
US9202144B2 (en)*2013-10-302015-12-01Nec Laboratories America, Inc.Regionlets with shift invariant neural patterns for object detection
CN104299012B (en)*2014-10-282017-06-30银河水滴科技(北京)有限公司A kind of gait recognition method based on deep learning
CN106250812B (en)*2016-07-152019-08-20汤一平A kind of model recognizing method based on quick R-CNN deep neural network
CN106919978B (en)*2017-01-182020-05-15西南交通大学Method for identifying and detecting parts of high-speed rail contact net supporting device
CN106845430A (en)*2017-02-062017-06-13东华大学Pedestrian detection and tracking based on acceleration region convolutional neural networks
CN106874894B (en)*2017-03-282020-04-14电子科技大学 A Human Object Detection Method Based on Regional Fully Convolutional Neural Networks

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
WO2015078185A1 (en)*2013-11-292015-06-04华为技术有限公司Convolutional neural network and target object detection method based on same
CN106647758A (en)*2016-12-272017-05-10深圳市盛世智能装备有限公司Target object detection method and device and automatic guiding vehicle following method

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
CMS-RCNN:contextual multi-scale region-based CNN for unconstrained face detection;Chenchen Zhu et al;《Deep Learning for Biometrics》;20160617;全文*
Learning oriented region-based convolutional neural networks for building detection in satellite remote sensing images;Chaoyue Chen et al;《Remote Sensing and Spatial Information Sciences》;20170609;461-464*
基于区域卷积神经网络的多目标跟踪算法;胡鹏等;《西南科技大学学报》;20160331;第31卷(第1期);67-72*
甚高速区域卷积神经网络的船舶视频检测方法;杨名等;《北京邮电大学学报》;20170630;第40卷;130-134*

Also Published As

Publication numberPublication date
CN108121986A (en)2018-06-05

Similar Documents

PublicationPublication DateTitle
CN108121986B (en)Object detection method and device, computer device and computer readable storage medium
CN109886998B (en)Multi-target tracking method, device, computer device and computer storage medium
CN109918969B (en)Face detection method and device, computer device and computer readable storage medium
CN112506342B (en)Man-machine interaction method and system based on dynamic gesture recognition
CN109903310B (en)Target tracking method, device, computer device and computer storage medium
US10885660B2 (en)Object detection method, device, system and storage medium
CN110378297B (en)Remote sensing image target detection method and device based on deep learning and storage medium
US8792722B2 (en)Hand gesture detection
US8750573B2 (en)Hand gesture detection
CN112101344B (en)Video text tracking method and device
JP6188400B2 (en) Image processing apparatus, program, and image processing method
CN104915972A (en)Image processing apparatus, image processing method and program
CN107944381B (en)Face tracking method, face tracking device, terminal and storage medium
CN108021908B (en)Face age group identification method and device, computer device and readable storage medium
CN107895021B (en)image recognition method and device, computer device and computer readable storage medium
CN110570442A (en)Contour detection method under complex background, terminal device and storage medium
CN109871792B (en) Pedestrian detection method and device
Meus et al.Embedded vision system for pedestrian detection based on HOG+ SVM and use of motion information implemented in Zynq heterogeneous device
CN116740721B (en)Finger sentence searching method, device, electronic equipment and computer storage medium
CN113239746A (en)Electric vehicle detection method and device, terminal equipment and computer readable storage medium
KR101981284B1 (en)Apparatus Processing Image and Method thereof
CN113807407B (en)Target detection model training method, model performance detection method and device
CN111931557A (en)Specification identification method and device for bottled drink, terminal equipment and readable storage medium
CN113971671B (en) Instance segmentation method, device, electronic device and storage medium
CN113361511B (en)Correction model establishing method, device, equipment and computer readable storage medium

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp