CN108121986B

Movatterモバイル変換

Info

Publication number: CN108121986B
Application number: CN201711484723.3A
Authority: CN
Inventors: 牟永强; 刘荣杰; 裴超
Original assignee: Shenzhen Intellifusion Technologies Co Ltd
Current assignee: Shenzhen Intellifusion Technologies Co Ltd
Priority date: 2017-12-29
Filing date: 2017-12-29
Publication date: 2019-12-17
Anticipated expiration: 2037-12-29
Also published as: CN108121986A

Abstract

a method of target detection, the method comprising: acquiring a training sample set, wherein the training sample set comprises a plurality of target images marked with target positions and target angle types; training an acceleration region convolutional neural network model by using the training sample set to obtain a trained acceleration region convolutional neural network model; acquiring an image to be detected; and carrying out target detection on the image to be detected by utilizing the trained acceleration region convolutional neural network model to obtain a target region of the image to be detected and a target angle type of the target region. The invention also provides a target detection device, a computer device and a readable storage medium. The invention can realize the rapid target detection with high detection rate.

Description

Object detection method and device, computer device and computer readable storage medium

Technical Field

The invention relates to the technical field of image processing, in particular to a target detection method and device, a computer device and a computer readable storage medium.

Background

Existing target detection techniques include target detection based on simple pixel features or manually designed complex features. Although the simple pixel features such as representative HAAR and pixel difference are used, the calculation efficiency is high and the real-time performance is good, but the robustness to factors such as complex and diverse background changes is poor and the detection accuracy is poor. And although complex features based on manual design, such as HOG in DPM, have better feature expression and stronger robustness, the computation on the CPU is complex because GPU acceleration cannot be used, and the requirement of real-time property is difficult to achieve.

Existing target detection techniques also include convolutional neural network based target detection. Although the target detection method based on the convolutional neural network improves the detection precision, the calculation amount is greatly improved. Although GPU computing solves the computational problem of extracting convolution features, candidate region extraction still takes a considerable amount of time. In addition, the whole scheme is a frame process of extracting the candidate region and classifying, so that end-to-end detection cannot be realized, and the application is relatively complicated.

In addition, due to different shooting angles, the appearance of the target object can be greatly changed on the image, and the problem of the shooting angle is not considered in the existing target detection technology, so that the detection rate of the target is low.

Disclosure of Invention

In view of the above, it is desirable to provide a target detection method and apparatus, a computer apparatus and a computer-readable storage medium, which can achieve target detection with a fast high detection rate.

a first aspect of the present application provides a target detection method, the method comprising:

Acquiring a training sample set, wherein the training sample set comprises a plurality of target images marked with target positions and target angle types;

Training an acceleration region convolutional neural network model by using the training sample set to obtain a trained acceleration region convolutional neural network model, wherein the acceleration region convolutional neural network model comprises a region suggestion network and a fast region convolutional neural network, the region suggestion network and the fast region convolutional neural network share a convolutional layer, the convolutional layer extracts a feature map of each target image in the training sample set, the region suggestion network obtains a candidate region in each target image and a target angle type of the candidate region according to the feature map, and the fast region convolutional neural network screens and adjusts the candidate region according to the feature map to obtain the target region of each target image and the target angle type of the target region;

Acquiring an image to be detected;

And performing target detection on the image to be detected by utilizing the trained acceleration region convolutional neural network model to obtain a target region of the image to be detected and a target angle type of the target region.

in another possible implementation manner, the training the acceleration region convolutional neural network model by using the training sample set includes:

(1) initializing the regional recommendation network using an Imagenet model, training the regional recommendation network using the training sample set;

(2) Generating candidate regions of each target image by using the trained region suggestion network in the step (1), and training the fast regional convolutional neural network by using the candidate regions;

(3) Initializing the area suggestion network by using the fast area convolution neural network trained in the step (2), and training the area suggestion network by using the training sample set;

(4) Initializing the fast regional convolutional neural network by using the trained regional suggestion network in the step (3), keeping the convolutional layer fixed, and training the fast regional convolutional neural network by using the training sample set.

And training the regional proposal network and the fast regional convolutional neural network by using a back propagation algorithm, and adjusting network parameters of the regional proposal network and the fast regional convolutional neural network in the training process to minimize a loss function, wherein the loss function comprises target classification loss, angle classification loss and regression loss.

In another possible implementation manner, the acceleration region convolutional neural network model adopts a ZF framework, and the region suggestion network and the fast region convolutional neural network share 5 convolutional layers.

in another possible implementation manner, a negative sample hard mining method is added in the training of the fast regional convolutional network.

A second aspect of the present application provides an object detection apparatus, the apparatus comprising:

A first acquisition unit configured to acquire a training sample set including a plurality of target images in which target positions and target angle types are marked;

A training unit, configured to train an acceleration region convolutional neural network model by using the training sample set to obtain a trained acceleration region convolutional neural network model, where the acceleration region convolutional neural network model includes a region suggestion network and a fast region convolutional neural network, the region suggestion network and the fast region convolutional neural network share a convolutional layer, the convolutional layer extracts a feature map of each target image in the training sample set, the region suggestion network obtains a candidate region in each target image and a target angle type of the candidate region according to the feature map, and the fast region convolutional neural network screens and adjusts the candidate region according to the feature map to obtain a target region of each target image and a target angle type of the target region;

The second acquisition unit is used for acquiring an image to be detected;

And the detection unit is used for performing target detection on the image to be detected by utilizing the trained acceleration region convolutional neural network model to obtain a target region of the image to be detected and a target angle type of the target region.

In another possible implementation manner, the training unit is specifically configured to:

A third aspect of the application provides a computer apparatus comprising a processor for implementing the object detection method when executing a computer program stored in a memory.

a fourth aspect of the present application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the object detection method.

The method comprises the steps of obtaining a training sample set, wherein the training sample set comprises a plurality of target images marked with target positions and target angle types; training an acceleration region convolutional neural network model by using the training sample set to obtain a trained acceleration region convolutional neural network model, wherein the acceleration region convolutional neural network model comprises a region suggestion network and a fast region convolutional neural network, the region suggestion network and the fast region convolutional neural network share a convolutional layer, the convolutional layer extracts a feature map of each target image in the training sample set, the region suggestion network obtains a candidate region in each target image and a target angle type of the candidate region according to the feature map, and the fast region convolutional neural network screens and adjusts the candidate region according to the feature map to obtain the target region of each target image and the target angle type of the target region; acquiring an image to be detected; and performing target detection on the image to be detected by utilizing the trained acceleration region convolutional neural network model to obtain a target region of the image to be detected and a target angle type of the target region.

The existing target detection based on the convolutional neural network uses a selective search algorithm to generate a candidate region, the time consumption is high, and the region extraction and the target detection are separated. The method introduces the area suggestion network into the acceleration area convolution neural network model, and extracts the candidate area by using the deep convolution neural network. After network training, by a method of sharing convolutional network parameters, a feature map obtained by the image through a convolutional layer can be simultaneously applied to region extraction and target detection, namely, the calculation result of the convolutional network is shared, so that the region extraction speed is greatly increased, the speed of the whole detection process is accelerated, and an end-to-end detection scheme is realized. In addition, the problem of detection rate reduction caused by different shooting angles is considered, the target image marked with the target angle type is used for training the acceleration region convolution neural network model, and the detection rate of the target is improved. Therefore, the invention can realize the target detection with high detection rate.

Drawings

Fig. 1 is a flowchart of a target detection method according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of a regional recommendation network.

Fig. 3 is a structural diagram of an object detection apparatus according to a second embodiment of the present invention.

Fig. 4 is a schematic diagram of a computer device according to a third embodiment of the present invention.

Detailed Description

In order that the above objects, features and advantages of the present invention can be more clearly understood, a detailed description of the present invention will be given below with reference to the accompanying drawings and specific embodiments. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.

In the following description, numerous specific details are set forth to provide a thorough understanding of the present invention, and the described embodiments are merely a subset of the embodiments of the present invention, rather than a complete embodiment. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.

Preferably, the object detection method of the present invention is applied in one or more computer devices. The computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.

The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing equipment. The computer device can be in man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.

Example one

Fig. 1 is a flowchart of a target detection method according to an embodiment of the present invention. The target detection method is applied to a computer device. The target detection method can detect the position of a preset target (such as a vehicle, a ship and the like) in the image and can detect the angle type (such as the front, the side and the back) of the preset target in the image.

As shown in fig. 1, the target detection method specifically includes the following steps:

101: a training sample set is obtained.

the training sample set includes a plurality of target images labeled with target positions and target angle types. The target image is an image including a preset target (e.g., a ship, a vehicle, etc.). The target image may include one or more preset targets. The target position represents a position of a preset target in a target image. The target angle type represents a photographing angle (e.g., front, back, side) of a preset target.

In one embodiment, the training sample set includes about 10000 target images. The target location may be labeled [ x, y, w, h ], where x, y represents the top left coordinate of the target region, w represents the width of the target region, and h represents the height of the target region. The target angle types include a front angle type, a side angle type, and a back angle type. For example, when the target detection method is used for detecting a ship, if the target image is a front image of the ship, the marked target angle type is a front angle type; if the target image is a side image of the ship, the marked target angle type is a side angle type; and if the target image is the back image of the ship, the marked target angle type is the back angle type.

102: and training a fast Region-based Convolution Neural Network (fast R-CNN) by using the training sample set to obtain a trained acceleration Region Convolution Neural Network model.

The acceleration Region convolutional Neural Network model comprises a Region suggestion Network (RPN) and a Fast Region convolutional Neural Network (FastR-CNN). It is necessary to train alternately the area proposal network and the fast convolution network.

The region suggestion network and the fast region convolutional neural network share a convolutional layer, and the convolutional layer is used for extracting a feature map of an image. And the area suggestion network generates a candidate area of the image and a target angle type of the candidate area according to the feature map, and inputs the generated candidate area and the target angle type of the candidate area into the fast area convolution neural network. And the fast regional convolution neural network screens and adjusts the candidate region according to the characteristic graph to obtain a target region of the image and a target angle type of the target region.

Specifically, during training, the convolutional layer extracts a feature map of each target image in the training sample set, the area suggestion network obtains a candidate area and a target angle type of the candidate area in each target image according to the feature map, and the fast area convolutional neural network screens and adjusts the candidate area according to the feature map to obtain the target area of each target image and the target angle type of the target area.

In a preferred embodiment, the accelerated regional convolutional neural network model employs a ZF framework, and the regional proposal network and the fast regional convolutional neural network share 5 convolutional layers.

In one embodiment, the target images in the training sample set may be images of any size, scaled to a uniform size (e.g., 1000 x 600) before entering the convolutional layer. In one embodiment, the length and width of the feature map extracted by the convolutional layer are reduced by 16 times relative to the input image, and the depth of the feature map is 256.

in a particular embodiment, training the acceleration region convolutional neural network model using the training sample set may include:

(1) initializing the regional recommendation network using an Imagenet model, training the regional recommendation network using the training sample set.

(2) and (3) generating candidate regions of each target image in the training sample set by using the region suggestion network trained in the step (1), and training the fast region convolution neural network by using the candidate regions. At this point, the area proposal network and the fast area convolutional neural network have not shared convolutional layers.

(3) Initializing the regional proposal network by using the fast regional convolutional neural network trained in the step (2), and training the regional proposal network by using a training sample set.

(4) Initializing the fast regional convolutional neural network using the trained regional proposal network in (3), keeping the convolutional layer fixed, and training the fast regional convolutional neural network using a training sample set. At this time, the area proposal network and the fast area convolution neural network share the same convolution layer, and a unified network is formed.

fig. 2 is a schematic diagram of a regional recommendation network.

The area suggestion network selects more candidate areas, and a plurality of candidate areas with the highest scores can be screened according to the target classification scores of the candidate areas and input into the fast area convolution neural network, so that the training and detecting speed is increased.

To train the area proposal network, each candidate area is assigned a label, which includes a positive label and a negative label, and the positive label can be assigned to two types of candidate areas: (1) a candidate region that overlaps with a bounding box of a real target (GT) with the highest IoU (Intersection over Union); (2) candidate regions with IoU overlap with any GT bounding box greater than 0.7. For a GT bounding box, positive labels may be assigned to multiple candidate regions. Negative labels are assigned to candidate regions where the IoU ratio to all GT bounding boxes is below 0.3. Candidate regions that are not positive or negative have no effect on the training target.

The training of the regional proposed network is trained by using a back propagation algorithm, and network parameters of the regional proposed network are adjusted in the training process to minimize a loss function. The loss function indicates a difference between the prediction confidence and the true confidence of the candidate regions of the region proposed network prediction. In the present embodiment, the loss function includes three parts, namely, a target classification loss, an angle classification loss and a regression loss.

the loss function of an image can be defined as:

Where i is an index of a candidate region in a training batch (mini-batch).

is the target classification penalty for the candidate region. N is a radical of_clsTo train the size of the batch, for example 256. p is a radical of_iIs the prediction probability that the ith candidate region is the target.is a GT label if the candidate region is positive (i.e., assigned label)Is a positive label, called a positive candidate region),Is 1; if the candidate region is negative (i.e., the assigned label is a negative label, referred to as a negative candidate region),Is 0.Can be calculated as

Is the loss of angular classification of the candidate region,Can be referred to

Is the regression loss of the candidate region. λ is the balance weight and can be taken to be 10. N is a radical of_regis the number of candidate regions.Can be calculated ast_iIs a coordinate vector, i.e. t_i＝(t_x,t_y,t_w,t_h) 4 parameterized coordinates representing the candidate region (e.g. coordinates in the upper left corner of the candidate region and widthDegree, height).Is a coordinate vector of the GT bounding box corresponding to the positive candidate region, i.e.(e.g., coordinates of the top left corner of the real target area as well as width, height). R is a loss function (smooth) with robustness_L1) Defined as:

The embodiment considers the problem of detection rate reduction caused by different shooting angles, uses the loss function comprising angle classification loss in the training of the acceleration region convolutional neural network model, calculates the angle classification loss of the candidate region according to the predicted target angle type, and improves the detection rate of the target.

the above suggests a network training method for the area. The training method of the fast convolutional network may refer to the training method of the area recommendation network, and is not described herein again.

In this embodiment, a Hard Negative sample Mining (HNM) method is added to the training of the fast convolutional network. For negative samples (namely, difficult samples) which are wrongly classified into positive samples by the fast regional convolutional network, the information of the negative samples is recorded, and in the process of next iterative training, the negative samples are input into the training sample set again, the lost weight of the negative samples is increased, and the influence of the negative samples on the classifier is enhanced, so that the negative samples which are difficult to learn can be continuously classified, the features learned by the classifier are difficult to learn, and the distribution of the covered samples is more diverse.

103: and acquiring an image to be detected.

The image to be detected is an image including a preset target (e.g., a ship). The preset target is a detection object in the image to be detected. For example, when ship detection is performed on an image to be detected, the preset target is a ship in the image to be detected.

The image to be detected may be an image received from an external device, for example an image of a ship taken by a camera near the quay, from which the image of the ship is received.

Alternatively, the image to be detected may be an image taken by the computer device, for example an image of a ship taken by the computer device.

Alternatively, the image to be detected may also be an image read from a memory of the computer device, for example an image of a ship read from a memory of the computer device.

104: and detecting the image to be detected by using the trained acceleration region convolutional neural network model to obtain a target region of the image to be detected and a target angle type of the target region.

Specifically, the convolution layer shared by the area suggestion network and the fast area convolution neural network extracts a characteristic diagram of the image to be detected. And the area suggestion network acquires a candidate area in the image to be detected and the target angle type of the candidate area according to the characteristic diagram. And the fast regional convolution neural network screens and adjusts the candidate region according to the characteristic diagram to obtain a target region of the image to be detected and a target angle type of the target region.

the target detection method of the first embodiment obtains a training sample set, where the training sample set includes a plurality of target images labeled with target positions and target angle types; training an acceleration region convolutional neural network model by using the training sample set to obtain a trained acceleration region convolutional neural network model, wherein the acceleration region convolutional neural network model comprises a region suggestion network and a fast region convolutional neural network, the region suggestion network and the fast region convolutional neural network share a convolutional layer, the convolutional layer extracts a feature map of each target image in the training sample set, the region suggestion network obtains a candidate region in each target image and a target angle type of the candidate region according to the feature map, and the fast region convolutional neural network screens and adjusts the candidate region according to the feature map to obtain the target region of each target image and the target angle type of the target region; acquiring an image to be detected; and performing target detection on the image to be detected by utilizing the trained acceleration region convolutional neural network model to obtain a target region of the image to be detected and a target angle type of the target region.

The existing target detection based on the convolutional neural network uses a selective search algorithm to generate a candidate region, the time consumption is high, and the region extraction and the target detection are separated. The target detection method of the first embodiment introduces the area suggestion network into the acceleration area convolutional neural network model, and extracts the candidate area by using the deep convolutional neural network. After network training, by a method of sharing convolutional network parameters, a feature map obtained by the image through a convolutional layer can be simultaneously applied to region extraction and target detection, namely, the calculation result of the convolutional network is shared, so that the region extraction speed is greatly increased, the speed of the whole detection process is accelerated, and an end-to-end detection scheme is realized. In addition, the target detection method in the first embodiment considers the problem of detection rate reduction caused by different shooting angles, and trains the acceleration region convolution neural network model by using the target image marked with the target angle type, so that the detection rate of the target is improved. Therefore, the target detection method of the first embodiment can achieve rapid target detection with a high detection rate.

Example two

Fig. 3 is a structural diagram of an object detection apparatus according to a second embodiment of the present invention. As shown in fig. 3, the object detection device 10 may include: a first acquisition unit 301, a training unit 302, a second acquisition unit 303, and a detection unit 304.

a first obtaining unit 301, configured to obtain a training sample set.

A training unit 302, configured to train a fast Region-based Convolution Neural Network model (fast R-CNN) using the training sample set, so as to obtain a trained acceleration Region Convolution Neural Network model.

Fig. 2 is a schematic diagram of a regional recommendation network.

The loss function of an image can be defined as:

Where i is an index of a candidate region in a training batch (mini-batch).

Is the target classification penalty for the candidate region. N is a radical of_clsTo train the size of the batch, for example 256. p is a radical of_iIs the prediction probability that the ith candidate region is the target.Is a GT tag, if the candidate region is positive (i.e., the assigned tag is a positive tag, called a positive candidate region),is 1; if the candidate region is negative (i.e., the assigned label is a negative label, referred to as a negative candidate region),is 0.Can be calculated as

is the regression loss of the candidate region. λ is the balance weight and can be taken to be 10. N is a radical of_regis the number of candidate regions.Can be calculated ast_iIs a coordinate vector, i.e. t_i＝(t_x,t_y,t_w,t_h) And 4 parameterized coordinates representing the candidate region (e.g., the coordinates of the upper left corner of the candidate region, as well as width and height).Is a coordinate vector of the GT bounding box corresponding to the positive candidate region, i.e.(e.g., coordinates of the top left corner of the real target area as well as width, height). R is a loss function (smooth) with robustness_L1) Defined as:

A second obtaining unit 303, configured to obtain an image to be detected.

The detection unit 304 is configured to detect the image to be detected by using the trained acceleration region convolutional neural network model, so as to obtain a target region of the image to be detected and a target angle type of the target region.

Acquiring a training sample set, wherein the training sample set comprises a plurality of target images marked with target positions and target angle types; training an acceleration region convolutional neural network model by using the training sample set to obtain a trained acceleration region convolutional neural network model, wherein the acceleration region convolutional neural network model comprises a region suggestion network and a fast region convolutional neural network, the region suggestion network and the fast region convolutional neural network share a convolutional layer, the convolutional layer extracts a feature map of each target image in the training sample set, the region suggestion network obtains a candidate region in each target image and a target angle type of the candidate region according to the feature map, and the fast region convolutional neural network screens and adjusts the candidate region according to the feature map to obtain the target region of each target image and the target angle type of the target region; acquiring an image to be detected; and performing target detection on the image to be detected by utilizing the trained acceleration region convolutional neural network model to obtain a target region of the image to be detected and a target angle type of the target region.

The existing target detection based on the convolutional neural network uses a selective search algorithm to generate a candidate region, the time consumption is high, and the region extraction and the target detection are separated. Embodiment two introduces a region suggestion network in the acceleration region convolution neural network model, and uses a deep convolution neural network to extract a candidate region. After network training, by a method of sharing convolutional network parameters, a feature map obtained by the image through a convolutional layer can be simultaneously applied to region extraction and target detection, namely, the calculation result of the convolutional network is shared, so that the region extraction speed is greatly increased, the speed of the whole detection process is accelerated, and an end-to-end detection scheme is realized. In addition, the second embodiment considers the problem of detection rate reduction caused by different shooting angles, and trains the acceleration region convolution neural network model by using the target image marked with the target angle type, so that the detection rate of the target is improved. Therefore, the second embodiment can realize the target detection with fast high detection rate.

EXAMPLE III

Fig. 4 is a schematic diagram of a computer device according to a third embodiment of the present invention. The computer device 1 comprises a memory 20, a processor 30 and a computer program 40, such as an object detection program, stored in the memory 20 and executable on the processor 30. The processor 30, when executing the computer program 40, implements the steps of the above-mentioned object detection method embodiments, such as the steps 101 to 104 shown in fig. 1. Alternatively, the processor 30, when executing the computer program 40, implements the functions of the modules/units in the above-mentioned device embodiments, such as the units 301 to 304 in fig. 3.

Illustratively, the computer program 40 may be partitioned into one or more modules/units that are stored in the memory 20 and executed by the processor 30 to implement the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program 40 in the computer apparatus 1. For example, the computer program 40 may be divided into a first obtaining unit 301, a training unit 302, a second obtaining unit 303, and a detecting unit 304 in fig. 3, and the specific functions of each unit are shown in embodiment two.

the computer device 1 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. It will be understood by those skilled in the art that the schematic diagram 4 is only an example of the computer apparatus 1, and does not constitute a limitation to the computer apparatus 1, and may include more or less components than those shown, or combine some components, or different components, for example, the computer apparatus 1 may further include an input and output device, a network access device, a bus, and the like.

The Processor 30 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The general purpose processor may be a microprocessor or the processor 30 may be any conventional processor or the like, the processor 30 being the control center of the computer device 1 and connecting the various parts of the whole computer device 1 with various interfaces and lines.

The memory 20 may be used for storing the computer program 40 and/or the module/unit, and the processor 30 implements various functions of the computer device 1 by running or executing the computer program and/or the module/unit stored in the memory 20 and calling data stored in the memory 20. The memory 20 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the computer apparatus 1, and the like. In addition, the memory 20 may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.

the modules/units integrated with the computer device 1 may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

In the embodiments provided in the present invention, it should be understood that the disclosed computer apparatus and method can be implemented in other ways. For example, the above-described embodiments of the computer apparatus are merely illustrative, and for example, the division of the units is only one logical function division, and there may be other divisions when the actual implementation is performed.

In addition, functional units in the embodiments of the present invention may be integrated into the same processing unit, or each unit may exist alone physically, or two or more units are integrated into the same unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. The units or computer means recited in the computer means claims may also be implemented by the same unit or computer means, either in software or in hardware. The terms first, second, etc. are used to denote names, but not any particular order.

finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims

1. A method of object detection, the method comprising:

Acquiring an image to be detected;

2. the method of claim 1, wherein the training the acceleration region convolutional neural network model using the set of training samples comprises:

3. The method of claim 1, wherein the training the acceleration region convolutional neural network model using the set of training samples comprises:

4. the method of any of claims 1 to 3, wherein the accelerated region convolutional neural network model employs a ZF framework, the region suggestion network and the fast region convolutional neural network sharing 5 convolutional layers.

5. the method of any one of claims 1 to 3, wherein a negative sample hard case mining method is added to the training of the fast area convolution network.

6. An object detection apparatus, characterized in that the apparatus comprises:

the second acquisition unit is used for acquiring an image to be detected;

7. The apparatus of claim 6, wherein the training unit is specifically configured to:

8. The apparatus of claim 6, wherein the training unit is specifically configured to:

9. A computer device, characterized by: the computer arrangement comprises a processor for implementing the object detection method according to any of claims 1-5 when executing a computer program stored in a memory.

10. a computer-readable storage medium having stored thereon a computer program, characterized in that: the computer program, when executed by a processor, implements the object detection method as claimed in any one of claims 1-5.