CN110619339A

Movatterモバイル変換

Info

Publication number: CN110619339A
Application number: CN201810626961.1A
Authority: CN
Inventors: 彭劲璋
Original assignee: Beijing Shenjian Intelligent Technology Co Ltd
Current assignee: Xilinx Technology Beijing Ltd
Priority date: 2018-06-19
Filing date: 2018-06-19
Publication date: 2019-12-27
Anticipated expiration: 2038-06-19
Also published as: CN110619339B

Abstract

A target detection method and apparatus are provided. The object detection method (100) comprises: inputting image data (S110); passing the input image data through a convolutional neural network to obtain output characteristic data of a target detection frame with respect to a detection target (S120); performing an approximation process on the output characteristic data (S130); the output characteristic data after the approximation processing is output (S140). Aiming at the problem of jitter of a target detection frame in the prior art, the technology is based on the image detection method, performs constraint design on network output characteristics, and improves the stability of the displayed target detection frame on the basis of not combining a tracking method. The technology is simple and easy to implement, and does not need extra computational cost.

Description

Target detection method and device

Technical Field

The present invention relates to image recognition, and more particularly, to a target detection method and apparatus.

Background

In recent years, with the development of deep learning, deep learning has been highly successful in the fields of classification, target detection, and segmentation in computer vision. Target detection is used as a basic research direction of computer vision, and the quality of detection performance determines the research of other tasks. Most of the existing detection methods are to perform neural network training and finally predict the result by manually marking an image data set. The algorithm designed by the method has the potential problem that the prediction error occurs in two adjacent frames in the video test, so that the target frame displayed and detected is jittered. In order to solve the problem, the existing method is to add a tracking algorithm and combine detection and tracking to solve the problem. Thus, while eliminating or reducing jitter errors, the added algorithm incurs additional computational overhead.

Disclosure of Invention

Embodiments of the present invention provide a target detection method and apparatus, which are directed to the problem of jitter of a target detection frame mentioned in the background art, and perform constraint design on network output characteristics based on an image detection method itself, so as to improve the stability of a displayed target detection frame without combining a tracking method. The method is simple and easy to implement, and does not need extra computational cost.

To achieve the object of the present invention, according to a first aspect of the present invention, there is provided an object detection method. The target detection method may include: inputting image data; enabling the input image data to pass through a convolutional neural network to obtain output characteristic data of a target detection frame of a detection target; carrying out approximate processing on the output characteristic data; and outputting the output characteristic data after the approximate processing.

However, in the above method, the approximation process may have an error. To further eliminate this error, preferably, the step of approximating the output characteristic data may further include: normalizing the output characteristic data based on the adjacent output characteristic data; and performing approximate processing on the normalized output characteristic data.

Preferably, the step of approximating comprises: the value is approximated as taking only a few bits after the reserved decimal point.

Further, the object detection method according to the first aspect of the present invention may further include: and displaying the target detection frame on the image according to the output characteristic data after output.

In order to achieve the object of the present invention, according to a second aspect of the present invention, there is provided an object detection apparatus. The object detection apparatus may include: an input module for inputting image data; the network module is used for enabling the input image data to pass through a convolutional neural network to obtain output characteristic data of a target detection frame related to a detection target; the approximate processing module is used for carrying out approximate processing on the output characteristic data; and the output module is used for outputting the output characteristic data after the approximate processing.

Likewise, in order to further eliminate the error in the approximation process, preferably, the approximation processing module may further include: the normalization submodule is used for performing normalization processing on the output characteristic data based on the adjacent output characteristic data; and the normalization approximate processing submodule is used for carrying out approximate processing on the output characteristic data after normalization processing.

Preferably, the approximation module or the normalized approximation sub-module may be further configured to approximate the value to only a few bits after the reserved decimal point.

In addition, the object detection apparatus according to the second aspect of the present invention may further include an object detection frame display module for displaying the object detection frame on the image according to the output feature data after the output.

To achieve the object of the present invention, according to a third aspect of the present invention, there is provided a computer readable medium for recording instructions executable by a processor, the instructions, when executed by the processor, causing the processor to perform an object detection method, comprising the operations of: inputting image data; enabling the input image data to pass through a convolutional neural network to obtain output characteristic data of a target detection frame of a detection target; carrying out approximate processing on the output characteristic data; and outputting the output characteristic data after the approximate processing.

Based on the target detection technology, aiming at the problem of jitter of a target detection frame in the prior art, a tracking algorithm and extra calculation power are not required to be introduced, and prediction errors causing the jitter are eliminated only through approximate processing, so that the target detection technology can be simply realized and has a good effect. In addition, the error of approximate processing is reduced through normalization operation, so that the target detection technology can further achieve a good anti-jitter effect.

Drawings

The invention is described below with reference to the embodiments with reference to the drawings.

Fig. 1 shows a flow chart of a target detection method according to an embodiment of the invention.

Fig. 2 shows detailed steps of the approximation processing in the object detection method of fig. 1.

Fig. 3 shows a schematic block diagram of an object detection arrangement according to an embodiment of the invention.

Fig. 4 illustrates a process of approximating a network output characteristic according to a specific embodiment of the present invention.

Fig. 5 illustrates a more detailed process for approximating a network output characteristic, according to a specific embodiment of the present invention.

Detailed Description

The drawings are only for purposes of illustration and are not to be construed as limiting the invention. The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.

As shown in fig. 1, the object detection method 100 according to the present invention starts at step S110, where image data is input. It will be understood by those skilled in the art that the input image data may be data of one frame in a dynamic video in general. The target detection method 100 is used to find a suitable target object in such a frame of image. The target object may be determined in advance or determined in the image according to some rule.

In step S120, the input image data is passed through a convolutional neural network, and feature data about the position of the detection target is obtained. And predicting the position of the target in the image by using the characteristic data, and identifying by using a rectangular frame.

As mentioned above, one problem in practice is that: in two adjacent frames of the dynamic video, the target hardly changes, but the predicted target detection frame is jittered due to the prediction error. For example, suppose that we set up 20 frames of images per second in a piece of video, and display the target detection frame of each frame of image. Then, this target detection frame appears 20 times in one second, and due to the existence of the prediction error, the displayed target detection frame generates a small but perceptible displacement between each frame, so that the viewer visually generates a jittering effect and feels uncomfortable.

To solve this problem, next, in step S130, approximation processing is performed on the output characteristic data.

Specifically, the approximating the output characteristic data described in step S130 may further include the following two sub-steps: in step S130a, normalization processing is performed on the output feature data based on the adjacent output feature data; then, in step S130b, the output feature data after the normalization processing is subjected to the approximation processing. The normalization operation reduces the error of the approximate processing, so that the target detection method can further achieve a good anti-jitter effect.

More specifically, the approximation processing (S130 or S130 b) referred to herein means approximating a numerical value to take only a few bits after the reserved decimal point. For example, for the output characteristic data, the precision of each value is reserved only to the 3 rd bit or 5 th bit or nth bit after the decimal point.

Finally, in step S140, the output characteristic data after the approximation processing is output.

After step S140, the object detection method 100 of the present invention may further include: an object detection frame (not shown in fig. 1) is displayed on the image according to the output feature data after the output. However, for the core purpose and core scheme of the present invention, the method may end by step S140.

As shown in fig. 3, the object detection apparatus 300 according to the present invention may include: an input module 301, a network module 302, an approximation processing module 303 and an output module 304.

The input module 301 is used for inputting image data. Those skilled in the art will appreciate that the operation performed by the input module 301 corresponds to step S110 in fig. 1.

The network module 302 is configured to pass the input image data through a convolutional neural network to obtain output feature data of a target detection frame of a detection target. Those skilled in the art will appreciate that the operation performed by the network module 302 corresponds to step S120 in fig. 1.

The approximation module 303 is configured to perform approximation processing on the output feature data. Those skilled in the art will appreciate that the operation performed by the approximation processing module 303 corresponds to step S130 in fig. 1.

The approximation processing module 303 may further include a normalization sub-module 303a and a normalized approximation processing sub-module 303 b. The normalization submodule 303a is configured to perform normalization processing on the output feature data based on adjacent output feature data. The normalization approximate processing submodule 303b is configured to perform approximate processing on the normalized output feature data. It will be appreciated by those skilled in the art that the operations performed by the normalization submodule 303a and the normalization approximation submodule 303b correspond to steps S130a and S130b in fig. 2, respectively.

The approximation processing module 303 or more specifically the normalized approximation processing sub-module 303b described above may be specifically configured to approximate the value to take only a few bits after the reserved decimal point.

The output module 304 is used for outputting the output characteristic data after the approximation processing.

Although not shown in fig. 3, the object detection apparatus according to the present invention may further include an object detection frame display module for displaying an object detection frame on the image according to the output feature data after the output.

Fig. 4 illustrates a process of approximating a network output characteristic according to a specific embodiment of the present invention. We can refer to the process of feature approximation in fig. 4 as the first stage of the present invention. Specifically, an input image is subjected to convolution neural network to obtain output characteristics, and approximation processing is performed by rounding characteristic data after a plurality of decimal points are performed. That is, each value of the feature data is approximated as taking only a few bits after the reserved decimal point as described above. The decimal point multi-bit rounding does not influence the output of the result, and can also forcibly ensure that two extremely similar images give the same output, thereby eliminating the root cause of the jitter of the detection frame. As shown in fig. 4, for the four data displayed at the forefront (and then the latter data are processed in sequence), approximation processing is performed respectively, thereby obtaining the final network output.

Fig. 5 illustrates a more detailed process for approximating a network output characteristic, according to a specific embodiment of the present invention. We can refer to the process of feature approximation in fig. 5 as the second stage of the present invention. The second stage may be seen as a further improvement to the first stage. This is because, in practice, the processing in the first stage may generate an error in performing the approximation processing, such an error being generated for each data, and there may be an inconsistency between the respective data, that is, the error may be identical or different in direction, thereby possibly causing the error to be amplified. The second stage normalizes the values of the neighboring data before the approximation process, so as to reduce the error of the first stage in this respect. For example, as shown in fig. 5, according to every four adjacent data squares, based on the values of the adjacent data, a normalized value of the upper left data is obtained, and the calculation is performed by analogy, and finally, the normalized values of all the data are obtained. The approximation processing is carried out by the normalized numerical value, so that the error of the approximation processing can be further reduced, and the target detection method can further achieve a good anti-jitter effect.

Those skilled in the art will appreciate that the methods of the present invention may be implemented as computer programs. As described above in connection with fig. 1, 2, 4, 5, the methods according to the above embodiments may execute one or more programs, including instructions, to cause a computer or processor to perform the algorithms described in connection with the figures. These programs may be stored and provided to a computer or processor using various types of non-transitory computer readable media. Non-transitory computer readable media include various types of tangible storage media. Examples of the non-transitory computer readable medium include magnetic recording media such as floppy disks, magnetic tapes, and hard disk drives, magneto-optical recording media such as magneto-optical disks, CD-ROMs (compact disc read only memories), CD-R, CD-R/W, and semiconductor memories such as ROMs, PROMs (programmable ROMs), EPROMs (erasable PROMs), flash ROMs, and RAMs (random access memories). Further, these programs can be provided to the computer by using various types of transitory computer-readable media. Examples of the transitory computer readable medium include an electric signal, an optical signal, and an electromagnetic wave. The transitory computer readable medium can be used to provide the program to the computer through a wired communication path such as an electric wire and an optical fiber or a wireless communication path.

Therefore, according to the present invention, it is also proposed a computer program or a computer readable medium for recording instructions executable by a processor, the instructions, when executed by the processor, causing the processor to perform an object detection method, comprising the operations of: inputting image data; enabling the input image data to pass through a convolutional neural network to obtain output characteristic data of a target detection frame of a detection target; carrying out approximate processing on the output characteristic data; and outputting the output characteristic data after the approximate processing.

In the above computer program or computer readable medium, more specifically, the operation of performing approximate processing on the output characteristic data may further include: normalizing the output characteristic data based on the adjacent output characteristic data; and performing approximate processing on the normalized output characteristic data.

Various embodiments and implementations of the present invention have been described above. However, the spirit and scope of the present invention is not limited thereto. Those skilled in the art will be able to devise many more applications in accordance with the teachings of the present invention which are within the scope of the present invention.

That is, the above examples of the present invention are only examples for clearly illustrating the present invention, and do not limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, replacement or improvement made within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.

Claims

1. A method of target detection, comprising:

inputting image data;

enabling the input image data to pass through a convolutional neural network to obtain output characteristic data of a target detection frame of a detection target;

carrying out approximate processing on the output characteristic data;

and outputting the output characteristic data after the approximate processing.

2. The object detection method of claim 1, wherein said step of approximating the output characteristic data further comprises:

normalizing the output characteristic data based on the adjacent output characteristic data;

and performing approximate processing on the normalized output characteristic data.

3. The object detection method of claim 1 or 2, wherein said step of approximating comprises: the value is approximated as taking only a few bits after the reserved decimal point.

4. The object detection method of claim 1, further comprising: and displaying the target detection frame on the image according to the output characteristic data after output.

5. An object detection device comprising:

an input module for inputting image data;

the network module is used for enabling the input image data to pass through a convolutional neural network to obtain output characteristic data of a target detection frame related to a detection target;

the approximate processing module is used for carrying out approximate processing on the output characteristic data;

and the output module is used for outputting the output characteristic data after the approximate processing.

6. The object detection device of claim 5, wherein said approximation processing module further comprises:

the normalization submodule is used for performing normalization processing on the output characteristic data based on the adjacent output characteristic data;

and the normalization approximate processing submodule is used for carrying out approximate processing on the output characteristic data after normalization processing.

7. The object detection device of claim 5 or 6, wherein the approximation module or the normalized approximation module is further configured to approximate the value to only a few bits after the reserved decimal point.

8. The object detecting device of claim 5, further comprising an object detection frame displaying module for displaying an object detection frame on the image based on the output feature data after the output.

9. A computer-readable medium for recording instructions executable by a processor, the instructions, when executed by the processor, causing the processor to perform a method of object detection, comprising the operations of:

inputting image data;

carrying out approximate processing on the output characteristic data;

and outputting the output characteristic data after the approximate processing.

10. The computer-readable medium of claim 9, wherein the operation of approximating the output characteristic data further comprises: