Disclosure of Invention
The embodiment of the invention provides an object identification method and system and electronic equipment, which at least solve the technical problems that objects cannot be stacked and placed and the identification precision is low when the objects are identified in the related technology.
According to an aspect of an embodiment of the present invention, there is provided an object recognition method including: acquiring an image of an object through an image capturing module; acquiring detection results of the object at a plurality of moments based on a single frame image of the object, wherein the detection results comprise position detection information and category detection information; acquiring movement information of pixel points in an image based on a multi-frame image of an object; obtaining a visual detection result of the object according to the detection results of the objects at the multiple moments and the movement information of the pixel points; acquiring gravity information of a preselected area by using a gravity sensor, and acquiring a gravity detection result of an object according to the gravity information; and fusing the visual detection result and the gravity detection result to determine the identification result of the object.
Further, acquiring detection results of the object at a plurality of moments based on the single frame image of the object includes: carrying out image preprocessing on single-frame images of the object frame by frame; acquiring an object detection frame and the category detection information in the single-frame image after the image preprocessing; and acquiring the position detection information of the object according to the object detection frame.
Further, the object recognition method further includes: and performing non-maximum suppression on the object detection frame.
Further, acquiring movement information of a pixel point in an image based on a multi-frame image of an object includes: performing background modeling based on multi-frame images of the object and extracting a foreground region; and performing interval sampling on the foreground region to acquire the movement information of the pixel point.
Further, the foreground region is a region containing a moving object.
Further, the foreground region is sampled at intervals by using a semi-dense optical flow method to acquire movement information of the pixel points.
Further, according to the detection results of the objects at the multiple moments and the movement information of the pixel points, obtaining the visual detection result of the object includes: generating a motion track of the object according to the detection results of the objects at the multiple moments and the movement information of the pixel points; classifying the motion trail of the object to obtain the trail state of the object; and obtaining a visual detection result of the object according to the category detection information of the object and the track state of the object.
Further, generating the motion trail of the object according to the detection results of the objects at the multiple moments and the movement information of the pixel points includes: acquiring position prediction information of the object at the time (t+1) according to the position detection information in the detection result of the object at the time t and the movement information of the pixel point; judging whether the detection result of the object at the moment (t+1) and the detection result of the object at the moment (t+1) belong to the same track according to the detection result of the object at the moment (t+1), the detection result of the object at the moment (t+1) and the position prediction information of the object at the moment (t+1), and acquiring a judgment result; and generating the motion trail of the object according to the judging result.
Further, the obtaining the position prediction information of the object at the time (t+1) according to the position detection information in the detection result of the object at the time t and the movement information of the pixel point includes: acquiring a predicted position of the object at the time (t+1) according to the position detection information in the detection result of the object at the time t and the speed of the pixel point at the time t; acquiring the speed of a pixel point at the time (t+1) according to the predicted position of the object at the time (t+1); carrying out weighted average on the speed of the pixel point at the time t and the speed of the pixel point at the time (t+1) to obtain an average speed; and acquiring the position prediction information of the (t+1) moment object according to the position detection information in the detection result of the t moment object and the average speed.
Further, classifying the motion trail of the object, and obtaining the trail state of the object includes: extracting track information in a motion track of the object, wherein the track information comprises at least one of the following: the initial position of the motion trail, the final position of the motion trail, the maximum position of the motion trail and the maximum displacement between adjacent nodes of the motion trail; and classifying the motion track of the object through a decision tree algorithm according to the track information to obtain the track state of the object.
Further, acquiring gravity information of the preselected area by using a gravity sensor, and acquiring a gravity detection result of the object according to the gravity information comprises: acquiring gravity information of a preselected area at different moments by using a gravity sensor; and according to the difference value of the gravity information at different moments, exhausting all possible gravity detection results of the object.
Further, the visual detection result and the gravity detection result are matched, different weights are given to the gravity detection result according to the matching degree of the visual detection result and the gravity detection result, and the gravity detection result with the highest weight is selected as a final gravity detection result, so that the identification result of the object is determined.
Further, according to the detection result of the object at the time t, (t+1) and the position prediction information of the object at the time (t+1), judging whether the detection result of the object at the time (t+1) and the detection result of the object at the time t belong to the same track, and acquiring the judgment result includes: judging whether the type of the object at the moment t is the same as the type of the object at the moment (t+1) according to the type detection information in the detection result of the object at the moment t and the type detection information in the detection result of the object at the moment (t+1); judging whether the distance between the position prediction information of the object at the time (t+1) and the position detection information of the object at the time (t+1) is smaller than a preset threshold value according to the position detection information in the detection result of the object at the time (t+1) and the position prediction information of the object at the time (t+1); when the category of the object at the time t is the same as the category of the object at the time (t+1), and the distance between the position prediction information of the object at the time (t+1) and the position detection information of the object at the time (t+1) is smaller than a preset threshold, the judgment result indicates that the detection result of the object at the time (t+1) and the detection result of the object at the time t belong to the same track.
Further, the trajectory state of the object comprises at least one of: false detection, true placement, true taking, suspected taking and suspected placement.
According to another aspect of an embodiment of the present invention, there is also provided an object recognition system including: the image capturing module is used for acquiring an image of an object; the object detection module is used for acquiring detection results of the object at a plurality of moments based on a single frame image of the object, wherein the detection results comprise position detection information and category detection information; the pixel detection module is used for acquiring the movement information of the pixel points in the image based on the multi-frame image of the object; the visual result acquisition module is used for acquiring visual detection results of the object according to the detection results of the object at the multiple moments and the movement information of the pixel points; the gravity result acquisition module is used for acquiring gravity information of a preselected area and acquiring a gravity detection result of the object according to the gravity information; and the fusion module is used for fusing the visual detection result and the gravity detection result to determine the identification result of the object.
Further, the object detection module includes: the image preprocessing module is used for carrying out image preprocessing on single-frame images of the object frame by frame; the first information acquisition module is used for acquiring object detection frames and category detection information in the single-frame image after the image preprocessing; and the second information acquisition module is used for acquiring the position detection information of the object according to the object detection frame.
Further, the pixel detection module: the foreground extraction module is used for carrying out background modeling based on multi-frame images of the object and extracting a foreground region; and the sampling module is used for sampling the foreground area at intervals to acquire the movement information of the pixel points.
Further, the visual result acquisition module includes: the track generation module is used for generating a motion track of the object according to the detection results of the objects at the multiple moments and the movement information of the pixel points; the track classification module is used for classifying the motion track of the object to obtain the track state of the object; a visual analysis module; and the visual detection result of the object is obtained according to the category detection information of the object and the track state of the object.
Further, the gravity result obtaining module includes: the gravity sensor is used for acquiring gravity information at different moments; and the gravity analysis module is used for exhausting all possible gravity detection results of the object according to the difference value of the gravity information at different moments.
Further, the fusing the visual detection result and the gravity detection result to determine the identification result of the object includes: and matching the visual detection result with the gravity detection result, giving different weights to the gravity detection result according to the matching degree of the visual detection result and the gravity detection result, and selecting the gravity detection result with the highest weight as a final gravity detection result so as to determine the identification result of the object.
According to another aspect of the embodiment of the present invention, there is also provided an electronic device, including: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform the object recognition method of any one of the above via execution of the executable instructions.
According to another aspect of the embodiment of the present invention, there is further provided a storage medium, where the storage medium includes a stored program, and when the program runs, the device where the storage medium is controlled to execute any one of the object recognition methods described above.
In the embodiment of the invention, an image of an object is acquired through an image capturing module; acquiring detection results of the object at a plurality of moments based on a single frame image of the object, wherein the detection results comprise position detection information and category detection information; acquiring movement information of pixel points in an image based on a multi-frame image of an object; obtaining a visual detection result of the object according to the detection results of the objects at the multiple moments and the movement information of the pixel points; acquiring gravity information of a preselected area by using a gravity sensor, and acquiring a gravity detection result of an object according to the gravity information; and fusing the visual detection result and the gravity detection result to determine the identification result of the object. In this embodiment, even if the object is blocked, the gravity detection result obtained based on the gravity information can be still corrected based on the visual detection result obtained based on the image, so that the problem of poor accuracy caused by simply using the visual detection result to identify the object is solved, the type and number of the object can be accurately identified, the technical problems that the object cannot be stacked and placed and the identification accuracy is low when the object is identified in the related art are solved, and the space utilization rate of the object storage device is improved.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The invention is illustrated below by means of a detailed example.
The embodiment of the invention can be applied to the fields of new retail and the like, and the specific application range can be in the areas of intelligent containers, intelligent cabinets, markets, supermarkets and the like, and the intelligent containers are used for schematically describing the invention, but the invention is not limited to the areas.
FIG. 1 is a flow chart of an alternative object recognition method according to an embodiment of the invention, as shown in FIG. 1, comprising the steps of:
step S102, obtaining an image of an object through an image capturing module;
Step S104, acquiring detection results of the object at a plurality of moments based on a single frame image of the object, wherein the detection results comprise position detection information and category detection information;
step S106, obtaining the movement information of pixel points in the image based on the multi-frame image of the object;
Step S108, according to detection results of the object at a plurality of moments and movement information of the pixel points, visual detection results of the object are obtained;
step S110, acquiring gravity information of a preselected area by using a gravity sensor, and acquiring a gravity detection result of an object according to the gravity information;
Step S112, fusing the visual detection result and the gravity detection result to determine the identification result of the object.
According to the steps, on one hand, the detection results of the object at a plurality of moments obtained based on a single frame image are combined with the movement information of the pixel points obtained based on a plurality of frames of images, so that the obtained visual detection results comprise the movement state of the object, and the false recognition rate of the object is reduced; on the other hand, the gravity detection result based on the gravity information can be corrected based on the visual detection result obtained by the image, so that the problem of poor accuracy caused by simply using the visual detection result to identify the object is solved, the types and the numbers of the object can be accurately identified, and the technical problems that the objects cannot be stacked and placed and the identification accuracy is low when the objects are identified in the related technology are solved.
The above steps are described in detail below.
Step S102, an image of the object is acquired through an image capturing module.
In the present application, optionally, in the embodiment of the present application, the image capturing module may be a general camera or a camera, for example, an RGB camera, an infrared camera, a Mono camera, or the like. Of course, the person skilled in the art can adjust the types and the number of the image capturing modules according to the actual requirements without being limited to the examples given herein, the image capturing modules can be installed in areas such as a container or a mall, the number of the image capturing modules is at least one, and when the number of the image capturing modules is more than 2, the same type of image capturing module or a combination of different types of image capturing modules can be used; each image capturing module can capture at least two images, and when in recognition, the image capturing time points between the image capturing modules can be unified, namely, a plurality of images at the same time point are respectively analyzed so as to recognize an object from a plurality of angles.
Alternatively, the number of objects is at least one, and the objects may be placed in an object storage device, e.g. in a smart container. Object storage devices include, but are not limited to: an intelligent container.
In an alternative embodiment, acquiring an image of an object by an image capture module includes: starting an image capturing module to acquire a video of an object; an image of the object is taken from the video. The video in the object storage device can be acquired in real time through the image capturing module after the object storage device is opened, and the image of the object is intercepted from the video after the object storage device is closed or the fetching action of a user is detected to stop.
Step S104, acquiring detection results of the object at a plurality of moments based on a single frame image of the object, wherein the detection results comprise position detection information and category detection information.
Optionally, step S104 may include: and carrying out image preprocessing on single-frame images of the object frame by frame. Wherein the image preprocessing includes at least one of: image enhancement, image scaling, image subtraction; acquiring object detection frames and category detection information in a single frame image subjected to image preprocessing, wherein the object detection frames comprise at least one object; and acquiring position detection information of the object according to the object detection frame.
Alternatively, before acquiring the object detection frame and the class detection information in the single frame image after the image preprocessing, a plurality of object candidate frames (priority boxes) may be extracted, and then the object candidate frames may be subjected to deep learning and analysis to acquire the object detection frame and the class detection information of the object.
Optionally, the step S104 may further include: non-maximum suppression (Non Maximum Suppress ion, NMS) is performed on the object detection frame to prevent false detection and improve the recognition accuracy of the object. The method comprises the steps of preprocessing an image when an object in the image is identified, including operations such as image enhancement, scaling, mean reduction and the like, extracting an object detection frame, and carrying out non-maximum suppression on the extracted object detection frame so as to prevent false detection and improve the identification accuracy of the object.
Of course, in another alternative embodiment, in order to reduce the calculation amount and improve the recognition efficiency of the object, the single frame image of the object may be processed by a frame skipping manner, that is, the single frame image of the object is processed at a certain interval to obtain the detection results of the object at multiple moments.
Step S106, obtaining the movement information of the pixel points in the image based on the multi-frame image of the object.
Optionally, step S106 includes: performing background modeling based on the multi-frame images of the object and extracting a foreground region, wherein the foreground region is a region containing a moving object; the foreground region is sampled at intervals to obtain movement information of the pixel points, wherein the movement information can include, but is not limited to, displacement, speed and the like. By extracting the foreground region, the area of the sampled region can be reduced, so that the calculated amount is reduced, and the recognition efficiency of the object is improved.
In an alternative embodiment, background modeling and extracting foreground regions based on multiple frames of images of the object comprises: storing a predetermined number of pixel values for each coordinate in each frame of the image; comparing the pixel value stored in each coordinate in the current frame with the pixel value stored in the coordinate corresponding to the historical frame, and judging that the coordinate is background if the number of the pixel values of a certain coordinate in the current frame which is the same as the number of the pixel values of the corresponding coordinate in the historical frame is larger than a first threshold value; and fitting a background area according to the information of the coordinates determined to be the background so as to realize background modeling and extract a foreground area. Alternatively, background modeling and foreground region extraction can be implemented by combining a background difference method. Typically, the first frame is defaulted to background.
In an alternative embodiment, the foreground region may be sampled at intervals using a semi-dense optical flow method to obtain movement information for the pixel points. The semi-dense optical flow method is a pixel-level image registration method for carrying out point-by-point matching on an image, is different from a sparse optical flow method which only aims at a plurality of characteristic points on the image and is also different from a dense optical flow method which aims at all points on the image, is only used for carrying out interval sampling on a foreground area, and then obtains the movement information of the pixel points by calculating an optical flow field of the semi-dense pixel points. The semi-dense optical flow method combines the characteristics of high calculation speed of the sparse optical flow method and strong adaptability of the dense optical flow method, can overcome the problem that effective pixels cannot be extracted by the sparse optical flow method under a low light condition, and can also overcome the problem of low calculation speed of the dense optical flow method.
Step S108, according to the detection results of the objects at a plurality of moments and the movement information of the pixel points, the visual detection results of the objects are obtained.
Optionally, step S108 includes: the detection result of the object is combined with the movement information of the pixel points, so that a complete movement track can be obtained; by classifying the motion trail, the trail state of the object can be obtained; combining the category detection information of the object and the track state of the object, and obtaining a more accurate visual detection result. By acquiring the complete motion trail, errors in position detection information acquired from the image can be overcome and corrected to some extent.
Step S110, acquiring gravity information of a preselected area by using a gravity sensor, and acquiring a gravity detection result of the object according to the gravity information.
Optionally, step S110 includes: acquiring gravity information of a preselected area at different moments by using a gravity sensor; according to the difference value of the gravity information at different moments, such as the difference value of the initial gravity stability value when the object storage device is opened and the end gravity stability value when the object storage device is closed, all possible gravity detection results of the object are exhausted. The gravity detection result may include determining whether to pick up or put in the commodity, and the type and number of the commodity. An exhaustive approach may be to traverse all possible results in a permutation and combination based on the differences in gravity information at different times. For example, if the difference between the end gravity stability value and the initial gravity stability value is-500 grams, then the likelihood of exhaustion from all object information in the object storage device includes: (1) taking 1 bottle of 500ml cola; (2) taking 1 bag of 500g of Youguan crisp biscuits; (3) Take 1 can of 200g of Anmousse yogurt and 1 bag of 300g of double sausage. Accordingly, if the difference between the end gravity stable value and the initial gravity stable value is +500 g, the possible exhaustion of all object information in the object storage device includes: (1) placing 1 bottle of 500ml cola; (2) putting 1 bag of 500g of Youguan crisp biscuits; (3) Into 1 can 200g of Anmousse yogurt and 1 bag of 300g of double sausage were placed.
Alternatively, the gravity information of the preselected area may be obtained in real time or at predetermined intervals by the gravity sensor after the object storage device is turned on, and the gravity information of the object is stopped after the object storage device is turned off or after the user's taking action is detected to be stopped.
Alternatively, the gravity sensor may be disposed in the object storage device, for example, when the object storage device is an intelligent container, the gravity sensor may be disposed on each layer of shelves of the intelligent container, where each layer of shelves is a preselected area.
Of course, the person skilled in the art may adjust the type and number of the gravity sensors according to actual demands without being limited to the examples given herein, and when the number of the gravity sensors is 2 or more, the same type or a combination of different types of gravity sensors may be used. .
Step S112, fusing the visual detection result and the gravity detection result to determine the identification result of the object.
Optionally, step S112 includes: and matching the visual detection result with the gravity detection result, giving different weights to the gravity detection result according to the matching degree of the visual detection result and the gravity detection result, generally giving higher weight to the gravity detection result with higher matching degree with the visual detection result, giving lower weight to the gravity detection result with higher matching degree with the visual detection result, and then selecting the gravity detection result with the highest weight as the final gravity detection result so as to determine the identification result of the object. The recognition result of the object may include whether to take or put the object, the object category, the number of objects and the specific name of each object category, and the like. Optionally, the object categories in the embodiment of the present invention include, but are not limited to: vegetables, fruits, snacks, fresh meats, seafood, and the like.
Because the situation that the object is blocked possibly exists, at the moment, the blocked object cannot be accurately analyzed simply through the visual detection result, and therefore, the false recognition rate can be further reduced and the probability of missing recognition can be reduced by fusing the gravity detection result and the visual detection result of the object, so that an accurate object recognition result can be obtained.
Through the steps, even if the object is shielded, the gravity detection result based on the gravity information can be corrected, so that the problem of poor accuracy caused by simply using the vision detection result to identify the object is solved, the types and the numbers of the object can be accurately identified, the technical problems that the object cannot be stacked and placed and the identification accuracy is low in the related art when the object is identified are solved, and the space utilization rate of the object storage device is improved.
FIG. 2 is a flowchart of an alternative method for obtaining visual inspection results of an object according to an embodiment of the invention, as shown in FIG. 2, the method comprising the steps of:
step S202, generating a motion track of an object according to detection results of the object at a plurality of moments and the movement information of the pixel points;
step S204, classifying the motion trail of the object to obtain the trail state of the object;
step S206, according to the category detection information of the object and the track state of the object, a visual detection result of the object is obtained.
The method comprises the steps of firstly combining the detection result of the object with the movement information of the pixel points to obtain a complete movement track, then classifying the movement track to obtain the track state of the object, overcoming and correcting errors in position detection information acquired according to the image to a certain extent, and then combining the category detection information of the object to obtain a more accurate visual detection result.
The above steps are described in detail below.
Step S202, according to the detection results of the object at a plurality of moments and the movement information of the pixel points, generating the movement track of the object.
In an alternative embodiment, step S202 includes: acquiring position prediction information of the object at the time (t+1) according to the position detection information in the detection result of the object at the time t and the movement information of the pixel point; judging whether the detection result of the object at the moment (t+1) and the detection result of the object at the moment (t+1) belong to the same track according to the detection result of the object at the moment (t+1), the detection result of the object at the moment (t+1) and the position prediction information of the object at the moment (t+1), and acquiring a judgment result; and generating the motion trail of the object according to the judging result. If the judging result shows that the object belongs to the same track, connecting the position of the object at the moment (t+1) with the position of the object at the moment t to generate a motion track of the object; if the judgment result shows that the object does not belong to the same track, a new motion track is created according to the position of the object at the moment (t+1).
Specifically, according to the detection result of the object at the time t, (t+1) the detection result of the object at the time t and the position prediction information of the object at the time (t+1), judging whether the detection result of the object at the time (t+1) and the detection result of the object at the time t belong to the same track, and acquiring the judgment result includes: judging whether the types of the objects are the same according to the type detection information in the detection result of the object at the moment t and the type detection information in the detection result of the object at the moment (t+1); judging whether the distance between the position prediction information of the object at the time (t+1) and the position detection information of the object at the time (t+1) is smaller than a preset threshold value according to the position detection information in the detection result of the object at the time (t+1) and the position prediction information of the object at the time (t+1); when the type of the object at the time t is the same as the type of the object at the time (t+1), and the distance between the position prediction information of the object at the time (t+1) and the position detection information of the object at the time (t+1) is smaller than a preset threshold value, the judgment result shows that the detection result of the object at the time (t+1) and the detection result of the object at the time t belong to the same track; otherwise, it means not belonging to the same track.
In another alternative embodiment, to obtain more accurate position prediction information of the object at the time (t+1), according to the position detection information in the detection result of the object at the time t and the movement information of the pixel point, obtaining the position prediction information of the object at the time (t+1) includes: acquiring a predicted position of the object at the time (t+1) according to position detection information in a detection result of the object at the time t and the speed of the pixel point at the time t; acquiring the speed of a pixel point at the time (t+1) according to the position prediction information of the object at the time (t+1); the speed of the pixel point at the moment t and the speed of the pixel point at the moment (t+1) are weighted and averaged to obtain an average speed; and acquiring the position prediction information of the object at the time (t+1) according to the position detection information and the average speed in the detection result of the object at the time t.
Step S204, classifying the motion trail of the object to obtain the trail state of the object.
Optionally, step S204 includes: extracting track information in a motion track of an object, wherein the track information comprises at least one of the following: the initial position of the motion trail, the final position of the motion trail, the maximum position of the motion trail and the maximum displacement between adjacent nodes of the motion trail; and classifying the motion trail of the object through a decision tree algorithm according to the information, and obtaining the trail state of the object. The track state of the object may include false detection, true placement, true taking, suspected placement, and the like.
Step S206, according to the category detection information of the object and the track state of the object, a visual detection result of the object is obtained.
Alternatively, the visual inspection results of objects can be divided into two categories: (1) a confirmed detection result; (2) possible complementary detection results. Wherein, the confirmed detection result comprises: certain objects are put in and taken out; possible additional detection results include: suspected taking some objects, suspected putting some objects, and the like.
Therefore, by combining the track state of the object, the error in the position detection information acquired according to the image can be overcome and corrected to a certain extent, and then by combining the category detection information of the object, a more accurate visual detection result can be obtained.
According to another aspect of the embodiment of the invention, there is also provided an object recognition system. FIG. 3 is a block diagram of an alternative object recognition system in accordance with an embodiment of the present invention.
As shown in fig. 3, the system may include: an image capturing module 30, an object detecting module 31, a pixel detecting module 32, a visual result obtaining module 33, a gravity result obtaining module 34 and a fusion module 35.
An image capturing module 30 for acquiring an image of an object. Alternatively, the image capturing module 30 may be a general camera or a video camera, for example, an RGB camera, an infrared camera, a Mono camera, or the like. Of course, the person skilled in the art may adjust the types and the number of the image capturing modules 30 according to the actual needs without being limited to the examples given herein, the image capturing modules 30 may be installed in a container or a mall or the like, the number of the image capturing modules 30 is at least one, when the number of the image capturing modules 30 is 2 or more, the same type of image capturing module 30 or a combination of different types of image capturing modules 30 may be used; each image capturing module 30 may capture at least two images, and at the time of recognition, the image capturing time points between the image capturing modules 30 may be unified, that is, a plurality of images at the same time point may be respectively analyzed to recognize an object from a plurality of angles.
Alternatively, the number of objects is at least one, and the objects may be placed in an object storage device, e.g. in a smart container. Object storage devices include, but are not limited to: an intelligent container.
In an alternative embodiment, capturing an image of an object by image capture module 30 includes: turning on the image capturing module 30 to acquire a video of the object; an image of the object is taken from the video. I.e. the video in the object storage device is acquired in real time by the image capturing module 30 after the object storage device is opened, and the image of the object is captured from the video after the object storage device is closed or the user's taking action is detected to stop.
The object detection module 31 is configured to obtain detection results of the object at a plurality of moments based on a single frame image of the object, where the detection results include position detection information and category detection information. The object detection module 31 may communicate with the object detection module 31 by wired or wireless means to acquire an object image captured by the image capturing module 30.
In an alternative embodiment, the object detection module 31 may include an image preprocessing module 310, a first information acquisition module 312, and a second information acquisition module 314. The image preprocessing module 310 is used for performing image preprocessing on single-frame images of objects frame by frame. Wherein the image preprocessing includes at least one of: image enhancement, image scaling, image subtraction. The first information obtaining module 312 is configured to obtain object detection frames and category detection information in a single frame image after image preprocessing, where at least one object is included in the object detection frames. The second information acquisition module 314 is configured to acquire position detection information of the object according to the object detection frame.
Optionally, the first information obtaining module 312 may further include a candidate frame extracting module 3120 and a detection frame analyzing module 3122. The frame extraction module 3120 is configured to extract a plurality of object candidate frames (priority boxes) from a single frame image after image preprocessing. The detection frame analysis module 3122 is configured to perform deep learning and analysis on the object candidate frame to obtain object detection frame and class detection information of the object.
Optionally, the object detection module 31 may further include a detection frame processing module 313, configured to perform non-maximum suppression (Non Maximum Suppression, NMS) on the object detection frame acquired by the first information acquisition module 312, so as to prevent false detection and improve the recognition accuracy of the object.
Of course, it will be appreciated by those skilled in the art that, in another alternative embodiment, in order to reduce the amount of calculation and improve the recognition efficiency of the object, the image preprocessing module 310 may process the single frame image of the object in a frame skipping manner, that is, process the single frame image of the object at intervals to obtain the detection results of the object at multiple moments.
The pixel detection module 32 is configured to obtain movement information of a pixel point in an image based on a multi-frame image of an object.
In an alternative embodiment, pixel detection module 32 includes a foreground extraction module 320 and a sampling module 322. The foreground extraction module 320 models the background based on the multi-frame image of the object and extracts a foreground region, wherein the foreground region is a region containing a moving object. The sampling module 322 is configured to sample the foreground region at intervals to obtain movement information of the pixel point, where the movement information may include, but is not limited to, displacement, speed, and the like. By extracting the foreground region, the area of the sampled region can be reduced, so that the calculated amount is reduced, and the recognition efficiency of the object is improved.
In an alternative embodiment, background modeling and extracting foreground regions based on multiple frames of images of the object comprises: storing a predetermined number of pixel values for each coordinate in each frame of the image; comparing the pixel value stored in each coordinate in the current frame with the pixel value stored in the coordinate corresponding to the historical frame, and judging that the coordinate is background if the number of the pixel values of a certain coordinate in the current frame which is the same as the number of the pixel values of the corresponding coordinate in the historical frame is larger than a first threshold value; and fitting a background area according to the information of the coordinates determined to be the background so as to realize background modeling and extract a foreground area. Alternatively, background modeling and foreground region extraction can be implemented by combining a background difference method. Typically, the first frame is defaulted to background.
In an alternative embodiment, the sampling module 322 may use a semi-dense optical flow method to sample the foreground region at intervals to obtain the movement information of the pixel points. The semi-dense optical flow method is a pixel-level image registration method for carrying out point-by-point matching on an image, is different from a sparse optical flow method which only aims at a plurality of characteristic points on the image and is also different from a dense optical flow method which aims at all points on the image, is only used for carrying out interval sampling on a foreground area, and then obtains the movement information of the pixel points by calculating an optical flow field of the semi-dense pixel points. The semi-dense optical flow method combines the characteristics of high calculation speed of the sparse optical flow method and strong adaptability of the dense optical flow method, can overcome the problem that effective pixels cannot be extracted by the sparse optical flow method under a low light condition, and can also overcome the problem of low calculation speed of the dense optical flow method.
The visual result obtaining module 33 is configured to obtain a visual detection result of the object according to the detection results of the object at a plurality of moments and the movement information of the pixel points.
In an alternative embodiment, visual result acquisition module 33 includes a trajectory generation module 330, a trajectory classification module 332, and a visual analysis module 334. The track generation module 330 is configured to generate a motion track of the object according to detection results of the object at multiple moments and movement information of the pixel points. The track classification module 332 is configured to classify a motion track of an object, and obtain a track state of the object. The vision analysis module 334 is configured to obtain a vision detection result of the object according to the category detection information of the object and the track state of the object.
In an alternative embodiment, the method for generating the motion trail of the object according to the detection results of the object at a plurality of moments and the movement information of the pixel points includes: acquiring position prediction information of the object at the time (t+1) according to the position detection information in the detection result of the object at the time t and the movement information of the pixel point; judging whether the detection result of the object at the moment (t+1) and the detection result of the object at the moment (t+1) belong to the same track according to the detection result of the object at the moment (t+1), the detection result of the object at the moment (t+1) and the position prediction information of the object at the moment (t+1), and acquiring a judgment result; generating a motion track of the object according to the judging result, for example, if the judging result indicates that the object belongs to the same track, connecting the position of the object at the moment (t+1) with the position of the object at the moment t to generate the motion track of the object; if the judgment result shows that the object does not belong to a track, a new motion track is created according to the position of the object at the moment (t+1).
Specifically, according to the detection result of the object at the time t, (t+1) the detection result of the object at the time t and the position prediction information of the object at the time (t+1), judging whether the detection result of the object at the time (t+1) and the detection result of the object at the time t belong to the same track, and acquiring the judgment result includes: judging whether the types of the objects are the same according to the type detection information in the detection result of the object at the moment t and the type detection information in the detection result of the object at the moment (t+1); secondly, judging whether the distance between the position prediction information of the object at the time (t+1) and the position detection information of the object at the time (t+1) is smaller than a preset threshold value according to the position detection information in the detection result of the object at the time (t+1) and the position prediction information of the object at the time (t+1); when the type of the object at the time t is the same as the type of the object at the time (t+1), and the distance between the position prediction information of the object at the time (t+1) and the position detection information of the object at the time (t+1) is smaller than a preset threshold value, the judgment result shows that the detection result of the object at the time (t+1) and the detection result of the object at the time t belong to the same track; otherwise, it means not belonging to the same track.
In another alternative embodiment, to obtain more accurate position prediction information of the object at the time (t+1), according to the position detection information in the detection result of the object at the time t and the movement information of the pixel point, obtaining the position prediction information of the object at the time (t+1) includes: acquiring a predicted position of the object at the time (t+1) according to position detection information in a detection result of the object at the time t and the speed of the pixel point at the time t; acquiring the speed of a pixel point at the time (t+1) according to the position prediction information of the object at the time (t+1); the speed of the pixel point at the moment t and the speed of the pixel point at the moment (t+1) are weighted and averaged to obtain an average speed; and acquiring the position prediction information of the object at the time (t+1) according to the position detection information and the average speed in the detection result of the object at the time t.
In an alternative embodiment, classifying the motion trajectory of the object, obtaining the trajectory state of the object includes: extracting track information in a motion track of an object, wherein the track information comprises at least one of the following: the initial position of the motion trail, the final position of the motion trail, the maximum position of the motion trail and the maximum displacement between adjacent nodes of the motion trail; and classifying the motion trail of the object through a decision tree algorithm according to the information, and obtaining the trail state of the object. The track state of the object may include false detection, true placement, true taking, suspected placement, and the like.
Alternatively, the visual inspection results of objects can be divided into two categories: (1) a confirmed detection result; (2) possible complementary detection results. Wherein, the confirmed detection result comprises: certain objects are put in and taken out; possible additional detection results include: suspected taking some objects, suspected putting some objects, and the like.
The gravity result obtaining module 34 is configured to obtain gravity information, and obtain a gravity detection result of the object according to the gravity information.
In an alternative embodiment, gravity result acquisition module 34 includes a gravity sensor 340 and a gravity analysis module 342. The gravity sensor 340 is used for acquiring gravity information at different moments. The gravity analysis module 342 exhausts all possible gravity detection results of the object according to the difference value of the gravity information at different moments, such as the difference value between the initial gravity stability value when the object storage device is opened and the final gravity stability value when the object storage device is closed. The gravity detection result may include determining whether to pick up or put in the commodity, and the type and number of the commodity. An exhaustive approach may be to traverse all possible results in a permutation and combination based on the differences in gravity information at different times. For example, if the difference between the end gravity stability value and the initial gravity stability value is-500 grams, then the likelihood of exhaustion from all object information in the object storage device includes: (1) taking 1 bottle of 500ml cola; (2) taking 1 bag of 500g of Youguan crisp biscuits; (3) Take 1 can of 200g of Anmousse yogurt and 1 bag of 300g of double sausage. Accordingly, if the difference between the end gravity stable value and the initial gravity stable value is +500 g, the possible exhaustion of all object information in the object storage device includes: (1) placing 1 bottle of 500ml cola; (2) putting 1 bag of 500g of Youguan crisp biscuits; (3) Into 1 can 200g of Anmousse yogurt and 1 bag of 300g of double sausage were placed.
Alternatively, the gravity information of the preselected area may be acquired in real time or at predetermined intervals by the gravity sensor 340 after the object storage device is turned on, and the gravity sensor 340 stops acquiring the gravity information of the object after the object storage device is turned off or after the user's taking action is detected to stop.
Alternatively, the gravity sensor may be disposed in the object storage device, for example, when the object storage device is an intelligent container, the gravity sensor may be disposed on each layer of shelves of the intelligent container, where each layer of shelves is a preselected area.
Of course, the person skilled in the art may adjust the type and number of the gravity sensors 340 according to actual needs without being limited to the examples given herein, and when the number of the gravity sensors 340 is 2 or more, the same type or a combination of different types of gravity sensors 340 may be used. The gravity sensor 340 may be disposed in an object storage device.
And the fusion module 35 is used for fusing the visual detection result and the gravity detection result to determine the identification result of the object.
Optionally, the fusion module 35 assigns different weights to the gravity detection results according to the matching degree of the two, typically, assigns higher weight to the gravity detection result with higher matching degree to the vision detection result, assigns lower weight to the gravity detection result with higher matching degree to the vision detection result, and then selects the gravity detection result with the highest weight as the final gravity detection result. The recognition result of the object may include whether to take or put the object, the object category, the number of objects and the specific name of each object category, and the like. Optionally, the object categories in the embodiment of the present invention include, but are not limited to: vegetables, fruits, snacks, fresh meats, seafood, and the like.
Because the situation that the object is blocked possibly exists, at the moment, the blocked object cannot be accurately analyzed simply through the visual detection result, and therefore, the false recognition rate can be further reduced and the probability of missing recognition can be reduced by fusing the gravity detection result and the visual detection result of the object, so that an accurate object recognition result can be obtained.
Through the steps, even if the object is shielded, the gravity detection result based on the gravity information can be corrected, so that the problem of poor accuracy caused by simply using the vision detection result to identify the object is solved, the types and the numbers of the object can be accurately identified, the technical problems that the object cannot be stacked and placed and the identification accuracy is low in the related art when the object is identified are solved, and the space utilization rate of the object storage device is improved.
According to another aspect of the embodiment of the present invention, there is also provided an electronic device, including: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform the object recognition method of any of the above via execution of the executable instructions.
According to another aspect of the embodiment of the present invention, there is also provided a storage medium, where the storage medium includes a stored program, and when the program runs, the device on which the storage medium is controlled to execute the object recognition method of any one of the above items.
The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
In the foregoing embodiments of the present invention, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed technology may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, for example, may be a logic function division, and may be implemented in another manner, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention.