Detailed Description
The following detailed description illustrates embodiments of the disclosure and the manner in which the embodiments may be implemented. While the best mode for carrying out the disclosure has been disclosed, those skilled in the art will recognize that other embodiments for carrying out or practicing the disclosure are also possible.
Referring to fig. 1, a retail store environment 100 is shown in which various embodiments of the present disclosure may be practiced. The retail store environment 100 includes first through third shelves 102 a-102 c for storing and displaying one or more items. The retail store environment 100 also includes first through third cashier terminals 104a through 104c, respectively equipped with first through third cashiers 106a through 106c, to scan and bill items in the shopping carts of the respective customers. The retail store environment 100 also includes an SCO surface area 108, the SCO surface area 108 including one or more SCO terminals to enable respective customers to each scan and bill items present in their shopping carts. The SCO surface area 108 is explained in further detail with reference to FIG. 2.
Fig. 2 illustrates a central control unit 200 for operating the SCO surface area 108 of a retail store in accordance with an embodiment of the present disclosure. The SCO surface area 108 includes first through fifth SCO terminals 202a through 202e (hereinafter collectively SCO terminals 202), corresponding first through fifth customers 204a through 204e and their corresponding first through fifth shopping carts 206a through 206e, a central camera 208, and one or more store caretakers 210.
In embodiments of the present disclosure, the various components of the SCO surface region 108 may be communicatively coupled with the central control unit 200 through a communication network. The communication network may be any suitable wired network, wireless network, a combination of these networks, or any other conventional network, without limiting the scope of the present disclosure. Some examples may include a Local Area Network (LAN), a wireless LAN connection, an internet connection, a point-to-point connection, or other network connection, and combinations thereof. By way of example, the network may comprise a mobile communication network, such as a 2G, 3G, 4G or 5G mobile communication network. The communication network may be coupled to one or more other networks to provide coupling between a greater number of devices. This may be the case, for example, where the networks are coupled together via the internet.
Each SCO terminal 202a to 202e is equipped with a scanner for enabling the respective customer to scan one or more items themselves, and a user display for enabling the user to make the necessary selections and payments for one or more items. For example, the scanner may be a bar code scanner for scanning a bar code of an item to identify the item. Preferably, the scanner is a fixed wall or desktop mounted scanner designed for checkout counters in supermarkets and other retail stores for scanning items placed in the scanning zone. In the context of the present disclosure, a scanning zone is an area in front of a scanner in which a user places items for scanning to purchase the items. Each SCO terminal 202 a-202 e may include a processor (not shown) for recording a scan of one or more items and providing instructions on a corresponding user display for payment of the one or more scanned items. In embodiments of the present disclosure, the processor of each SCO terminal 202 a-202 e may be communicatively coupled with the central control unit 200 to enable the central control unit 200 to control the operation of the SCO terminal 202 and also process information captured by the central camera 208.
In an embodiment of the present disclosure, each SCO terminal 202 a-202 e is equipped with one or more overhead cameras 207 a-207 e, respectively, to continuously capture the scan area of the corresponding SCO terminal 202 a-202 e in order to detect scanning violations due to mismatch between the item to be scanned taken by the user and the item actually scanned at each SCO terminal 202 a-202 e. Scanning violations may occur when items identified to be scanned during a predetermined time interval are not present in a scanned item list generated by a scanner during the corresponding interval. For example, a user may place an item in the scanning area of the scanner, but the user may take the item in a manner such that the bar code scanner cannot see the bar code of the item. In this case, the user may place the item in their shopping bag after performing the scanning action, but in reality the scanner may not scan the item and the user may not receive a bill for the item. In embodiments of the present disclosure, the overhead cameras 207 a-207 e may be communicatively coupled to the central control unit 200 such that the central control unit 200 is configured to control the operation of the overhead cameras 207 a-207 e and also process information captured by the cameras 208.
The central camera 208 is configured to generate an overview image of the entire SCO surface area 108. Examples of the central camera 208 include an overhead 360 ° camera, a 180 ° camera, and the like. In embodiments of the present disclosure, the central camera 208 may be communicatively coupled to the central control unit 200 to enable the central control unit 200 to control the operation of the central camera 208 and also process information captured by the central camera 208. The central camera 208 may facilitate customer experience promotion in the SCO surface area 108, for example, the central camera 208 may detect customers with children or shopping cart overflow at entry points in the SCO surface area 108, and store caretakers 210 may be alerted to provide support during checkout. If no attendant is available, support may be provided preferentially when an attendant becomes available.
Although not shown, the central control unit 200 may be communicatively coupled to a computing device of the store caretaker 210 to issue an alarm/notification or instruction to the computing device.
In embodiments of the present disclosure, the various components of the SCO surface region 108 may be communicatively coupled to the central control unit 200 through a communication network. Without limiting the scope of this disclosure, the communication network may be any suitable wired network, wireless network, combination of these or any other conventional network. Some examples may include LAN connections, wireless LAN connections, internet connections, point-to-point connections, or other network connections, and combinations thereof. By way of example, the network may comprise a mobile communication network, for example a 2G, 3G, 4G or 5G mobile communication network. The communication network may be coupled to one or more other networks to provide coupling between a greater number of devices. This may be the case, for example, where the networks are coupled together via the internet.
In an embodiment of the present disclosure, the central control unit 200 includes a central processing unit 214, a memory 216, and an operation panel 218. The central processing unit 214 includes a processor, computer, microcontroller, or other circuitry that controls the operation of various components, such as the operation panel 218 and the memory 216. The central processing unit 214 may execute software, firmware, and/or other instructions stored, for example, on volatile or non-volatile memory (such as memory 216) or otherwise provided to the central processing unit 214. The central processing unit 214 may be connected to the operating panel 218 and the memory 216 by a wired or wireless connection, such as one or more system buses, cables, or other interfaces.
The operation panel 218 may be a user interface and may take the form of a physical keyboard or touch screen. The operator panel 218 may receive input from one or more users regarding the functions, preferences, and/or authentications it selects, and may provide and/or receive input visually and/or audibly.
In addition to storing instructions and/or data used by the central processing unit 214, the memory 216 may also include user information associated with one or more operators of the SCO surface area 108. For example, the user information may include authentication information (e.g., a username/password pair), user preferences, and other user-specific information. The central processing unit 214 may access this data to help provide control functions (e.g., send and/or receive one or more control signals) related to the operation of the operation panel 218 and the memory 216.
In an embodiment of the present disclosure, the central processing unit 214 is configured to detect one or more scanning violation conditions based on information received from the scanners of the head top cameras 207 a-207 e and the SCO terminals 202 a-202 e and to lock the respective one or more SCO terminals 202 a-202 e based on the detected scanning violation conditions, i.e. the customer cannot continue the product scanning. Once locked, the central processing unit 214 may alert the store caretaker 210 accordingly. In the context of the present disclosure, the SCO caretaker 210 can manually verify whether the reported scanning violation condition is valid.
In an embodiment of the present disclosure, the central processing unit 214 is configured to automatically lock the SCO terminal, e.g., the first SCO terminal 202a, based on the lock status of the other SCO terminals. In one example, the central processing unit 214 is configured to lock the first SCO terminal 202a when a scanning violation condition is detected in the first SCO terminal 202a and when the number of other SCO terminals that have been locked (e.g., the second SCO terminal 202b and the third SCO terminal 202 c) is less than a first threshold. When the number of other SCO terminals that have been locked is greater than the first threshold, then the central processing unit 214 disables locking of the first SCO terminal 202a unless the scanning violation detected at the first SCO terminal 202a has reached the second threshold.
In another embodiment of the present disclosure, the central processing unit 214 is configured to automatically lock an SCO terminal, such as the first SCO terminal 202a, based on the location of the store caretaker or the SCO surface area supervisor 210 and its status (i.e., free or busy). This location in effect means a physical location and the physical location of the store caretaker and the location of the associated SCO terminal are used to determine the distance between them. A smaller distance would mean a shorter response time for store caretaker 210. To take advantage of this, the central processing unit 214 will be able to lock the first SCO terminal 202a only when the store attendant is within a predetermined distance from the SCO terminal. If the distance is greater than the predetermined distance, the central processing unit 214 will not lock the first SCO terminal 202a.
In yet another embodiment of the present disclosure, the central processing unit 214 is configured to automatically lock SCO terminals, such as the first SCO terminal 202a, based on the length of the non-scanning event sequence for each SCO terminal since the last lock. It is possible that, although a non-scanning event occurs at the first SCO terminal 202a, the central processing unit 214 does not lock the terminal in order to reduce customer friction. For example, during the black friday, the central processing unit 214 may be configured to ignore the first three non-scanning events of the first SCO terminal 202a. However, if the first SCO terminal 202a has a fourth non-scanning event, the first SCO terminal 202a may be locked.
In yet another embodiment of the present disclosure, the central processing unit 214 is configured to automatically lock the SCO terminal (such as the first SCO terminal 202 a) based on the status of the respective cart, such as a full-load cart (a cart containing a large amount of product), a batch-load cart (a cart having a small number of categories but a large number of carts), or a cart containing a large object (e.g., a television). Large objects are those whose size is greater than a predetermined threshold size. In addition, the scanning speed of a bulk loading cart is much faster because it involves scanning few items and then manually inputting the number of occurrences of the item. In one example, the central processing unit 214 may be configured to lock the first SCO terminal 202a when a full or bulk loading cart is detected and there is a store caretaker nearby so that the corresponding customer will receive assistance from the store caretaker 210. The central processing unit 214 is further configured to notify the store caretaker 210 to actively assist when a full or bulk-load cart is detected at the entrance to the SCO surface area 108.
In yet another embodiment of the present disclosure, the central processing unit 214 is configured to automatically lock the exit door of the SCO surface area 108 and issue a notification to the store caretaker 210 when a large product is moving through the exit of the SCO surface area 108 and is not present in the scanned product list. The exit door is the door of the retail store through which products may be removed after the self-checkout process is completed.
In yet another embodiment of the present disclosure, the central processing unit 214 is configured to inform the store caretaker 210 when the product has been transferred from one customer to another.
In yet another embodiment of the present disclosure, the central processing unit 214 is configured to send an alert to the computing device of the store caretaker 210 when the queue size at the entrance of the SCO surface region 108 is greater than a predetermined third threshold, such that more potentially available service personnel can be assigned to the region. The alert may be in the form of an audible signal, visual display, tactile alert, instant message, or the like. The entry of the SCO surface area 108 may be an entry point from which a customer enters the SCO surface area 108 to initiate a self-checkout process. The central processing unit 214 may be configured to change the first and second thresholds when the queue length at the entrance in the SCO surface region 108 is above a third threshold. In the context of the present disclosure, a 360 degree camera may be used to automatically determine the queue length.
In yet another embodiment of the present disclosure, the central processing unit 214 is configured to automatically lock the SCO terminal based on an emergency event (e.g., a person holding a gun). In embodiments of the present disclosure, a video camera and a central camera 208 may be used to detect emergency situations. For example, a central camera 208 may be used to detect a person having (actually waving) a gun.
In embodiments of the present disclosure, the above-described parameters may be preconfigured by a store manager of the corresponding retail store or by a person managing the entire security system. Based on the preconfigured parameters, the real-time information captured by the central camera 208 and overhead cameras 207 a-207 e, the status of the SCO terminal 202, the status of the SCO attendant 210, the central processing unit 214 automatically controls the locking of the SCO terminal 202 and sends messages to the store caretaker 210 and the store manager. In an embodiment of the present disclosure, the central processing unit 214 is configured to dynamically create and adjust store-customer interactions for each SCO terminal 202 a-202 e of the SCO surface area 108, and optimize customer flows for the SCO terminal 202. The SCO terminal may be unlocked with the intervention of a store attendant/serviceman/SCO surface area supervisor 210.
In various embodiments of the present disclosure, the central processing unit 214 is configured to reduce the overall waiting queue and increase customer satisfaction in the SCO surface area 108 by balancing the cost of caretaker intervention at the SCO terminals 202 a-202 e with the cost of potential product leakage (product that may leave the store without billing). In embodiments of the present disclosure, the central processing unit 214 may be configured to calculate a customer wait time per minute and a cost value for each leaked product. The cost may be weighted relative to the cost of another leaking product. The central processing unit 214 may also be configured to predict the aggregate latency by taking into account the number of locking terminals, the attendance of the store in the area, and the queue length at the entrance in the SCO area, and build a model indicating how many minutes of latency may be increased if a new alarm is triggered.
Fig. 2 is merely an example. Those of ordinary skill in the art will recognize many variations, alternatives, and modifications of the embodiments herein.
Fig. 3 is a schematic diagram of steps of a method 300 of operating an SCO surface region 108 by a central processing unit 214 of a central control unit 200 according to the present disclosure. The method is described as a collection of steps in a logic flow diagram, which represents a sequence of steps that may be implemented in hardware, software, or a combination thereof.
In step 302, a non-scanning event at an SCO terminal of an SCO surface area is identified. A non-scanning event refers to an event in which a user takes out an item to be scanned in a scanning area of a corresponding scanner, but the item may or may not be successfully scanned by the scanner. For example, a user may place an item in the scanning area of the scanner, but the manner in which the user holds the item may result in the bar code scanner not being able to see the bar code of the item. Actions corresponding to a non-scanning event may not be captured by the scanner, but may be captured by an overhead camera disposed therein.
In step 304, a check is performed to determine if the number of other locked SCO terminals is less than a first threshold, and in step 306, if the number is less than the first threshold, the SCO terminal is automatically locked. The value of the first threshold may be set based on the number of SCO terminals and store caretakers in the corresponding SCO surface area.
If the number of other locked SCO terminals has reached the first threshold, then at step 308, a check is performed to determine if the number of consecutive non-scanning events at the SCO terminal has reached the second threshold.
At step 310, the SCO terminal is automatically locked if the number of detected consecutive non-scanning events has reached a second threshold. For example, the value of the first threshold may be 2 and the value of the second threshold may be 3. Thus, when at least two terminals have been locked, then the third terminal will be locked only if the current non-scanning event is the third non-scanning event in the current transaction.
Fig. 3 is merely an example. Those skilled in the art will recognize many variations, alternatives, and modifications of the embodiments of the present disclosure.
Fig. 4 is an illustration of a block diagram of software 400 for operating SCO surface areas of a retail store according to an embodiment of the present disclosure. The software 400 includes a video unit 402, the video unit 402 communicatively coupled with a plurality of video sensors including a plurality of cameras C1 -Cn installed at different locations around a retail store (not shown). Each of at least some of the cameras C1 to Cn is installed at a location within a predetermined distance of the SCO terminals SCO1 to SCOn in a retail store (not shown). Specifically, each of at least some of the cameras C1 to Cn is mounted at a position directly above one of the SCO terminals SCO1 to SCOn to obtain a bird's eye view thereof.
In one embodiment, cameras C1 to Cn are configured to capture video clips of an environment within the field of view of cameras C1 to Cn. Video clips from cameras C1 to Cn (not shown) include a plurality of consecutively captured video frames, where P is the number of video frames in the captured video clip. Given video frameCaptured by cameras C1 -Cn at a time instant (also referred to as a sampling time) τ+iΔt, where τ is the time at which capture of a video clip begins and Δt is the time interval (also referred to as a sampling interval) between capture of a first video frame and capture of a next video frame. Using this notation, the video clips captured by cameras C1 through Cn can be described as
In one embodiment, software 400 further includes an SCO unit 404, the SCO unit 404 communicatively coupled with a plurality of SCO terminals SCO1 through SCOn in the retail store. In particular, the SCO unit 404 is configured to receive transaction data including sales bill of lading data (SALE TILL DATA) from each of the SCO terminals SCO1 to SCOn, wherein the sales bill of lading data includes generic product codes (Universal Product Code, UPC) of products detected by scanner devices (not shown) of the SCO terminals SCO1 to SCOn during product scans performed at the SCO terminals SCO1 to SCOn. Sales bill of lading data also includes the number of those same products.
In one embodiment, the SCO unit 404 is further configured to receive status signals from each SCO terminal SCO1 to SCOn. The status signal may comprise an indicator for indicating whether the SCO terminals SCO1 to SCOn are locked or active. The status signal may also include a time stamp for indicating when the SCO terminals SCO1 to SCOn are locked. In one embodiment, the status signal may be obtained from an NCR remote Access procedure (Remote Access Program, RAP) Application Program Interface (API). However, those skilled in the art will recognize that the above sources of status signals are provided for illustrative purposes only. In particular, those skilled in the art will recognize that the software of the preferred embodiments is not limited to the status signal sources described above. Rather, the software of the preferred embodiment may operate with any source of status signals, including the APIs of any manufacturer of SCO terminals.
In one embodiment, the SCO unit 404 is further configured to issue control signals to each SCO terminal SCO1 to SCOn to lock the SCO terminal. In one embodiment, the issuance of control signals to a given SCO terminal SCO1 to SCOn, the receipt of control signals by the associated SCO terminal, and the execution of locking operations in response to the received control signals are undertaken through an NCR Remote Access Program (RAP) Application Program Interface (API). However, those skilled in the art will recognize that the mechanisms of the above-described issuance of control signals to SCO terminals, the receipt of control signals by the associated SCO terminals, and the execution of locks by the SCO terminals are for illustrative purposes only. In particular, those skilled in the art will recognize that the software of the preferred embodiments is not limited to the mechanisms described above. Rather, the software of the preferred embodiments may operate with any mechanism for issuing control signals to SCO terminals, receiving control signals by an associated SCO terminal, and performing locking of SCO terminals, including APIs of any manufacturer of SCO terminals.
In one embodiment, the software 400 further includes a control unit 406 communicatively coupled with the video unit 402 and the SCO unit 404. The control unit 406 is configured to receive video clips captured by the cameras C1 to Cn from the video unit 402. The control unit 406 is further configured to receive status signals from each of the SCO terminals SCO1 to SCOn from the SCO unit 404. The status signal may include an indicator for indicating whether the corresponding SCO is locked or operational. In the case where the SCO is locked, the status signal from the SCO terminal may include a timestamp indicating the time the SCO terminal was locked. Similarly, the control unit 406 is configured to issue control signals to the SCO unit 404, which are configured to cause locking of the designated SCO terminal.
In an embodiment, the control unit 406 is also communicatively coupled with a person classification module (HCM) 408, a person tracking module (HTM) 410, a motion detection module 412, and an object identification module 414. Each of these modules and their operation will be described in more detail below. The control unit 406 itself includes a processing unit 416 communicatively coupled to a logic unit 418, which logic unit 418 is in turn communicatively coupled to the SCO unit 404. Each of these units and their operation will also be described in more detail below.
In an embodiment, the people classification module 408 is configured to receive video frames from the control unit 406 from video clips captured by cameras C1 to Cn installed at different locations around a retail store (not shown). In another embodiment, the people classification module 408 is configured to process video framesTo detect the presence of persons in the video frames and to classify each of the detected persons as one of a child, adult customer and staff.
In an embodiment, the people classification module 408 may be implemented by an object detection Machine Learning (ML) algorithm, such as EFFICIENTDET (as described in M.Tan, R.Pang, and Q.V.le, in Seattle, washington, EfficientDet:Scalable and Efficient Object Detection,2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR),, 2020, pages 10778-10787). Alternatively, the people classification module (HCM) 408 may be implemented by a panorama segmentation algorithm, such as a bi-directional aggregation network (BANet) (as described in Y.Chen、G.Lin、S.Li、O.Bourahla、Y.Wu、F.Wang、J.Feng、M.Xu、X.Li、Banet:Bidirectional aggregation network with occlusion handling for panoptic segmentation,Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2020, pages 3793-3802).
Those skilled in the art will recognize that the above examples of algorithms for object detection and panoramic segmentation are provided for illustrative purposes only. In particular, those skilled in the art will recognize that the preferred embodiments are not limited to the above-described algorithms. Rather, the preferred embodiments may operate with any algorithm suitable for object detection in video frames or combined instance segmentation and semantic segmentation of video frames, respectively, as described in YOLOv4 (as described in Bochkovskiy, C-Y Wang and H-Y M Liao in 2020arXiv: 2004.10934) and AuNet (as described in Y.Li, X.Chen, Z.Zhu, L.Xie, G.Huang, D.Du, Attention guided unified network for panoptic segmentation,Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2019 of X.Wang, pages 7026-7035).
The object detection or panorama segmentation algorithm aims at:
Detecting one or more persons present in the video frame;
● Establishing positioning information for the detected person (e.g., by a box or interest (hereinafter interchangeably referred to as a bounding box) established around the detected person in the video frame by an object detection algorithm), and
● It is determined whether the detected person is an adult client, a child, or a worker.
Fig. 5 illustrates video frames captured by cameras C1 to Cn installed in a retail store (not shown) according to an embodiment of the present disclosureAnd an example of processing of video frames by the people classification module 408 of the software 400 of fig. 4.
In one embodiment, referring now to FIG. 5, an object detection algorithm detects a primary object of interest consisting of a person in a received video frame 500 and ignores secondary objects of interest that appear in the video frame 500, such as cash registers (cash registers), shopping carts, and stacked items. The detected persons are represented by a bounding box substantially surrounding each person. The bounding box facilitates subsequent tracking of the individual. The object detection algorithm then distinguishes between the staff 502, the child 504, and the adult client 506. The distinction between the staff 502 and the adult client 506 may be premised on the staff 502 wearing uniforms having a unique color or having a unique pattern (including protruding logos on uniforms).
In an embodiment, an object detection or panoramic segmentation algorithm is trained with video frames selected from video clips captured by a plurality of cameras installed at different locations within a retail store. The video frames will hereinafter be referred to as training data sets. The individual video frames of the training dataset are selected and compiled to provide robust and class-balanced information (class-balanced information) about staff, children and adult customers from views obtained with respect to different positioning and orientations of the camera. In addition, the video frames of the training data set are selected from video clips acquired from various locations within the retail store. Similarly, the video frames of the training dataset include individuals wearing different types and colors of clothing. Members of the training dataset may also undergo further data enhancement techniques (e.g., rotation, flipping, brightness change) to generate more video frames, thereby increasing the size of the training dataset, preventing overfitting and regularization of the deep neural network model, balancing categories within the training dataset, and synthetically generating new video frames that are more representative of the current task. Thus, the video frames of the training dataset are balanced in terms of gender, age, and skin tone.
The video frames of the training data set for the object detection algorithm are manually marked with bounding boxes arranged substantially around each individual visible in the video frame and having appropriate corresponding category labels, i.e. adult clients/staff/children. Members of the training dataset are organized in pairs, wherein each data pair includes a video frame and a corresponding XML file. The XML file includes bounding box coordinates and corresponding tags for each bounding box relative to the coordinate system of the video frame.
In contrast, individual pixels of each video frame of the training dataset for the panoramic segmentation algorithm are suitably manually labeled with category labels, such as adult clients/staff/children. Each pixel is also marked with an instance number indicating to which instance of a given class the pixel corresponds. For example, the instance number may indicate whether the pixel corresponds to a second adult client visible in the video frame or a third child visible in the video frame. Members of the training dataset are organized in pairs, wherein each data pair includes a video frame and a corresponding XML file. The XML file includes a class label and an instance number for each pixel in the corresponding video frame.
In an embodiment, returning now to FIG. 4, after training of the object detection algorithm is completed, the object detection algorithm is responsive to the video frames subsequently received by the people classification module 408Comprises a set of bounding boxes (each defined by two opposite corners) (Bnd _bxi (t)) and a corresponding set of class labels. In contrast, after training of the panoramic segmentation algorithm is completed, the panoramic segmentation algorithm is responsive to the video frames that are subsequently presented to the people classification module 408The output of (a) includes being the video frameClass labels and instance numbers provided for each pixel in (a). The person classification module 408 is configured to communicate the output to the control unit 406.
In one embodiment, person tracking module 410 is configured to receive video frames from control unit 406 from video clips captured by cameras C1 to Cn mounted at different locations.
Typical person re-identification algorithms assume that the physical appearance of a person does not change significantly from one video frame to another. Thus, the physical appearance becomes key information that can be used to re-identify a person. Thus, the person tracking module 410 represents a person by various rich semantic features regarding visual appearance, body movement, and interaction with the surrounding environment. These semantic features essentially form a biometric identification of the person that is used to re-identify the person in a different video frame.
In an embodiment, the person tracking module 410 builds an internal repository for storing semantic features of persons in a store. For brevity, this internal repository will be referred to hereinafter as the gallery feature set. The gallery feature set is populated with feature representations of each person extracted by the trained person re-recognition neural network model. Since the specific identity of these persons is largely unknown, the semantic features of each person are related by person identification data. The person identification data essentially comprises a Person Identifier (PID). In other words, the person tracking module 410 associates a biometric identification of a person with the person's PIDi. The PIDi and corresponding biometric information in the gallery feature set will be deleted at the end of each day or more frequently as desired by the operator.
In an embodiment, another video frame in which the Person is visible (i.e., query Image of a Query Person (Query Person)) may be selected, and the trained Person re-recognition network extracts a feature representation of the Person to establish relevant semantic features of the Person. The characteristic representation of the person in the query image may correspond to query identification data (query identification data). The extracted feature representation is compared with the feature representations in the gallery feature set. If a match is found, the person is identified as a person having a PIDi corresponding to the matching feature representation in the gallery feature store. If the feature representation of the querying person from the query image does not match any feature representation in the gallery feature set, a new unique PIDi is assigned to the person and the person's corresponding feature representation is added to the gallery feature set and associated with PIDi.
In one embodiment, the person re-identifies the network usage standard ResNet architecture. However, those skilled in the art will recognize that this architecture is provided for illustrative purposes only. In particular, those skilled in the art will recognize that the preferred embodiments are not limited to use with this architecture. Rather, the preferred embodiments may operate with any neural network architecture capable of forming an internal representation of a person's semantic features. For example, the person re-recognition network may also employ a batch normalization-Inception (BN-Inception) architecture to normalize the input of the layers by re-centering and re-scaling to make training of the machine learning algorithm faster and more stable. In use, a person is trained to re-identify a network using a data set comprising:
video frames in which people appear, and
An annotated bounding box that substantially surrounds everyone visible in each video frame.
In an embodiment, the label for each bounding box will include the PIDi of the person enclosed by that bounding box. This enables the same person to be identified across multiple video frames collected from a group of cameras. Thus, a set of bounding boxes labeled with the same PIDi will encapsulate appearance information of the same person extracted from its different views. Thus, the training data includes a set of video frames, each video frame being described by a frame number, the PIDi of the person visible in the video frame, and corresponding bounding box details. Since several persons may be present in a single video frame, the training data for any such video frame will include multiple entries, one for each person visible in the video frame.
The output of person tracking module 410 is a set of data detailing video frames captured by cameras installed at different locations within a retail storeThe time at which the person was detected and the location in the retail store. The location of the detected person is established based on the coordinates of the bounding box established in each video frame in which the person is visible and the identity of the camera capturing the video frame. The output of the person tracking module 410 may also include an extracted feature representation of the person.
In an embodiment, the person tracking module 410 is configured to communicate the output of the person tracking module 410 to the control unit 406.
In one embodiment, the motion detection unit 412 is configured to receive video clips from a camera (not shown) mounted directly above an SCO terminal (not shown) in a retail store (not shown) to provide a bird's eye view of the SCO terminal (not shown). The motion detection unit 412 is configured to process consecutively captured video frames Fr (τ) and Fr (τ+Δt) in the received video clip to detect movement within a predetermined distance of the SCO terminal (not shown), where the predetermined distance is determined by intrinsic parameters of the camera (not shown) that establish the field of view of the camera (not shown) along with the position of the camera at the top of the head of the SCO terminal (not shown).
In one embodiment, video frames in a received video clip are encoded using the h.264 video compression standard. The h.264 video format uses motion vectors as key elements for compressing video clips. The motion detection unit 412 detects movement within a predetermined distance of the SCO terminal (not shown) using a motion vector obtained from decoding of the h.264 encoded video frame. In another embodiment, the video signal is generated from a sequence of samples (Fr (τ + qΔt), fr (τ+ (q+1) Δt) to detect differences therebetween, differences exceeding a predetermined threshold are believed to be indicative of movement occurring in the interval period between successive samples, the threshold being configured to avoid transient changes that are mistaken for motion, such as light blinking, once the motion detection unit 412 detects movement within a predetermined distance of the SCO terminal (not shown), the motion detection unit 412 sends a "motion trigger" signal to the control unit 406.
In one embodiment, the object recognition module 414 is configured to receive video clips from a camera installed directly above an SCO terminal (not shown) in a retail store (not shown) to provide a bird's eye view of the SCO terminal (not shown). The object recognition module 414 may also be configured to receive video clips from a camera (not shown) that is mounted within a predetermined distance of the SCO terminal (not shown) and whose field of view covers an area in which a customer will approach the SCO terminal (not shown) with a product to be purchased. The object recognition module 414 is configured to recognize and identify a specified object visible in the received video frame. In one embodiment, the object may include an inventory of products from a retail store. In another embodiment, the object may comprise a complete shopping cart.
Referring now to FIG. 6, an object identification module in the software of FIG. 4 for manipulating SCO surface areas of a retail store is shown, according to an embodiment of the present disclosure.
In an embodiment, the object recognition module 414 includes an object detection module 422, the object detection module 422 being communicatively coupled with a cropping module 424, the cropping module 424 in turn being communicatively coupled with an embedding module 426 and a cart evaluation module 428. The embedded module 426 is also communicatively coupled to an expert system 430, and the expert system 430 is also communicatively coupled to an embedded database 432 and a product database 434. Each of these modules and their operation will be described in detail below.
The input to the object detection module 422 is a video frame from a video clip captured by a camera within a predetermined distance of an SCO terminal (not shown) disposed in a retail store (not shown)The predetermined distance is empirically determined based on the layout of the retail store (not shown) and the SCO terminals (not shown) therein to allow detection of products scanned at the SCO terminals (not shown) and shopping carts as they approach the SCO terminals (not shown). The output of the object detection module 422 includes video framesThe location (Loc (Obji)) of each object (Obji) visible in the database, as represented by a bounding box substantially surrounding the object, and a corresponding class label for each object. Thus, the output of object detection module 422 includes video framesThe location and the tag of all objects visible in the system.
Thus, for a given video frame Fr (τ+iΔt), the object detection module 422 is configured to determine coordinates of a bounding box that substantially surrounds objects detected in the video frame. The coordinates of the bounding box are established relative to the coordinate system of the received video frame. Specifically, for a given video frame Fr (τ+iΔt), the object detection module 422 is configured to output one or more details B (τ+iΔt) = [ B1(τ+iΔt),b2(τ+iΔt)…bj(τ+iΔt))]T,j≤Nobj (τ+iΔt), where Nobj (τ+iΔt) is the number of objects detected in the video frame Fr (τ+iΔt), and Bj (τ+iΔt) is the bounding box surrounding the jth detected product. The details of each bounding box (bj (τ+iΔt)) include four variables, namely [ x, y ], h, and w, where [ x, y ] is the coordinates of the upper left corner of the bounding box relative to the upper left corner of the video frame (Fr (τ+iΔt)). For brevity, the details of each bounding box (bj (τ+iΔt)) will be referred to hereinafter as bounding box coordinates.
In an embodiment, the object detection module 422 includes a deep neural network whose architecture is substantially EFFICIENTDET based (as described in Seattle, washington, M.Tan,R.Pang,Q.V.Le,EfficientDet:Scalable and Efficient Object Detection,2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR),, 2020, pages 10778-10787). The architecture of the deep neural network may also be based on YOLOv < 4 > (as described in A Bochkovskiy, C-Y Wang, H-Y M Liao,2020arXiv: 2004.10934). However, those skilled in the art will recognize that these neural network architectures are provided for illustrative purposes only. In particular, those skilled in the art will appreciate that the preferred embodiments are not limited to these deep neural network architectures. Rather, the preferred embodiments may operate with any deep neural network architecture and/or training algorithm suitable for detecting and locating objects in video frames. For example, the preferred embodiments may operate with a region-based convolutional neural network (RCNN), fast-RCNN, or single-pass detector (SSD).
The goal of training the deep neural network of the object detection module 422 is to have the deep neural network build an internal representation of the object, which allows the deep neural network to identify the presence of the object in a subsequently received video clip. To this end, the data set for training the deep neural network of the object detection module 422 includes a plurality of video frames captured by cameras within a predetermined distance of an SCO terminal (not shown) disposed in the retail store. The predetermined distance is empirically determined based on the layout of the retail store and the SCO terminals (not shown) therein to allow detection of products scanned at the SCO terminals (not shown) and detection of shopping carts as they approach the SCO terminals (not shown). The video frames are selected and compiled to provide robust and class-balanced information about the subject object from object views obtained from different orientations and directions of the subject object relative to a camera (not shown). For clarity, this data set will hereinafter be referred to as training data set.
Video frames having a similar appearance are removed from the video frames before the video frames are used in the training dataset. Members of the training dataset may also undergo further data enhancement techniques (such as rotation, flipping, brightness change) to generate more video frames, thereby increasing the size of the training dataset, preventing overfitting and regularization of the deep neural network model, balancing categories within the training dataset, and synthetically generating new video frames that are more representative of tasks at hand. In a further preprocessing step, individual video frames of the training data set are provided with one or more bounding boxes, wherein each such bounding box is arranged to substantially enclose an object visible in the video frame. The respective video frame is also suitably provided with category labels, such as "product" or "shopping cart" or "other", which correspond to the or each bounding box in the respective video frame. The category label "product" indicates that the detected object is a product included in the inventory of the retail store, and not a personal item belonging to the customer that may also be visible in the video frame. The category label "shopping cart" indicates that the detected object is a shopping cart that may have different fullness.
In an embodiment, the object detection module 422 is further configured to associate the bounding box coordinates of each object detected in the video frame with the corresponding label classification of the detected object to form a detected object vector (Detected Object Vector). Specifically, the output of the object detection module 422 is one or more detected object vectorsWherein the object detection module 422 is further configured to communicate the output to a clipping module 424.
In an embodiment, clipping module 424 is communicatively coupled with object detection module 422 to receive detected object vector DO (τ+iΔt) from object detection module 422. Cropping module 424 is also configured to receive video frames that are also received by object detection module 422The cropping module 424 is configured to crop a product cropping area from each received video frame, the perimeter of the product cropping area being established by the bounding box coordinates of the corresponding detection object vector DO (τ+iΔt), the category label of the product cropping area being "product". The cropping module 424 is also configured to adjust the size of each product cropping zone to the same predetermined size. This predetermined size, which will be referred to hereinafter as the "processed product image size," is empirically established as the size that the embedding module 426 can deliver the best product identification. The trim module 424 is also configured to send the resulting product trim area to the embedding module 426.
In an embodiment, the cropping module 424 is further configured to crop a cart cropping area from each video frame of the video clip received from the camera, the perimeter of the cart cropping area being established by the bounding box coordinates of the corresponding detection object vector DO (τ+iΔt), the category label of the cart cropping area being "shopping cart". The cropping module 424 is also configured to adjust the size of each cart cropping zone to the same predetermined size. This predetermined size, which will be referred to hereinafter as the "processed shopping cart image size," is empirically established as the size that the cart assessment module 428 can best assess the full state of the shopping cart. The cropping module 424 is also configured to transmit the resulting cart cropping zone to the cart evaluation module 428.
In an embodiment, the embedding module 426 has two distinct phases of operation, an initial configuration phase and a runtime phase, as will be described below. The embedding module 120 employs a depth metric learning module, as described in K.musgrave, SBelongie, and S. -N.Li at A METRIC LEARNING REALITY CHECK (acquired from https:// arxiv. Org/abs/2003.08505 on day 19 of 2020), to learn a unique representation of the embedded vector form for each product in inventory of a retail store from video frames in which the product is visible. This enables identification of products visible in subsequently captured video frames. For brevity, the video frame or portion thereof in which the product is visible will hereinafter be referred to as an "image". Thus, the depth metric learning module is configured to generate embedded data comprising embedded vectors in response to an image in which a product is visible, wherein the embedded vectors are close together (in an embedded space) if the image comprises the same product and are distant from each other as measured by a similarity or distance function (such as dot product similarity or euclidean distance) if the image comprises a different product. The query image may then be validated based on a similarity or distance threshold in the embedding space.
Initial configuration phase of embedding module 426
In an embodiment, during the initial configuration phase, the embedding module 426 is trained to learn one or more embedding vectors Ei, forming a unique representation of the product pi included in the inventory of the retail store. Thus, the initial configuration phase includes several different phases, namely a training data preparation phase and a network training phase. These phases are continuously implemented in a loop iterative fashion to train the embedding module 426. Each of these stages will be described in more detail below.
Training data preparation stage
The data set used to train the embedding module 426 includes a plurality of video frames in which each product from the inventory of the retail store is visible. The video frames are captured by a camera mounted on top of the head of an SCO terminal (not shown) in a retail store (not shown). The video frames, which will be referred to hereinafter as training data sets, are compiled with the aim of providing robust and class-balanced information about the subject product derived from the different views of the product, which are obtained with different positioning and orientation of the product relative to the camera. The members of the training dataset are selected to create sufficient diversity to overcome challenges to subsequent product identification from lighting condition changes, viewing angle changes, and most important intra-class changes.
Video frames of similar appearance are removed from the video frames prior to use in the training dataset. Members of the training data set may also be subjected to further data enhancement techniques (e.g., rotation, flipping, brightness change) to increase their diversity, thereby increasing the robustness of the trained deep neural network embedded in module 426. Polygonal regions are cropped from the video frames that include the individual products visible in the video frames. The size of the cropped region is adjusted to the processed product image size to produce a cropped product image. Each cropped product image is also provided with a category label for identifying the corresponding product.
Model training stage:
For brevity, the deep neural network (not shown) embedded in module 426 will hereinafter be referred to as an "Embedded Neural Network (ENN)". ENN includes a deep neural network (e.g., resNet, inception, efficientNet) whose last layer or layers (which typically output classification vectors) are replaced with linear normalization layers that output unit norm (embedded) vectors of the desired dimension. The dimension is a parameter established when the ENN is created.
During the model training phase, positive and negative pairs of cropped product images are constructed from the training dataset. Facing two cropped product images comprising labels of the same category, the negative pair includes two cropped product images with different categories of labels. For brevity, the resulting cropped product image will hereinafter be referred to as a "paired cropped image". The paired clip images are sampled according to a paired mining strategy, such as MultiSimilarity or ARCFACE AS outlined in r.manmatha, c. -y.wu, a.j.smoia and P.Sampling Matters in Deep Embedded Learning,2017 IEEE International Conference on Computer Vision(CCV2017)Venice,2017 Pages 2859 to 2867, doi 10.1109/ICCV.2017.309. The pair-wise metric learning loss is then calculated from the sampled paired video frames (as described in K.musgrave, S.Belongie, and S. -N.lim. In A METRIC LEARNING REALITY CHECK,2020, https:// arxiv.org/abs/2003.08505). The weights of the ENNs are then optimized using a back propagation method to minimize the pair-wise metric learning loss values.
All paired cropped images are processed by the ENN to generate their corresponding embedded vectors. As a result, the embedded vectors are organized in pairs similar to the paired cropped image. The resulting embedded vector is stored in an embedded database 432. Thus, given an image of each product in inventory at a retail store, the trained ENN would populate the embedded database 432 with embedded data including the calculated embedded vector Ei for each such product. Thus, the embedded database 432 includes an embedded vector and a plurality of tuples (Ei,Idi) of the corresponding identifier Idi for each product pi in inventory at the retail store.
Runtime phase of embedded module 426
For clarity, the run time is defined as the normal open time of the retail store. During operation, the ENN (not shown) generates an embedded vector for each product visible in video frames captured by cameras within a predetermined distance of an SCO terminal (not shown) disposed in a retail store (not shown). Thus, embedding module 426 is coupled with clipping module 424 to receive clipping regions from clipping module 424. Query embedded data including an embedded vector generated by an ENN (not shown) in response to a received cropped region will hereinafter be referred to as a query embedded QE. The embedding module 426 is communicatively coupled with the expert system module 430 to send query embedded QEs to the expert system module 430.
In an embodiment, expert system module 430 is coupled with embedding module 426 to receive query embedded QEs generated by the ENN during a runtime operation phase of embedding module 426.
Upon receiving the query embedded QE, the expert system module 430 queries the embedded database 432 to retrieve the embedded vector Ei from the embedded database 432. The expert system module 430 uses a similarity or distance function (e.g., dot product similarity or euclidean distance) to compare the query embedded QE to the embedded vector Ei. Expert system module 430 compares query embedding QEs to the retrieved embedding vectors Ei using similarity or distance functions (e.g., dot product similarity or euclidean distance).
If the similarity between the query embedded QE and the retrieved embedded vector Ei exceeds a pre-configured threshold (Th), then it is inferred that the query embedded QE matches the retrieved embedded vector Ei. The values of the threshold (Th) parameters are established using a grid search method.
In an embodiment, the process of querying the embedded database 432 and comparing the retrieved embedded vectors Ei with the received query embedded QEs is repeated until a match is found or until all embedded vectors Ei have been retrieved from the embedded database 432. In the event that a match is found between the query embedded QE and the embedded vector Ei from the embedded database 432, the matched embedded vector Ei will hereinafter be referred to as the matched embedded ME. Expert system module 430 is further adapted to retrieve a product identifier corresponding to the matching embedded ME from product database 434 using the matching embedded ME, wherein the product identifier is an identifier of the product represented by the matching embedded ME. For brevity, this product identifier will be referred to hereinafter as a matching category label.
In an embodiment, the cart evaluation module 428 is configured to receive the cart trim area from the trim module 424. The cart evaluation module 428 is configured to implement a panorama segmentation algorithm, such as a bi-directional aggregation network (BANet) (as described in Y.Chen,G.Lin,S.Li,O.Bourahla,Y.Wu,F.Wang,J.Feng,M.Xu,X.Li,Banet:Bidirectional aggregation network with occlusion handling for panoptic segmentation,Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2020, pages 3793-3802) to establish a class label and instance number for each pixel in the cart trim area.
In an embodiment, those skilled in the art will recognize that the above examples of panoramic segmentation algorithms are provided for illustrative purposes only. In particular, those skilled in the art will recognize that the preferred embodiments are not limited to the above-described algorithms. Rather, the preferred embodiments may operate with any algorithm suitable for combined instance segmentation and semantic segmentation of a cart cut-out region, such as AuNet (as described in Y.Li,X.Chen,Z.Zhu,L.Xie,G.Huang,D.Du,X.Wang,Attention guided unified network for panoptic segmentation,Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,2019, pages 7026-7035) or EFFICIENTPS networks (as described in r.mohan and A.Valada, efficientPS: EFFICIENT PANOPTIC SEGMENTATION, international Journal of Computer Vision,2021, 129 (5), pages 1551-1579).
More specifically, the purpose of the panorama segmentation algorithm is:
identifying full, partially full or empty shopping carts and products from inventory of retail stores in a cart cut-out area, and
Identifying in a cart cut area all instances of the full, partially full, or empty shopping cart and the product;
Identifying all products visible in the cart cut area;
● All instances of the product identified in the cart trim area are identified.
The dataset used to train the panoramic algorithm includes a plurality of video frames in which each product from inventory of the retail store is visible. The dataset also includes a plurality of video frames in which shopping carts of different fullness are visible. In particular, the data set includes video frames in which empty shopping carts, partially full shopping carts, and completely full or overflow shopping carts are visible. The video frames are captured by a camera mounted on top of the head of an SCO terminal (not shown) in a retail store (not shown) and a camera mounted within a predetermined distance of the SCO terminal (not shown). The predetermined distance is empirically determined based on the layout of the retail store (not shown) and the intrinsic parameters of the camera such that the field of view of the camera encompasses an access area (app region) to the SCO terminal (not shown) of the retail store.
Video frames, which will hereinafter be referred to as training data sets, are compiled in order to provide robust and class-balanced information about the subject product derived from different views of the product, which are obtained with different positioning and orientation of the product relative to the camera. The video frames of the training dataset are further compiled to provide robust and class-balanced information about shopping carts of different fullness, which is obtained with different positioning and orientation of the shopping carts relative to the camera. The members of the training dataset are selected to create sufficient diversity to overcome challenges presented by lighting condition changes, viewing angle changes, and most important intra-class changes to subsequent product identification and shopping cart identification.
Video frames of similar appearance are removed from the video frames prior to use of the video frames in the training dataset. Members of the training dataset may also be subjected to further data enhancement techniques (such as rotation, flipping, brightness change) to increase their diversity, thereby increasing the robustness of the trained neural network of the panoramic segmentation algorithm. Polygonal regions are cropped from the video frames that include the individual shopping carts visible in the video frames. The size of the crop area is manually adjusted to the processed cart image size to produce a crop shopping cart image.
Each pixel of each cropped shopping cart image of the training dataset is manually labeled with a category label that is used to identify the corresponding product or to identify the shopping cart as empty, partially full, or full. Each pixel is also marked with an instance number indicating to which instance of a given class the pixel corresponds. For example, the instance number may indicate whether the pixel corresponds to a second barrel of ice cream visible in the video frame or a third packet of toilet paper roll visible in the video frame. Members of the training dataset are organized in pairs, wherein each data pair includes a video frame and a corresponding XML file. The XML file includes a class label and an instance number for each pixel in the corresponding video frame.
During training of the neural network model of the panoramic segmentation algorithm, individual members of the training dataset and corresponding entries from the XML file are presented to the neural network model in order to construct representations of large-scale and small-scale features and contextual features sufficient to reproduce the presented members of the training dataset and corresponding entries from the XML file.
During runtime, the trained panorama segmentation algorithm is presented with the cart crop area received from crop module 424. The panorama segmentation algorithm marks each pixel of the cart cut area corresponding to the area in which the cart is visible as an empty cart, a partially full cart, or an empty cart. In the case where there are multiple shopping carts in the cart trim area, the panorama segmentation algorithm marks each pixel of the cart trim area corresponding to the area in which the shopping cart is visible with the instance number of the shopping cart. The panorama segmentation algorithm marks the pixels of the cart cut-out area corresponding to the area in which the product is visible with the class label of the product and the instance number of the product. The output of the cart evaluation module 428 includes the pixels of the cart trim area and their labels. For brevity, this output will be referred to hereinafter as "cart-related data".
Fig. 7 illustrates a processing unit 416 of a control unit (not shown) in the software of fig. 4 for operating SCO surface areas of a retail store according to an embodiment of the present disclosure.
In an embodiment, the processing unit 416 includes a non-scanning event detector 436, a SCO supervisor locator module 438, a queue analyzer module 440, a product movement analyzer 442, a SCO group analyzer module 444, a non-scanning sequence analyzer module 446, a cart fullness analyzer module 448, and a customer group analyzer module 450. Each of these modules and their operation will be described in more detail below.
In one embodiment, the non-scanning event detector 436 is communicatively coupled with the video unit 402, the SCO unit 404, the motion detection module 412, and the object recognition module 414. In particular, the non-scanning event detector 436 is communicatively coupled with the motion detection unit 412 to receive a motion trigger signal from the motion detection unit 412 that indicates that motion is detected within a predetermined distance of the SCO terminal (not shown), wherein the predetermined distance is determined by intrinsic parameters of a camera (not shown) mounted overhead of the SCO terminal (not shown) and a mounting height of the camera to establish a field of view of the camera (not shown). The received motion trigger signal indicates that the customer has approached the SCO terminal (not shown) and is scanning the product on the SCO terminal (not shown).
Upon receiving the motion trigger signal, the non-scanning event detector 436 is configured to receive successive video frames Fr (τ) and Fr (τ+Δt) from the video unit 402 from a video clip captured by a camera mounted on top of the SCO terminal (not shown). The non-scanning event detector 436 is configured to send successive video frames Fr (τ) and Fr (τ+Δt) to the object identification module 414 to detect the presence of a product in the video frame from the retail store inventory. Upon detecting the presence of and identifying a product from inventory of a retail store in a received video frame, the object identification module 414 is configured to return a corresponding matching category label to the non-scanning event detector 436. The matching category label is an identifier of the identified product. More specifically, the matching category label may be a UPC of the identified product.
Upon receipt of the motion trigger signal, the non-scanning event detector 436 is further configured to receive sales bill of lading data from the SCO unit 404. The received sales bill of lading data originates from an SCO terminal (not shown) where the movement represented by the movement trigger signal is detected. Upon receipt of the matching category label, the non-scanning event detector 436 is configured to compare the matching category label with sales slip data received in a time interval of a predetermined duration before and after receipt of the matching category label. The predetermined duration is empirically determined to be of sufficient length to enable a match to be found between the matching category label and a member of sales slip data related to the product scanned at the SCO terminal (not shown) during the time interval without delaying operation of the SCO terminal (not shown).
In the event that no match is found between the received sales order data and the received matching category label, the non-scanning event detector 436 is configured to issue a non-scanning alert signal including an identifier of an SCO terminal (not shown) at which movement represented by the motion trigger signal was detected. For brevity, this identifier will be referred to hereinafter as an originating SCO identifier (Originating SCO Identifier), and the SCO terminal (not shown) corresponding to the originating SCO identifier will be referred to hereinafter as "alert originating SCO (Alert Originating SCO)".
In an embodiment, the SCO supervisor locator module 438 is communicatively coupled with the non-scanning event detector 436, the people classification module 408, and the people tracking module 410. Specifically, the SCO supervisor locator module 438 is configured to receive non-scanning alarm signals from the non-scanning event detector 436. The SCO supervisor locator module 438 is also configured to activate the personnel classification module 408 and the personnel tracking module 410 upon receipt of the non-scanning alert signal to determine the location of each SCO supervisor in the retail store. Using this location information, SCO supervisor locator module 438 is also configured to calculate the distance between each SCO supervisor and the alert originating SCO.
In one embodiment, the SCO supervisor locator module 438 is further configured to detect the presence of an adult client or child within a predetermined distance of each SCO supervisor. The predetermined distance is empirically determined as the expected maximum distance between the worker and the customer and/or child if the worker is assisting the customer and/or child.
In the event that the SCO supervisor locator module 438 determines that the SCO supervisor (not shown) is located within a predetermined distance from the adult client and/or child, the SCO supervisor locator module 438 is configured to activate the personnel tracking module 410 to track movement of the SCO supervisor (not shown) and the adult client and/or child for a predetermined time interval. The predetermined time interval is empirically determined as an expected minimum duration of participation (engagement) between the staff and the customer and/or child if the SCO supervisor (not shown) is assisting the customer and/or child. The purpose of tracking the movement of the SCO supervisor (not shown) and the adult client and/or child for a predetermined time interval is to screen out situations where the SCO supervisor (not shown) is accidentally approaching the adult client and/or child, rather than active participation between the SCO supervisor (not shown) and the adult client and/or child.
In the event that the SCO supervisor locator module 438 determines that the SCO supervisor (not shown) is within the predetermined distance of the adult customer and/or child for a period of time exceeding a predetermined time interval, the SCO supervisor locator module 438 assigns a "busy" status tag to the SCO supervisor (not shown). Once the status tag is assigned to the SCO supervisor (not shown), the SCO supervisor locator module 438 may deactivate tracking of the SCO supervisor of interest (not shown) and nearby adult clients and/or children.
In an embodiment, the SCO supervisor locator module 438 is further configured to activate the personnel tracking module 410 to track movement of the remaining SCO supervisors for a predetermined time interval. The predetermined time interval is empirically determined to have a sufficient duration to determine whether the SCO supervisor has moved from one portion of the retail store to another without unduly delaying operation of the SCO supervisor locator module 438. In the event that the SCO supervisor locator module 438 determines that the SCO supervisor (not shown) is moving toward a storeroom or cash room or the like, the SCO supervisor locator module 438 assigns a "busy" status tag to the SCO supervisor (not shown).
In one embodiment, the SCO supervisor locator module 438 is further configured to identify an SCO supervisor (not shown) that has not been assigned a "busy" status tag and that is closest to the alert originating SCO. In the event that it is determined that the identified SCO supervisor (not shown) is less than the predetermined distance from the alert originating SCO, the SCO supervisor locator module 438 is configured to issue an output signal O1, which output signal O1 includes an "SCO Lock" signal. Otherwise, the output signal O1 includes an "invalid (VOID)" signal.
In one embodiment, the predetermined distance from the alert originating SCO is determined empirically based on the layout of the retail store (not shown), which is the maximum feasible distance for the SCO supervisor to return a locked SCO terminal (not shown) to determine the cause of the non-scanning alert signal and unlock the SCO terminal (not shown) appropriately.
In one embodiment, the queue analyzer module 440 is communicatively coupled to the non-scanning event detector 436 and the people classification module 408. Specifically, the SCO queue analyzer module 440 is configured to receive the non-scanning alarm signal from the non-scanning event detector 436. The queue analyzer module 440 is further configured to activate the people classification module 408 upon receipt of the non-scanning alert signal to count the number of adult clients and children located within a predetermined distance of the pathway to the alert originating SCO. The queue analyzer module 440 is further configured to compare the locations of adult clients and children located within a predetermined distance of the pathway to the alert originating SCO to determine whether at least some adult clients and children are arranged in a queuing mode at the pathway to the alert originating SCO.
In the event that the queue analyzer module 440 determines that at least some adult clients and children are arranged in a queuing mode on the way to the alert originating SCO, the queue analyzer module 440 is configured to calculate the number of adult clients and children in the queue. In the event that the number of adult clients and children in the queue is less than a predetermined threshold, the queue analyzer module 440 is configured to issue an output signal O2, the output signal O2 comprising an "SCO lock" signal. Otherwise, the output signal O2 includes a "VOID" signal.
The predetermined distance to the pathway to the alert originating SCO and the threshold number of queue persons at the alert originating SCO are determined empirically based on the operator's understanding that there is a balance between the possible loss of revenue due to customers being surprised by lengthy queues and the risk of loss of revenue due to unpaid products at the alert originating SCO.
In one embodiment, the product movement analyzer 442 is communicatively coupled with the non-scanning event detector 436, the people classification module 408, the people tracking module 410, and the object identification module 414. Specifically, the product movement analyzer 442 is configured to receive the non-scanning alarm signal from the non-scanning event detector 436. The product movement analyzer 442 is further configured to activate the people classification module 408 and the people tracking module 410 upon receipt of the non-scanning alarm signal to receive the extracted features of the adult client or child closest to the alarm-originating SCO from the people classification module 408 and the people tracking module 410 at a time immediately after the received non-scanning alarm signal is emitted. The product movement analyzer 442 is further configured to activate the object recognition module 414 upon receipt of the non-scanning alert signal to recognize and issue an identifier of a product (not shown) closest to the alert originating SCO at a time immediately after the received non-scanning alert signal is issued. The product movement analyzer 442 is configured to temporarily store the extracted features received from the person tracking module 410 and the product identifiers received from the object recognition module 414. The product movement analyzer 442 is further configured to reactivate the object recognition module 414 after a predetermined time interval to determine a location of a product whose identifier matches the stored product identifier. For brevity, this product will be referred to hereinafter as a "non-scanning query product".
In one embodiment, the product movement analyzer module 442 is further configured to reactivate the people classification module 408 and the people tracking module 410 to receive extracted features from the people classification module 408 and the people tracking module 410 that are closest to an adult customer or child of the non-scanning query product. For brevity, the adult client or child will hereinafter be referred to as a "non-scanning inquirer". The product movement analyzer 442 is also configured to compare extracted features of non-scanning inquirers with stored extracted features. If no match is found between the extracted features of the non-scanning inquirer and the stored extracted features, it is indicated that the product involved in the non-scanning event is already handy and held by another person. Such product movement between people shortly after a non-scanning event implies intentional intent of the people involved in the non-scanning event. Thus, the product movement analyzer 442 is configured to issue an output signal O3, which output signal O3 includes an "SCO lock" signal. Otherwise, the output signal O3 includes a "VOID" signal.
Customer social engineering at the SCO terminal using the micro-push theory (nudge theory) includes two main elements, namely freezing or locking the SCO terminal when a non-scanning event is detected, which causes a corresponding delay inconvenience to the customer, and customer interaction with the SCO supervisor investigating the non-scanning event, which may also cause inconvenience to the customer. Both of which act as deterrents to a potential thief by changing the perceived balance between the risk of theft being detected and the return from theft. The time spent by the SCO supervisor on the SCO terminals and people involved in the non-scanning event also adds cost to the provider, which time can be used more effectively elsewhere in the retail store, and customer frustration due to delays and long queues at the SCO terminals, also resulting in sales losses.
Since SCO supervisors can only handle locked SCO terminal events in sequence, the challenges of managing this balance are amplified in retail stores that include multiple SCO terminals operating in parallel. By using an analogy of fault management, a locked SCO terminal event may be considered a fault in the continuous operation of the SCO terminal, albeit a deliberately created fault. Based on this analogy, as the number of sources of such faults increases (e.g., the number of SCO terminals used during busy periods increases as compared to quiet periods), the separation between parallel fault generation and sequential fault resolution becomes particularly acute.
The balance between these two competing objectives of the supplier, as well as the impact of the separation between parallel fault generation and sequential fault resolution, can be addressed by a three-threshold system. The first threshold is based on the number of locked SCO terminals that the SCO supervisor can handle in a given period of time. The second threshold is based on the observation that in many cases, the person's frustration with the queue increases according to the duration that the person has spent in the queue. Thus, the second and third thresholds account for the length of time that each SCO terminal is locked. These three thresholds may also indicate the number of queues at an SCO terminal and the length of time that a queue has been formed at a given SCO terminal. The values of these three thresholds may be adjusted by the provider based on their risk preference for loss of revenue due to theft at the SCO terminal and their knowledge of the customer's tolerance to delays, recognizing that different customer patterns and profiles may exist at different times of the day.
Accordingly, the SCO analysis module 444 is communicatively coupled to the non-scanning event detector 436 and the SCO unit 404. Specifically, the SCO component analysis module 444 is configured to receive a non-scanning alarm signal from the non-scanning event detector 436. Upon receiving the non-scanning alert signal, the SCO analysis module 444 is further configured to receive status signals from each SCO terminal (not shown) coupled to the SCO unit 404. The SCO group analysis module 444 is further configured to calculate the number of SCO terminals locked based on the received status signal. The SCO group analysis module 444 is further configured to calculate the duration that each locked SCO terminal (not shown) has been locked. For brevity, the SCO terminal (not shown) that has been locked for the longest duration will be referred to as an "advanced lock SCO terminal (Senior Locked SCO Terminal)". Similarly, the duration that the advanced lock SCO terminal has been locked will hereinafter be referred to as the "advanced lock period (Senior Locked Period)".
In one embodiment, the SCO group analysis module 444 is further configured to compare the number of locked SCO terminals (not shown) to a first threshold, and in the event that the number of locked SCO terminals (not shown) is less than the first threshold, the SCO group analysis module 444 is configured to issue an output signal O4, the output signal O4 comprising an SCO lock signal. Alternatively or additionally, the SCO group analysis module 444 is further configured to compare the advanced lock-in period to a second threshold, and in the event that the advanced lock-in period is less than the second threshold, the SCO group analysis module 444 is configured to issue an output signal O4, the output signal O4 comprising an SCO lock-in signal. Alternatively or additionally, the SCO group analysis module 444 is further configured to calculate a number of SCO terminals (not shown) that have been locked for a duration exceeding a second threshold, and in case the number of such SCO terminals (not shown) is less than a third threshold, the SCO group analysis module 444 is configured to issue an output signal O4, the output signal O4 comprising an "SCO lock" signal. Otherwise, the output signal O4 includes a "VOID" signal.
In one embodiment, the non-scanning sequence analyzer module 446 is communicatively coupled to the non-scanning event detector 436 and the SCO unit 404. Specifically, the non-scanning sequence analyzer module 446 is configured to receive a non-scanning alarm signal from the non-scanning event detector 436. Upon receipt of the non-scanning alert signal, the non-scanning sequence analyzer module 446 is configured to store the originating SCO identifier and the timestamp of the non-scanning alert signal. The non-scanning sequence analyzer module 446 is further configured to compare the originating SCO identifier of the subsequently received non-scanning alert signal with the stored originating SCO identifier to identify a match. In the event a match is found, the non-scanning sequence analyzer module 446 is configured to compare the timestamp of the subsequently received non-scanning alert signal with the stored timestamp corresponding to the matching stored originating SCO identifier. For brevity, the time elapsed between the time stamp of the subsequently received non-scanning alert signal and the stored time stamp corresponding to the matching stored originating SCO identifier will hereinafter be referred to as the "time elapsed since the last non-scanning alert". In the event that the time elapsed since the last non-scanning alert is less than a predetermined threshold, the non-scanning sequence analyzer module 446 is configured to issue an output signal O5, which output signal O5 includes an "SCO lock" signal. Otherwise, the output signal O5 includes a "VOID" signal.
In one embodiment, the cart fullness analyzer module 448 is communicatively coupled to the non-scanning event detector 436 and the object identification module 414. Specifically, the cart fullness analyzer module 448 is configured to receive a non-scanning alarm signal from the non-scanning event detector 436. The cart fullness analyzer module 448 is also configured to activate the object identification module 414 upon receipt of a non-scanning alert signal to identify the presence of a shopping cart next to the alert-originating SCO. Specifically, the cart fullness analyzer module 448 is configured to receive cart-related data from a cart evaluation module (not shown) of the object recognition module 414.
In one embodiment, the cart-related data includes shopping carts visible in video frames received from a camera mounted atop an alert-originating SCO head and pixels of an area occupied by products contained in the shopping carts. The cart-related data also includes a category label and an instance label for each such pixel. The cart fullness analyzer module 448 is configured to calculate a percentage of pixels in the cart-related data that are labeled "full cart" or "partially full cart" or "empty cart". The cart fullness analyzer module 448 is configured to set the cart state variable to a "full" value if most of the pixels in the cart-related data are marked as "full shopping cart". Similarly, the cart fullness analyzer module 448 is configured to set the cart state variable to a "partially full" value if most of the pixels in the cart-related data are marked as "partially full carts. Further similarly, the cart fullness analyzer module 448 is configured to set the cart state variable to a "null" value if most of the pixels in the cart-related data are marked as "empty shopping carts.
With the cart state variable set to a "full" value, the cart fullness analyzer module 448 is configured to count the number of instances of each visible product included in the shopping cart. In the event that the number of instances of visible products included in the shopping cart exceeds a predetermined threshold, the cart fullness analyzer module 448 is configured to append the tag "bulk load" to the cart state variable.
The predetermined threshold is determined empirically based on the operator's insights, experience and historical knowledge concerning situations in which a thief may attempt to conceal his theft by placing the product in a shopping cart filled with other products, in particular the other products being substantially identical.
In one embodiment, the cart fullness analyzer module 448 is further configured to count the number of pixels in the cart related data that are labeled with the same product category label and instance number. The counted number of pixels provides an initial rough estimate of the visible area of the corresponding product. For brevity, the product in which a single instance of the number of pixels in the cart-related data that are not labeled as "full cart" or "partially full cart" or "empty cart" is the majority will be referred to hereinafter as the "maximum visible product".
In one embodiment, the cart fullness analyzer module 448 is communicatively coupled to a product details database (not shown). A product detail database (not shown) includes details of the volume of each product in inventory at the retail store. The cart fullness analyzer module 448 is configured to query a product details database (not shown) to retrieve a record corresponding to the largest visible product. Thus, the retrieved record details the volume of the largest visible object included in the shopping cart next to the alert-originating SCO. For brevity, this volume will be referred to hereinafter as the "maximum volume instance". In the event that the maximum volume instance exceeds a predetermined threshold, the cart fullness analyzer module 448 is configured to attach the label "bulk item" to the cart state variable. The predetermined threshold is empirically determined based on operator insight, experience, and historical knowledge regarding the normal size range of items sold in the retail store. The cart fullness analyzer module 448 is configured to issue an output signal O6, which output signal O6 includes a cart state variable.
Fig. 8 shows an output table of a processing unit of the control unit in the software of fig. 4 for operating SCO surface areas of a retail store according to an embodiment of the present disclosure.
Accordingly, the processing unit 416 of the control unit 406 is configured to send the processed output signals including O1、O2、O3、O4、O5 and O6 shown in the table of fig. 8 to the logic unit 418 of fig. 4.
Returning to fig. 4, the logic unit 418 of the control unit 406 is configured to receive the processed output signal from the processing unit 416 of the control unit 406. Logic 418 includes a number of boolean logic units (not shown) operable to view the contents of one or more of the O1、O2、O3、O4、O5 and O6 components of the processed output signal, as well as the originating SCO identifier of the SCO terminal (not shown) that issued the SCO lock instruction and detected the non-scanning event. For brevity, the SCO lock instruction and the originating SCO identifier are collectively referred to as "SCO lock control signals".
The boolean logic cell (not shown) may be configured according to the operator's requirements. However, in one example, a boolean logic cell (not shown) is configured to cause logic cell 418 to issue an SCO lock control signal to SCO cell 404 if any of the O1、O2、O3、O4, and O5 components of the processed output signal have a value of "SCO lock. Similarly, in another example, a boolean logic cell (not shown) is configured to cause logic cell 418 to issue an SCO lock control signal to SCO unit 404 if the O6 component of the processed output signal has a value of "full" or "full bulk load.
Those skilled in the art will recognize that the above examples of configurations of boolean logic cells (not shown) are provided for illustrative purposes only. In particular, those skilled in the art will recognize that the software of the preferred embodiment is not limited to the above examples of the configuration of a boolean logic cell (not shown). Rather, the software of the preferred embodiment may operate with any configuration of a boolean logic unit (not shown) adapted to process one or more components of the processed output signal received from the processing unit 416 of the control unit 406.
In one embodiment, SCO unit 404 is configured to receive an SCO lock control signal from logic unit 418 of control unit 406 to cause SCO terminals SCO1 -SCOn in the retail store represented by an originating SCO identifier of the SCO lock control signal to lock.
Referring to fig. 9, a method 900 of operating one or more SCO terminals of a SCO environment is illustrated. The method comprises a first step 902, in which a plurality of video frames are captured using one or more video sensors, wherein the video sensors are installed at predetermined locations from one or more SCO terminals SCO1 to SCOn. The video sensor includes one or more cameras C1 to Cn mounted above one or more SCO terminals SCO1 to SCOn or at a predetermined distance from the SCO terminals SCO1 to SCOn.
In one embodiment, the method 900 includes a next step 904, in which state data for each of the SCO terminals SCO1 to SCOn is obtained. The status data includes an indicator of whether the SCO terminals SCO1 to SCOn are locked or active. The status signal may also include a timestamp of when SCO terminals SCO1 to SCOn were locked.
In one embodiment, the method 900 includes a next step 906, in which a control unit is coupled to the one or more video sensors and the SCO unit. The control unit 406 includes a processing unit and a memory. The processing unit includes a processor, computer, microcontroller, or other circuitry that controls the operation of various components, such as a memory. The processing unit may execute software, firmware, and/or other instructions stored, for example, on a volatile or non-volatile memory, such as memory or the like. The processing unit may be connected to the memory by a wired or wireless connection, such as one or more system buses, cables or other interfaces.
In an embodiment, the method 900 includes a next step 908 in which one or more frames of interest are determined from the plurality of video frames using a machine learning model. In the same embodiment, the determination of one or more frames includes detecting a primary object of interest using the person classification module 408. In the same embodiment, the detected primary object of interest is classified based on an age group, i.e. child or adult. In the same embodiment, the method comprises detecting one or more secondary objects of interest after detecting the primary object of interest.
In one embodiment, the method 900 includes a next step 910, in which the detection locations and detection times of the primary object of interest and the secondary object of interest within a predetermined distance of the SCO of the one or more SCO terminals SCO1 to SCOn are determined using the person tracking module 410.
In one embodiment, the method 900 includes a next step 912, in which a motion trigger is generated using the motion detection unit 412 based on detection of a change in position of the primary object of interest and the secondary object of interest within a predetermined distance of any of the SCO terminals SCO1 to SCOn.
In one embodiment, the method 900 includes a next step 914, in which transaction data is received from the SCO unit 404 based on the generated motion trigger, wherein the transaction data includes transactions received by scanning one or more secondary objects of interest at any of the SCO terminals SCO1 to SCOn.
In one embodiment, the method 900 includes a next step 916, in which the transaction data from the SCO unit 404 is compared to the detected secondary object of interest. Further, the method 900 includes a next step 918, in which a non-scanning event alert is generated based on a mismatch of comparisons of the transaction data with the detected one or more secondary objects of interest.
Modifications may be made to the embodiments of the disclosure described in the foregoing without departing from the scope of the disclosure as defined by the following claims. Expressions such as "comprising," "including," "incorporating," "consisting of," "having," "being" used to describe and claim the present disclosure are intended to be interpreted in a non-exclusive manner, i.e., to allow for the existence of items, components, or elements that are not explicitly described. Reference to the singular is also to be construed to relate to the plural.