US20110169917A1

Movatterモバイル変換

Info

Publication number: US20110169917A1
Application number: US12/942,108
Authority: US
Inventors: Anne Marie Stephen; David Patrick McNeill; Jane Farias; Mikhail Zaturenskiy; Rahul Miglani; William C. Kastilahn; Zhiqian Wang
Original assignee: ShopperTrak RCT LLC
Current assignee: ShopperTrak RCT LLC
Priority date: 2010-01-11
Filing date: 2010-11-09
Publication date: 2011-07-14
Also published as: US20130314505A1; US10909695B2; CA2723613A1; GB2476869B; US20150294482A1; GB201100105D0; US20180300887A1; GB2476869A

Abstract

A system is disclosed that includes: at least one image capturing device at the entrance to obtain images; a reader device; and a processor for extracting objects of interest from the images and generating tracks for each object of interest, and for matching objects of interest with objects associated with RFID tags, and for counting the number of objects of interest associated with, and not associated with, particular RFID tags.

Description

CROSS-REFERENCE

This application claims the benefit of U.S. Provisional Patent Application No. 61/294,013 filed on Jan. 11, 2010, which is incorporated herein in its entirety.

BACKGROUND

1. Field of the Invention

The present invention generally relates to the field of object detection, tracking, and counting. In specific, the present invention is a computer-implemented detection and tracking system and process for detecting and tracking human objects of interest that appear in camera images taken, for example, at an entrance or entrances to a facility, as well as counting the number of human objects of interest entering or exiting the facility for a given time period.

2. Related Prior Art

Traditionally, various methods for detecting and counting the passing of an object have been proposed. U.S. Pat. No. 7,161,482 describes an integrated electronic article surveillance (EAS) and people counting system. The EAS component establishes an interrogatory zone by an antenna positioned adjacent to the interrogation zone at an exit point of a protected area. The people counting component includes one people detection device to detect the passage of people through an associated passageway and provide a people detection signal, and another people detection device placed at a predefined distance from the first device and configured to detect another people detection signal. The two signals are then processed into an output representative of a direction of travel in response to the signals.

Basically, there are two classes of systems employing video images for locating and tracking human objects of interest. One class uses monocular video streams or image sequences to extract, recognize, and track objects of interest. The other class makes use of two or more video sensors to derive range or height maps from multiple intensity images and uses the range or height maps as a major data source.

In monocular systems, objects of interest are detected and tracked by applying background differencing, or by adaptive template matching, or by contour tracking. The major problem with approaches using background differencing is the presence of background clutters, which negatively affect robustness and reliability of the system performance. Another problem is that the background updating rate is hard to adjust in real applications. The problems with approaches using adaptive template matching are:

1) object detections tend to drift from true locations of the objects, or get fixed to strong features in the background; and

2) the detections are prone to occlusion. Approaches using the contour tracking suffer from difficulty in overcoming degradation by intensity gradients in the background near contours of the objects. In addition, all the previously mentioned methods are susceptible to changes in lighting conditions, shadows, and sunlight.

In stereo or multi-sensor systems, intensity images taken by sensors are converted to range or height maps, and the conversion is not affected by adverse factors such as lighting condition changes, strong shadow, or sunlight.

Therefore, performances of stereo systems are still very robust and reliable in the presence of adverse factors such as hostile lighting conditions. In addition, it is easier to use range or height information for segmenting, detecting, and tracking objects than to use intensity information.

Most state-of-the-art stereo systems use range background differencing to detect objects of interest. Range background differencing suffers from the same problems such as background clutter, as the monocular background differencing approaches, and presents difficulty in differentiating between multiple closely positioned objects.

U.S. Pat. No. 6,771,818 describes a system and process of identifying and locating people and objects of interest in a scene by selectively clustering blobs to generate “candidate blob clusters” within the scene and comparing the blob clusters to a model representing the people or objects of interest. The comparison of candidate blob clusters to the model identifies the blob clusters that is the closest match or matches to the model. Sequential live depth images may be captured and analyzed in real-time to provide for continuous identification and location of people or objects as a function of time.

U.S. Pat. Nos. 6,952,496 and 7,092,566 are directed to a system and process employing color images, color histograms, techniques for compensating variations, and a sum of match qualities approach to best identify each of a group of people and objects in the image of a scene. An image is segmented to extract regions which likely correspond to people and objects of interest and a histogram is computed for each of the extracted regions. The histogram is compared with pre-computed model histograms and is designated as corresponding to a person or object if the degree of similarity exceeds a prescribed threshold. The designated histogram can also be stored as an additional model histogram.

U.S. Pat. No. 7,176,441 describes a counting system for counting the number of persons passing a monitor line set in the width direction of a path. A laser is installed for irradiating the monitor line with a slit ray and an image capturing device is deployed for photographing an area including the monitor line. The number of passing persons is counted on the basis of one dimensional data generated from an image obtained from the photographing when the slit ray is interrupted on the monitor line when a person passes the monitor line.

Despite all the prior art in this field, no invention has developed a technology that enables unobtrusive detection and tracking of moving human objects, requiring low budget and maintenance while providing precise traffic counting results with the ability to distinguish between incoming and outgoing traffic, moving and static objects, and between objects of different heights. Thus, it is a primary objective of this invention to provide an unobtrusive traffic detection, tracking, and counting system that involves low cost, easy and low maintenance, high-speed processing, and capable of providing time-stamped results that can be further analyzed.

In addition, people counting systems typically create anonymous traffic counts. In retail traffic monitoring, however, this may be insufficient. For example, some situations may require store employees to accompany customers through access points that are being monitored by an object tracking and counting system, such as fitting rooms. In these circumstances, existing systems are unable to separately track and count employees and customers. The present invention would solve this deficiency.

SUMMARY OF THE INVENTION

The present invention is directed to a system and process for detecting, tracking, and counting human objects of interest entering or exiting an entrance or entrances of a facility.

According to the present invention, the system includes: at least one image capturing device at the entrance to obtain images; a reader device; and a processor for extracting objects of interest from the images and generating tracks for each object of interest, and for matching objects of interest with objects associated with RFID tags, and for counting the number of objects of interest associated with, and not associated with, particular RFID tags.

An objective of the present invention is to provide a technique capable of achieving a reasonable computation load and providing real-time detection, tracking, and counting results.

Another objective is to provide easy and unobtrusive tracking and monitoring of the facility.

Another objective of the present invention is to provide a technique to determine the ratio of the number of human objects entering the facility over the number of human objects of interest passing within a certain distance from the facility.

In accordance with these and other objectives that will become apparent hereafter, the present invention will be described with particular references to the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic perspective view of a facility in which the system of the present invention is installed;

FIG. 2 is a diagram illustrating the image capturing device connected to an exemplary counting system of the present invention;

FIG. 3 is a diagram illustrating the sequence of converting one or more stereo image pairs captured by the system of the present invention into the height maps, which are analyzed to track and count human objects;

FIG. 4 is a flow diagram describing the flow of processes for a system performing human object detection, tracking, and counting according to the present invention;

FIG. 5 is a flow diagram describing the flow of processes for object tracking;

FIG. 6 is a flow diagram describing the flow of processes for track analysis;

FIG. 7 is a first part of a flow diagram describing the flow of processes for suboptimal localization of unpaired tracks;

FIG. 8 is a second part of the flow diagram ofFIG. 7 describing the flow of processes for suboptimal localization of nnpaired tracks;

FIG. 9 is a flow diagram describing the flow of processes for second pass matching of tracks and object detects;

FIG. 10 is a flow diagram describing the flow of processes for track updating or creation;

FIG. 11 is a flow diagram describing the flow of processes for track merging;

FIG. 12 is a flow diagram describing the flow of processes for track updates;

FIG. 13 is a diagram illustrating the image capturing device connected to an exemplary counting system, which includes an RFID reader;

FIG. 14 is a flow diagram depicting the flow of processes for retrieving object data and tag data and generating track arrays and sequence arrays;

FIG. 15 is a flow diagram depicting the flow of processes for determining whether any overlap exists between any of the track records and any of the sequence records;

FIG. 16 is a flow diagram depicting the flow of processes for generating a match record316 for each group of sequence records whose track records overlap;

FIG. 17 is a flow diagram depicting the flow of processes for calculating the match quality scores;

FIG. 18A is a flow diagram depicting the flow of processes for determining which track record is the best match for a particular sequence; and

FIG. 18B is a flow diagram depicting the flow of processes for determining the sequence record that holds the sequence record/track record combination with the highest match quality score.

DETAILED DESCRIPTION OF THE INVENTION

This detailed description is presented in terms of programs, data structures or procedures executed on a computer or a network of computers. The software programs implemented by the system may be written in languages such as JAVA, C, C++, C#, Assembly language, Python, PHP, or HTML. However, one of skill in the art will appreciate that other languages may be used instead, or in combination with the foregoing.

1. System Components

Referring toFIGS. 1,2 and3, the present invention is a system10 comprising at least oneimage capturing device20 electronically or wirelessly connected to acounting system30. In the illustrated embodiment, the at least oneimage capturing device20 is mounted above an entrance or entrances21 to afacility23 for capturing images from the entrance or entrances21. Facilities such as malls or stores with wide entrances often require more than one image capturing device to completely cover the entrances. The area captured by theimage capturing device20 is field of view44. Each image, along with the time when the image is captured, is a frame48 (FIG. 3).

Typically, the image capturing device includes at least one stereo camera with two or more video sensors46 (FIG. 2), which allows the camera to simulate human binocular vision. A pair of stereo images comprises frames48 taken by each video sensor46 of the camera. A height map56 is then constructed from the pair of stereo images through computations involving finding corresponding pixels in rectified frames52,53 of the stereo image pair.

Door zone84 is an area in the height map56 marking the start position of an incoming track and end position of an outgoing track. Interior zone86 is an area marking the end position of the incoming track and the start position of the outgoing track. Dead zone90 is an area in the field of view44 that is not processed by thecounting system30.

Video sensors46 (FIG. 2) receive photons through lenses, and photons cause electrons in theimage capturing device20 to react and form light images. Theimage capturing device20 then converts the light images to digital signals through which thedevice20 obtains digital raw frames48 (FIG. 3) comprising pixels. A pixel is a single point in a raw frame48. The raw frame48 generally comprises several hundred thousands or millions of pixels arranged in rows and columns.

Examples of video sensors46 used in the present invention include CMOS (Complementary Metal-Oxide Semiconductor) sensors and/or CCD (Charge-Coupled Device) sensors. However, the types of video sensors46 should not be considered limiting, and any video sensor46 compatible with the pfesent system may be adopted.

Thecounting system30 comprises three main components: (I) boot loader32; (2) system management and communication component34; and (3)counting component36.

The boot loader32 is executed when the system is powered up and loads the main application program into memory38 for execution.

The system management and communication component34 includes task schedulers, database interface, recording functions, and TCP/IP or PPP communication protocols. The database interface includes modules for pushing and storing data generated from thecounting component36 to a database at a remote site. The recording functions provide operations such as writing user defined events to a database, sending emails, and video recording.

Thecounting component36 is a key component of the system10 and is described in further detail as follows.

2. The Counting Component.

In an illustrated embodiment of the present invention, the at least oneimage capturing device20 and thecounting system30 are integrated in a single image capturing and processing device. The single image capturing and processing device can be installed anywhere above the entrance or entrances to thefacility23. Data output from the single image capturing and processing device can be transmitted through the system management and communication component34 to the database for storage and further analysis.

FIG. 4 is a diagram showing the flow of processes of thecounting component36. The processes are: (1) obtaining raw frames (block100); (2) rectification (block102); (3) disparity map generation (block104); (4) height map generation (block106); (5) object detection (block108); and (6) object tracking (block110).

Referring toFIGS. 1-4, inblock100, theimage capturing device20 obtains raw image frames48 (FIG. 3) at a given rate (such as for every 1 As second) of the field of view44 from the video sensors46. Each pixel in the raw frame48 records color and light intensity of a position in the field of view44. When theimage capturing device20 takes a snapshot, each video sensor46 of thedevice20 produces a different raw frame48 simultaneously. One or more pairs of raw frames48 taken simultaneously are then used to generate the height maps56 for the field of view44, as will be described.

When multipleimage capturing devices20 are used, tracks88 generated by eachimage capturing device20 are merged before proceeding to block102.

Block

102 uses calibration data of the stereo cameras (not shown) stored in theimage capturing device20 to rectify raw stereo frames48. The rectification operation corrects lens distortion effects on the raw frames48. The calibration data include each sensor's optical center, lens distortion information, focal lengths, and the relative pose of one sensor with respect to the other. After the rectification, straight lines in the real world that have been distorted to curved lines in the raw stereo frames48 are corrected and restored to straight lines. The resulting frames from rectification are called rectified frames52,53 (FIG. 3).

Block

104 creates a disparity map50 (FIG. 3) from each pair of rectified frames52,53. Adisparity map50 is an image map where each pixel comprises a disparity value. The term disparity was originally used to describe a 2-D vector between positions of corresponding features seen by the left and right eyes. Rectified frames52,53 in a pair are compared to each other for matching features. The disparity is computed as the difference between positions of the same feature in frame52 and frame53.

Block106 converts thedisparity map50 to the height map56. Each pixel of the height map56 comprises a height value and x-y coordinates, where the height value is represented by the greatest ground height of all the points in the same location in the field of view44. The height map56 is sometimes referred to as a frame in the rest of the description.

2.1 Object Detection

Object detection (block108) is a process of locating candidate objects58 in the height map56. One objective of the present invention is to detect human objects standing or walking in relatively flat areas. Because human objects of interest are much higher than the ground, local maxima of the height map56 often represent heads of human objects or occasionally raised hands or other objects carried on the shoulders of human objects walking in counting zone84,86 (FIG. 1). Therefore, local maxima of the height map56 are identified as positions of potential human object58 detects. Each potential human object58 detect is represented in the height map56 by a local maximum with a height greater than a predefined threshold and all distances from other local maxima above a predefined range.

Occasionally, some human objects of interest do not appear as local maxima for reasons such as that the height map56 is affected by false detection due to snow blindness effect in the process of generating thedisparity map50, or that human objects of interests are standing close to taller objects such as walls or doors. To overcome this problem, the current invention searches in the neighborhood of the most recent local maxima for a suboptimal location as candidate positions for human objects of interest, as will be described later.

A run is a contiguous set of pixels on the same row of the height map56 with the same non-zero height values. Each run is represented by a four-tuple (row, start-column, end-column, height). In practice, height map56 is often represented by a set of runs in order to boost processing performance and object detection is also performed on the runs instead of the pixels.

Object detection comprises four stages: 1) background reconstruction; 2) first pass component detection; 3) second pass object detection; and 4) merging of closely located detects.

2.1.1 Component Definition and Properties

Pixel q is an eight-neighbor of pixel p if q and p share an edge or a vertex in the height map56, and both p and q have non-zero height values. A pixel can have as many as eight eight-neighbors.

A set of pixels E is an eight-connected component if for every pair of pixels Pi and Pi in E, there exists a sequence of pixels Pi′ . . . , Pi such that all pixels in the sequence belong to the set E, and every pair of two adjacent pixels are eight neighbors to each other. Without further noting, an eight connected component is simply referred to as a connected component hereafter.

The connected component is a data structure representing a set of eight-connected pixels in the height map56. A connected component may represent one or more human objects of interest. Properties of a connected component include height, position, size, etc. Table 1 provides a list of properties associated with a connected component. Each property has an abbreviated name enclosed in a pair of parentheses and a description. Properties will be referenced by their abbreviated names hereafter.

TABLE 1

	Variable Name
Number	(abbreviated name)	Description

1	component ID (det_ID)	Identification of a component. In the first pass,
		componentID represents the component. In the
		second pass, componentID represents the
		parent component from which the current
		component is derived.
2	peak position (det_maxX, det_maxY)	Mass center of the pixels in the component
		having the greatest height value.
3	peak area (det_maxArea)	Number of pixels in the component having the
		greatest height value.
4	center (det_X, det_Y)	Mass center of all pixels of the component.
5	minimum size	Size of the shortest side of two minimum
	(det_minSize)	rectangles that enclose the component at 0 and
		45 degrees.
6	maximum size	Size of the longest side of two minimum
	(det_maxSize)	rectangles that enclose the component at 0 and
		45 degrees.
7	area (det_area)	Number of pixels of the component.
8	minimum height	Minimum height of all pixels of the
	(det_minHeight)	component.
9	maximum height	Maximum height of all pixels of the
	(det_maxHeight)	component.
10	height sum (det_htSum)	Sum of heights of pixels in a small square
		window centered at the center position of the
		component, the window having a configurable
		size.
11	Grouping flag	A flag indicating whether the subcomponent
	(de_grouped)	still needs grouping.
12	background	A flag indicating whether the mass center of
	(det_inBackground)	the component is in the background
13	the closest detection	Identifies a second pass component closest to
	(det_closestDet)	the component but remaining separate after
		operation of “merging close detections”.

Several predicate operators are applied to a sunset of properties of the connected component to check if the subset of properties satisfies a certain condition. Component predicate operators include:

IsNoisy, which checks whether a connected component is too small to be considered a valid object detect58. A connected component is considered as “noise” if at least two of the following three conditions hold: 1) its det_minSize is less than two thirds of a specified minimum human body size, which is configurable in the range of [9,36] inches; 2) its det_area is less than four ninths of the area of a circle with its diameter equal to a specified minimum body size; and 3) the product of its det_minSize and det area is less than product of the specified minimum human body size and a specified minimum body area.

IsPointAtBoundaries, which checks whether a square window centered at the current point with its side equal to a specified local maximum search window size is intersecting boundaries of the height map56, or whether the connected component has more than a specific number of pixels in the dead zone90. If this operation returns true, the point being checked is considered as within the boundaries of the height map56.

NotSmallSubComponent, which checks if a subcomponent in the second pass component detection is not small. It returns true if its detrninxize is greater than a specified minimum human head size or its det_area is greater than a specified minimum human head area.

BigSubComponentSeed, which checks if a subcomponent seed in the second pass component detection is big enough to stop the grouping operation. It returns true if its detrninxize is greater than the specified maximum human head size or its det_area is greater than the specified maximum human head area.

SmallSubComponent, which checks if a subcomponent in the second pass component detection is small. It returns true if its detrninxize is less than the specified minimum human head size or its der area is less than the specified minimum human head area.

2.1.2 Background Reconstruction

The background represents static scenery in the field view44 of theimage capturing device20 and is constructed from the height map56. The background building process monitors every pixel of every height map56 and updates a background height map. A pixel may be considered as part of the static scenery if the pixel has the same non-zero height value for a specified percentage of time (e.g., 70%).

2.1.3 First-Pass Component Detection

First pass components are computed by applying a variant of an eight-connected image labeling algorithm on the runs of the height map56. Properties of first pass components are calculated according to the definitions in Table 1. Predicate operators are also applied to the first pass components. Those first pass components whose “IsNoise” predicate operator returns “true” are ignored without being passed on to the second pass component detection phase of the object detection.

2.1.4 Second Pass Object Detection

In this phase, height map local maxima, to be considered as candidate human detects, are derived from the first pass components in the following steps.

First, for each first pass component, find all eight connected subcomponents whose pixels have the same height. The deigrouped property of all subcomponents is cleared to prepare for subcomponent grouping and the deCID property of each subcomponent is set to the ID of the corresponding first pass component.

Second, try to find the highest ungrouped local maximal subcomponent satisfying the following two conditions: (1) the subcomponent has the highest height among all of the ungrouped subcomponents of the given first pass component, or the largest area among all of the ungrouped subcomponents of the given first pass component if several ungrouped subcomponents with the same highest height exist; and (2) the subcomponent is higher than all of its neighboring subcomponents. If such a subcomponent exists, use it as the current seed and proceed to the next step for further subcomponent grouping. Otherwise, return to step 1 to process the next first pass component in line.

Third, if BigSubComponentSeed test returns true on the current seed, the subcomponent is then considered as a potential human object detect. Set the det grouped flag of the subcomponent to mark it as grouped and proceed to step 2 to look for a new seed. If the test returns false, proceed to the next step.

Fourth, try to find a subcomponent next to the current seed that has the highest height and meets all of the following three conditions: (I) it is eight-connected to the current seed; (2) its height is smaller than that of the current seed; and (3) it is not connected to a third subcomponent that is higher and it passes the NotSmallSubComponent test. If more than one subcomponent meets all of above conditions, choose the one with the largest area. When no subcomponent meets the criteria, set the deigrouped property of the current seed to “grouped” and go to step 2. Otherwise, proceed to the next step.

Fifth, calculate the distance between centers of the current seed and the subcomponent found in the previous step. If the distance is less than the specified detection search range or the current seed passes the SmallSubComponent test, group the current seed and the subcomponent together and update the properties of the current seed accordingly. Otherwise, set the det_grouped property of the current seed as “grouped”. Return to step 2 to continue the grouping process until no further grouping can be done.

2.1.5 Merging Closely Located Detections

Because theimage capturing device20 is mounted on the ceiling of the facility entrance (FIG. 1), a human object of interest is identified by a local maximum in the height map. Sometimes more than one local maxima detection is generated from the same human object of interest. For example, when a human object raises both of his hands at the same time, two closely located local maxima may be detected. Therefore, it is necessary to merge closely located local maxima.

The steps of this phase are as follows.

First, search for the closest pair of local maxima detections. If the distance between the two closest detections is greater than the specified detection merging distance, stop and exit the process. Otherwise, proceed to the next step.

Second, check and process the two detections according to the following conditions in the given order. Once one condition is met, ignore the remaining conditions and proceed to the next step:

a) if either but not all detection is in the background, ignore the one in the background since it is most likely a static object (the local maximum in the foreground has higher priority over the one in the background);

b) if either but not all detection is touching edges of the height map56 or dead zones, delete the one that is touching edges of the height map56 or dead zones (a complete local maximum has higher priority over an incomplete one);

c) if the difference between det rnaxlleights of detections is smaller than a specified person height variation threshold, delete the detection with significantly less 3-D volume (e.g., the product of det_maxHeight and det_masArea for one connected component is less than two thirds of the product for the other connected component) (a strong local maximum has higher priority over a weak one);

d) if the difference between maximum heights of detections is more than one foot, delete the detection with smaller det_maxHeight if the detection with greater height among the two is less than the specified maximum person height, or delete the detection with greater det_maxHeight if the maximum height of that detection is greater than the specified maximum person height (a local maxima with a reasonable height has higher priority over a local maximum with an unlikely height);

e) delete the detection whose det area is twice as small as the other (a small local maximum close to a large local maximum is more likely a pepper noise);

f) if the distance between the two detections is smaller than the specified detection search range, merge the two detections into one (both local maxima are equally good and close to each other);

g) keep both detections if the distance between the two detections is larger than or equal to the specified detection search range (both local maxima are equally good and not too close to each other). Update the det., closestDet attribute for each detection with the other detection's

Then, return to step 1 to look for the next closest pair of detections.

The remaining local maxima detections after the above merging process are defined as candidate object detects58, which are then matched with a set of existing tracks74 for track extension, or new track initiation if no match is found.

2.2 Object Tracking

Object tracking (block110 inFIG. 1) uses objects detected in the object detection process (block108) to extend existing tracks74 or create new tracks80. Some short, broken tracks are also analyzed for possible track repair operations.

To count human objects using object tracks, zones82 are delineated in the height map56. Door zones84 represent door areas around thefacility23 to the entrance. Interior zones86 represent interior areas of the facility. A track76 traversing from the door zone84 to the interior zone86 has a potential “in” count. A track76 traversing to the door zone84 from the interior zone86 has a potential “out” count. If a track76 traverses across zones82 multiple times, there can be only one potential “in” or “out” count depending on the direction of the latest zone crossing.

As illustrated inFIG. 5, the process of object tracking110 comprises the following phases: 1) analysis and processing of old tracks (block120); 2) first pass matching between tracks and object detects (block122); 3) suboptimal localization of unpaired tracks (block124); 4) second pass matching between tracks and object detects (block126); and 5) track updating or creation (block128).

An object track76 can bemused to determine whether a human object is entering or leaving the facility, or to derive properties such as moving speed and direction for human objects being tracked.

Object tracks76 can also be used to eliminate false human object detections, such as static signs around the entrance area. If an object detect58 has not moved and its associated track76 has been static for a relatively long time, the object detect58 will be considered as part of the background and its track76 will be processed differently than normal tracks (e.g., the counts created by the track will be ignored).

Object tracking110 also makes use of color or gray level intensity information in the frames52,53 to search for best match between tracks76 and object detects58. Note that the color or the intensity information is not carried to disparity maps50 or height maps56.

The same technique used in the object tracking can also be used to determine how long a person stands in a checkout line.

2.2.1 Properties of Object Track

Each track76 is a data structure generated from the same object being tracked in both temporal and spatial domains and contains a list of 4-tuples (x, y, t, h) in addition to a set of related properties, where h, x and y present the height and the position of the object in the field of view44 at time t. (x, y, h) is defined in a world coordinate system with the plane formed by x and y parallel to the ground and the h axis vertical to the ground. Each track can only have one position at any time. In addition to the list of 4-tuples, track76 also has a set of properties as defined in Table 2 and the properties will be referred to later by their abbreviated names in the parentheses:

TABLE 2

Number	Variable Name	Description

1	ID number (trk_ID)	A unique number identifying the track.
2	track state (trk_state)	A track could be in one of three states: active,
		inactive and deleted. Being active means the
		track is extended in a previous frame, being
		inactive means the track is not paired with a
		detect in a previous frame, and being deleted
		means the track is marked for deletion.
3	start point (trk_start)	The initial position of the track (Xs, Ys, Ts,
		Hs).
4	end point (trk_end)	The end position of the track (Xe, Ye, Te, He).
5	positive Step Numbers (trk_posNum)	Number of steps moving in the same direction
		as the previous step.
6	positive Distance (trk_posDist)	Total distance by positive steps.
7	negative Step Numbers (trk_negNum)	Number of steps moving in the opposite
		direction to the previous step.
8	negative Distance (trk_negDist)	Total distance by negative steps.
9	background count	The accumulative duration of the track in
	(trk_backgroundCount)	background.
10	track range (trk_range)	The length of the diagonal of the minimal
		rectangle covering all of the track's points.
11	start zone (trk_startZone)	A zone number representing either door zone
		or interior zone when the track is created.
12	last zone (trk_lastZone)	A zone number representing the last zone the
		track was in.
13	enters (trk_enters)	Number of times the track goes from a door
		zone to an interior zone.
14	exits (trk_exits)	Number of times the track goes from an
		interior zone to a door zone.
15	total steps (trk_totalSteps)	The total non-stationary steps of the track.
16	high point steps (trk_higbPtSteps)	The number of non-stationary steps that the
		track has above a maximum person height (e.g.
		85 inches).
17	low point steps (trk_lowPtSteps)	The number of non-stationary steps below a
		specified minimumn person height.
18	maximum track heigbt	The maximum height of the track.
	(trk_maxTrackHt)
19	non-local maximum detection point	The accumulative duration of the time that the
	(trk_nonMaxDetNum)	track has from non-local maximum point in the
		height map and that is closest to any active
		track.
20	moving vector (trk_movingVec)	The direction and offset from the closest point
		in time to the current point with the offset
		greater than the minimwn body size.
21	following track (trk_followingTrack)	The ID of the track that is following closely. If
		there is a track following closely, the distance
		between these two tracks don't change a lot,
		and the maximum height of the front track is
		less than a specified height for shopping carts,
		then the track in the front may be considered as
		made by a shopping cart.
22	minimum following distance	The minimum distance from this track to the
	(trk_minFollowingDist)	following track at a point of time.
23	maximum following distance	The maximum distance from this track to the
	(trk_maxFollowingDist)	following track at a point of time.
24	following duration (trk_voteFollowing)	The time in frames that the track is followed by
		the track specified in trk_followingTrack.
25	most recent track	The id of a track whose detection t was once
	(trk_lastCollidingTrack)	very close to this track's non-local minimum
		candidate extending position.
26	number of merged tracks	The number of small tracks that this track is
	(trk_mergedTracks)	made of through connection of broken tracks.
27	number of small track searches	The number of small track search ranges used
	(trk_smallSearches)	in merging tracks.
28	Mirror track (trk_mirrorTrack)	The ID of the track that is very close to this
		track and that might be the cause of this track.
		This track itself has to be from a non-local
		maximum detection created by a blind search,
		or its height has to be less than or equal to the
		specified minimum person height in order to be
		qualified as a candidate for false tracks.
29	Mirror track duration	The time in frames that the track is a candidate
	(trk_voteMirrorTrack)	for false tracks and is closely accompanied by
		the track specified in trk_mirrorTrack within a
		distance of the specified maximum person
		width.
30	Maximum mirror track distance	The maximum distance between the track and
	(trk_maxMirrorDist)	the track specified in trk_mirrorTrack.

Several predicate operators are defined in order to obtain the current status of the tracks76. The predicate operators are applied to a subset of properties of a track76 to check if the subset of properties satisfies a certain condition. The predicate operators include:

IsNoisyNow, which checks if track bouncing back and forth locally at the current time. Specifically, a track76 is considered noisy if the track points with a fixed number of frames in the past (specified as noisy track duration) satisfies one of the following conditions:

a) the range of track76 (trkrange) is less than the specified noisy track range, and either the negative distance (trk_negDist) is larger than two thirds of the positive distance (trk_posDist) or the negative steps (trk_negNum) are more than two thirds of the positive steps (trk_posNum);

b) the range of track76 (trkrange) is less than half of the specified noisy track range, and either the negative distance (trk_negDist) is larger than one third of the positive distance (trk_posDist) or the negative steps (trk_negNum) are more than one third of the positive steps (trk_posNum).

WholeTrackIsNoisy: a track76 may be noisy at one time and not noisy at another time.

This check is used when the track76 was created a short time ago, and the whole track76 is considered noisy if one of the following conditions holds:

b) the range of track76 (trkrange) is less than half the specified noisy track range, and either the negative distance trk_negDist) is larger than one third of the positive distance (trk_posDist) or the negative steps (trk_negNum) are more than one third of the positive steps (trk_posNum).

IsSameTrack, which check if two tracks76,77 are likely caused by the same human object. All of the following three conditions have to be met for this test to return true: (a) the two tracks76,77 overlap in time for a minimum number of frames specified as the maximum track timeout; (b) the ranges of both tracks76,77 are above a threshold specified as the valid counting track span; and (c) the distance between the two tracks76,77 at any moment must be less than the specified minimum person width.

IsCountIgnored: when the track76 crosses the counting zones, it may not be created by a human object of interest. The counts of a track are ignored if one of the following conditions is met:

Invalid Tracks: the absolute difference between trk_exits and trk_enters is not equal to one.

Small Tracks: trkrange is less than the specified minimum counting track length.

Unreliable Merged Tracks: trkrange is less than the specified minimum background counting track length as well as one of the following: trk_mergedTracks is equal to trk_smallSearches, or trk_backgroundCount is more than 80% of the life time of the track76, or the track76 crosses the zone boundaries more than once.

High Object Test: trk_highPtSteps is larger than half oftrk_totaISteps.

Small Child Test: trk_lowPtSteps is greater than ¾ of trk_totaISteps, and trk_maxTrackHt is less than or equal to the specified minimum person height.

Shopping Cart Test: trk_voteFollowing is greater than 3, trk_minFollowingDist is more than or equal to 80% of trk_maxFollowingDist, and trk_maxTrackHt is less than or equal to the specified shopping cart height.

False Track test: trk_voteMirrorTrack is more than 60% of the life time of the track76, and trk_maxMirrorTrackDist is less than two thirds of the specified maximum person width or trk_totalVoteMirrorTrack is more than 80% of the life time of the track76.

2.2.3 Track Updating Operation

Referring toFIG. 12, each track76 is updated with new information on its position, time, and height when there is a best matching human object detect58 in the current height map56 for First, set trk_state of the track76 to1 (block360).

Second, for the current frame, obtain the height by using median filter on the most recent three heights of the track76 and calculate the new position56 by averaging on the most recent three positions of the track76 (block362).

Third, for the current frame, check the noise status using track predicate operator IsNoisyNow. If true, mark a specified number of frames in the past as noisy. In addition, update noise related properties of the track76 (block364).

Fourth, update the span of the track76 (block366).

Fifth, if one of the following conditions is met, collect the count carried by track76 (block374):

a) the track76 is not noisy at the beginning, but it has been noisy for longer than the specified stationary track timeout (block368); or

b) the track76 is not in the background at the beginning, but it has been in the background for longer than the specified stationary track timeout (block370).

Finally, update the current zone information (block372).

2.2.4 Track Prediction Calculation

It helps to use a predicted position of the track76 when looking for best matching detect58. The predicted position is calculated by linear extrapolation on positions of the track76 in the past three seconds.

2.2.5 Analysis and Processing of Old Track

This is the first phase of object tracking. Active tracks88 are tracks76 that are either created or extended with human object detects58 in the previous frame. When there is no best matching human object detect58 for the track76, the track76 is considered as inactive.

This phase mainly deals with tracks76 that are inactive for a certain period of time or are marked for deletion in previous frame56. Track analysis is performed on tracks76 that have been inactive for a long time to decide whether to group them with existing tracks74 or to mark them for deletion in the next frame56. Tracks76 are deleted if the tracks76 have been marked for deletion in the previous frame56, or the tracks76 are inactive and were created a very short period of time before. If the counts of the soon-to-be deleted tracks76 shall not be ignored according to the IsCountIgnored predicate operator, collect the counts of the tracks76.

2.2.6 First Pass Matching Between Tracks and Detects

After all tracks76 are analyzed for grouping or deletion, this phase searches for optimal matches between the human object detects58 (i.e. the set of local maxima found in the object detection phase) and tracks76 that have not been deleted.

First, check every possible pair of track76 and detect58 and put the pair into a candidate list if all of the following conditions are met:

1) The track76 is active, or it must be long enough (e.g. with more than three points), or it just became inactive a short period of time ago (e.g. it has less than three frames);

2) The smaller of the distances from center of the detect58 to the last two points of the track76 is less than two thirds of the specified detection search range when the track76 hasn't moved very far (e.g. the span of the track76 is less than the specified minimum human head size and the track76 has more than 3 points);

3) If the detect58 is in the background, the maximum height of the detect58 must be greater than or equal to the specified minimum person height;

4) If the detect58 is neither in the background nor close to dead zones or height map boundaries, and the track76 is neither in the background nor is noisy in the previous frame, and a first distance from the detect58 to the predicted position of the track76 is less than a second distance from the detect58 to the end position of the track76, use the first distance as the matching distance. Otherwise, use the second distance as the matching distance. The matching distance has to be less than the specified detection search range;

5) The difference between the maximum height of the detect58 and the height oblast point of the track76 must be less than the specified maximum height difference; and

6) If either the last point off-track76 or the detect58 is in the background, or the detect58 is close to dead zones or height map boundaries, the distance from the track76 to the detect58 must be less than the specified background detection search range, which is generally smaller than the threshold used in condition (4).

Sort the candidate list in terms of the distance from the detect58 to the track76 or the height difference between the detect58 and the track76 (if the distance is the same) in ascending order.

The sorted list contains pairs of detects58 and tracks76 that are not paired. Run through the whole sorted list from the beginning and check each pair. If either the detect58 or the track76 of the pair is marked “paired” already, ignore the pair. Otherwise, mark the detect58 and the track76 of the pair as “paired”. [0144] 2.2.7 Search of Suboptimal Location For

Unpaired Tracks

Due to sparseness nature of thedisparity map50 and the height map56, some human objects may not generate local maxima in the height map56 and therefore may be missed in theobject detection process108. In addition, the desired local maxima might get suppressed by a neighboring higher local maximum from a taller object. Thus, some human object tracks76 may not always have a corresponding local maximum in the height map56. This phase tries to resolve this issue by searching for a suboptimal location for a track76 that has no corresponding local maximum in the height map56 at the current time. Tracks76 that have already been paired with a detect58 in the previous phase might go through this phase too to adjust their locations if the distance between from end of those tracks to their paired detects is much larger than their steps in the past. In the following description, the track76 currently undergoing this phase is called Track A. The search is performed in the following steps.

First, referring toFIG. 7, if Track A is deemed not suitable for the suboptimal location search operation (i.e., it is inactive, or it's in the background, or it's close to the boundary of the height map56 or dead zones, or its height in last frame was less than the minimum person height (block184)), stop the search process and exit. Otherwise, proceed to the next step.

Second, if Track A has moved a few steps (block200) (e.g., three steps) and is paired with a detection (called Detection A) (block186) that is not in the background and whose current step is much larger than its maximum moving step within a period of time in the past specified by a track time out parameter (block202,204), proceed to the next step. Otherwise, stop the search process and exit.

Third, search around the end point of Track A in a range defined by its maximum moving steps for a location with the largest height sum in a predefined window and call this location Best Spot A (block188). If there are some detects58 deleted in the process of merging of closely located detects in the object detection phase and Track A is long in either the spatial domain or the temporal domain (e.g. the span of Track A is greater than the specified noisy track span threshold, or Track A has more than three frames) (block190), find the closest one to the end point of Track too. If its distance to the end point of Track A is less than the specified detection search range (block206), search around the deleted component for the position with the largest height sum and call it Best Spot AI (block208). If neither Best Spot A nor Best Spot AI exists, stop the search process and exit. If both Best Spot A and Best Spot AI exist, choose the one with larger height sum. The best spot selected is called suboptimal location for Track A. If the maximum height at the suboptimal location is greater than the predefined maximum person height (block192), stop the search and exit. If there is no current detection around the suboptimal location (block194), create a new detect58 (block214) at the suboptimal location and stop the search. Otherwise, find the closest detect58 to the suboptimal location and call it Detection B (block196). If Detection B is the same detection as Detection A in step 2 (block198), update Detection A's position with the suboptimal location (block216) and exit the search. Otherwise, proceed to the next step.

Fourth, referring toFIG. 8, if Detection B is not already paired with a track76 (block220), proceed to the next step. Otherwise, call the paired track of the Detection B as Track B and perform one of the following operations in the given order before exiting the search:

1) When the suboptimal location for Track A and Detection B are from the same parent component (e.g. in the support of the same first pass component) and the distance between Track A and Detection B is less than half of the specified maximum person width, create a new detect58 at the suboptimal location (block238) if all of the following three conditions are met: (i) the difference between the maximum heights at the suboptimal location and Detection B is less than a specified person height error range; (ii) the difference between the height sums at the two locations is less than half of the greater one; (iii) the distance between them is greater than the specified detection search range and the trk_range values of both Track A and Track B are greater than the specified noisy track offset. Otherwise, ignore the suboptimal location and exit;

2) If the distance between the suboptimal location and Detection B is greater than the specified detection search range, create a new detect58 at the suboptimal location and exit;

3) If Track A is not sizable in both temporal and spatial domains (block226), ignore the suboptimal location;

4) If Track B is not sizable in both temporal and spatial domain (block228), detach Track B from Detection B and update Detection B's position with the suboptimal location (block246). Mark Detection B as Track A's closest detection;

5) Look for best spot for Track B around its end position (block230). If the distance between the best spot for Track B and the suboptimal location is less than the specified detection search range (block232) and the best spot for Track B has a larger height sum, replace the suboptimal location with the best spot for Track B (block233). If the distance between is larger than the specified detection search range, create a detect58 at the best spot for Track B (block250). Update Detection A's location with the suboptimal location if Detection A exists.

Fifth, if the suboptimal location and Detection B are not in the support of the same first pass component, proceed to the next step. Otherwise create a new detection at the suboptimal location if their distance is larger than half of the specified maximum person width, or ignore the suboptimal location and mark Detection B as Track A's closest detection otherwise.

Finally, create a new detect58 at suboptimal location and mark Detection B as Track A's closest detection (block252) if their distance is larger than the specified detection search range. Otherwise, update Track A's end position with the suboptimal location (block254) if the height sum at the suboptimal location is greater than the height sum at Detection B, or mark Detection Bas Track A's closest detection otherwise.

2.2.8 Second Pass Matching Between Tracks and Detects

After the previous phase, a few new detections may be added and some paired detects72 and tracks76 become unpaired again. This phase looks for the optimal match between current unpaired detects72 and tracks76 as in the following steps.

For every pair of track76 and detect58 that remain unpaired, put the pair into a candidate list if all of the following five conditions are met:

1) the track76 is active (block262 inFIG. 9);

2) the distance from detect58 to the end point of the track76 (block274) is smaller than two thirds of the specified detection search range (block278) when the track doesn't move too far (e.g. the span of the track76 is less than the minimal head size and the track76 has more than three points (block276));

3) if the detect58 is in the background (block280), the maximum height of the detect58 must be larger than or equal to the specified minimum person height (block282);

4) the difference between the maximum height and the height of the last point of the track76 is less than the specified maximum height difference (block284);

5) the distance from the detect58 to the track76 must be smaller than the specified background detection search range, if either the last point of the track76 or the detect58 is in background (block286), or the detect58 is close to dead zones or height map boundaries (block288); or if not, the distance from the detect58 to the track76 must be smaller than the specified detection search range (block292).

Sort the candidate list in terms of the distance from the detect58 to the track76 or the height difference between the two (if distance is the same) in ascending order (block264).

The sorted list contains pairs of detects58 and tracks76 which are not paired at all at the beginning. Then run through the whole sorted list from the beginning and check each pair. If either the detect58 or the track76 of the pair is marked “paired” already, ignore the pair. Otherwise, mark the detect58 and the track76 of the pair as “paired” (block270).

2.2.9 Track Update or Creation

After the second pass of matching, the following steps are performed to update old tracks or to create new tracks:

First, referring toFIG. 10, for each paired set of track76 and detect58 the track76 is updated with the information of the detect58 (block300,302).

Second, create a new track80 for every detect58 that is not matched to the track76 if the maximum height of the detect58 is greater than the specified minimum person height, and the distance between the detect58 and the closest track76 of the detect58 is greater than the specified detection search range (block306,308). When the distance is less than the specified detection merge range and the detect58 and the closest track76 are in the support of the same first pass component (i.e., the detect58 and the track76 come from the same first pass component), set the trk_IastCollidingTrack of the closest track76 to the ID of the newly created track80 if there is one (block310,320).

Third, mark each unpaired track77 as inactive (block324). If that track77 has a marked closest detect and the detect58 has a paired track76, set the trk_IastCollidingTrack property of the current track77 to the track ID of the paired track76 (block330).

Fourth, for each active track88, search for the closest track89 moving in directions that are at most thirty degrees from the direction of the active track88. If the closest track89 exists, the track88 is considered as closely followed by another track, and “Shopping Cart Test” related properties of the track88 are updated to prepare for “Shopping Cart Test” when the track88 is going to be deleted later (block334).

Finally, for each active track88, search for the closest track89. If the distance between the two is less than the specified maximum person width and either the track88 has a marked closest detect or its height is less than the specified minimum person height, the track88 is considered as a less reliable false track. Update “False Track” related properties to prepare for the “False Track” test later when the track88 is going to be deleted later (block338).

As a result, all of the existing tracks74 are either extended or marked as inactive, and new tracks80 are created.

2.2.10 Track Analysis

Track analysis is applied whenever the track76 is going to be deleted. The track76 will be deleted when it is not paired with any detect for a specified time period. This could happen when a human object moves out of the field view44, or when the track76 is disrupted due to poor disparity map reconstruction conditions such as very low contrast between the human object and the background.

The goal of track analysis is to find those tracks that are likely continuations of some soon-to-be deleted tracks, and merge them. Track analysis starts from the oldest track and may be applied recursively on newly merged tracks until no tracks can be further merged. In the following description, the track that is going to be deleted is called a seed track, while other tracks are referred to as current tracks. The steps of track analysis are as followings:

First, if the seed track was noisy when it was active (block130 inFIG. 6), or its trkrange is less than a specified merging track span (block132), or its trk_IastCollidingTrack does not contain a valid track ID and it was created in less than a specified merging track time period before (block134), stop and exit the track analysis process.

Second, examine each active track that was created before the specified merging track time period and merge an active track with the seed track if the “Is the Same Track” predicate operation on the active track (block140) returns true.

Third, if the current track satisfies all of the following three initial testing conditions, proceed to the next step. Otherwise, if there exists a best fit track (definition and search criteria for the best fit track will be described in forthcoming steps), merge the best fit track with the seed track (block172,176). If there is no best fit track, keep the seed track if the seed track has been merged with at least one track in this operation (block178), or delete the seed track (block182) otherwise. Then, exit the track analysis.

The initial testing conditions used in this step are: (1) the current track is not marked for deletion and is active long enough (e.g. more than three frames) (block142); (2) the current track is continuous with the seed track (e.g. it is created within a specified maximum track timeout of the end point of the seed track) (block144); (3) if both tracks are short in space (e.g., the trkranges properties of both tracks are less than the noisy track length threshold), then both tracks should move in the same direction according to the relative offset of the trk_start and trk_end properties of each track (block146).

Fourth, merge the seed track and the current track (block152). Return to the last step if the current track has collided with the seed track (i.e., the trk_IastCollidingTrack of the current track is the trk_ID of the seed track). Otherwise, proceed to the next step.

Fifth, proceed to the next step if the following two conditions are met at the same time, otherwise return to step 3: (1) if either track is at the boundaries according to the “is at the boundary” checking (block148), both tracks should move in the same direction; and (2) at least one track is not noisy at the time of merging (block150). The noisy condition is determined by the “is noisy” predicate operator.

Sixth, one of two thresholds coming up is used in distance checking. A first threshold (block162) is specified for normal and clean tracks, and a second threshold is specified for noisy tracks or tracks in the background. The second threshold (block164) is used if either the seed track or the current track is unreliable (e.g. at the boundaries, or either track is noisy, or trkranges of both tracks are less than the specified noisy track length threshold and at least one track is in the background) (block160), otherwise the first threshold is used. If the shortest distance between the two tracks during their overlapping time is less than the threshold (block166), mark the current track as the best fit track for the seed track (block172) and if the seed track does not have best fit track yet or the current track is closer to the seed track than the existing best fit track (block170). Go to step 3.

2.2.11 Merging of Tracks

This operation merges two tracks into one track and assigns the merged track with properties derived from the two tracks. Most properties of the merged track are the sum of the corresponding properties of the two tracks but with the following exceptions:

Referring toFIG. 11, trk_enters and trk_exits properties of the merged track are the sum of the corresponding properties of the tracks plus the counts caused by zone crossing from the end point ozone track to the start point of another track, which compensates the missing zone crossing in the time gap between the two tracks (block350).

If a point in time has multiple positions after the merge, the final position is the average (block352).

The trk_start property of the merged track has the same trk_start value as the newer track among the two tracks being merged, and the trk_end property of the merged track has the same trk_end value as the older track among the two (block354).

The buffered raw heights and raw positions of the merged track are the buffered raw heights and raw positions of the older track among the two tracks being merged (block356).

As shown inFIG. 13, an alternative embodiment of the present invention may be employed and may comprise a system210 having animage capturing device220, areader device225 and a counting system230. In the illustrated embodiment, the at least oneimage capturing device220 may be mounted above an entrance or entrances221 to a facility223 for capturing images from the entrance or entrances221. The area captured by theimage capturing device220 is field ofview244. Each image captured by theimage capturing device220, along with the time when the image is captured, is a frame248. As described above with respect to image capturingdevice20 for the previous embodiment of the present invention, theimage capturing device220 may be video based. The manner in which object data is captured is not meant to be limiting so long as theimage capturing device220 has the ability to track objects in time across a field ofview244. The object data261 may include many different types of information, but for purposes of this embodiment of the present invention, it includes information indicative of a starting frame, an ending frame, and direction.

For exemplary purposes, theimage capturing device220 may include at least one stereo camera with two or more video sensors246 (similar to the image capturing device shown inFIG. 2), which allows the camera to simulate human binocular vision. A pair of stereo images comprises frames248 taken by each video sensor246 of the camera. Theimage capturing device220 converts light images to digital signals through which thedevice220 obtains digital raw frames248 comprising pixels. The types ofimage capturing devices220 and video sensors246 should not be considered limiting, and anyimage capturing device220 and video sensor246 compatible with the present system may be adopted.

For capturing tag data226 associated with RFID tags, such as name tags that may be worn by an employee or product tags that could be attached to pallets of products, thereader device225 may employ active RFID tags227 that transmit their tag information at a fixed time interval. The time interval for the present invention will typically be between 1 and 10 times per second, but it should be obvious that other time intervals may be used as well. In addition, the techniques for transmitting and receiving RFID signals are well known by those with skill in the art, and various methods may be employed in the present invention without departing from the teachings herein. An active RFID tag is one that is self-powered, i.e., not powered by the RF energy being transmitted by the reader. To ensure that all RFID tags227 are captured, thereader device225 may run continuously and independently of the other devices and systems that form the system210. It should be evident that thereader device225 may be replaced by a device that uses other types of RFID tags or similar technology to identify objects, such as passive RFID, ultrasonic, or infrared technology. It is significant, however, that thereader device225 has the ability to detect RFID tags, or other similar devices, in time across a field ofview228 for thereader device225. The area captured by thereader device225 is the field ofview228 and it is preferred that the field ofview228 for thereader device225 be entirely within the field ofview244 for theimage capturing device220.

The counting system230 processes digital raw frames248, detects and follows objects258, and generates tracks associated with objects258 in a similar manner as thecounting system30 described above. The counting system230 may be electronically or wirelessly connected to at least oneimage capturing device220 and at least onereader device225 via a local area or wide area network. Although the counting system230 in the present invention is located remotely as part of a central server, it should be evident to those with skill in the art that all or part of the counting system230 may be (i) formed as part of theimage capturing device220 or thereader device225, (ii) stored on a “cloud computing” network, or (iii) stored remotely from theimage capturing device220 andreader device225 by employing other distributed processing techniques. In addition, theRFID reader225, theimage capturing device220, and the counting system230 may all be integrated in a single device. This unitary device may be installed anywhere above the entrance or entrances to a facility223. It should be understood, however, that the hardware and methodology that is used for detecting and tracking objects is not limited with respect to this embodiment of the present invention. Rather, it is only important that objects are detected and tracked and the data associated with objects258 and tracks is used in combination with tag data226 from thereader device225 to separately count and track anonymous objects320 and defined objects322, which are associated with an RFID tag227.

To transmit tag data226 from thereader device225 to a counting system230, thereader device225 may be connected directly to the counting system230 or thereader device225 may be connected remotely via a wireless or wired communications network, as are generally known in the industry. It is also possible that thereader device225 may send tag data to the image capturing device, which in turn transmits the tag data226 to the counting system230. The tag data226 may be comprised of various information, but for purposes of the present invention, the tag data226 includes identifier information, signal strength information and battery strength information.

To allow the counting system230 to process traffic data260, tag data226 and object data261 may be pulled from thereader device225 and theimage capturing device220 and transmitted to the counting system230. It is also possible for thereader device225 and theimage capturing device220 to push the tag data226 and object data261, respectively, to the counting system230. It should be obvious that the traffic data260, which consists of both tag data226 and object data261, may also be transmitted to the counting system via other means without departing from the teachings of this invention. The traffic data260 may be sent as a combination of both tag data226 and object data261 and the traffic data260 may be organized based on time.

The counting system230 separates the traffic data260 into tag data226 and object data261. To further process the traffic data260, the counting system230 includes alistener module310 that converts the tag data226 into sequence records312 and the object data261 into track records314. Moreover, the counting system230 creates asequence array352 comprised of all of the sequence records312 and atrack array354 comprised of all of the track records314. Each sequence record312 may consist of (1) a tag ID312a, which may be an unsigned integer associated with a physical RFID tag227 located within the field ofview228 of areader device220; (2) a startTime312b, which may consist of information indicative of a time when the RFID tag227 was first detected within the field ofview228; (3) an endTime, which may consist of information indicative of a time when the RFID tag227 was last detected within the field ofview228 of thereader device220; and (4) an array of references to all tracks that overlap a particular sequence record312. Each track record314 may include (a) a counter, which may be a unique ID representative of animage capturing device220 associated with the respective track; (b) a direction, which may consist of information that is representative of the direction of movement for the respective track; (c) startTime, which may consist of information indicative of a time when the object of interest was first detected within the field ofview244 of theimage capturing device220; (d) endTime, which may consist of information indicative of a time when the object of interest left the field ofview244 of theimage capturing device220; and (e) tagID, which (if non-zero) may include an unsigned integer identifying a tag227 associated with this track record314.

To separate and track anonymous objects320, such as shoppers or customers, and defined objects322, such as employees and products, thecounting system220 for the system must determine which track records314 and sequence records312 match one another and then thecounting system220 may subtract the matching track records312 from consideration, which means that the remaining (unmatched) track records314 relate to anonymous objects320 and the track records312 that match sequence records314 relate to defined objects322.

To match track records314 and sequence records312, thecounting system220 first determines which track records314 overlap with particular sequence records312. Then thecounting system220 creates an array comprised of track records312 and sequence records314 that overlap, which is known as a match record316. In the final step, thecounting system220 iterates over the records312,314 in the match record316 and determines which sequence records312 and track records314 best match one another. Based on the best match determination, the respective matching track record314 and sequence record312 may be removed from the match record316 and the counting system will then iteratively move to the next sequence record312 to find the best match for that sequence record312 until all of the sequence records312 and track records314 in the match record316 have matches, or it is determined that no match exists.

The steps for determining which sequence records312 and track records314 overlap are shown inFIG. 15. To determine which records312,314 overlap, thecounting system220 iterates over each sequence record312 in thesequence array352 to find which track records overlap with a particular sequence records312; the term “overlap” generally refers to track records314 that have startTimes that are within a window defined by the startTime and endTime of a particular sequence records312. Therefore, for each sequence record312, the counting system230 also iterates over each track record314 in thetrack array354 and adds a reference to the respective sequence record312 indicative of each track record314 that overlaps that sequence record314. Initially, the sequence records have null values for overlapping track records314 and the track records have tagID fields set to zero, but these values are updated as overlapping records312,314 are found. The iteration over the track array254 stops when a track record314 is reached that has a startTime for the track record314 that exceeds the endTime of the sequence record312 at issue.

To create an array of “overlapped” records312,314 known as match records316, the counting system230 iterates over thesequence array352 and for each sequence record312a, the counting system230 compares the track records314athat overlap with that sequence record312ato the track records314bthat overlap with the next sequence record312bin thesequence array352. As shown inFIG. 16, a match record316 is then created for each group of sequence records312 whose track records314 overlap. Each match record316 is an array of references to all sequence records312 whose associated track records314 overlap with each other and the sequence records312 are arranged in earliest-to-latest startTime order.

The final step in matching sequence records312 and track records314 includes the step of determining which sequence records312 and track records314 are the best match. To optimally match records312,314, the counting system230 must consider direction history on a per tag227 basis, i.e., by mapping between the tagID and the next expected match direction. The initial history at the start of a day (or work shift) is configurable to either “in” or “out”, which corresponds to employees initially putting on their badges or name tags outside or inside the monitored area.

To optimally match records312,314, a two level map data structure, referred to as a scoreboard360, may be built. The scoreboard360 has a top level or sequencemap362 and a bottom level or trackmap364. Each level362,364 haskeys370,372 andvalues374,376. Thekeys370 for the top level362 are references to thesequence array352 and thevalues374 are the maps for the bottom level364. The keys for the bottom level364 are references to thetrack array354 and the values376 are match quality scores380. As exemplified inFIG. 17, the match quality scores are determined by using the following algorithm.

1) Determine if the expected direction for the sequence record is the same as the expected direction for the track record. If they are the same, the MULTIPLIER is set to 10. Otherwise, the MULTIPLIER is set to 1.

2) Calculate the percent of overlap between the sequence record312 and the track record314 as an integer between 0 and 100 by using the formula:

OVERLAP=(earliest endTime−latest startTime)/(latest endTime−earliest startTime)

If OVERLAP is <0, then set the OVERLAP to 0.

3) Calculate the match quality score by using the following formula:

SCORE=OVERLAP×MULTIPLIER

The counting system230 populates the scoreboard360 by iterating over the sequence records312 that populate thesequence array352 referenced by the top level372 and for each of the sequence records312, the counting system230 also iterates over the track records314 that populate thetrack array354 referenced by thebottom level374 and generates match quality scores380 for each of the track records314. As exemplified inFIG. 18A, once match quality scores380 are generated and inserted as values376 in the bottom level364, each match quality score380 for each track record314 is compared to a bestScore value and if the match quality score380 is greater than the bestScore value, the bestScore value is updated to reflect the higher match quality score380. The bestTrack reference is also updated to reflect the track record314 associated with the higher bestScore value.

As shown inFIG. 18B, once the bestTrack for the first sequence in the match record is determined, the counting system230 iterates over thekeys370 for the top level372 to determine the bestSequence, which reflects the sequence record312 that holds the best match for the bestTrack, i.e., the sequence record/track record combination with the highest match quality score380. The bestScore and bestSequence values are updated to reflect this determination. When the bestTrack and bestSequence values have been generated, the sequence record312 associated with the bestSequence is deleted from the scoreboard360 and the bestTrack value is set to 0 in all remaining keys372 for the bottom level364. The counting system230 continues to evaluate the remaining sequence records312 and track records314 that make up the top and bottom levels362,364 of the scoreboard360 until all sequence records312 and track records314 that populate the match record316 have been matched and removed from the scoreboard360, or until all remaining sequence records312 have match quality scores380 that are less than or equal to 0, i.e., no matches remain to be found. As shown in Table 1, the information related to the matching sequence records312 and track records314 may be used to prepare reports that allow employers to track, among other things, (i) how many times an employee enters or exits an access point; (ii) how many times an employee enters or exits an access point with a customer or anonymous object320; (iii) the length of time that an employee or defined object322 spends outside; and (iv) how many times a customer enters or exits an access point. This information may also be used to determine conversion rates and other “What If” metrics that relate to the amount of interaction employees have with customers. For example, as shown in Table 2, the system210 defined herein may allow employers to calculate, among other things: (a) fitting room capture rates; (b) entrance conversion rates; (c) employee to fitting room traffic ratios; and (d) the average dollar spent. These metrics may also be extrapolated to forecast percentage sales changes that may result from increases to the fitting room capture rate, as shown in Table 3.

In some cases, there may be more than one counter222, which consists of the combination of both theimage capturing device220 and thereader device225, to cover multiple access points. In this case,separate sequence arrays352 and trackarrays354 will be generated for each of the counters222. In addition, a match array318 may be generated and may comprise each of the match records316 associated with each of the counters222. In order to make optimal matches, tag history must be shared between all counters222. This may be handled by merging, in a time-sorted order, all of the match records in the match array318 and by using a single history map structure, which is generally understood by those with skill in the art. When matches are made within the match array318, the match is reflected in thetrack array354 associated with a specific counter222 using thesequence array352 associated with the same counter222. This may be achieved in part by using a counter ID field as part of the track records314 that make up thetrack array354 referenced by the bottom level364 of the scoreboard360. For example, references to thetrack arrays354 may be added to atotal track array356 and indexed by counter ID. Thesequence arrays352 would be handled the same way.

The invention is not limited by the embodiments disclosed herein and it will be appreciated that numerous modifications and embodiments may be devised by those skilled in the art. Therefore, it is intended that the following claims cover all such embodiments and modifications that fall within the true spirit and scope of the present invention.

REFERENCES

[1] C. Wren, A. Azarbayejani, T. Darrel and A. Pentland. Pfinder: Real-time tracking of the human body.In IEEE Transactions on Pattern Analysis and Machine Intelligence, July 1997, Vol 19, No. 7, Page 780-785.
[2] 1. Haritaoglu, D. Harwood and L. Davis. W4: Who? When? Where? What? A real time system for detecting and tracking people.Proceedings of the Third IEEE International Conference on Automatic Face and Gesture Recognition, Nara, Japan, April 1998.
[3] M. Isard and A. Blake, Contour tracking by stochastic propagation of conditional density.Proc ECCV1996.
[4] P. Remagnino, P. Brand and R. Mohr, Correlation techniques in adaptive template matching with uncalibrated cameras.In Vision Geometry III, SPIE Proceedingsvol. 2356, Boston, Mass., 2-3 Nov. 1994
[5] C. Eveland, K. Konolige, R. C. Bolles, Background modeling for segmentation of video-rate stereo sequence.In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, page 226, 1998.
[6] J. Krumm and S. Harris, System and process for identifying and locating people or objects in scene by selectively slustering three-dimensional region. U.S. Pat. No. 6,771,818 BI, August 2004.
[7] T. Darrel, G. Gordon, M. Harville and J. Woodfill, Integrated person tracking using stereo, color, and pattern detection. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, page 601609, Santa Barbara, June 1998.