US10607105B1

Movatterモバイル変換

Info

Publication number: US10607105B1
Application number: US16/366,191
Authority: US
Inventors: Jeremie A. Papon; Kyle G. Freeman
Original assignee: Disney Enterprises Inc
Current assignee: Disney Enterprises Inc
Priority date: 2019-03-27
Filing date: 2019-03-27
Publication date: 2020-03-31
Anticipated expiration: 2039-03-27
Also published as: US10796195B1; US20200311462A1

Abstract

Embodiments provide for perceptual data association received from at least a first and a second sensor disposed at different positions in an environment, in respective time series of local scene graphs that identify several characteristics of at least one object in the environment that are updated different rates; merging, at an output rate, characteristics for each given object from the several time series of local scene graphs that are updated at the output rate; merging, at the output rate, characteristics for each given object from the time series of local scene graphs that are updated at rates other than the output rate; and outputting, at the output rate, a time series of global scene graphs including the merged characteristics.

Description

BACKGROUND

Amalgamated image processing systems may combine the data from several camera sensors (among other sensors) into a coherent representation of an environment. Each of the camera sensors may produce data at different rates, from different perspectives, using different algorithms or models, etc., that a central processor interprets to build the coherent representation of the environment to make sense of the disparate data received.

SUMMARY

The present disclosure provides, in one embodiment, a method for perceptual data association, comprising: receiving, from a first sensor disposed at a first position in an environment, a first time series of local scene graphs comprising a first characteristic of an object in the environment that is updated at a first rate and a second characteristic of the object that is updated at a second rate different from the first rate; receiving, from a second sensor disposed at a second position in the environment, a second time series of local scene graphs each comprising the first characteristic of the object that is updated at the first rate and the second characteristic of the object that is updated at a third rate different from the first rate and the second rate; merging the first time series of local scene graphs with the second time series of local scene graphs according to the first rate to determine a global first characteristic for the object at the first rate; merging the first time series of local scene graphs with the second time series of local scene graphs according to the second rate and the third rate to determine a global second characteristic for the object at the first rate; and outputting a time series of global scene graphs including the global first characteristic and the global second characteristic at the first rate.

The present disclosure provides, in one embodiment, a system, comprising: a processor; and a memory storage device, including instructions that when executed by the processor enable the processor to: receive, from a first sensor disposed at a first position in an environment, a first time series of local scene graphs comprising a first characteristic of an object in the environment that is updated at a first rate and a second characteristic of the object that is updated at a second rate different from the first rate; receive, from a second sensor disposed at a second position in the environment, a second time series of local scene graphs each comprising the first characteristic of the object that is updated at the first rate and the second characteristic of the object that is updated at a third rate different from the first rate and the second rate; merge the first time series of local scene graphs with the second time series of local scene graphs according to the first rate to determine a global first characteristic for the object at the first rate; merge the first time series of local scene graphs with the second time series of local scene graphs according to the second rate and the third rate to determine a global second characteristic for the object at the first rate; and output a time series of global scene graphs including the global first characteristic and the global second characteristic at the first rate.

The present disclosure provides, in one embodiment, a non-transitory computer-readable medium containing computer program code that, when executed by operation of one or more computer processors, performs an operation for perceptual data association comprising: receiving, from a first sensor disposed at a first position in an environment, a first time series of local scene graphs comprising a first characteristic of an object in the environment that is updated at a first rate and a second characteristic of the object that is updated at a second rate different from the first rate; receiving, from a second sensor disposed at a second position in the environment, a second time series of local scene graphs each comprising the first characteristic of the object that is updated at the first rate and the second characteristic of the object that is updated at a third rate different from the first rate and the second rate; merging the first time series of local scene graphs with the second time series of local scene graphs according to the first rate to determine a global first characteristic for the object at the first rate; merging the first time series of local scene graphs with the second time series of local scene graphs according to the second rate and the third rate to determine a global second characteristic for the object at the first rate; and outputting a time series of global scene graphs including the global first characteristic and the global second characteristic at the first rate.

BRIEF DESCRIPTION OF THE DRAWINGS

So that the manner in which the above recited aspects are attained and can be understood in detail, a more particular description of embodiments described herein, briefly summarized above, may be had by reference to the appended drawings.

It is to be noted, however, that the appended drawings illustrate typical embodiments and are therefore not to be considered limiting; other equally effective embodiments are contemplated.

FIG. 1A-1C illustrate a physical environment at various times including various objects, according to aspects of the present disclosure.

FIGS. 2A-2C illustrate scene graphs, according to aspects of the present disclosure.

FIG. 3 illustrates a computing device, according to aspects of the present disclosure.

FIG. 4 is a flowchart of a method for a sensor to analyze an environment, according to aspects of the present disclosure.

FIG. 5 is a flowchart of a method for perceptual data association using an arbitrary number of inputs collected from different positions and at different times, according to aspects of the present disclosure.

FIG. 6 is a flowchart of a method in which several sensors provide perceptual data to a collector that reconciles the data, according to aspects of the present disclosure.

DETAILED DESCRIPTION

The present disclosure provides for perceptual data association to address and standardize the processing of data that are received from various sources at various rates. In a distributed system monitoring a scene in a real-world environment, several sensors in the environment may analyze the scene using various algorithms/models that return results at different times. Several algorithms/models running on one sensor may use the same visual data inputs to produce different outputs, and those algorithms/models may run at the same time using visual data inputs taken at the same time from a different perspective on a second sensor. For example, a first algorithm/model and a second algorithm/model on a first sensor may use an image of the environment taken at time t₀and return results at times t₁and t₂respectively. A second sensor may also run the first algorithm/model and the second algorithm/model, but use an image of the environment captured from a different perspective and/or at a different time. When a central collector receives the data from the various sensors, the central collector translates and reconciles the timing and spacing of the determinations made by the various sensors to form a coherent image of the environment. By collecting the data as a time series from each of the sensors, a collector is able to adjust and use the most recently received data to hide any discrepancies in reporting rates to provide an output at a single steady rate to any downstream processes.

The present disclosure provides improvements to visual processing by reducing the computational resources needed to translate and reconcile data received from disparate sensors, providing the ability to hide any delays in algorithms/models that take longer to process from downstream applications that use the output of those algorithms/models as inputs, and extending the modularity of the plurality of sensors and algorithms/models used by a central collector, among other benefits.

FIGS. 1A-1C illustrate aphysical environment100 at various times including various objects (including inanimate and animate objects, such as persons).FIG. 1A illustrates thephysical environment100 at an initial time t₀,FIG. 1B illustrates thephysical environment100 at a time t₁, andFIG. 1C illustrates thephysical environment100 at a time t₂. Various times are referenced herein, using a designation of t_x, in which a lower value for the subscript x indicates that a time occurred earlier relative to higher values for the subscript x in a sequence of times. The amount of time passing between two adjacently indicated times t_xmay vary or be constant in different embodiments, and the amount of time between adjacently indicated times t_xin the sequence may be constant or variable over the course of the sequence.

InFIGS. 1A-1C, several objects110 compose a scene in theenvironment100 that is monitored by several sensors120 that provide analysis of the scene to acollector130. For purposes of explanation, the present disclosure discusses afirst sensor120aand asecond sensor120b, but three or more sensors120 may be used in other embodiments without departing from the spirit of the present disclosure. Similarly, although six objects110a-fof various types are illustrated inFIGS. 1A-1C, the sensors120 in other embodiments may observe and analyze anenvironment100 including a different number of objects110 and of different types than those discussed in the examples provided herein.

InFIG. 1A, a first object is illustrated as afirst person110astanding next to a second object which is asecond person110bwho is wearing a third object which is acrown110c. In various embodiments, the sensors120 include image processing software that identifies specific types of objects110 in the environment, such as, for example, persons, identity badges, vehicles, goods, packages, etc., or objects110 whose locations change over time. Other objects110 in the scene include a first wall (fourth object110d), a second wall (fifth object110e), additional walls (not illustrated), the floor (sixth object100f), and the ceiling (not illustrated), which may be observed by one or more of the sensors120, but may be ignored or not reported out by the image processing software. For example, the sensors120 may identify various objects110 as components of theenvironment100 or otherwise static, and report the position of these objects110 in an initial setup phase, to provide thecollector130 with information about theenvironment100, and then omit the position and other observations of these objects110 in future reports. In some embodiments, the sensors120 may be identified as objects110 or components of other objects110 by other sensors120. For example, afirst sensor120amay identify thesecond sensor120bas a seventh object110gor component of a seventh object110g(e.g., a camera stand that includes or supports a sensor120).

The sensors120 identify various attributes related to the objects110, such as, for example, a position or coordinates of an object110 in theenvironment100, an identity of the object, whether two or more objects110 are associated with one another, a feature layout of the object110 (e.g., limb/joint positions of a person), a focus of the object110 (e.g., gaze direction of a person, direction of travel of a vehicle), a state of the object110 (e.g., emotional affect of a person, power status of a device), etc. These data, including the images of the objects110 and the determined attributes and characteristics of those objects110, are collectively referred to herein as perceptual data.

Each sensor120 independently analyzes theenvironment100 based on the perspective and other data available to that individual sensor120, and forwards the analysis of the scene to thecollector130 for detailed analysis from several perspectives. For example, thefirst sensor120aand thesecond sensor120beach make the determination of whether thecrown110cis associated with thesecond person110bindependently and, using the determinations from the individual sensors120, thecollector130 determines whether to treat thecrown110cas associated with thesecond person110b, thefirst person110a, or no one. In some aspects, one sensor120 (e.g., as master sensor120) acts as thecollector130 and analyzes theenvironment100.

Because the sensors120 do not have perfect information from theenvironment100, and some of the analyses of theenvironment100 may take longer to calculate and propagate than other analyses, thecollector130 receives the various analyses from the various sensors120 asynchronously, and determines how to represent or render a virtual-environment based on the real-world environment100 using the most reliable and up-to-date data from the various sensors120. For example, as thesecond person110btransfers thecrown110cto thefirst person110ainFIG. 1B at time t₁, and as the first person accepts the crown inFIG. 1C at time t₂, the individual sensors120 may provide conflicting information about the persons and thecrown110cat different times. To resolve this conflict, thecollector130 chooses one of the information sets to use or amalgamates the data sets to determine a single state of theenvironment100 and the objects110 therein.

To aid thecollector130 in amalgamating the data, the sensors120 organize the determined attributes of theenvironment100 and objects110 intoscene graphs200, such as those illustrated inFIGS. 2A-2C. Ascene graph200 from the perspective of an individual sensor120 may be referred to as alocal scene graph200, whereas a scene graph from the perspective of several sensors120 (as is created by a collector130) may be referred to as aglobal scene graph200. Although not illustrated, thescene graphs200 may include or be associated with a timestamp or be organized into a time series ofseveral scene graphs200. Additionally, although a given number of attributes are discussed as being included in theexample scene graphs200 discussed herein, ascene graph200 may include more or fewer attribute values in various embodiments.

FIG. 2A illustrates afirst scene graph200aat time t₂from the perspective of thefirst sensor120ashown inFIGS. 1A-1C, according to various embodiments of the present disclosure. Thefirst scene graph200aincludes data identifying the objects110 recognized by thefirst sensor120aand the attributes of those objects110. The attributes of an object110 are determined using various algorithms/models that return a result after various amounts of processing. For example, thefirst scene graph200aincludes anidentity attribute210 for thefirst object110a, for thesecond object110b, and thethird object110c, which are determined by a first algorithm/model. Thefirst scene graph200aalso includes aposition attribute220 for the objects110, identifying where the first sensor120 has determined the objects110 to be located in theenvironment100. In the present example, the output of the first algorithm/model (i.e., the identity attribute210) and the second algorithm/model (i.e., the position attribute220) are based on an image of theenvironment100 captured by thefirst sensor120aat time t₂.

In various aspects, some algorithms/models take longer to produce results or are run less frequently than other algorithms/models, and thescene graph200 includes an attribute value related to a state of the environment that occurred at an earlier time. For example, thefirst scene graph200aincludes apose attribute230, which identifies how an object110 identified as a person is posed (e.g., limb/joint positions) via a third algorithm/model. The third algorithm/model may use the input from for thefirst object110aand thesecond object110b, but not thethird object110cdue to the identification of the first object and thesecond object110b. Additional algorithms/models may produce additional attributes, such as anassociation attribute240, amood attribute250, afocus attribute260, etc.

In some embodiments, an algorithm/model that produces attribute outputs at a rate slower than another algorithm/model may provide a last-produced output or a null output to match the faster rate of the first algorithm/model in thescene graph200. For example, consider a first algorithm/model that produces an output in every time division (e.g., at times t₀, t₁, t₂, t₃, etc.) and a second algorithm/model that produces an output every other time division (e.g., at times t₁, t₃, t₅, t₇, etc.). Ascene graph200 that reports the outputs of the first and second algorithms/models at every time division (i.e., at a shared rate with the first algorithm/model) may present a value for the second algorithm/model at the non-outputting times (e.g., at times t₀, t₂, t₄, t₆, etc.) as one or more of a “null” value in a placeholder for the attribute, an omission of the attribute from thescene graph200, or a last-known value for the attribute (e.g., the value for time t_x−1at time t_x).

FIG. 2B illustrates asecond scene graph200bat time t₂from the perspective of thesecond sensor120bshown inFIGS. 1A-1C, according to various embodiments of the present disclosure. Thesecond scene graph200bmay include the same set or a different set attributes than are included in thefirst scene graph200a, and those attributes that are included are determined based on the perspective of thesecond sensor120b. For example, thesecond scene graph200bdoes not include afocus attribute260 like thefirst scene graph200adoes. Additionally, because of the different perspectives of thefirst sensor120aand thesecond sensor120b, a different set of objects110 may be identified in thesecond scene graph200bfrom thefirst scene graph200a, and the same object110 may be given different identifiers by the different sensors120.

In various embodiments, thesecond sensor120bdetermines different values for the same attributes that the other sensors120 determine, and may determine those values at different rates. For example, thesecond sensor120bmay include more accurate or faster algorithms/models for determining apose attribute230, and return results faster than afirst sensor120a, e.g., returning a result at time t₂based on an image captured at time t₂rather than based on an image captured at time t₁.

In some embodiments, the attributes included in thescene graphs200 include the determined values for those attributes, but also include confidence scores in those values. In one example, aposition attribute220 may include values for coordinates along with a confidence score for a margin of error or measurement tolerance of the sensor120 (e.g., X±Y meters distant from the sensor120, A°±B° away from a centerline of the sensor120). In another example, anassociation attribute240, indicating whether a given object110 is associated with another object110, may include a confidence score for how likely the two objects110 are associated with one another (e.g., A is associated with B, X % confident). In a further example, amood attribute250 may include several mood identifiers (e.g., happy, sad, confused) and a several confidences in whether a person is affecting those moods (e.g., 90% certain happy, 10% certain sad, 50% certain confused). Twolocal scene graphs200 may indicate the same value for a given attribute, but different confidences in those values.

FIG. 2C illustrates athird scene graph200cat time t₂from the perspective of thecollector130 shown inFIGS. 1A-1C, according to various embodiments of the present disclosure. Thethird scene graph200cis aglobal scene graph200 that merges the data from thelocal scene graphs200 at a given time and provides a coherent dataset to downstream applications that hides any delays in algorithms/models that process at different rates. Theglobal scene graph200 incorporates the data received from thelocal scene graphs200, therefore thethird scene graph200cincludes anidentity attribute210, aposition attribute220, apose attribute230, anassociation attribute240, amood attribute250, and afocus attribute260.

In various embodiments, thecollector130 may reformat the data from thelocal scene graphs200 when creating theglobal scene graph200. For example, thethird scene graph200cincludes apositional attribute220 that indicates the positions of the objects110 in Cartesian coordinates, whereas thefirst scene graph200aand thesecond scene graph200bindicated polar coordinates.

In one example, each of the sensors120 providelocal scene graphs200 to thecollector130 that include up-to-date position attributes220. Thecollector130 merges the position attributes220 from the local scene graphs200 (and locations of the sensors120 in theenvironment100 in some embodiments) to triangulate or otherwise determine the position of an object110 in theenvironment100 for the most recent time. Stated differently, when thelocal scene graphs200 for time t_xinclude local values forvarious objects110aat t_x, thecollector130 can refine those local values into a global value for that object110 at time t_x, which thecollector130 then provides to downstream applications.

In situations in which not all of the sensors120 provide a local value for an attribute of an object110 at time t_xin alocal scene graph200 for time t_x, thecollector130 produces a global value for that object110 at t_xusing the available values at t_xand/or the most recent earlier-reported values for that attribute. For example, if at time t_x, neither thefirst scene graph200anor thesecond scene graph200binclude a value for an attribute determined based on time t_x, but the sensors120 provided values at times t_x−1and t_x−3, thecollector130 may produce a global value in thethird scene graph200cusing the values reported in the local scene graphs at time t_x−1. In another example, when thefirst scene graph200abut not thesecond scene graph200bfor time t_xincludes a value for an attribute determined based on time t_x, thecollector130 may produce a global value in thethird scene graph200cfor time t_xusing the values reported in thefirst scene graph200aand not thesecond scene graph200b. In a further example, when thefirst scene graph200abut not thesecond scene graph200bfor time t_xincludes a value for an attribute determined based on time t_x, thecollector130 may produce a global value for that attribute in thethird scene graph200cfor time t_xusing the values reported in thefirst scene graph200afor t_xand the most recent value for that attribute reported in thesecond scene graph200b. In cases in which thecollector130 uses data from prior times, thecollector130 may reduce a confidence score for the earlier determined values and/or report a result in theglobal scene graph200 with a lower confidence than if the values were determined from the most recentlocal scene graphs200.

FIG. 3 illustrates acomputing device300. Acomputing device300 includes aprocessor310, amemory320, and various hardware to produce and render a scene of theenvironment100. In various embodiments, thecomputing device300 may be implemented within a sensor120, acollector130, or a general computing device (e.g., a smart phone, a tablet computer, a laptop computer) that provides animage processing application321.

Theprocessor310 and thememory320 provide computing functionality to thecomputing device300. Thememory320 may be one or more memory devices, such as, for example, Random Access Memory (RAM), Read Only Memory (ROM), flash memory, or any other type of volatile or non-volatile storage medium that includes instructions that theprocessor310 may execute. Theprocessor310 may be any computer processor capable of performing the functions described herein, and may include a system clock or other timekeeping device used to determine when an image is captured or ascene graph200 is generated or transmitted.

Thememory320 generally includes program code for performing various functions related to image processing. The program code is generally described as various functional “applications” or “modules” within thememory320, although alternate implementations may have different functions and/or combinations of functions. Thememory320 also generally includes data structures that may store information for use by the various program code modules also stored thereon. Thememory320 includes program code for animage processing application321 and data structures forscene graphs200, although other applications and data structures may also be included by thememory320.

Theimage processing application321 is generally configured to provide functionality to observe objects110 in anenvironment100 and determine how those objects110 interact with one another and theenvironment100. In some embodiments, animage processing application321 running on a sensor120 identifies objects110 in theenvironment100 from the perspective of the sensor120 and outputs various analyses (which may be performed asynchronously from one another or synchronously with one another) of theenvironment100. In some embodiments, several sensors120 (each asynchronously running various analyses of the environment100) asynchronously provide animage processing application321 on acollector130 with correspondinglocal scene graphs200, and thecollector130 produces aglobal scene graph200 from the several inputs to render a view of the environment100 (e.g., generate visualizations of one or more virtual objects for display or inclusion in ascene graph200 based on a set point of view) or track the objects110 in theenvironment100.

Scene graphs

200 include coordinate or positional data for various objects110 identified in theenvironment100. Although described herein in relation toscene graphs200, the present disclosure may operate with other data structures or records. Thescene graphs200 may include positional data for objects110 indicated via relative coordinates from a sensor120 or a common point of reference (e.g., X meters from the sensor120, X meters from another object110) and/or by absolute coordinates (e.g., at latitude X and longitude Y). In addition to coordinate/positional data, various objects110 may be classified and identified in thescene graphs200 according to various attributes that are determined based on analyses of theenvironment100. The attributes include an identity/classification (e.g., Person A versus Person B, child versus adult, person versus animal, dog versus cat, animate versus inanimate, wall versus floor), a feature layout (e.g., joint/limb alignment/positions in a person, laptop lid percent open/closed, ladder extended/contracted), a focus (e.g., gaze direction in a person, facing in an inanimate object, direction of travel in a vehicle), and a state (e.g., emotional states, powered states, in motion/at rest) of the object110. In a more general sense,scene graphs200 may include any type of data that that a particular sensor120 is configured to sense. For example, orientation (e.g., upright or tilted), motion (e.g., in motion or still), temperature (e.g., hot or cold), sound emissions (e.g., snarling or purring), texture (soft or stiff), physical/operational state (e.g., clean or dirty), color (e.g., red or green), smell (e.g., sweet or acrid) and the like. The choice of which parameters to sense and incorporate intoscene graphs200 is made to meet the needs of a particular application within the constraints of the processing resources in sensors120 andcollector130.

One ormore cameras330 are included in thecomputing devices300 used as sensors120 to provide a video feed of sequential images to identify objects110 in theenvironment100 from. Thecamera330 may be included in thecomputing device300 by wired or wireless communications with one or moreadditional cameras330 so that theimage processing application321 may usecameras330 providing several perspectives. Acamera330 may include one or more image cameras (to produce a two or three dimensional view of the environment) in the visible spectrum as well as non-visible spectrum in particular applications. Distances to objections in the scene may be derived from stereo images, or may be estimated using artificial intelligence techniques, or distances can be determined as well as using one or more range finders (not shown) to identify distances to various objects110 from the sensor120.

Thepower source340 provides electric power to the various components of thecomputing device300. Various examples ofpower sources340 include batteries (rechargeable and non-rechargeable), Alternating Current to Direct Current (AC/DC) converters, Direct Current to Alternating Current (DC/AC) converters, transformers, capacitors, inductors, and wiring to connect to anexternal power source340.

Thenetwork interface350 provides wireline and/or wireless communications for thecomputing device300. In various embodiments, thenetwork interface350 is a radio transmitter/receiver, which receives signals from external sources and transmits signals to external devices. Thenetwork interface350 may be in communication with various antennas and may configure messages to be transmitted or received according to various standards, such as, Bluetooth, Wi-Fi, or a proprietary standard.Several computing devices300 may be placed into communication with one another viarespective network interfaces350 to provide and collect several views of anenvironment100.

Additional Input/Output (I/O)devices390 may be included in various embodiments of acomputing device300. The additional I/O devices390 may include various lights, displays, and speakers (e.g. LEDs, IR transmitter/receivers, speaker, buttons, microphones, light sensors, etc.) for providing output from thecomputing device300. For example, a speaker is an I/O device390 that provides audio output (e.g., of an audio component of a video feed). In another example, a microphone is an I/O device390 that receives audio information to provide audio input to thecomuring device300. The additional I/O devices390 may include physical joysticks, physical steering wheels/yokes, physical buttons, physical switches, microphones, and a touch interface that designates various regions for use as virtual joysticks, buttons, switches, etc. A user may manipulate the various additional I/O devices390 to signal thecomputing device300 to turn on or shut down, alter a mode of operation, switch to a different application, change system settings (e.g., volume, brightness), etc.

FIG. 4 is a flowchart of amethod400 for a sensor120 to analyze anenvironment100.FIG. 5 is a flowchart of amethod500 for perceptual data association using an arbitrary number of inputs collected from different positions and at different times. Each ofmethod400 andmethod500 may be run in parallel ondifferent computing devices300, and at different rates.FIG. 6 is a flowchart of amethod600 in which several sensors120, each performingmethod400 independently, provide perceptual data to acollector130 performingmethod500. Each of the example times given inFIGS. 4-6 and the related discussions are specific to an individual Figure and related discussion, although the rates discussed inFIGS. 4-6 may be understood across the Figures and related discussions.

Method

400 begins withblock410, where the sensor120 captures one or more images of theenvironment100 at time t_x. At blocks420a-420n, the sensor120 respectively performs analyses A through n. In some embodiments, the sensor120 performs the analyses in parallel with one another, while in other embodiments an analysis may use the output of a different analysis as an input, and the analyses are performed in series. Analyses in blocks420 may be algorithmic or implemented by artificial intelligence/machine learning techniques, or both. Accordingly,method400 proceeds collectively from blocks420a-nto block430 in a given cycle for t_x, but individual blocks420a-nmay proceed to block430 at a later time than other individual blocks420a-n. Stated differently, the individual models a-n (implemented in software (e.g., as a machine learning classifier agent or an algorithm), hardware, firmware or hybrids thereof) used in blocks420a-nmay process at different rates, thus affecting what outputs are available as the most recent data for inclusion in the local scene graph200 (generally referred to herein as a model or models a-n). For example, block420aprocesses at a first rate R_a, whileblock420nprocesses at an nth rate Rn.

Atblock430, the sensor120 outputs the most recently determined values for the objects110 in theenvironment100 in thelocal scene graph200 for t_x. In some embodiments, the reporting rate R_Rat which the sensor120 outputs the analyses of theenvironment100 is specified by thecollector130. In some embodiments, the sensors120 output the analyses at a rate R_Rtied to the output rate of a given model used to analyze the environment100 (e.g., R_a, R_n), which may be the fastest-to-process model or another model specified by a user. For models that complete at a rate slower than the reporting rate, the sensor120 may supply the last known values for the attributes analyzed by those slower models, predict values using a predictive technique such as linear quadratic estimation (LQE), or leave the fields associated with those attributes in alocal scene graph200 empty or null (or omit such fields until new data are available).

Method

400 returns to block410 fromblock430 to capture additional images or other forms of sensed information from theenvironment100 for the next time t_x+1(cycling throughblock410 through430 at rate R_R) and process those images to determine new attributes for the objects110 therein. When the rate R_Rat which the sensor120 reports thelocal scene graph200 to thecollector130 is slower than or equal to the rate at which a given analysis is performed in blocks420, thelocal scene graph200 includes the results of that analysis in everyscene graph200. When the rate R_Rat which the sensor120 reports thelocal scene graph200 to thecollector130 is faster than the rate at which a given analysis is performed in blocks420, thelocal scene graph200 includes the results of that analysis intermittently (e.g., at a fraction of the reporting rate R_R) or includes the last-known value for that analysis at the reporting rate R_R.

Method

500 begins withblock510, where thecollector130 receives alocal scene graph200 from sensors120 in theenvironment100. The sensors120 provide the correspondinglocal scene graphs200 to thecollector130 asynchronously from one another, such that afirst sensor120amay provide a firstlocal scene graph200aevery x milliseconds, asecond sensor120bmay provide a secondlocal scene graph200bevery y milliseconds, and an nth sensor120nmay provide an nth local scene graph200nevery z milliseconds. Each of the sensors120 from which thecollector130 receiveslocal scene graphs200 may perform different individual analyses on the images captured by those sensors120, and reports the most recent analyses at a predefined rate.

Atblock520, thecollector130 merges thelocal scene graphs200 received from the plurality of sensors120 in theenvironment100 for a given time t_xto produce values for use in aglobal scene graph200 for that time t_x. When attributes are determined by the sensors120 at different rates (or the same rate, but at offset times), thecollector130 analyzes the receivedlocal scene graphs200 to select the most recently reported values for the attributes from eachlocal scene graph200 to merge into the value for theglobal scene graph200.

In various embodiments, to account for delays in the sensors120 reporting the value of an attribute, thecollector130 uses the most-recently-reported value of the attribute from a particular sensor120, but reduces the confidence score related to the accuracy of that value. For example, if a given sensor120 reports a value of X for an attribute with a confidence score of Y % at time t₀, and the given sensor120 does not report a value for that attribute at time t₁, thecollector130 may behave as though the given sensor120 reported a value of X for the attribute at time t₁, but with a confidence score of Z % (where Y>Z).

In various embodiments, thecollector130 merges the values by averaging values for the data included in the local scene graphs200 (which may be weighted by various confidence scores), choosing a value associated the highest confidence score among thelocal scene graphs200, polling thelocal scene graphs200 for a most frequent value (or most heavily weighted value) among thelocal scene graphs200, etc. Thecollector130 may use different merging techniques with different data types, such as, for example, selecting an identity type via a highest confidence score, confidence-weighted averages for coordinates, and polling for TRUE/FALSE attributes. Other merging techniques include data integration and data fusion.

In some embodiments, thecollector130 may merge the identities of two or more objects based on the characteristic values reported from multiple sensors120. For example, several sensors120 may independently identify a singular object110 in the respective local scene graphs as different objects110. Thecollector130, when merging the several local scene graphs into a global scene graph, uses matching characteristic data from the several local scene graphs to correlate the identities of the different objects110 as pertaining to one singular object110, and merges the characteristic data from the several local scene graphs when describing that singular object110 in the global scene graph. For example, the positional characteristics may be used to merge afirst object110aidentified by afirst sensor120aat location X in theenvironment100 with asecond object110bidentified by asecond sensor120bat location X in the environment into a singular object110 in the global scene graph. In various embodiments, thecollector130 may use attribute data beyond positional characteristics to determine if objects110 identified by separate sensors120 describe one object110 or multiple objects110. For example, several objects110 may be identified at location X (e.g., a cluster of persons) by afirst sensor120aand asecond sensor120b, and attributes related to heights, facial expressions, clothing/hair color, direction of gaze, etc. may be used to differentiate the several objects110 and group the characteristics from each sensor120 with the appropriate object110 from the cluster of objects110. Thecollector130 may iteratively process the datasets received from the sensors120 to determine which objects110 identified by the sensors120 describe a singular object110, and which identify separate objects110.

In various embodiments, thecollector130 filters the data received from the sensors120 when merging thelocal scene graphs200. For example, thecollector130 may ignore any values reported from the sensors120 that fall below a predefined confidence threshold as indicated from the sensors120. In another example, thecollector130 may cluster the data points received from the sensors120 and exclude outliers from consideration.

In some embodiments, thecollector130 filters the outputs created from the merged data from thelocal scene graphs200. For example, aglobal scene graph200 may omit a value for a merged attribute that is associated with a confidence score below a confidence threshold.

Atblock530, thecollector130 outputs aglobal scene graph200 for time t_xthat includes global values for the attributes that the sensors120 individually reported.Method500 returns to block510 fromblock530 to processlocal scene graphs200 into aglobal scene graph200 for a subsequent time t_x+1. Thecollector130 cycles through

blocks

510,520, and530 at the global rate R_Gto analyze a sequence of data organized in a time series and output a similarly organized time series ofglobal scene graphs200. The outputglobal scene graph200 is provided at a rate requested by downstream applications, which may include the designated rate at which the sensors120 producelocal scene graphs200, a desired frame-rate at which a renderer refreshes a view of a virtual scene based on theenvironment100, or another rate specified by a user. In this way, thecollector130 merges the data from thelocal scene graphs200 according to the different rates at which the sensors120 produce the data, while hiding any delays or variances in the rate of production from downstream applications or devices.

Method

600 begins with blocks610a-n, where respective sensors120a-nindependently generatelocal scene graphs200a-nof theenvironment100 that include a plurality of characteristics determined for the objects110 identified by the respective sensors120a-n. The number of blocks610a-nperformed as part ofmethod600 corresponds to the number of sensors120a-nlocated at various positions in theenvironment100. As discussed in relation tomethod400, an individual sensor120 may determine and update a first characteristic at a first rate, a second characteristic at a second rate, an nth characteristic at an nth rate, etc., where the individual rates of determining/updating a particular characteristic may be the same as or different than other rates determining/updating other characteristics. Additionally, the characteristics that a particular sensor120 determines and reports may be different than what other sensors120 determine and report based on the objects110 identified, the capabilities of the particular sensor120, the location of the sensor120 in theenvironment100, and user preferences.

In some embodiments, the reporting rate R_Ris set according to the needs of a downstream application, while in other embodiments, the reporting rate R_Ris set according to the rate R_a-nat which one or more of models providing analysis of theenvironment100 are performed.

At blocks620a-n, thecollector130 receiveslocal scene graphs200 from the various sensors120a-n. The number of blocks620a-nperformed in a given iteration ofmethod600 is equal to or less than the number of sensors120a-n, depending on the rate at which the individual sensors120a-nprovide the respectivelocal scene graphs200. Each of the sensors120 may transmit alocal scene graph200 at various rates and thescene graphs200 may be received at various times by the collector130 (e.g., at the same rate, but with different time offsets). Thecollector130 may batch thevarious scene graphs200a-nfor processing according to a first rate₁at which thecollector130 produces aglobal scene graph200 so that anylocal scene graphs200 received within a time window are treated as being received for the same time t_x.

At blocks630a-n, thecollector130 merges the attribute values received in the time series oflocal scene graphs200. The number of blocks630a-nperformed in a given iteration ofmethod600 is equal to the number of characteristics that thecollector130 includes in theglobal scene graph200. Thecollector130 may use the value of a characteristic presented in onelocal scene graph200 as the value for the characteristic in the global scene graph (e.g., confidence selection, polling, single-source) or amalgamate several values from several local scene graphs200 (e.g., averaging, clustering) to use as the global attribute value. Regardless of the original rates at which the models produce the local characteristic values, and the rates at which sensors120a-nprovide thelocal scene graphs200, thecollector130 merges the values for output at the first rate₁used by the downstream application.

Using block630aas an example, data for a first characteristic are received from a firstlocal scene graph200aat a first rate₁and from an nth local scene graph200nat the first rate₁. Having data available at the rate at which thecollector130 generates theglobal scene graph200, thecollector130 merges the data for the first characteristic into the global characteristic value. In the present example, thecollector130 merges the value from the firstlocal scene graph200aat time t_xand the value from the nth local scene graph200nat time t_xfor the global value reported in theglobal scene graph200 for time t_x.

Usingblock630nas an example in which data for an nth characteristic are received from a firstlocal scene graph200aat a second rate₂slower than the first rate₁and from an nth local scene graph200nat a third rate₃. In various embodiments, the third rate₃is slower than, the same as, or faster than the second rate₂(and may be slower than, the same as, or faster than the first rate₁). When handling data received fromlocal scene graphs200 at delayed times or slower rates than is output in theglobal scene graph200, thecollector130 hides the delay or different rate of reporting by using the most recently available data from the variouslocal scene graphs200. For example, thecollector130 merges the value from the firstlocal scene graph200aat time t_x−2and the value from the nth local scene graph200nat time t_x−3for the global value reported in theglobal scene graph200 for time t_x. The various historic data from the time series of local scene graphs may be weighted (or the respective confidences adjusted) so that more recent data have a greater effect on the reported global value.

Atblock640, thecollector130 outputs theglobal scene graph200 according to the first rate₁using the most recently determined values for the attributes used by the downstream applications. Because thecollector130 outputs theglobal scene graph200 at the first rate₁, and merges that data at the first rate₁regardless of the rate of update, any delays from the first rate₁in calculating or reporting a particular attribute are hidden from downstream applications.

Atblock650, the downstream application uses the global scene graph to render and/or refresh a virtual environment that mirrors the real-world environment100 according to the attributes and characteristics for the objects110 tracked by the sensors120 in the environment.Method600 may continue at the different rates on the sensors120,collector130, and anyother computing devices300 until ended by a user or a command.

In the current disclosure, reference is made to various embodiments. However, it should be understood that the present disclosure is not limited to specific described embodiments. Instead, any combination of the following features and elements, whether related to different embodiments or not, is contemplated to implement and practice the teachings provided herein. Additionally, when elements of the embodiments are described in the form of “at least one of A and B,” it will be understood that embodiments including element A exclusively, including element B exclusively, and including element A and B are each contemplated. Furthermore, although some embodiments may achieve advantages over other possible solutions or over the prior art, whether or not a particular advantage is achieved by a given embodiment is not limiting of the present disclosure. Thus, the aspects, features, embodiments and advantages disclosed herein are merely illustrative and are not considered elements or limitations of the appended claims except where explicitly recited in a claim(s). Likewise, reference to “the invention” shall not be construed as a generalization of any inventive subject matter disclosed herein and shall not be considered to be an element or limitation of the appended claims except where explicitly recited in a claim(s).

As will be appreciated by one skilled in the art, embodiments described herein may be embodied as a system, method or computer program product. Accordingly, embodiments may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” Furthermore, embodiments described herein may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).

Aspects of the present disclosure are described herein with reference to flowchart illustrations or block diagrams of methods, apparatuses (systems), and computer program products according to embodiments of the present disclosure. It will be understood that each block of the flowchart illustrations or block diagrams, and combinations of blocks in the flowchart illustrations or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the block(s) of the flowchart illustrations or block diagrams.

These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other device to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the block(s) of the flowchart illustrations or block diagrams.

The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process such that the instructions which execute on the computer, other programmable data processing apparatus, or other device provide processes for implementing the functions/acts specified in the block(s) of the flowchart illustrations or block diagrams.

The flowchart illustrations and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart illustrations or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the Figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order or out of order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustrations, and combinations of blocks in the block diagrams or flowchart illustrations, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

While the foregoing is directed to embodiments of the present disclosure, other and further embodiments of the disclosure may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow.

Claims

What is claimed is:

1. A method, comprising:

receiving, from a first sensor disposed at a first position in an environment, a first time series of local scene graphs comprising a first characteristic of an object in the environment that is updated at a first rate and a second characteristic of the object that is updated at a second rate different from the first rate;

receiving, from a second sensor disposed at a second position in the environment, a second time series of local scene graphs each comprising the first characteristic of the object that is updated at the first rate and the second characteristic of the object that is updated at a third rate different from the first rate and the second rate;

merging the first time series of local scene graphs with the second time series of local scene graphs according to the first rate to determine a global first characteristic for the object at the first rate;

merging the first time series of local scene graphs with the second time series of local scene graphs according to the second rate and the third rate to determine a global second characteristic for the object at the first rate; and

outputting a time series of global scene graphs including the global first characteristic and the global second characteristic at the first rate.

2. The method ofclaim 1, wherein the first characteristic is a spatial position of the object in the environment and the second characteristic is an attribute of the object, wherein the attribute is one of:

an association with a second object in the environment;

a feature layout of the object;

a focus of the object; and

a state of the object.

3. The method ofclaim 1, wherein merging the first time series of local scene graphs with the second time series of local scene graphs includes:

identifying a singular object described with at least one matching characteristic in the first time series of local scene graphs and in the second time series of local scene graphs as separate objects;

correlating a first identity of the singular object from the first time series of local scene graphs with a second identity of the singular object from the second time series of local scene graphs; and

merging characteristics of the singular object included in the first time series of local scene graphs with corresponding characteristics of the singular object from the second time series of local scene graphs.

4. The method ofclaim 1, wherein merging the first time series of local scene graphs with the second time series of local scene graphs according to the second rate and the third rate further comprises:

selecting a more recently determined one of the second characteristic in the first time series of local scene graphs and the second characteristic in the second time series of local scene graphs for use as the global second characteristic.

5. The method ofclaim 1, wherein merging the first time series of local scene graphs with the second time series of local scene graphs according to the second rate and the third rate further comprises:

weighting the second characteristic in the first time series of local scene graphs according to a first ratio between the first rate and the second rate;

weighting the second characteristic in the second time series of local scene graphs according to a second ratio between the first rate and the third rate; and

for each time in the time series of global scene graphs, setting a value for the global second characteristic based on a respective weighted second characteristic in the first time series of local scene graphs and a respective weighted second characteristic in the second time series of local scene graphs.

6. The method ofclaim 1, wherein the second time series of local scene graphs includes a third characteristic of the object not included in the first time series of local scene graphs, further comprising:

determining a global third characteristic for the object at the first rate based on the third characteristic in the second time series of local scene graphs,

wherein the time series of global scene graphs includes the global third characteristic at the first rate.

7. The method ofclaim 1, further comprising:

rendering, and refreshing at the first rate, a virtual scene based on the time series of global scene graphs.

8. A system, comprising:

a processor; and

a memory storage device, including instructions that when executed by the processor enable the processor to:

receive, from a first sensor disposed at a first position in an environment, a first time series of local scene graphs comprising a first characteristic of an object in the environment that is updated at a first rate and a second characteristic of the object that is updated at a second rate different from the first rate;

receive, from a second sensor disposed at a second position in the environment, a second time series of local scene graphs each comprising the first characteristic of the object that is updated at the first rate and the second characteristic of the object that is updated at a third rate different from the first rate and the second rate;

merge the first time series of local scene graphs with the second time series of local scene graphs according to the first rate to determine a global first characteristic for the object at the first rate;

merge the first time series of local scene graphs with the second time series of local scene graphs according to the second rate and the third rate to determine a global second characteristic for the object at the first rate; and

output a time series of global scene graphs including the global first characteristic and the global second characteristic at the first rate.

9. The system ofclaim 8, wherein the first characteristic is a spatial position of the object in the environment and the second characteristic is an attribute of the object, wherein the attribute is one of:

an association with a second object in the environment;

a feature layout of the object;

a focus of the object; and

a state of the object.

10. The system ofclaim 9, wherein merging the first time series of local scene graphs with the second time series of local scene graphs includes:

11. The system ofclaim 8, wherein to merge the first time series of local scene graphs with the second time series of local scene graphs according to the second rate and the third rate the processor is further enabled to:

select a more recently determined one of the second characteristic in the first time series of local scene graphs and the second characteristic in the second time series of local scene graphs for use as the global second characteristic.

12. The system ofclaim 8, wherein to merge the first time series of local scene graphs with the second time series of local scene graphs according to the second rate and the third rate the processor is further enabled to:

weight the second characteristic in the first time series of local scene graphs according to a first ratio between the first rate and the second rate;

weight the second characteristic in the second time series of local scene graphs according to a second ratio between the first rate and the third rate; and

for each time in the time series of global scene graphs, set a value for the global second characteristic based on a respective weighted second characteristic in the first time series of local scene graphs and a respective weighted second characteristic in the second time series of local scene graphs.

13. The system ofclaim 8, wherein the second time series of local scene graphs includes a third characteristic of the object not included in the first time series of local scene graphs, wherein the processor is further enabled to:

determine a global third characteristic for the object at the first rate based on the third characteristic in the second time series of local scene graphs,

14. The system ofclaim 8, wherein the processor is further enabled to:

render, and refresh at the first rate, a virtual scene based on the time series of global scene graphs.

15. A non-transitory computer-readable medium containing computer program code that, when executed by operation of one or more computer processors, performs an operation comprising:

16. The non-transitory computer-readable medium ofclaim 15, wherein merging the first time series of local scene graphs with the second time series of local scene graphs includes:

17. The non-transitory computer-readable medium ofclaim 15, wherein merging the first time series of local scene graphs with the second time series of local scene graphs according to the second rate and the third rate further comprises:

18. The non-transitory computer-readable medium ofclaim 15, wherein merging the first time series of local scene graphs with the second time series of local scene graphs according to the second rate and the third rate further comprises:

19. The non-transitory computer-readable medium ofclaim 15, wherein the second time series of local scene graphs includes a third characteristic of the object not included in the first time series of local scene graphs, wherein the operation further comprises:

20. The non-transitory computer-readable medium ofclaim 15, wherein the operation further comprises: