CROSS-REFERENCE TO RELATED PATENT APPLICATIONSThis application is a continuation of U.S. patent application Ser. No. 14/557,238, filed Dec. 1, 2014, which claims priority from Provisional Application U.S. Application 61/952,055, filed Mar. 12, 2014, incorporated herein by reference in its entirety. This application relates to application Ser. No. 14/537,768, titled SYSTEMS AND METHODS FOR SCALABLE ASYNCHRONOUS COMPUTING FRAMEWORK, and filed on Nov. 10, 2014, which is incorporated herein by reference, in its entirety.
BACKGROUND OF THEINVENTION1. Field of InventionPresent embodiments relate generally to reconstructing and graphically displaying a 3-dimensional space, and more specifically, to reconstructing and graphically displaying the 3-dimensional space based on vertices defined within the 3-dimensional space.
2. BackgroundGenerally, 3-dimensional reconstruction has been a computation-intensive task, especially in a real-time context. Methods for 3-dimensional reconstruction include triangulation, stereo vision, and time of flight systems to extract the 3-dimensional features of objects within a target space in real-time. Computational methods may have considerable potential for growth as compared to sensor-based methods, but require complex searching or matching that consumes a tremendous amount of time and computational resources. For example, both the time of flight method and the video-based stereo vision capture only a single angle. Time of flight is limited in resolution by the sensor used. Though video-based stereo vision can operate outdoors, stereo vision relies on computation-intensive searching to generate outputs. Therefore, disadvantages associated with methods described above may include errors and inefficiencies caused by either too much or too little information.
Three-dimensional space reconstruction based on outputs from cameras is becoming increasingly valuable for entertainment, security, medicine, movie production, and the like. Due to the fact that most live 3-dimensional scanners have limited range, resolution, and outdoors capabilities, computer vision researchers have sought algorithmic solutions. Algorithmic photo-based methods to derive 3-dimensional geometry in real time have drawbacks due to the similarity between textures in input to the algorithms as well as the overabundance of data to be processed. Most algorithmic methods perform approximations or tricks to avoid processing all of the data required in order to increase speed of execution. As such, data that would be necessary for an ideal solution may be disregarded, yet on the other hand, ideal photo-realistic solutions typically require processing of a tremendous amount of data to execute well enough quality.
Therefore, cost-effective systems and methods to build a 3-dimensional model of a target space and to reconstruct the 3-dimensional model at devices of end-users are desired.
SUMMARY OF THE DISCLOSUREEmbodiments described herein relate to systems and methods for capturing, building, and reconstructing a 3-dimensional model of a target space. In particular, the systems and methods can be implemented to capture, build, and reconstruct the 3-dimensional model (e.g., including mobile and immobile objects within the target space) in real-time (e.g., as in the cases of live broadcasts and/or streams of events that last for more than a moment in time).
In particular, the target (3-dimensional) space may be partitioned to include a plurality of vertices, each vertex may be associated with a discrete volume within the target space. Data (e.g., image data, video data, and/or the like) relating to the target space may be captured by data sources (e.g., cameras). Each image or frame of a video may capture a plurality of areas. For example, each area (a fraction of a pixel or at least one pixel) may be associated with one of the plurality of vertices defined within the target space.
For each vertex, display characteristics may be assessed based on the images/video frames captured by the data sources. The display characteristics include, but are not limited to, colors, textures, frequency transforms, wavelet transforms, averages, standard deviations, a combination thereof, and/or the like. Each display characteristic may be represented in at least one display attribute. With respect to a single image/frame as captured by a single source device, display attributes associated with each of the vertices may be captured and processed. In particular, a weight for each display attribute may be determined based on the display attribute associated with the area (e.g., the fraction of a pixel or at least one pixel) of the captured image/or frame. Given that there are a plurality of data sources capturing the target space at the same time in different camera poses, a plurality of display attributes associated with a given vertex may be captured.
Objects within the target space may block or contain a given vertex. As at least one potential display attribute and weights associated therewith has been obtained, the selected display attribute may be set as one of the potential display attributes (e.g., at least one potential display attribute) that is associated with the highest level of confidence. The level of confidence may be based on predetermined threshold, comparison with weights of other display attribute(s), standard deviations, averages, convolution of surrounding pixels, a combination thereof, and/or the like.
In some embodiments, a method may be described herein, the method includes, but not limited to: partitioning a model of a target space into a plurality of vertices; determining at least one display attribute associated with each of the plurality of vertices based on output data provided by a plurality of data sources; and selecting one of the at least one display attribute for each of the plurality of vertices.
In some embodiments, the method further includes providing a plurality of data sources. Each of the plurality of data sources output data corresponding to a current frame as the output data comprising a plurality of areas.
In some embodiments, each of the plurality of areas corresponds to one of the plurality of vertices.
In various embodiments, determining the at least one display attribute associated with each of the plurality of vertices comprises determining a display attribute associated with each of the plurality of areas of the current frame for each of the plurality of data sources.
In some embodiments, each of the plurality of data sources provides the output data corresponding to at least some of the plurality of vertices.
In some embodiments, each of the plurality of data sources comprises at least one digital video camera arranged at a camera position and orientation that is different from the camera position and orientation of another one of the plurality of data sources.
According to some embodiments, the output data corresponding to at least one of the plurality of vertices is outputted by two or more of the plurality of data sources.
In various embodiments, the plurality of data sources are arranged in two or more levels. The at least one display attribute for a first vertex of the plurality of vertices is determined based on the output data outputted by the at least one of the plurality of data sources in a first level of the two or more levels.
In some embodiments, the at least one display attribute for the plurality of vertices other than the first vertex is determined based on the output from the plurality of data sources associated with levels other than the first level when a weight associated with one of the at least one display attribute for the first vertex exceeds a predetermined threshold.
According to some embodiments, the at least one display attribute for the first vertex is determined based on the output data outputted by the at least one of the plurality of data sources associated with a second level of the two or more levels when the weights associated with any of the at least one display attribute is equal to or less than a predetermined threshold.
In some embodiments, the method further includes projecting at least one of the plurality of vertices onto one of the plurality of areas.
In various embodiments, partitioning a model of a target space into a plurality of vertices includes at least: receiving an exterior boundary of the model of the target space; determining density of the plurality of vertices; and sampling a volume defined by the exterior boundary of the model based on the density of the plurality of vertices.
In some embodiments, the density of the plurality of vertices is determined based on at least one of resolution desired, processing power available, and network conditions.
Various embodiments related to an apparatus, the apparatus configured to: partition a model of a target space into a plurality of vertices; determine at least one display attribute associated with each of the plurality of vertices based on output data observed by a plurality of data sources; and select one of the at least one display attribute for each of the plurality of vertices.
In some embodiments, the apparatus is further configured to provide a plurality of data sources, each of the plurality of data sources output data corresponding to a current frame as the output data comprising a plurality of areas.
In various embodiments, each of the plurality of areas corresponds to one of the plurality of vertices.
In some embodiments, determining the at least one display attribute associated with each of the plurality of vertices comprises determining a display attribute associated with each of the plurality of areas of the current frame for each of the plurality of data sources.
According to some embodiments, each of the plurality of data sources provides the output data corresponding to at least some of the plurality of vertices.
In some embodiments, each of the plurality of data sources comprises at least one digital video camera arranged at a camera position and orientation that is different from the camera position and orientation of another one of the plurality of data sources.
Various embodiments relate to a non-transitory computer-readable storage medium storing program instructions that, when executed, causes a processor to: partition a model of a target space into a plurality of vertices; determine at least one display attribute associated with each of the plurality of vertices based on output data observed by a plurality of data sources; and select one of the at least one display attribute for each of the plurality of vertices.
BRIEF DESCRIPTION OF THE DRAWINGSFIG. 1A is a schematic block diagram illustrating an example of a 3-dimensional model reconstruction system according to various embodiments.
FIG. 1B is a block diagram illustrating an example of the backend device according to various embodiments.
FIG. 2 is a process flowchart illustrating an example of a generalized 3-dimensional model reconstruction process according to various embodiments.
FIG. 3A is a schematic block diagram illustrating a perspective view of a target space according to various embodiments.
FIG. 3B is a schematic block diagram illustrating a frontal view of a target space.
FIG. 3C is a schematic block diagram illustrating a side view of a target space.
FIG. 3D is a model of a target space according to various embodiments.
FIG. 4 is a schematic diagram illustrating a perspective view of a target space having at least some of its vertices being captured by a digital camera.
FIG. 5 is a schematic diagram illustrating a top view of a target space being captured by a plurality of data sources according to various embodiments.
FIG. 6 is a schematic diagram illustrating a first set of vertices as seen in frames/images captured by data sources.
FIG. 7 is a schematic diagram illustrating a second set of vertices as occupied by objects in a given frame according to various embodiments.
FIG. 8A is a schematic diagram showing a top view of a target space as being captured by different levels of data sources.
FIG. 8B may be table showing vertices requiring to be processed at each level of data sources according to various embodiments.
FIG. 9A is a schematic diagram illustrating an example of vertex projection according to various embodiments.
FIG. 9B is a mapping table illustrating relationship between a given vertex and frames of data sources capturing the vertex.
FIG. 10 is a diagram illustrating a data source-based display attribute determination method according to various embodiments.
FIG. 11 is a diagram illustrating a vertex-based display attribute determination method according to various embodiments.
FIG. 12 is an example of a weighting table implemented according to various embodiments.
FIG. 13 is a process flowchart illustrating an example of a 3-dimensional model reconstruction process according to various embodiments.
FIG. 14 is a process flowchart illustrating an example of a vertices partitioning process according to various embodiments.
FIG. 15 is a process flowchart illustrating an example of a level-focused weighting process according to various embodiments.
FIG. 16 is a process flowchart illustrating an example of a weighting process according to various embodiments.
BRIEF DESCRIPTION OF THE DRAWINGSIn the following description of various embodiments, reference is made to the accompanying drawings which form a part hereof and in which are shown by way of illustration specific embodiments in which the invention may be practiced. It is to be understood that other embodiments may be utilized, and structural changes may be made without departing from the scope of the various embodiments disclosed in the present disclosure.
Embodiments described herein relate to systems and methods for capturing, processing, and reconstructing a 3-dimensional model of a target space. Specific implementations of the systems and methods described may include displaying live scenes occurring within the target space in real-time. The 3-dimensional model can be generated in a pipeline that outputs a streaming, moving model with changing surface projections for display characteristics such as color. In other words, a 3-dimensional model of the target space may be automatically generated based on videos and/or images captured using weights for particular display attributes of a display characteristic for vertices defined within the target space. The 3-dimensional model may then be reconstructed for displaying, for which unique points of views around or within the live scenes may be rendered for consumption by end-users.
The target space may be partitioned into a plurality of discrete volumes, each of the discrete volumes may be associated with a vertex. Thus, a plurality of vertices may make up the target space. Each vertex may be associated with display characteristics such as colors, textures, frequency transforms, wavelet transforms, averages, standard deviations, a combination thereof, and/or the like. When the display characteristics are determined for each of the vertices at a single point in time (e.g., which respect to a frame of video), the 3-model of the target space for that point in time may be reconstructed by using projections to grant the end users 2-dimensional and/or 3-dimensional display of the target space at that moment in time. A streaming video (of the reconstructed 3-dimensional model of the target space) may be produced when consecutive frames are determined based on the display characteristics/attributes of the vertices at each frame (depending on frame rate).
To determine the display characteristics, a plurality of data sources (e.g., cameras) may be positioned around or within the target space to capture videos (or images) of the target space. Given different camera poses configured for the data sources, each of the data sources may capture at least some vertices in a given frame as a 2-dimensional area within the frame/image. In other words, a fraction of a pixel or at least one pixel may correspond to at least one vertex. Processing techniques (e.g., weighting display attributes observed by the data sources) to select one out of a plurality of display attributes associated with the vertex (as shown in the fraction of the pixel or the at least one pixel) may be utilized in the manner described.
Instead of complex searching, embodiments described herein are concerned with using brute force math and simple comparisons to gain insight on 3-dimensional vertices. This allows for partitioning the processing tasks into discrete data blocks/threads for distributed processing. In particular embodiments, the projection of the vertices onto corresponding areas of the image/frame, the processing of the captured images/videos to produce weights corresponding to the display attributes observed, and to select one of the plurality of display attributes observed for each of the vertices may be performed by a distributed computing framework as described in “SYSTEMS AND METHODS FOR SCALABLE ASYNCHRONOUS COMPUTING FRAMEWORK” (application Ser. No. 14/537,768).
For example, the distributed computing framework described therein may be an asynchronous NoSql server (as generalized in a backend device) using minimal metadata. The NoSql server may be connected with a plurality of user devices via a network (e.g., the internet), each user device having user cores (e.g., GPU cores). The NoSql server may divide the processing tasks based on video frame, vertex (or a slice of vertices), source device, time, and/or the like. Metadata (e.g., frame location, vertex location, and/or the like) may be transmitted to the user devices for processing. The user devices may return an output (e.g., weighting) with respect to a display attribute associated with a vertex to the backend device (or a remote server/database). Accordingly, the 3-dimensional model processing task may be split into a large number of easily computable threads/blocks for processing over a large neural network, where each of the neurons (e.g., the user devices) may compute the task in a parallel and asynchronous manner. From the outputs of each of the user devices, a pipeline of streaming 3-dimensional display data may be generated.
In some embodiments, the user device may be coupled to, or supplying the data source used to determine the color of vertices. In one example, the user device (e.g., a cell phone) may project a video from a camera onto a single flat plane, or cloud of vertices, of the scene. This single projection is simple and easy for cell phones today. The output may be streamed to a content delivery network (CDN), where data from additional (live) data sources may be aggregated. The user device may supply all input in that case, including the camera pose in space, which may be determined in the same way for ordinary cameras. Such determination may be carried out with markers in the scene, a 3D shape in a corresponding 3D space, the camera's intrinsic parameters, and linear algebra to solve for the projection.
FIG. 1A is a schematic block diagram illustrating an example of a 3-dimensionalmodel reconstruction system100 according to various embodiments. Referring toFIG. 1A, the 3-dimensionalmodel reconstruction system100 may include at least abackend device110, a plurality of data sources120a-120h(e.g., afirst source device120a, asecond source device120b, . . . , aneighth source device120h), atarget space105, anetwork130, a plurality of user devices140a-140n(e.g., afirst user device140a, asecond user device140b, . . . , a n-th user device140n), and adatabase170.
In some embodiments, thenetwork130 may allow data transfer between thebackend device110 and the user devices140a-140n. In further embodiments, thenetwork130 may also allow data transfer between the data sources120a-120hand thebackend device110/the user devices140a-140n. In still further embodiments, thenetwork130 may enable data transfer between thedatabase170 and thebackend device110. The user devices140a-140nmay be connected to each other through thenetwork130. Thenetwork130 may be a wide area communication network, such as, but not limited to, the Internet, or one or more Intranets, local area networks (LANs), ethernet networks, metropolitan area networks (MANs), a wide area network (WAN), combinations thereof, and/or the like. In particular embodiments, thenetwork130 may represent one or more secure networks configured with suitable security features, such as, but not limited to firewalls, encryption, or other software or hardware configurations that inhibits access to network communications by unauthorized personnel or entities. The data transmittable over thenetwork130 may be encrypted and decrypted within shader language onuser cores150 of the user devices140a-140nusing per frame keys, further securing the data.
In some embodiments, the data sources120a-120hand thebackend device110 may be connected via a first network, thebackend device110 and thedatabase170 may be connected via a second network, and thebackend device110 and the user devices140a-140nmay be connected via a third network. Each of the first, second, and third networks may be a network such as, but not limited to thenetwork130. Each of the first, second, and third networks may be a different network from the other networks in some embodiments. In other embodiments, two of the first, second, and third networks may be a same network.
Thetarget space105 may be any 3-dimensional space to be captured by the data sources120a-120h. Examples of thetarget space105 include, but are not limited to, a stadium, amphitheater, court, building, park, plant, farm, room, a combination thereof, and/or the like. In particular, as shown inFIG. 1A, thetarget space105 takes form of a tennis court, as a non-limiting example. In other words, virtually any venue, location, room, and/or the like may be represented as a 3-dimensional volume such as thetarget space105. Thetarget space105 may be sampled and partitioned into a plurality of discrete volumes, each of which may be associated with a vertex. The model of thetarget space105 may include display characteristics/attributes associated with each of the vertices.
Each vertex may be identified by a unique identifier. In particular embodiments, the identifier may be based on position of the vertex in thetarget space105. For example, a coordinate system may be implemented for thetarget space105 such that each vertex may be identified by a particular set of coordinates of the coordinate system.
Each of the data sources120a-120hmay be connected to the backend device110 (e.g., via thenetwork130 or otherwise), where thebackend device110 may generate metadata of the outputs (e.g., video feed or images) of data sources120a-120h. The data source120 may be connected to the user devices140a-140n(e.g., via thenetwork130 or otherwise) for providing source data to be processed by the user devices140a-140n. For example, the data sources120a-120hmay include any suitable devices for capturing videos and/or images and outputting raw video and/or image data to thebackend device110. In particular embodiments, each of the data sources120a-120hmay include at least one camera (e.g., digital cameras, high-definition digital cameras, IP-cameras, or other cameras with network capabilities). In other embodiments, the raw data outputted by the data sources120a-120hmay be routed to a network device, which will relay the data to thebackend device110.
The data sources120a-120hmay be positioned around or within thetarget space105 to capture videos and/or images of thetarget space105. For example, data sources (such as, but not limited to the data sources120a-120h) may be positioned around the top, bottom, and/or side surfaces of a target space (such as, but not limited to, the target space105) and facing an interior of the target space. In further embodiments, data sources may be positioned within the target space for better coverage of the vertices that may be difficult to be captured if all the data sources were to be positioned outside of the interior volume of the target space. In the non-limited example shown inFIG. 1A, thefirst data source120a, thesecond data source120b, and thethird data source120cmay be positioned along a first side surface of thetarget space105; thefourth data source120d, thefifth data source120e, and thesixth data source120fmay be positioned along a second side surface of thetarget space105; and theseventh data source120gand theeighth data source120hmay be positioned along a third side surface of thetarget space105.
Each of the data sources may have a different or unique camera pose (i.e., the position and orientation of the cameras relative to the target space). For example, the data sources may be positioned in a grid-like manner (spaced evenly) and pointing directly forward in a line of sight perpendicular to a surface of the target space. A plurality of rows and columns of data sources may be provided for a given surface of the target space. In other embodiments, the data sources may be positioned in a random or semi-random pattern. The camera pose of the data sources may be limited by the space around the target space and be placed in positions and orientations based on the available geometry of objects and obstacles in the space around the target space.
Distance may be provided between the surfaces or edges of thetarget space105 and the data sources120a-120h. The longer the distance, the more vertices may be captured by a given data source. On the other hand, longer distance may cause lower resolution of the captured video/image data, thus causing errors when processed. The distance between a data source and the target space may be determined based on camera resolution, the volume of the target space, the number of other data sources available, a combination thereof, and/or the like.
While 8 data sources120a-120hare shown inFIG. 1A, one of ordinary skill in the art should appreciate that, more or less number of data sources (such as, but not limited to data sources120a-120h) may be provided. A larger number of data sources spread out around or in the target space may provide for a larger sample size (e.g., more frames of video) for processing, thus providing a large number of weights for a given display attribute associated with a given vertex in a given frame time. Accuracy and faithfulness is thus improved with larger number of weighting values. On the other hand, larger number of data sources may cause prolonged processing due to the increase in unprocessed data.
FIG. 1B is a block diagram illustrating an example of thebackend device110 according to various embodiments. Referring toFIGS. 1A-1B, thebackend device110 may include aprocessor111,memory112 operatively coupled to theprocessor111,network device113, user interface114, and/or the like. In some embodiments, thebackend device110 may include a desktop computer, mainframe computer, a server unit, laptop computer, pad device, smart phone device or the like, configured with hardware and software to perform operations described herein. Thebackend device110 may be, in particular embodiments, a redis publish/subscribe server, and/or a NoSql server.
For example, thebackend device110 may include typical desktop PC or Apple™ computer devices, having suitable processing capabilities, memory, user interface (e.g., display and input) capabilities, and communication capabilities, when configured with suitable application software (or other software) to perform operations described herein. Platforms suitable for implementation include Amazon/Debian Linux, HTML (e.g., HTML5) browsers without plug-ins (such as java or flash), or the like. Thus, particular embodiments may be implemented, using processor devices that are often already present in many business and organization environments, by configuring such devices with suitable software processes described herein. Accordingly, such embodiments may be implemented with minimal additional hardware costs. However, other embodiments of thebackend device110 may relate to systems and process that are implemented with dedicated device hardware specifically configured for performing operations described herein.
Theprocessor111 may include any suitable data processing device, such as a general-purpose processor (e.g., a microprocessor). In the alternative, theprocessor111 may be any conventional processor, controller, microcontroller, or state machine. Theprocessor111 may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, a GPU, at least one microprocessors in conjunction with a DSP core, or any other such configuration.
Thememory112 may be operatively coupled to theprocessor111 and may include any suitable device for storing software and data for controlling and use by theprocessor111 to perform operations and functions described herein, including, but not limited to, random access memory (RAM), read only memory (ROM), floppy disks, hard disks, dongles or other recomp sensor board (RSB) connected memory devices, or the like.
Thenetwork device113 may be configured for communication over thenetwork130. Thenetwork device113 may include interface software, hardware, or combinations thereof, for communication over thenetwork130. Thenetwork device113 may include hardware such as network modems, wireless receiver or transceiver electronics, and/or software that provide wired or wireless communication link with the network130 (or with a network-connected device). In particular embodiments, thenetwork device113 may be coupled to theprocessor111 for providing communication functions. Thenetwork device113 may provide telephone and other communications in accordance with typical industry standards, such as, but not limited to code division multiple access (CDMA), time division multiple access (TDMA), frequency division multiple access (FDMA), long term evolution (LTE), wireless fidelity (WiFi), frequency modulation (FM), Bluetooth (BT), near field communication (NFC), and the like.
In particular embodiments, the user interface114 of thebackend device110 may include at least one display device. The display device may include any suitable device that provides a human-perceptible visible signal, audible signal, tactile signal, or any combination thereof, including, but not limited to a touchscreen, LCD, LED, CRT, plasma, or other suitable display screen, audio speaker or other audio generating device, combinations thereof, or the like.
In some embodiments, the user interface114 of thebackend device110 may include at least one user input device that provides an interface for designated personnel using thebackend device110. The user input device may include any suitable device that receives input from a user including, but not limited to one or more manual operator (such as, but not limited to a switch, button, touchscreen, knob, slider, keyboard, mouse, or the like), microphone, camera, image sensor, any types of remote connection control, or the like.
Still referring toFIGS. 1A-1B, thebackend device110 may additionally include avertices determination module115. In some embodiments, thevertices determination module115 may be hardware/software modules for determining density and positions of the vertices associated with thetarget space105. Thevertices determination modules150 may be a dedicated hardware/software entity within thebackend device110 having its own processor/memory (such as, but not limited to theprocessor111 and the memory112), or alternatively, uses theprocessor111 andmemory112 of thebackend device110.
In some embodiments, thevertices determination module115 may automatically determine the density and positions of vertices for thetarget space105 based on given information (e.g., dimensions or exterior boundaries) relating to thetarget space105. Suitable algorithms may specify a total number of vertices (or density) within thetarget space105, such that the discrete volumes associated with each vertex may be the total volume of thetarget space105 divided by the total number of vertices. The algorithms may alternatively include spacing requirements which may specify a distance between two vertices or the sizes of each discrete volume. In typical embodiments, the spacing of the vertices may be uniform throughout thetarget space105. In other embodiments, the spacing of the vertices may be denser in some portions of the target space105 (e.g., where actions of interest require more accuracy or higher resolution) while sparser in other portions of the target space105 (e.g., in areas where advertisement or background may be).
In various embodiments, outer dimensions (exterior boundaries) of thetarget space105 may be provided by user input or suitable imaging techniques. The exterior boundaries may form an overall volume of thetarget space105, which may be sampled based on a given size for discrete volumes into which the overall volume may be partitioned or the density of vertices determined. Smaller samples (smaller discrete volumes) may correspond to denser sampling, and would allow the 3-dimensional model of thetarget space105 to resemble theactual target space105 more closely. This would, in turn, allow the extraction of observed display attributes and the reconstruction of the 3-dimensional model with the display attributes with more detail and higher resolution. On the other hand, sparse sampling may yield a model with lower detail and resolution, where the reconstruction of the 3-dimensional model may appear pixelated and rough when viewed on the user devices140a-140b.
In other embodiments, thevertices determination module115 may accept input from a separate device (not shown) or user input (through the user interface114) concerning the number and position of the vertices within thetarget space105. For example, thevertices determination module115 may receive, from the separate device or the user interface114, a model of thetarget space105 including the positioning of the vertices.
Thevertices determination module115 may further project or otherwise map each vertex onto a frame or image as captured by each of the data sources120a-120hin the manner described.
Thebackend device110 may also include a weighting module160. In some embodiments, the weighting module160 may be hardware/software modules for determining observed display attributes and weights for each of the display attribute associated with the vertices in the manner described. The weighting module160 may be a dedicated hardware/software entity within thebackend device110 having its own processor/memory (such as, but not limited to theprocessor111 and the memory112). Alternatively, the weighting module160 may use theprocessor111 andmemory112 of thebackend device110 in performing its functions.
Thebackend device110 may also include a joboutcome determination module117. In some embodiments, the joboutcome determination module117 may be hardware/software modules for determining a job outcome (e.g., selecting one of the at least one display attribute observed for a display characteristic). In particular, joboutcome determination module117 may select a final display attribute for a given vertex in a given frame based on the weights associated with each potential display attribute as determined by the weighting module160. The joboutcome determination module117 may be a dedicated hardware/software entity within thebackend device110 having its own processor/memory (such as, but not limited to theprocessor111 and the memory112). Alternatively, the joboutcome determination module117 may use theprocessor111 andmemory112 of thebackend device110 in performing its functions.
The selected display attribute, as determined by the joboutcome determination module117, may be a display attribute captured by most data sources. In other embodiments, the selected display attribute, as determined by the joboutcome determination module117, may be associated with weights exceeding a predetermined threshold. In still other embodiments, the selected display attribute, as determined by the joboutcome determination module117, may be an average display attribute (e.g., average in the color channel when the display characteristic is color) when the standard deviation from the average display attribute is below a predetermined threshold.
In other embodiments, thebackend device110 may implement distributed computing framework as described in “SYSTEMS AND METHODS FOR SCALABLE ASYNCHRONOUS COMPUTING FRAMEWORK” (application Ser. No. 14/537,768), instead of having the localizedvertices determination module115, the weighting module160, and/or the joboutcome determination module117. For example, the backend device110 (via its processor111) may segment the computing tasks performed by thevertices determination module115, the weighting module160, and/or the joboutcome determination module117 as described into discrete data blocks/threads to be distributed to a plurality of user devices for processing (e.g., by the GPU of the user devices). Thus, the functions described with respect to thevertices determination module115, theweighting module116, and the joboutcome determination module117 may be performed by devices other than thebackend device110. In particular, one or more of thevertices determination module115, theweighting module116, and the joboutcome determination module117 may be external to thebackend device110.
In addition to (or as an alternative to) thememory112, thebackend device110 may be operatively coupled to the at least onedatabase170. Thedatabase170 may be capable of storing a greater amount of information and provide a greater level of security against unauthorized access to stored information than thememory112 in thebackend device110. Thedatabase170 may include any suitable electronic storage device or system, including, but not limited to random access memory RAM, read only memory ROM, floppy disks, hard disks, dongles or other RSB connected memory devices, or the like. In particular embodiments, thedatabase170 may be a NOSQL database maintained by a redis server.
Thedatabase170 and/or thememory112 may be configured to store source data (e.g., unprocessed data) from the data sources. In some embodiments, the source data may be stored in either thedatabase170 or thememory112. In other embodiments, at least a portion of the source data may be stored in one of thedatabase170 and thememory112, while a separate portion of the source data may be stored in another one of thedatabase170 or thememory112.
Each of the user devices140a-140nmay include general processing unit, memory device, network device, and user interface. The processing unit may be configured to execute general functions of the user devices140a-140n, such as any suitable data processing device. The memory device of each of the user devices140a-140nmay be operatively coupled to the processing unit and may include any suitable device for storing software and data for controlling and use by the processing unit to perform operations and functions described herein. The network device of each of the user devices140a-140nmay include interface software, hardware, or combinations thereof, for communication over thenetwork130.
The user devices140a-140nmay each include a user interface including at least a display device for displaying information (e.g., text and graphics) to the users. The display device may include any suitable device that provides a human-perceptible visible signal, audible signal, tactile signal, or any combination thereof, including, but not limited to a touchscreen, LCD, LED, CRT, plasma, or other suitable display screen, audio speaker or other audio generating device, combinations thereof, or the like. The interface may be configured to display to a user of the user devices140a-140nthe projected video or image of the 3-dimensional model based on the final display characteristics associated with each vertex.
Each of the user devices140a-140nmay be any wired or wireless computing systems or devices. In some embodiments, the user devices140a-140nmay be a desktop computer, mainframe computer, laptop computer, pad device, or the like, configured with hardware and software to perform operations described herein. For example, each of the user devices140a-140nmay include typical desktop PC or Apple™ computer devices, having suitable processing capabilities, memory, user interface (e.g., display and input) capabilities, and communication capabilities, when configured with suitable application software (or other software) to perform operations described herein. In other embodiments, the user devices140a-140nmay include a mobile smart phone (such as, but not limited to an iPhone™, an Android™ phone, or the like) or other mobile phone with suitable processing capabilities. Typical modern mobile phone devices include telephone communication electronics as well as some processor electronics, one or more display devices and a keypad and/or other user input device, such as, but not limited to described above. Particular embodiments employ mobile phones, commonly referred to as smart phones, that have relatively advanced processing, input and display capabilities in addition to telephone communication capabilities. However, the user devices140a-140n, in further embodiments of the present invention, may include any suitable type of mobile phone and/or other type of portable electronic communication device, such as, but not limited to, an electronic smart pad device (such as, but not limited to an iPad™), a portable laptop computer, or the like.
FIG. 2 is a process flowchart illustrating an example of a generalized 3-dimensionalmodel reconstruction process200 according to various embodiments. Referring toFIGS. 1A-2, the generalized 3-dimensionalmodel reconstruction process200 may be implemented with the 3-dimensionalmodel reconstruction system100 as illustrated inFIGS. 1A-1B. First at block B210, thevertices determination module115 of thebackend device110 may partition a model of thetarget space105 into a plurality of vertices, each of the plurality of vertices may be associated with (e.g., positioned in a center of) a discrete volume which is a portion of thetarget space105.
Next at block B220, theweighting module116 of thebackend device110 may determine at least one display attribute for a display characteristic associated with each of the plurality of vertices based on video frames (or images) captured by the plurality of data sources (e.g., the data sources120a-120h). At a given frame time, each of the plurality of data sources may output a frame of video (or an image) capturing at least a portion (and vertices defined in that portion) of thetarget space105. The display characteristic associated with a vertex may refer to at least one of: colors, textures, frequency transforms, wavelet transforms, averages, standard deviations, a combination thereof, and/or the like. Each vertex may be captured in at least one (often a plurality) display attribute, which is a specific type of one display characteristics. For example, red, yellow, blue may be examples of display attributes associated with the display characteristic color.
Next at block B230, the joboutcome determination module117 of thebackend device110 may select one of the at least one display attribute (e.g., select one color of a plurality of potential colors) for each of the plurality of vertices. Subsequently thebackend device110 or the user devices140a-140nmay display the model of thetarget space105 based on the one selected display attribute of the at least one display attribute (for that display characteristics), for that current frame. As such, multiple display characteristics may be processed in parallel in the manner described above.
FIG. 3A is a schematic block diagram illustrating a perspective view of atarget space300 according to various embodiments.FIG. 3B is a schematic block diagram illustrating a frontal view of thetarget space300.FIG. 3C is a schematic block diagram illustrating a side view of thetarget space300. Referring toFIGS. 1-3C, thetarget space300 may a 3-dimensional space such as, but not limited to, a portion of thetarget space105. Thetarget space300 may be partitioned into a plurality ofdiscrete volumes310a-3101 (e.g., afirst volume310a, asecond volume310b, . . . , a twelfth volume3101). Each of thediscrete volumes310a-3101 may be associated with a vertex. For example, thefirst volume310amay be associated with afirst vertex320a, thesecond volume310bmay be associated with asecond vertex320b, . . . , the twelfth volume3101 may be associated with a twelfth vertex320l. Each of thediscrete volumes310a-3101 may be of any suitable shape such as, but not limited to cuboids, cubes, and the like. For the sake of clarity, thetarget space300 is scaled to include 12 vertices.
The discrete volumes and/or the vertices may be predetermined given that the dimensions for thetarget space300 may be known. In some embodiments, for a target space of 60′ by 30′ by 10′, there may be 6,000,000 vertices such as, but not limited to, the vertices320a-3201. The larger the number of vertices, the more faithful and detailed the 3-dimensional model of the target space may be. When associated with the display characteristics, the larger number of the vertices may allow for high resolution when reconstructing and displaying the 3-dimensional model of the target space.
Each of the vertices320a-3201 may correspond to a portion of a pixel (e.g., 1/16, ⅛, ¼, ½, or the like), pixel(s), macroblock(s), or the like when projected and displayed in a 2-dimensional context of the output of the data sources. For example, the user devices140a-140nmay be configured to display (with user interfaces) 3-dimensional projections of thetarget space300 based on the vertices320a-3201 as 2-dimensional video streams. Each of the vertices320a-3201 may correspond to a portion of a pixel, at least one pixel, or at least one macroblock when captured by the data sources120a-120h.
Each vertex may be associated with display characteristics such as, but not limited to, colors, textures, frequency transforms, wavelet transforms, averages, standard deviations. Various embodiments described herein may refer to color as an exemplary display characteristic. One of ordinary skill in the art would know that other display characteristics as stated may also be implemented in a similar manner.
Given that the camera pose (position and orientation) may be known in advance, the vertices (e.g., at least some of the vertices320a-3201) captured by each data source (such as, but not limited to the data sources120a-120h) may be determined. For example, a data source (e.g., a digital camera) capturing the frontal view (e.g.,FIG. 3B) of thetarget space300 may capture the frontal vertices (e.g., thesecond vertex320b, thethird vertex320c, thefifth vertex320e, thesixth vertex320f, theseventh vertex320g, and theeighth vertex320h). Similarly, a data source capturing the side view (e.g.,FIG. 3C) of thetarget space300 may capture the side vertices (e.g., thefirst vertex320a, thesecond vertex320b, thethird vertex320c, and thefourth vertex320d). Additional data sources may capture the back, top, bottom, and/or side views of thetarget space300. Thus, a single vertex may be captured by multiple data sources. In particular, the data sources may capture display characteristics associated with each vertex.
Each vertex may be defined by a unique identifier based on the position of the vertex within the target space. In the no-limiting example illustrated inFIGS. 3A-3C, thefourth vertex320dmay be identified by its coordinates (1, 1, 1), thesixth vertex320fmay be identified by its coordinates (3, 2, 1), and thefifth vertex320emay be identified by its coordinates (2, 2, 2).
FIG. 3D is amodel390 of thetarget space105 according to various embodiments. Referring toFIGS. 1A-3D, themodel390 may include a plurality of vertices such as, but not limited to, the vertices320a-3201. The vertices may be predetermined by any suitable algorithms or by user input as described. In various embodiments, the same set of vertices defined for a target space (e.g., thetarget space105 or the target space300) may be reused for a plurality of live events, given that the target space remains statically, such that the model involved the same vertices can still adequately represent the target space faithfully.
FIG. 4 is a schematic diagram illustrating a perspective view of thetarget space105 having at least some of its vertices being captured by thedigital camera410. Referring toFIGS. 1A-4, in various embodiments, thetarget space105 may include aninterior volume430. Theinterior volume430 may be partitioned into a plurality of vertices such as, but not limited to, the vertices320a-3201. For the sake of clarity, the vertices within theinterior volume430 are not shown inFIG. 4. A digital camera410 (as a specific implementation of the data sources120a-120g) may be provided at any suitable camera pose to capture images/video of thetarget space105. Thedigital camera410 may capture a capturedspace420 representing a field of view of thedigital camera410. The vertices within the capturedspace420 may be captured by thedigital camera410. Thus, a given source device may capture at least some vertices within theinterior volume430. All vertices may be captured by a camera that has been distanced sufficiently far away from the target space105 (as indicated by the camera pose, which may be predetermined in setting up).
In some embodiment, the digital camera410 (as well as the data sources120a-120g) may include or be associated with amarker440. Themarker440 may include any physical or digital indication of the location and/or angle (e.g., the camera pose) associated with thedigital camera410. In some embodiments, themarker440 may be placed on or near the ground or other suitable objects on (below, or around) which thedigital camera410 is positioned. Themarker440 may include a checkerboard, QR code, or any specifically identifiable marker containing position from and/or angle data with respect to thetarget space105, when thedigital camera410 is positioned on the marker. Themarker440 may be preliminarily placed around or within thetarget space105 before thedigital camera410 is actually placed. Thedigital camera410 may become associated with themarker440 by scanning the information contained in themarker440 with an associated scanning device. In other embodiments, themarker440 may be placed in any suitable manner on thedigital camera410 and move with thedigital camera410. Accordingly, thebackend device110 may recognize the association between the output from thedigital camera410 and the camera pose as specified by themarker440.
FIG. 5 is a schematic diagram illustrating a top view of thetarget space105 being captured by a plurality of data sources510a-520jaccording to various embodiments. Referring toFIGS. 1A-5, the plurality of data sources510a-520jmay each be a data source such as, but not limited to, the data sources120a-120gand/or thedigital camera410. It should be understood by one of ordinary skill in the art that the data sources120a-120gmay have any suitable camera pose (e.g., the distance from thetarget space105, the distance from each of the data sources120a-120g, the angle, and/or the like). The collective camera pose of the data sources120a-120gmay be of any organized or randomized pattern. The camera pose may affect the projection of the vertices onto the frames/images outputted by the particular data source as well as the vertices being captured by the particular data source. The location of the data sources120a-120gmay be also captured by way of a marker within the scene. The geometry of the marker is known, so all 2D projections of the marker allow deriving the 3D pose of the camera.
Each of the data sources510a-520jmay be associated with a unique field of view due to the camera pose associated with each of the data sources510a-520j. For example, thefirst data sources510amay be associated with afirst field520a, thesecond data sources510bmay be associated with asecond field520b, . . . , and thetenth data sources510jmay be associated with atenth field520j. At least two of the fields510a-510jmay overlap to capture a same vertex. For example, a vertex530 may be captured by thefirst field520a, thefourth field520d, theentity field520h, and thetenth field520j. A large number of data sources capturing a given vertex from diverse camera poses can allow more accurate sampling of the vertex, and can thus yield more accurate results with respect to selecting the display attribute from the plurality of display attributes that may be outputted by the data sources capturing the given vertex.
WhileFIG. 5 illustrates the non-limiting example involving two dimensions (e.g., the x-y plane), it should be understood that thetarget space105 may be captured by data sources such as, but not limited to the data sources510a-510jin the manner described for the 3-dimensional volume of thetarget space105.
FIG. 6 is a schematic diagram illustrating a first set ofvertices600 as seen in frames/images captured by data sources. Referring toFIGS. 1A-6, the first set ofvertices600 may be any vertices described herein. A first portion of the first set of vertices600 (e.g., a left portion containing 16 vertices) may be projected to a first frame610 (as captured by a first data source such as any data source described herein). A second portion of the first set of vertices600 (e.g., a middle portion containing 16 vertices) may be projected to a second frame620 (as captured by a second data source such as any data source described herein). A third portion of the first set of vertices600 (e.g., a right portion containing 16 vertices) may be projected to a third frame620 (as captured by a second data source such as any data source described herein). Each of the vertices shown may be a first vertex in a string of vertices arranged in the y-direction (e.g., into the page). The vertex640 may be captured by both thefirst frame610 and thesecond frame620. The location of the camera is also captured by way of a marker within the scene. The geometry of the marker is known, so all 2D projections of the marker allow deriving the 3D pose of the camera. Each vertex may be projected to each frame, but may be projected to a position outside the bounds of the frame (meaning the data sources may not observe/capture data related to all of the vertices, including any projected outside of bounds of the data sources).
FIG. 7 is a schematic diagram illustrating a second set ofvertices700 as occupied by objects in a given frame according to various embodiments. Referring toFIGS. 1A-7, the second set ofvertices700 may be any vertex described herein. Three data sources (e.g.,Camera X710,Camera Y720, and Camera Z730) may be provided to capture display characteristics/attributes of the second set ofvertices700. In particular,Camera X710 may capture the display characteristics/attributes of some of the second set ofvertices700 withinField X715.Camera Y720 may capture the display characteristics/attributes of some of the second set ofvertices700 withinField Y725.Camera Z730 may capture the display characteristics/attributes of some of the second set ofvertices700 withinField Z735.
Camera X710 may be configured to capture the display characteristics/attributes of vertices in avertex row760, which include unloadedvertices770, first loadedvertex745, second loaded vertex775a, third loadedvertex755b, and fourth loadedvertex755c. The unloadedvertices770 are not loaded with an object (i.e., no objects, other than air, occupy the blank vertices770). The first loadedvertex745 may be occupied by afirst object740. The second loaded vertex775a, the third loadedvertex755b, and the fourth loadedvertex755cmay be occupied by asecond object750.
Given that the camera pose ofCamera X710 may allow only one set of display characteristics/attributes to be captured concerning one vertex in thevertex row760 given that each of the vertices of thevertex row760 is stacked together and blocking vertices behind it relative toCamera X710. For example, given that the unloadedvertices770 is occupied by air which has no display characteristics/attributes, the first vertex in thevertex row760 captured byCamera X710 may be the first loadedvertex745, which may have the display characteristics/attributes of thefirst object740. Thus,Camera X710 may output each of the unloadedvertices770, the first loadedvertex745, the second loaded vertex775a, third loadedvertex755b, and fourth loadedvertex755cto have the display characteristics/attributes of thefirst object740. The weighting for the display characteristics/attributes associated with thefirst object740 for each of the vertices in thevertex row760 may be increased.
However, by virtue of having additional data sources (Camera Y720,Camera Z730, and other cameras not shown, for clarity reasons) having different camera pose, the vertices inappropriately/failed to be captured may be appropriately captured by other data sources in better camera poses to appropriately capture the display characteristics/attributes of those vertices. For example, with reference to the second loadedvertex755a, bothField Y725 ofCamera Y720 andField Z735 ofCamera Z730 may capture the second loadedvertex755a. As the precedingvertex775 may be unloaded (e.g., unoccupied), bothCamera Y720 andCamera Z730 may capture the second loadedvertex755ato have the display characteristics/attributes of thesecond object750 at the second loadedvertex755a, which is appropriate. As such, appropriate weight (e.g.,2 fromCamera Y720 and Camera Z730) may exceed inappropriate weight (e.g.,1 from Camera X710) or a threshold. Other vertices may be captured and weighted in similar manner.
When the weighting of the display characteristics/attributes does not clearly indicate a dominant attribute out of all the captured attributes for a given vertex, the vertex may be completely (or at least substantially) within an object (e.g., blocked by other peripheral vertices) or the vertex may be air, the display characteristics/attributes may be selected as null or not available. The display characteristics/attributes for that vertex may be selected as a background characteristic/attribute (e.g., a background color), blanked, transparent, or the same as the display characteristics/attributes of nearest vertices that yield a set of valid display characteristics/attributes above confidence level, in the manner described.
Referring toFIGS. 1A-8A,FIG. 8A is a schematic diagram showing a top view of thetarget space105 as being captured by different levels of data sources. In various embodiments, each of the data sources may be any data sources as described herein. The data sources may each have a unique camera pose. The data sources may be classified into a plurality of levels. In the non-limiting example as illustrated, the data sources may include first level devices810a-810d, second level devices820a-820d, and third level devices830a-830h.
In some embodiments, each level may include data sources that can (collectively) capture all vertices in thetarget space105, by virtue of the camera poses associated with each source device of the level. In other embodiments, each level may include data sources that can capture (collectively) some but not all vertices in thetarget space105, by virtue of the camera poses associated with each source device of the level. In some embodiments, a number of data sources for a preceding level may be less than or equal to a number of data sources for a subsequent level. In the nonlimiting example ofFIG. 8A, there are 4 devices for each of the first level devices810a-810dand the second level devices820a-820d, and there are 8 devices for third level devices830a-830h. In other embodiments, a number of data sources for a preceding level may be greater than or equal to a number of data sources for a subsequent level.
In some embodiments, the data sources for a preceding level may be arranged in a more sparse pattern as compared to the data sources for at least one subsequent level. In other embodiments, the data sources for a preceding level may be arranged in a more dense pattern as compared to the data sources for at least one subsequent level. In some embodiments, each source device from a same level may not be placed adjacent to one another. In other or further embodiments, each source device from a same level may be placed adjacent to one another.
Generally, different levels of source devices may serve to efficiently conduct vertex sampling and drop/withdraw from consideration vertices at the end of each level. The dropped/withdrew vertices are not considered by subsequent levels given that a display attribute may already been selected based on the confidence level. The vertices of thetarget space105 may be projected onto areas on frames (at a same frame time) captured by each of the data sources (the first level devices810a-810d) in the manner described. The corresponding areas may be sampled for display characteristics/attributes associated with the vertices. When a given vertex has a display attribute (e.g., the color coded “4394492”) exceeds a predetermined threshold (e.g., each of the first level devices810a-801dmay capture the given vertex in the same color 4394492, thus exceeding a threshold weight of 4), that display attribute (4394492) may be selected as the display attribute for the particular display characteristic of that vertex. The vertex may be dropped from processing by the subsequent levels (e.g., the second level devices820a-820d, and the third level devices830a-830h), given that the confidence level associated with the first level (as implemented with the threshold method) has already been met. In further embodiments, whereas the output from each data source within a given level are extremely dissimilar (e.g., the largest weighting for a given display attribute is a very small fraction of the total number of devices present in the level) concerning a given vertex, the vertex may be designated as unloaded (e.g., air) or inside of a volume, and is also dropped from consideration at subsequent levels.
FIG. 8B may be table900 showing vertices required to be processed at each level of data sources according to various embodiments. Referring toFIGS. 1A-8B, various levels of data sources (such as, but not limited to, the first level devices810a-810d, second level devices820a-820d, and third level devices830a-830h) may be provided for a system (e.g., the 3-dimensional model reconstruction system100) for processing the vertices of thetarget space105 by levels.
For example, thetarget space105 may be partitioned into a plurality of vertices (e.g., vertex a, vertex b, . . . , vertex n). Atlevel1910, vertex b may be found to have a display attribute above confidence level (e.g., having weighting exceeding a predetermined threshold) and is thus withdrawn from further processing in the subsequent levels. Atlevel2920, vertex c may be found to have a display attribute above confidence level (e.g., having weighting exceeding a predetermined threshold) and is thus withdrawn from further processing in the subsequent levels. At any level betweenlevel2920 andlevel N930, vertex a may be found to have a display attribute above confidence level (e.g., having weighting exceeding a predetermined threshold) and is thus withdrawn from further processing in any subsequent levels. All remaining vertices inlevel N930 may be processed.
FIG. 9A is a schematic diagram illustrating an example of vertex projection according to various embodiments. Referring toFIGS. 1A-9A, a vertex ofinterest910 may be associated with a discrete volume ofinterest920 within thetarget space105. The vertex ofinterest910 may be any vertices as described herein. Afirst frame930 may be an image or a frame of a video captured by a first source device (e.g., any suitable source device as described herein). Asecond frame940 may be an image or a frame of a video captured by a second source device (e.g., any suitable source device as described herein). Athird frame950 may be an image or a frame of a video captured by a third source device (e.g., any suitable source device as described herein). Each of thefirst frame930, thesecond frame940, and thethird frame950 may be a frame captured by a separate data source at a same frame time.
Thevertex910 may be projected onto each of thefirst frame930, thesecond frame940, and thethird frame950 to determine an area (a fraction of a pixel, at least one pixel, at least one macroblock, and/or the like) in each frame that corresponds to thevertex910. In other words, thevertex910 in the 3-dimensional volume of the model for thetarget space105 may be projected onto a 2-dimensional image/frame in any suitable transformation matrix. For example, the projection of the vertex ofinterest910 from the 3-dimensional model of thetarget space105 onto a frame capturing the vertex ofinterest910 may be a function of camera pose (e.g., distance from the vertex ofinterest910, angle/orientation with respect to the vertex ofinterest910 or the axis defined for the target space105), size of thetarget space105, screen size of the frame, a combination thereof, and/or the like.
The projected 2-dimensional areas (e.g., afirst area935 in thefirst frame930, asecond area945 in thesecond frame940, athird area955 in the third frame950) may be of different size, shape (orientation of the plane as shown for the third area955), position in the frame. This may be caused by the differences in camera pose.
FIG. 9B is a mapping table960 illustrating relationship between a given vertex and frames of data sources capturing the vertex. Referring toFIGS. 1A-9B, the relationship between a given vertex (such as, but not limited to, any vertices described herein) in model of thetarget space105 as projected onto frames of different data sources (such as, but not limited to, any data sources described herein) may be stored in thememory112, thedatabase170, or other processing devices such as user devices, for frames captured subsequent in time for each of the data sources. In response to the position/density of the vertices change or in response to the camera pose for a data source change, the projections may be updated and restored.
Still referring toFIGS. 1A-9B, in the non-limiting example illustrated inFIG. 9B, avertex n970 may be any vertices described herein, and is index in memory by an identifier such as its coordinates in the model of thetarget space105. A plurality of data sources (e.g.,Data Source A980,Data Source B982, . . . ,Data Source N984, as shown in the data source column985) may be provided to capture display characteristics/attributes of vertices within thetarget space105. In some embodiments, at least one data source (e.g., Data Source B) of thevertex n970 may not be configured (due to its camera pose) to capture thevertex n970. Thus, the area coordinates listed in the projected position onframe column990 is listed as void or “N/A.” In other words, the determining of the display attribute associated with thevertex n970 is unrelated to the frames of Data Source B982 (unless either thevertex n970 or the camera pose ofData Source B982 changes). For other data sources, a planar coordinate may be provided to indicate the area on frame that is mapped to thevertex n970. For example, the area of a frame may be indicated by the coordinates (α1,β2). In further embodiments, the mapping table960 may include aframe location column995 detailing where the frame for each data source is stored. The frames may be stored in thememory112, thedatabase170, or anywhere on the Internet.
FIG. 10 is a diagram illustrating a data source-based displayattribute determination method1000 according to various embodiments. Referring toFIGS. 1A-10, each of theCamera A1010,Camera B1020, andCamera C1030 may be any suitable data source as described. Each of the cameras1010-1030 may be configured (based on the camera pose associated therewith) to capture at least some of the vertices (shown in the vertices column1040) in the model of the target spaced105 as described. In the simplicity non-limiting example illustrated byFIG. 10, Camera A may be capture a frame having areas corresponding to each ofvertex1,vertex2,vertex3, andvertex4 as shown in thevertices column1040. Such method may be data source-based given that theweighting module116 may determine the vertices associated with a given camera frame, perform graphic determination processes on the areas associated with the vertices captured by that camera frame, then move on to the next camera frame outputted by a different camera at the current frame time.
Through any suitable image processing methods, theweighting module116 may be configured to determine display attributes (e.g., specific colors as shown in the display attribute column1150) for a given display characteristic (e.g., color associated with the vertices) of an area on the frame (captured by Camera A1010) associated with each of the vertices1-4. For instance,vertex1 may be determined to have the color A,vertex2 may be determined to have the color C,vertex3 may be determined to have the color D, andvertex4 may be determined to have the color E.
FIG. 11 is a diagram illustrating a vertex-based displayattribute determination method1100 according to various embodiments. Referring toFIGS. 1A-11, the vertex-based displayattribute determination method1100 may be an alternative method to the data source-based displayattribute determination method1000. Each of the in thecamera column1160 may be any suitable data source as described. Each of thevertex11110,vertex21120,vertex31130,vertex41140, andvertex51150, may be any suitable vertices described herein.
For each of the vertices1110-1150, theweighting module116 may be configured to determine the cameras that capture the vertex. For example, theweighting module116 may determine that Camera A and Camera C (in the camera column1160) are associated with thevertex11110. Through any suitable image processing methods, theweighting module116 may be configured to determine display attributes (e.g., specific colors as shown in the display attribute column1170) for a given display characteristic (e.g., color associated with the vertices) of an area on the frames (captured by Camera A and Camera C) associated with thevertex11110. Both Camera A and Camera C may capturevertex11110 in the color A.
FIG. 12 is an example of a weighting table1200 implemented according to various embodiments. Referring toFIGS. 1A-12, the weighting table1200 may correspond to the non-limiting example illustrated in either the data source-based displayattribute determination method1000 or the vertex-based displayattribute determination method1100. The weighting table1200 illustrates a simplified example with respect the display characteristic of color.
For each of the vertices (e.g.,vertex11210,vertex21220,vertex31230,vertex41240, andvertex51250) a display attribute bin may be created to store display attributes captured corresponding each of the vertices in an array. As shown in the display attribute (color)bin column1260, theVertex11210 may be captured only in Color A. TheVertex31230 may be captured in Color D and Color F. For each display attribute captured for a given vertex, a weight bin is created, as shown in theweight bin column1270. A weight bin stores weighting values for each of the display attributes captured. Each time a separate data source captures the vertex in a given display attribute, the weighting value for that display attribute is increased by a set amount (e.g.,1, in this case). For example, as shown inFIGS. 10-11, theVertex4 may be captured in Color A twice (e.g., byCamera B1020 and Camera C1030) and in Color E once (e.g., by Camera A1010).
Based on the weighting values in the weighting bins as shown in theweight bin column1270, one display attribute of the display attributes found in the display attribute bin shown in thedisplay attribute column1260 may be selected. In various embodiments, a threshold may be predetermined (in this specific example, display threshold weight=1), such that when the weight associated with a given display attribute exceeds (or equals to) the threshold, it is selected, e.g., by the joboutcome determination module117. For example, forVertex41240, the color A in the display attribute bin may be associated with a weight of 2, which exceeds the display threshold of 1. Thus, Color A is selected. On the other hand, where no display attribute in the display attribute bin exceeds the predetermined threshold, then no display attribute is selected (e.g., shown as “N/A,” as in the cases ofVertex31230 andVertex51250). This may be the case when the vertex is blocked by other objects or is within an object. In some embodiments, the vertices with unselected display attributes may be displayed at a default display attribute (e.g., background color). In other embodiments, the vertices with unselected display attributes may be assigned an average display attribute (e.g., an average color channel) within a predetermined number of vertices (e.g., surrounding vertices).
FIG. 13 is a process flowchart illustrating an example of a 3-dimensionalmodel reconstruction process1300 according to various embodiments. Referring toFIGS. 1A-13, the 3-dimensionalmodel reconstruction process1300 may be particular implementations of the generalized 3-dimensionalmodel reconstruction process200. In particular, block B1310 may correspond to block B210, blocks B1320-B1350 may correspond to block B220, and blocks B1360 may correspond to block B230.
First at block B1310, thevertices determination module115 of thebackend device110 may partition a model of a target space (e.g., any suitable target space described herein) into a plurality of vertices in a manner such as, but not limited to, block B210. In particular, each of the plurality of vertices may be associated with (e.g., positioned in a center of) a discrete volume which is a portion of the target space. In other embodiments, the partitioning may be executed by devices other than thebackend device110. In such cases, the model including the partitioned vertices (e.g., the positions of the vertices) may be imported to thebackend device110 or a remote server (to minimize data processed or stored by the backend device110). The vertices may be partitioned in any suitable manner described.
Next at block B1320, data sources (e.g., any suitable data sources described herein) may capture, at a current frame time, the target space, such that a captured frame of each data source (at the current frame time) captures display characteristics of at least some of the plurality of vertices. In some embodiments, the data sources may output the captured data to a remote server storage/database or a CDN to minimize bottleneck at thebackend device110. In other embodiments, the output of the data sources may be relayed directly to thebackend device110
Next at block B1330, for each captured frame (e.g., at the current frame time) of each of the data sources, thebackend device110 may project each captured vertex onto an area of each captured frame. In other embodiments, given the process-intensive nature of projections, thebackend device110 may outsource the projection processes to additional devices, such as web cores in the manner described with respect to “SYSTEMS AND METHODS FOR SCALABLE ASYNCHRONOUS COMPUTING FRAMEWORK” (application Ser. No. 14/537,768). The projection may project each vertex in the 3-dimensional model of the target space (e.g., having 3-dimensional coordinates) onto an area (a fraction of a pixel, at least one pixel, at least one macroblock, or the like) of each captured frame, which is 2-dimensional.
Next at block B1340, theweighting module116 may determine display attribute associated with each area of each captured frame of each data source, for example, as a given frame time. In an example where the display characteristic is color, thebackend device110 may determine the color associated with each area of each captured frame. In other embodiments, the determination of the display attributes of the areas may be outsourced to web cores in the manner described with respect to “SYSTEMS AND METHODS FOR SCALABLE ASYNCHRONOUS COMPUTING FRAMEWORK” (application Ser. No. 14/537,768).
Next at block B1350, theweighting module116 may determine weighting for at least one display attribute associated with each vertex based on the determined display attribute associated with each area in each captured frame (e.g., as described with respect to block B1340) in the manner described. In typical embodiments, an area of a captured frame may be analyzed to determine what is the display attribute being associated with it. The determined display attribute may indicate an increased weight for that determined display attribute associated with the vertex.
Next at block B1360, the joboutcome determination module117 may select one of the at least one display attribute for each vertex based on the weighting for each of the at least one display attribute associated with that vertex. In typical embodiments, the joboutcome determination module117 may select one of the at least one display attribute having its weighting exceeded a predetermine threshold.
Next at block B1370, the joboutcome determination module117 may reconstruct the model of the target space based on the positions of the plurality of vertices and the selected display attribute. The selected display attribute is assigned to its corresponding vertex, the position of which may be known.
As an alternative embodiment to blocks B1340-B1360, standard deviation of all display attributes observed for a given vertex may be determined. The average of the display attributes (e.g., an average of the color channel) may be chosen when the standard deviation is below a predetermined threshold. On the other hand, the vertex is observed to have different colors in a substantial number of data sources when the standard deviation of all observed display attributes is above a predetermined threshold. Thus, it is likely that the vertex may the interior of a volume in thetarget space300 rather than a vertex associated with a color, and thus is assigned as null or an average of display attributes of surrounding vertices.
FIG. 14 is a process flowchart illustrating an example of avertices partitioning process1400 according to various embodiments. Referring toFIGS. 1A-14, thevertices partitioning process1400 may be particular implementations of blocks B210 and B1310. In some embodiments, thevertices partitioning process1400 may be executed by thevertices determination module115 in thebackend device110. In other embodiments, thevertices partitioning process1400 may be executed by a device external to the backend device110 (e.g., a processor connected to thebackend device110 via a suitable network) having thevertices determination module115.
First at block B1410, thevertices determination module115 may receive exterior boundary of a model of a target space. The model may be automatically generated based on scanner/camera outputs, manually by a user with a user interface (e.g., the user interface114), a combination thereof, and/or the like. In some embodiments, the exterior boundary may be generated by the device (e.g., thebackend device110 or another device) having thevertices determination module115, or another device that generates the exterior boundary of the model.
Next at block B1420, thevertices determination module115 may determine density of the vertices for the model based on at least one of 1) resolution desired, 2) processing power available, and 3) network conditions. The denser vertices may be selected when higher resolution is desired, more processing power is available, and/or better network conditions (e.g., network bandwidth and congestion) are present, vice versa. The resolution may be set manually via the user interface114 of thebackend device110 or through any other device. In addition, the resolution may also be a function of processing power and network conditions (e.g., higher resolution may be associated with more processing power being available and/or better network conditions being present. The processing power available may refer to the processing capabilities of the backend device110 (e.g., the processor111) or the distributed computing framework as set forth in “SYSTEMS AND METHODS FOR SCALABLE ASYNCHRONOUS COMPUTING FRAMEWORK” (application Ser. No. 14/537,768). The network conditions may refer to the bandwidth/usage of thenetwork130 or other networks involved in transmitting data used in the processes described herein.
Next at block B1430, thevertices determination module115 may sample a volume defined by the exterior boundary of the model based on the density, and determine positions of vertices in the volume defined by the exterior boundary of the model. The higher the density, the closer together the vertices are, and the smaller the discrete volume associated with each vertex is. In some embodiments, the volume may be divided into two or more regions, where a first region (e.g., a region of interest where a lot of activities are occurring in the target space) may have a first (higher) density while a second region (e.g., a relatively uneventful region) may have a second (lower) density.
It should be noted by one having ordinary skill in the art that the density determination block B1420 and the sampling block B1430 may be reiterated when the resolution desired, the processing power available, and the network conditions are altered once the initial sampling took place. In some embodiments, when the fluctuation to these elements exceed a threshold, thevertices determination module115 may re-execute blocks B1420 and B1430 for adjustment.
Next at block B1440, thevertices determination module115 may store information relating to the determined positions of vertices of the model in thememory112, thedatabase170, or another remote storage/database to be accessed (e.g., for projections and selecting display characteristics/attributes).
FIG. 15 is a process flowchart illustrating an example of a level-focusedweighting process1500 according to various embodiments. Referring toFIGS. 1A-15, the level-focusedweighting process1500 may be particular implementations of blocks B220 and B1340/B1350. In some embodiments, the level-focusedweighting process1500 may be executed by theweighting module116 in thebackend device110. In other embodiments, the level-focusedweighting process1500 may be executed by a device external to the backend device110 (e.g., a processor connected to thebackend device110 via a suitable network) having theweighting module116.
First at block B1510, two or more levels of data sources may be provided to capture the target space, where each level may include at least one data source. The levels of data sources may be provided in a manner such as, but not limited to, described with respect toFIG. 8A. Next at block B1520, theweighting module116 may set the current level as a first level.
Subsequently at block B1530, theweighting module116 may determine weighting for at least one display attribute associated with a given vertex based on captured frames of source devices of current level. In other words, the source devices of the current level (which may be the first level initially, and other levels subsequently) may output a captured frame each, and each captured frame may include an area corresponding to the given vertex. A display attribute of a display characteristic may be extracted from the area and added to the weighting bin in the manner described. With respect to the given vertex, weighting bins corresponding to each captured display attribute may be developed, where the weighting bins may include the display attributes and weighting based on the captured frames of the current level of data sources.
Next at block B1540, theweighting module116 may determine whether the weighting of one of the at least one display attribute captured by the data sources of the current level exceeds a predetermined threshold. When the weighting of one display attribute exceeds the predetermined threshold, sufficient confidence may be established that the vertex is associated with that display attribute. Thus, at block B1550 (B1540:YES), the display attribute having weightings exceeding the predetermined threshold may be selected for the given vertex. The selected display attribute may be stored in the selecteddisplay attribute bin1280. Subsequently at block B1560, theweighting module116 may withdraw the given vertex from consideration at the current frame time. In other words, the given vertex is withdraw from processing at subsequent levels (involving additional data sources) in the same frame time.
On the other hand, whereas none of the weightings for the at least one display attribute observed by the current level of data sources exceed the predetermined threshold, theweighting module116 may set a subsequent level as the current level at block B1570 (B1540:NO), and the process returns to block B1530. At block B1570, the selected display attribute may be set as void or as an average of display attributes of surrounding vertices when there are no more subsequent levels of data sources.
One of ordinary skills in the art would appreciate that the weighting threshold may be altered as the process advances to a subsequent level (at block B1570). For example, where there are 10 data sources in a first level, the threshold may be set to be 5 (to assure that only one display attribute could potentially exceed the threshold). When there are additional 10 data sources in a second level, the threshold for the second level may be set to be 10. The example assumes increasing the weighting by 1 every time a display attribute is captured.
In addition to the level-focusedweighting process1500 which seeks to withdraw vertices from consideration with respect to subsequent levels, theweighting module116 may employ heuristic methods to further reduce the number of vertices (the corresponding areas on frames) to be processed for each frame time by discarding vertices known to require little to no processing.
For example, a mask-based on motion can be used to discard vertices that are not in the volumes of interest in the target space. The volume of interest may refer to a particular region (e.g., the field where the players are playing, rather than the stands) of special processing interest. In some embodiments, volume of interest may be particular volumes where static objects are located. For example, theweighting module116 may accessdisplay attribute bins1260, theweight bins1270, and the selecteddisplay attribute bin1280 for a given vertex for previous frame times. The weighting for a display attribute associated with the given vertex of the current frame may be further modified by the selected display attributes (as stored in the selected display attribute bin1280) for previous frame times. For example, the weight for each display attribute may be modified by a constant multiplied by a number of times that the display attribute has been selected for the vertex previously (e.g., for previous frame times). As such, the processes described herein may quickly pass over vertices that are commonly static and would have the same display attribute for this frame time as the previous frame times.
FIG. 16 is a process flowchart illustrating an example of aweighting process1600 according to various embodiments. Referring toFIGS. 1A-16, theweighting process1600 may be particular implementations of blocks B220, B1350, and B1530. In some embodiments, theweighting process1600 may be executed by theweighting module116 in thebackend device110. In other embodiments, theweighting process1600 may be executed by a device external to the backend device110 (e.g., a processor connected to thebackend device110 via a suitable network) having theweighting module116.
First at block B1610, theweighting module116 may determine a first display attribute associated with a given area of a captured frame, the given area corresponding to a vertex defined in the model of the target space. In a non-limiting example (where the display characteristic is color), theweighting module116 may determine that a color coded “b65e6f” may be associated with a first area of a captured frame of a first data source. Theweighting module116 may then retrieve storage bins (e.g., thedisplay attribute bin1260, theweight bin1270, and/or the like) for the vertex associated with the first area.
Next at block B1620, theweighting module116 may determine whether the first display attribute (e.g., the color b65e6f) has adisplay attribute bin1260 associated with the first display attribute. At block B1630 (B1620:YES), whereas adisplay attribute bin1260 is already created, theweighting module116 may increase the weighting of the first display attribute by a predetermined amount (e.g.,1). Theweighting module116 may store the updated weighting into thedisplay attribute bin1260.
On the other hand, at block B1640 (B1620:NO), theweighting module116 may a newdisplay attribute bin1260 corresponding to the first display attribute. Then, theweighting module116 may initialize weighting of the first display attribute at block B1650 by, for example, increasing the weighting of the first display attribute by a predetermined amount (e.g.,1) and storing the updated weighting into the newly createddisplay attribute bin1260.
Various embodiments described above with reference toFIGS. 1A-16 include the performance of various processes or tasks. In various embodiments, such processes or tasks may be performed through the execution of computer code read from computer-readable storage media. For example, in various embodiments, one or more computer-readable storage mediums store one or more computer programs that, when executed by a processor cause the processor to perform processes or tasks as described with respect to the processor in the above embodiments. Also, in various embodiments, one or more computer-readable storage mediums store one or more computer programs that, when executed by a device, cause the computer to perform processes or tasks as described with respect to the devices mentioned in the above embodiments. In various embodiments, one or more computer-readable storage mediums store one or more computer programs that, when executed by a database, cause the database to perform processes or tasks as described with respect to the database in the above embodiments.
Thus, embodiments include program products including computer-readable or machine-readable media for carrying or having computer or machine executable instructions or data structures stored thereon. Such computer-readable storage media can be any available media that can be accessed, for example, by a general purpose or special purpose computer or other machine with a processor. By way of example, such computer-readable storage media can include semiconductor memory, flash memory, hard disks, optical disks such as compact disks (CDs) or digital versatile disks (DVDs), magnetic storage, random access memory (RAM), read only memory (ROM), and/or the like. Combinations of those types of memory are also included within the scope of computer-readable storage media. Computer-executable program code may include, for example, instructions and data which cause a computer or processing machine to perform certain functions, calculations, actions, or the like.
The embodiments disclosed herein are to be considered in all respects as illustrative, and not restrictive. The present disclosure is in no way limited to the embodiments described above. Various modifications and changes may be made to the embodiments without departing from the spirit and scope of the disclosure. Various modifications and changes that come within the meaning and range of equivalency of the claims are intended to be within the scope of the disclosure.