A SYSTEM AND METHOD TO CREATE THREE-DIMENSIONAL MODELS IN REAL-TIME FROM STEREOSCOPIC VIDEO PHOTOGRAPHS
FIELD OF THE INVENTION The present invention relates to the creation of three-dimensional (3D) models, and more specifically the creation of the same in a real-time manner from stereoscopic video photographs.
BACKGROUND OF THE INVENTION 3-dimensional photography is not new. 3-dimensional photography has been available for over hundred years through stereoscopic cameras.
Panoramic photography, the taking of a photograph covering a field of view that is wide to an entire 360 degrees panorama, has a long history in photography.
US patent 5,646,679, issued on July 8, 1997, discloses an image processing method and apparatus that uses a pair of digital cameras to capture separate overlapping images. The overlapping portions of the images are forming combined image information and a single image over a wide field of view may be created.
While achieving improved panoramic photography, these methods do not provide the visual image data necessary to produce a 3-dimensional image or model. US patent 7,724, 379 issued on May 25, 2010, discloses a 3-dimensional shape measuring method and apparatus, using pattern projector and an image capturing device having a fixed relative positional relationship to each other. The range is calculated in accordance with the formal distortion of the light as it is received by image captured by the camera. US patent 7, 463, 280 issued on Dec. 9, 2008, discloses a digital 3D/360 degrees camera system that uses several digital cameras to capture image data that is necessary to create an accurate digital model of a 3-dimensional scene.  None of the above inventions and patents, taken either singly or in combination, is seen to describe the instant invention as claimed.
BRIEF DESCRIPTION OF THE DRAWINGS Fig. 1 is a block diagram of the system of the present invention.
Fig, 2 is a diagrammatic illustration of a stereoscopic field of view using two digital cameras being part of the system of the present camera.
Fig.3 is a block diagram of the laser or LED projector being part the system of the present invention. Fig.4 is an example of pattern lights (markers) projected by the laser or LED projector.
Fig.5A-5C is a flow chart explaining the steps of the method of the 3-D modeling in an embodiment of the present invention,
DETAIL DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention is a 3-D modeling system and method creating 3-D models in realtime from stereoscopic video photographs designed generally as 10 in the drawings.
The 3-D modeling system and method may operate in two modes:
• standalone system mode, for building models of objects in three dimensional (3- D) for different uses such as advertising, product catalogs, etc.
 · smart glass system mode wherein the system is embedded in smart glasses which measures the area where the user moves around so the smart glasses can recognize the surrounding environment and to enable it to place augmented information at the right range from the user on the relevant real object.
As shown in Fig.l, the 3-D modeling system 10 includes at least two digital cameras 12, 14, a texture video camera 20 and a laser or LED projector 30. Cameras 12, 14 are oriented so that each camera's field of view overlaps with the field of view of the second  camera to form a stereoscopic field of view, and the texture camera 20 field of view that overlaps the stereoscopic field of view, as detailed later in the description related to Fig. 2.
Both cameras 12 and 14 are medium resolution (e.g. 480X640 VGA resolution) cameras, have the same optical properties, e.g. pixel size, distance between pixels, focal length and are panchromatic cameras type. Camera 20 is a high resolution (e.g. 1,920 x 1,080 HD resolution or higher) color camera. The purpose of camera 20 is to record the texture of the objects in the field of view. This texture is later used in the building of the 3D model and gives it a real life presence. The cameras are under the control of a controller 32, which is depicted schematically in the figure. The controller is a system that includes a video processor 40, a command and control 44, a memory 48, a mass storage device 50, a clock 52, an external communication interface 54, a battery 58 and a camera interface 60. The controller could be a general purpose computer system such as a Personal Computer (PC) with enough computing resources, or a custom-designed computer system. The memory is large enough to store the images captured by the cameras.
The cameras have a command and data interface that is in connection with the camera interface. Commercially available cameras are typically available with a Universal Serial Bus (USB), Fire Wire, or another interface for command and data transfer. Additionally it is desirable that the cameras be equipped with a digital command line that will allow a digital signal to cause the camera to capture an image. Use of a single digital control line allows all the cameras to be commanded simultaneously, by a single-digital control signal, to capture an image.
The clock is used to schedule image capture, to tag image data files that are captured and to synchronize command of the camera. The clock should have a resolution and accuracy within 0.01 msec, or better.
The external communication interface 54 may be any data communication interface, and may employ a wired, fiber optic, wireless, or another method for connection with an  external device e.g. reader photo engine, a smartphone, user interface to the smart glasses.
A computer software program, stored on the mass storage device 50 and executed in the memory, directs the controller to perform the various functions, such as commanding the cameras to capture image data, storing image data files. It is also responsible to calculate the range from the digital cameras to any point in the area and together with the images outputted from the texture video camera to build the 3-D modeling of the target object.
In traditional stereoscopic photography, two cameras in a stereoscopic pair are spaced a certain distance apart and each has the same field of view angle. As shown in Fig.2, the two cameras may be spaced any distance apart 300. The lens should have at least 90% field of view overlapping 310. The projector and texture camera are mounted in the middle between camera 12 and 14. The cameras are rigidly mounted and their precise geometry is known.
The problem is to identify in two images the same point of the landscape. Traditionally, it is done using image processing procedure called correlation, but it requires high computing power and its precision is dependent on the light conditions. In order to overcome these issues, the system in the present invention is projecting a light pattern landscape by using a laser or LED projector, which is depicted schematically in Fig, 3. The projector is projecting a plurality of patterns to the target object, while at the same time, the cameras are capturing it. The projector includes an infrared light source 300, a mask 310 and optical elements 320. In one embodiment, a lens is used as the optical elements. The mask is a component produced in photolithography technology consisting of a pattern that blocks, or transmit portions of the light beam from being projected. It is a passive flat glass coated with opaque material to create the pattern needed. The cameras include a filter that transmits the wavelength of the projector and rejects all other wavelength. The purpose of the filter is to reduce ambient light and enhance the illumination markers to the cameras.
If at any time, all the markers are unique, and do not repeat themselves, every image captured by the cameras will include the same marker, as the captured marker is the  optical reflect of the marker that is projected by the projector. Based on this method, a large number of correspondences between the pixel coordinates on the captured image of each camera can be retrieved.
In order to be able to identify the markers and create a geometric structure that is unique, the marker has to be built from a collection of 4 pixels by 4 pixels and the angular size of each pixel of the marker should be the same angular size of each pixel of the cameras. Assuming that the cameras have a VGA resolution (480x640 pixels so 307200 pixels), and assuming that each marker is separated from the neighbored marker by an empty pixel line, so each marker is about 5x5 pixels, 12,228 different markers may be applied, An example of the markers is shown at Fig. 4.
A search window of 6x6 pixels moving across the field of view enables to identify the markers in each of the captured images. Once the marker is identified in the two pictures, the range from the object to every pixel of the marker on the camera focal plan can be calculated from the two images by solving the stereoscopic equation. As shown in Fig, 5A-5C, the process to build the 3-D model includes following stages:
1. Capturing images from the two cameras 12, 14 (500).
2. Building a cloud of the points (530).
3. Capturing image from the texture video camera, synchronized with cameras 12, 14 (550). 4. Applying texture to cloud of points (560)
5. Generating the three-dimensional and three hundred-and-sixty degree model (570) 1. Capturing images from the two different digital cameras 12, 14 (500)
This step includes:
1.1. Capturing image of the scene with a target lit by the projector light by camera 12 - image 1 (510).
1.2. Capturing image of the scene with a target lit by the projector light by camera 14 - image 2(520). 2. Building a cloud of the points (530)
This step includes:
2.1 Going over all image 1 and performing: 2.1. Searching for markers in image 1 by activating a window search of 6x6 pixels
(523).
2.2 Building a list of the pixels relevant to each marker at image 1 (534)
2.2. Going over all image2 and performing:
2.2.1. Searching for markers in image 2 by activating a window search of 6x6 pixels (538).
2.2.2 Building a list of the pixels relevant to each marker at image 2 (540)
3. Calculating for every pixel in the images the range to the object (544), obtaining about 307200 ranges, one for every pixel, this ensemble has a unique coordinate in the image. 3. Capturing image from the texture video camera (550)
This step includes:
3.1. Capturing image of the scene with a target lit by the projector light by texture video camera 20- image 3 (552). As mentioned earlier, all the cameras are synchronized. 4. Generating the reconstructed object (560)
This step includes:
4.1 generating the reconstructed target object by stretching the texture image from stage 3 over the cloud of points from stage 2 as defined above.
5. Generating the three hundred-and-sixty degree model (570)  The purpose is to build all around model of the target object, , either by measuring the same scene while moving the cameras set around it, or by measuring the same scene while rotating the target object with the cameras and the projector fixed.
This step includes, in an embodiment where the cameras are rotated:
5.1 Moving the cameras around the target object by a pre-defined angle (572) (e.g. 120 degrees)
5.2 Performing stages 1- 4 (574).
5.3 Stitching together the reconstructed object from stage 4 to the reconstructed object obtained at stage 4 by moving of the cameras around the target object. (576)
5.4 Performing steps 5.1 - 5.3 to cover full rotation (hundred and sixty degree around the target object).
In an embodiment of the present invention, the result of the method provides actual distances between the cameras and the objects perceived by said cameras in the 3-D model.