WO2014108799A2

Movatterモバイル変換

Info

Publication number: WO2014108799A2
Application number: PCT/IB2014/000030
Authority: WO
Inventors: Quan Xiao
Original assignee: Quan Xiao
Priority date: 2013-01-13
Filing date: 2014-01-13
Publication date: 2014-07-17
Also published as: WO2014108799A3

Abstract

A new way of presenting 3D visual effects with stereopsis (or "Binocular Vision") in real time with interacting control device/prop more realistically with accurate depth feeling, this could be done by calculating according to viewing positions and device/prop position. This invention also relating to "subtract reality" which is the reverse of "augmented reality"(AR) in a way that make real world object or a part of it disappear in the virtual reality environment when using external (Non-HMD) display. Fast configuration/calibration methods also being used in the embodiments.

Description

Apparatus and Methods of real time presenting 3D visual effects with stereopsis more realistically and subtract reality with external display(s)

Cross Reference to Related Applications

[0001] The present application claims priority from US Provisional Patents Application Serial No. 61/751,873 filed January 13, 2013, the full disclosures of which are hereby incorporated by reference herein

Technical Field

[0002] The present invention relates to presenting 3D visual effects with stereopsis (or "Binocular Vision") in real time, and more particularly to presenting 3D visual effects that is associated/interactive with (such as triggered) by user (or user's hand held device) more realistically with accurate depth feeling. This invention also relating to "subtract reality" which is the reverse of "augmented reality"(AR) in a way that make real world object or a part of it disappear in the virtual reality environment using external (Non-HMD) display.

Background Art

[0003] It is known in the prior art to provide users with virtual reality systems. The ability for these systems has increased and they are providing greater image definition, lower prices, and real time 3D stereo pairs. While mixed reality /augmented reality could provide the abilities to integrate real world objects such as control devices, pops/mockups with the virtual scene to provide a more realistic experience, in the situation when user just interact with a normal large screen display or with "traditional" VR with external screens (such as in CAVE environment) in which user does not wearing any HMD(Head mounted display) or see-thru AR display, there are several challenges. This invention resolved these challenges and take advantage of stereopsis to provide realistic presentation of visual effects that "generated" by the real world object user is controlling. It can significantly improve realistic feelings and accuracy. A new way of displaying VR objects called "subtract reality" is also discussed here and could be used together with other 3D display methods. It is basically the reverse of "augmented reality" in a way that make real world object or a part of it "disappear" in the virtual reality environment that using external (Non-HMD) display. Previously this can only be done with Mixed Reality or Augmented reality with HMD. With methods provided in this invention, this can be achieved using a normal external screen (and some other device) with out the need for head mounted display, Mixed-reality goggles or AR see-thru glasses.

Summary of the invention

[0004] Definitions. As used in this description and the accompanying claims, the following terms shall have the meanings indicated, unless the context otherwise requires:

[0005] External Display/screen: The external display herein means the display is not head mounted or body mounted. Usually it is a relatively large screen within some distance range with user and usually do not physically move together with user's head movement.

[0006] 3D rendering engine, for example that of games or CAD, Virtual Reality system and etc, having the ability to calculate the stereo pairs from 3D scene from a certain view point using "virtual camera(s)", just like a camera shooting from that point for the scene in real life.

[0007] intersect/converge: Convergence - The standard definition is 'when both your eyes focus on the same object, the images from each converge on that object.' The angle of convergence is smaller when looking at far away objects; things say, farther than 10 meters. We sometimes use convergence interchangeably with other terms like Parallax, etc. Really only toe-in stereo rigs, where each camera rotates can have a convergence angle. (Might also used interchangeable with Cross point/Intersection). In Fig la, we say that the 2 image for left and right eye "converge" at the position where the virtual object/visual effect locates.

[0008] "Control device/prop" means handheld or body carry /mounted devices, tools, props that user used in the VR environment. It is usually resembling something that is used in the virtual environment (similar to that of the function of "props" used in the movies/TV), for example having a shape of a weapon, such as a gun, or handle of a light sword that is consistent with the game/VR scene being simulated. (The position sensor/tracking system could track the position of such device/prop, and make related "augmentation" by adding visual effects and maybe also input its position into the VR system for interaction.). Some example: weapon, magic wand, gloves, or it might also be non-tangible like some kind of field, light, "force" /'fireball" etc.

[0009] "Visual Effect" associated with the "Control device/prop", some visible effect usually close to the control device/prop, and

triggered/generated by (or interactive with) the device/prop, such as muzzle flash generated by a gun, a light blade generated by light sword handle, stars generated by magic wand, etc. It's spatial location can be determined once we know the position and the orientation of the control device/Prop.

[0010] "Stereo Window" is the plane/surface that is the picture through which the stereo image is seen (like the TV screen). The Stereo Window is often set at the closest object to the screen. When an object appears in front of the stereo window and is cut off by the edges of the screen, this is termed a Stereo Window Violation. [0011] "Subtract Reality" is an invention discussed in the later part of this document. It functions like the reverse of "augmented reality" in a way that make real world object or a part of it looks like "disappear" in the virtual reality environment, or hide by an "virtual object" that appears to be in front, in the VR environment that using external (Non-HMD) display. Previously this can only be done with Mixed Reality or Augmented reality with HMD (Head mounted display), because external displays that is relatively far from user and will always behind the real object (control device/prop) in user's hand, and thus the image of that hidden part can not be seen by the user and thus can not make an virtual object in this area appears "in front of" the handheld object. So traditionally the can be only done by augmented reality or mixed reality in which the display is closer to the eye than the handheld object. With the unique methods and apparatus provided in this invention, the "virtual object in front of hand-held device" can be achieved using a normal external screen with special device/prop that can emit light or providing display on at least one surfaces, as if a part of the device where eclipsed/disappeared to let the light "in behind" to pass. This will be discussed in detail in the paragraphs below.

[0012] A first embodiment of the invention is directed to a method or corresponding apparatus of presenting 3D visual effects with stereopsis (or "Binocular Vision") in real time to user when user using a control device/prop (in a VR environment using external screens) to provide more realistic / accurate and possibly more fun user experience. This includes acquiring user's viewing position(s) such as head position (so that we can infer positions of left and right eye) or positions of each eye and the position, and position and orientation of the "control device/prop" - such as prop, weapon, magic wand, gloves, or it might also be non-tangible like some kind of field, light, "force" /'fireball" etc.— user is using/wearing in real time, so that the spatial location of the visual effect can be determined; Then display (or generate and display) images of the visual effect(s) for user's left eye and right eye onto a external display/screen near user when the given visual effect—such as but not limited to: muzzle flash, laser, light saber/sword, energy/particle beam (such as those from a blaster), fireball, projectile(s), flame, halo, magic-wand-like trails, stars and etc.— is enabled/triggered, the (stereo) images are displayed in appropriate positions on the display/screen such that when user looking at them, each image for the corresponding eye "converge" at the spatial position close to the device at the place it supposed to be, just like Fig la, lb and lc shows.

This can be achieved by using for example one or the combination of the following ways :

a) By selectively displaying the images of each eye for the visual effects at the corresponding intersection point of the extern display/screen surface with the "line of sight" which is determined by the corresponding eye location and the spatial 3D location(point) of the visual effect , image for each eye appears to viewer will "converge" at the spatial position of the visual effect, thus a correct 3D visual effect for viewer is created.

b) An other way to display the images for left eye and right eye at the correct spatial location defined by the 3D rendering engine can be achieved by modifying the 3D scene to add or enable one or more virtual object(s) for the visual effects into the scene, and use the 3D rendering engine's virtual camera to generating appropriate images for the left eye and also for the right eye for the whole scene which take considerations into user's head position in relation to the device's location, and screen location; so basically the virtual camera will "shooting" from observer real-time location.

There could also be multiple ways to implement this, i.e. by moving the location of the virtual camera so that the camera is "looking at" the device the same way (in spatial relationships) the user looking at the device, and/or move the images displayed on the screen (because if the virtual camera moved with user's eye position but screen is fixed, so the image on the screen might need to move especially when the distance of move is significant, or more easier/quicker way is to just move the image in a 2D way on screen instead of moving the camera), so that the visual effects appear at appropriate locations on the screen where the two lines connecting between the eyes and respecting images "crossed" or

"converge" (or having the shortest distance between them when using 3d line) at the point it supposed to be, such as very close to said control device/prop in spatial location, thus it will appear to the user the virtual objects for the visual effects is(are) at the correct/expected spatial location in relation to the device (with correct direction and depth), such as (but not limited to), when user fire a "weapon" using the gun-like control device/prop, the "muzzle flash" appears to the user is at the position very close to the muzzle, (as shown in Fig la, lb, lc)

[0013] In a related embodiment to the 1^st embodiment, wherein the displaying (or: generating and displaying) of images of the visual effect(s) for user's left eye and right eye onto a external display/screen near user" is done in a manner selectively according to circumstances/condition such as whether or not the user is looking at the direction, and/or when a visual effect is enabled/triggered (for example enabled or triggered by the control device/prop, or by the 3D rendering system such as game , but the visual effects "belongs to"/a part of the control device/prop , or the visual effect appears either close to the control device/prop or started moving from/to the device), such as (but not limited to muzzle flash, laser, light

saber/sword, energy/particle beam (such as those from a blaster), fireball, projectile(s), flame, halo, magic-wand-like trails, stars and etc.).

[0014J In a related embodiment, the visual effect generated might be overlay(ed) or "add on" to the images generated by the 3D rendering engine (such as but not limited to game engine) of the scene. This might also be done selectively depending on conditions as discussed in the embodiment above paragraph. [0015] There could be situations in all above embodiments that hiding/resizing or not displaying the visual effects might be desired, for example in the situation when the visual effect will be displayed very close to the border of the screen such that causing "stereo window violation".

[0016] In an embodiment that is related to all of the above embodiments, wherein the position acquiring/ tracing device/method can use technology such as be not limited to: Kinect™ sensors measuring distances of different objects, stereo camera, which can generate image data that a processor can analyze to estimate distances to objects in the image through trigonometric analysis of stereo images, or Alternatively or in addition, distance measuring sensors (e.g., a laser or sonic range finder) that can measure distances to various surfaces within the image. As discussed in more detail below, in the various embodiments a variety of different types of distance measuring sensors and algorithms may be used an imaged scene for measuring distances to objects. Also, more than one sensor and type of sensor may be used in combination. Therefore, for ease of description and consistency, the various assemblages and types of distance measuring sensors that may be included in the VR system such as but not limited to on a head mounted device/user control device/prop are referred to herein collectively or individually as "distance AND DIRECTION sensors. "

[0017] In the embodiments discussed above , the head position tracking/accruing device and the device for acquiring control device/prop's position and orientation may also include sensors such as accelerometers, gyroscopes, magnetic sensors, optical sensors, mechanical or electronic level sensors, and inertial sensors which alone or in combination can provide data to the device's processor regarding the up/down/level orientation of the device/or user's head (e.g., by sensing the gravity force orientation) and thus the user's head position/orientation (and from that viewing perspective) and /or the control device/prop's head

position/orientation can be aquirred or calculated. Further, the helmet /glass weared by user or control device/prop may include rotational orientation sensors, such as an electronic compass and accelerometers, that can provide data to the device's processor regarding left/right orientation and movement. Collectively, sensors (including accelerometers, gyroscopes, magnetic sensors, optical sensors, mechanical or electronic level sensors, inertial sensors, and electronic compasses)_^configured to provide data regarding the up/down and rotational orientation of the head mounted device (and thus the user' s viewing perspective) are referred to herein as "orientation sensors. "

[0018] For Position tracking/acquiring for head/eyes/control devices mentioned in all above embodiments - they can be just working

momentarily as well, which means only when we need those position data, we acquire them at real-time. Depends on different implementation of the tracking system, the position of the device can be tracked with the same system such as when using a tracking system with sensors not placed on user, such as the case of inect, in which case the "externaPy"3rd party" tracking systems tracks the position of both user's head/glass/eye position and the position of the device (including the orintation/which direction it is pointing). Alternatively, separated sensors/tracking system can be used to detect the device's own position—such as using its own cameras to find out its own location— or the position relative to user's head/glass, for example camera(s) could be placed on user's helmet or glasses and capture the stereo images (or IR images) of the device/weapon, or "component light mesuring device" like those used in Kinect could be used to aquire a "depth map" of objects in user's view, and thus location/orintation (of the devive) in realtive to user's head can be obtained/calculated.

[0019] While it might be desirable that real time position tracking is maintained "full time" (or "always on", constant polling), as this might facilitate the 2ⁿ independent embodiment mentioned above (moving/updating position of the virtual stereo camera of the 3D rendering engine together with user's head movement), The above mentioned positions and orientation information( of head, device/prop) can be captured only at the time when there's a need to display visual effect. This "event driven" model only acquire those information when needed, this can be also very useful as 1) this can be used for a another display engine that is "independent" from the Scene 3D rendering engine, as in many situations there's no need or 2) too computational expensive to re-render the whole scene because some movements of user or device, or in case the updating of the whole scene might be hardly noticeable by user because of objects are far away (so might not worthwhile doing it), or in case maybe we just want to render a part of the screen, or just do overlay of images)

{00201 The visual effects/object, according to different type of

device/"weapon" user is using , might be momentarily and do not follow the device movement, such as Muzzle Flash, the smoke trail after launch a missile/projectile/Flare gun, or it can be continues and follow the device such as a torch, a light beam, flame and etc.

[0021] It will be desirable that stereo surround sound effect will be provided together (if the visual effect also has a corresponding sound effect). It is also desirable that the sound effect is in sync with user's movement, and appears to the user that the sound is generated from the direction of the visual effect. This might require a sound system with multiple speakers, and additional surround sound processing capabilities either provided by individual components or by the 3D scene engine (such as some game engine) and using the position information of user's head and that of the controlling device.

[0022] It is possible some visual effects appears to be "incoming" to the user/device can be processed/displayed in the above embodiments also, and these "visual effects" such as projectiles, light saber/sword might be interact with user's props, depends on situation such as the spatial relationship (collision detection), it might be a close call or have impact. Although it might be desirable the 3D rendering engine/game engine do the collision detection, it is also possible this being done outside of the engine in an independent component, which takes user's positional data, might including the full body and limbs position in addition to head and

device/prop position, in order to do an (highly) accurate estimation.

(However there are also some estimation technique can be used so that part or all of the information could be omitted for the calculation on collision detection.)

[0023] It is also desirable these 3D (stereo) visual effects also having (optionally) tactile/force feedback effect (such as but not limited to vibration, impact feeling etc) provided to user by the control device/prop and/or costumes/props user is wearing.

[0024] To keep the real world object and virtual scene/object have the correct depth relationship appear to user, the retina image size is also important. The same object should appear smaller when further away from user. External screens might have different sizes and resolution and usually the image is displayed in a fixed resolution that might be optimized to the display, but not necessarily appear to the user to have the "correct" size of virtual object like those in real world. This would introduce conflict in perception and might cause nausea and other uncomfortable feeling or feeling of unrealistic for user, it might also render the scene looks like "television" or "film" (which have are "zoomed" or "resized" objects/figure) and lacking immersive feeling, and more severely when side by side with real world object, it will create an conflict feeling if similar objects in the real world and virtual world with similar distance to user appears to have significantly different sizes, and this conflict would drastically reduce the realistic or immersive feeling of the whole scene. It is desirable that the size of the object displayed on the external display/screen appears approximately the same size on user's retina as they were in real world. For example, a ball with 3 feet diameter 10 feet away would appear to viewer to have about 17 arc degree, and if the ball image is displayed on the display 5 feet away, it would require the ball image on the screen to be 1.5 feet in diameter to have the same size of retina projection to user as the real world ball. While for objects far away from the viewer some tolerance might be allowed, for object close to user especially when close to the real object user is handling/controlling, it is desirable to achieve close to or substantially close to 1 : 1 proportion for virtual object displayed and real world object with the same size and at the same distance/depth. This require adjusting image size according to the distance between the external display, the angle viewer's "line of sight" to the surface of the display, the resolution and size of the screen, the size, distance and angle of the virtual object. There could be multiple ways to calculate this, one example will be, that we first calculate the "arc angle" that the virtual object should project on view's retina, and with this arc angle, we calculate what size it would require on the external screen. And according to the screen resolution (pixel per inch) and the angle viewer's "line of sight" to the surface of the display, to calculate the actual pixel size of the image.

[0025] Notice that normally we assume viewer's "line of sight" to the surface of the display is 90 degrees when user is at or near the "ideal" position to the screen, but it could be other angles in certain situations such as some curved surface when user is not at the ideal point of view (usually in the center), or off-center image for some very large screen where user looking at them at 60, 45 degrees or even shallower. So in these situations, in order to achieve realistic and accurate image representation to user, we need to take those into account in the embodiments discussed above.

[0026] There could be many other ways to calculate the image size for 1 : 1 perspective display. One could be using the proportional method to determine what is the^wl : l" size of the image according to the distance of the screen vs the distance of the object. For example to display an image for an object 10 feet away on a external display that is 5 feet away from viewer, the image size would be 5/10 times the real size, which is half the size in this case. If the display surface is 3 feet from the user, the factor would be 0.3.

[0027] So, as discussed in the above 3 paragraphs, in another

(independent) embodiment, the image size being displayed on the screen is adjusted according to configuration of the VR environment, such as user distance and screen size, so that the virtual object having the same or similar size with the real word object, from user's perspective. This could be achieved in several ways , such as ,

1) calibrate the system by applying appropriate zoom factor, which can be calculated in many ways

such as:

a) For a given object calculate the arc angle appear to user according to object size and user-to-object distance, and calculate image actual size according to user-to-display distance and this arc angle, then use pixel to screen size to calculate pixel size of the image. In the example above, a ball with 3 feet diameter 10 feet away would appear to viewer to have about 17 degrees ( = atan(3/10) ) of "angle of view", so the image size on a display 5 feet away from user needs to be 5 feet * tan(17) = 1.5 feet (in diameter) to have the same size of retina projection ("angle of view") to user as the real world ball mentioned above. If the screen have a "pixel to size" ratio of 600 pixel per foot (50 pixel per inch), then the image needs to be 1.5x600=900 pixel in diameter.

b) using "proportional" method which is the similar: 1^st calculate ratios of user-display/user-to-object, and use this factor to divide the size of real object to get the actual image size on screen, and then use screen resolution (pixel to size) to calculate the pixel size. So In the example above, this can be calculated as: proportional_factor =

user_to_display_distance/user_to_object_distance and we got 0.5, then the ball image size needs to be 3 feet real-life diameter *

proportional_factor = 1.5 feet in diameter, to have the same size of retina projection ("angle of view") to user as the real world ball mentioned above. If the screen have a "pixel to size" ratio of 600 pixel per foot (50 pixel per inch), then the image needs to be 1.5x600=900 pixel in diameter.

2) As merely zooming might not be taking full advantage of the screen resolution, to use a specific resolution or take advantage of max screen resolution, one solution is to change virtual camera's FOV (field of view) of the rendering engine used, according to what user's actual FOV when looking at the screen. So, the FOV of the virtual camera of the (3D) Image rendering system could be adjusted to the same or similar to the screen FOV to the user's point of view. So if the screen appear to be 60 degree FOV to user, the virtual camera should also adjust to be that value (or substantially close to that). The user's "actual viewing FOV" (or angle or view) to the screen can be determined by screen size/geometry (if curved/ surround screen) and the distance user to the screen, so we are able to adjust the FOV of the virtual camera according to the configuration information above., either one time, or dynamically, according to specific requirements similar to what described in the "zoom method" mentioned above

[0028] In a related embodiment, depends on which method (1 or 2) , the zoom factor or "actual viewing FOV" is calculated or loaded once before the VR simulation starts or at the start of the simulation/game/Other VR activities, according to user's distance to screen, and adjust the screen display, and keep the factor unchanged until another scene which have different virtual camera settings or other incompatible parameters with the current scene that require re-calculate the factor. [0029] In another related embodiment, depends on which method (1 or 2) , the zoom factor or "actual viewing FOV'js calculated dynamically in the VR simulation process according to user's distance to screen. Display system use this "zoom" factor dynamically change the image size. This is useful when user's movement relatively to the screen is frequent or user can notice the difference.

10030] It is also be desirable that in the 1^st and 2^nd independent embodiments, virtual objects in the 3D scene as well as the visual effects using the "1 : 1" display method(s) we just discussed, for example by detecting (or pre-define) user's location to the screen, and "calibrate" related images. The "calibration" might be done in the beginning of the simulation/game/other VR contents, or can be done dynamically with the rendering engine(s) of 3D scene and/or visual effect. So all combinations of the above 3 embodiments for 1:1 display methods with the first 2 independent embodiments discussed in this document and its related embodiments are possible, to create multiple embodiments that take advantage of the position information (for user and etc) acquired and/or features of 3D rendering engine to provide accurate and realistic representation of the scenes and visual effects.

[0031] In another (independent) embodiment, the control device/prop used together with the VR display system that using one or more external display screen, could by itself emitting lights /colors or having display on one or more if its surfaces. The way the lights/display is arranged is that the part that can emit light / or this display area extend all the way to at least one edge(s) of the surface. In case the light emiting area or display area not extending to all edges, for example in case it just extended to 2 adjacent edges of a Hexagon, those edges are to be used towards the external screen when user looking at the device. The idea is to eliminate any visible Non-display area between the light emitting area/display area of the device and the external screen, when the device is used in front of the screen by user, so that they appear "connected" in most of the situation, such as when both displaying the same solid color with same illumination strength appear to user, or when they both display similar patterns, or when the external screen display some stereo 3D virtual objects and the images for left and right eye appears to user converges at or in front of the device, and the light emitting area/display area having the same/similar

color/texture/illumination, this would appear that a part of the object is "hidden" by the virtual object that appears "in front of" the device, like illustrated in Fig 2.

[00321 There are many way to adjust the light emitting area/disp area so that it shows the correct "clipping/hidden area" with correct color and brightness (illumination strength). One way would be using head mounted camera to check/get what is the light strength/ and shape of from the view point of user. Another way is to calculate based on user position, device position, visual effect/object position and screen location, with

consideration of brightness.

[0033] It is also desirable to prevent reflecting contents from the screen, using "matte" display screen surface. Only when solid color is used in external screen and only trick is vanish, can we use simple mirror.

[0034] Optionally there could be light/illumination of the device/weapon user is taking so that it looks the same as on screen. It could also provide stroboscopic effects, the frequency same or with integral

multiplicity/division relation ship with the frequency of active shutter glasses user is wearing, to provide more special effects.

[00351 In another (independent) embodiment, the device itself can display colors or images that selectively "match" what is displayed in the screen when user looking at it, so it can become "Transparent" to user, or it can display colors or images or even stereo images on one or more surfaces that match/related to the visual effects being displayed, so that it appear to user that the visual effects (or as a whole together with the image on the screen) appears in font of the device and "hide" the device (or part of it). This is something can not be done with traditional VR with external display. In a related embodiment the surface could be flat or curved/spherical, etc.

[0036] As shown in Fig. 2A, a handle of "light sword" like those in "Star Wars" is used by user in the VR environment with external display (s). The "upper surface" of the handle which suppose to connect to the body of the light sword, have a light emitting area / display area that extend to the edge(s). When the light sward is turned on, the whole surface or a part of it (the shape like shown in the Fig which is a "projection" of the light sword from user's point of view, it could also be stereo images that can be picked up with user's stereo glasses, such as stereo image pairs in synchronize with the active shutter glasses user is wearing) is displaying the same color and brightness of the light sword, combined with the stereo image pairs displayed on the external screen which dynamically modify their location and size according to user's head position (or eyes positions) and control device/prop (in this case the handle of the light sword) and their position relating to the screen, so that the corresponding image of the light sward body for each eye appears to user converges (or cross) at the spatial location on along the direction of the handle. Because this light emitting area/display area having the same/or very similar color/brightness with the stereo image presented and because there's no visible seam between the 2 images (because the display area extends all the way to the edges), it would appear to the user that the sward is indeed in front of the handle and hide a part of the handle.

[0037] Fig 2B shows an alternative design of the handle which can be apply to more devices, basically the light emitting area /display area is on a curved surface, in the fig shows a spherical surface although spherical surface might also be used and might be preferable in certain occasions, such as rear projection. The spherical surface could be useful when the simulation environment require the device have a larger angle of view. The image shown on such device, however needs more accurate calculation to appear in sync with the external screen.

[0038] Fig 2C shows another shape and extended usage of the display area. The Fig shows a sward like control device/prop, that have display area on at least one side and extend all the way to the edges. Since we have the full control of the display area, we can make part of it such as edges to have the same color/texture with the background or display whatever content that is on the external screen that is hide by this part dynamically, so that it appears to user this part disappeared and the sword looks like to have "notched" "gapping". This is useful to display damages done to the prop, the change of shape of the prop, and etc.

[0039] It is also be desirable that in the 1^st and 2^nd independent embodiments, control devices/props using the "subtract reality" method(s) or having related features such as light emitting area/ display areas extend to at least one edges, as discussed in the above 9 paragraphs. So combinations of the above embodiments for "subtract reality" methods with the first independent embodiments discussed in this document (and some of its related embodiments are possible), to create multiple embodiments that take advantage of the position information (for user and device) acquired and/or features of 3D rendering engine and device to provide visual effects that is only possible with MR or AR (with HMDs) before.

[00401 It is also be desirable that in the 1^st and 2^nd independent embodiments, control devices/props using the "subtract reality" method(s) or having related features such as light emitting area/ display areas extend to at least one edges, as discussed in the above 10 paragraphs further use the "1 : 1" display methods that . This would create multiple very useful embodiments , such as some embodiments similar to Fig.3 shows, that take advantage of the position information (for user and device) acquired and/or features of 3D rendering engine and device to provide visual effects that is only possible with MR or AR (with HMDs) before, and provide a lot more accurate and realistic user experience when user have some real world control device close or "contact" with to the virtual objects/visual effects.

[0041] Heads up display (HUD) or see thru AR(augmented reality) glasses might be used in the VR environments together to provide even more special effects desired, such as night vision (simulated) and etc. Aslo as an alternative to "subtract reality" technology as discussed above, normal AR glasses or Mixed Reality (MR) HMDs might be used in some

cases/embodiments to display or "overlay"/add on images of virtual object that is in front of the device, thus hide a part of the device.

[0042] In all the above embodiments, illumination of people and/or the control device might be desired in order to achieve a "immersive" feeling for user, for example if it is day time in the simulation/game/VR scene is it desirable some illumination which might be restrict to just the user area or device area is provided so that user could see the weapon just like he would in the scene.

[0043] It is desirable in the above embodiments, the visual effects generation mechanism (wether independent or that of the 3D rendering engine) could get information of which type of controller/device user is using, so that the visual effect will be coherent with the current , device/weapon that being used. When user change to another

device/weapon, the visual effects might change accordingly also.

10044) The display related technologies discussed hear can be used independently (such as from the 3D rendering engine for the scene generation/game and etc) as long as they can get appropriate position, orientation and other information from the sensory system, and use(utilize) image processing technology such as overlay/add on/alpha

channel/chroma-keying and etc. technology to merge with the images from 3D rendering engine. Additional ways such as using Display Card hardware function or DirectX/OpenGL layer etc. to do the image processing /merging also possible, so that we just need to individually calculate the position, and no need to work together with the engine (for example the game engine, so that no need to modify the source code). In this way the existing 3D rendering engine such as game do not need to modify. It is also possible that these technologies/embodiments being integrated into the 3D rendering engine (such as game engine) so that to handle all related image processing and position calculation in one place/module.

[004S] Optionally device and weapon might be non-tangible, or even virtual, user can use hand to control, although through using glove, costume is preferable as they are easy to add markers/beacons

[0046] Since different people have different Interocular/Interpupillary Distance or distance between eyes, it is desirable that 2 images for the left and right eye has the same "interaxial separation" distance (which generates "Retinal Disparity" to user) so that accurate depth/distance feeling for the object can be achieved. ( or user can have correct perception of distance from the stereo image pairs). Because of the difference of "Interocular/Interpupillary Distance" of different people, when looking at same image pairs generated by the stereo camera (including virtual camera used in 3D rendering engines), different viewer would get different depth/distance feelings. If the camera having differnt "interaxial separation" distance than the "Interocular/Interpupillary Distance" of viewer, the "convergence angle" captured by the camera will be different with viewer's eyes perceived, so when stereo image pair are presented to viewer, the object feels like in a different position than the actual depth/distance, this is presented in Fig. 4.— notice that when the

"interaxial separation" distance of the camera is different than the d jistance between viewer's eye, the converge point to user is different than actual object position , Thus this will create a problem for user's perception when the virtual objects being presented this way close to or attached to, a real world object with same(or very similar) depth/distance to viewer, as viewer would notice the images converges differently for visual object and the real-world object and they do not have the same(or very similar) distance, and thus create a conflict.

[0047] While it is desirable to be able to track each eye's position, it might not always be possible or convenient (or economy) to do so. Since people's eyes are looking at the same direction and the distance between the eyes are fixed, it might not necessary to track each individual eye position at all time. In stead, we can estimate eye location by using head position or position of one eye.

[0048] In a further related embodiment, the data for the "interaxial separation" distance could come from "fast on site calibration", or from data uploaded by user using various methods/means, such as but not limited to, from network/internet, from removable media, from flash drive/USB port, from I port, from Bluetooth, from other means from smart phone, and etc. the data could be stored in a exchangeable format such as file, XML and ec , and allow system to be configured using such format; in some situations it might even allow wireless communication of such format;

[0049] In a further related embodiment, other data associated with user such as stereo depth limitation, FOV preferences, force feedback

preferences, custom made games and etc could also be loaded to the VR environment.

[0050] It is desirable that the Interocular/Interpupillary Distance calibration parameter (maybe as well as other preferences that specific for individual user) that is acquired from the above discussed embodiments can be use to configure 3D rendering engine, for example setting the "interaxial separation" distance of (virtual) stereo camera, or configure other

(independent) stereo /3D render mechanism.

[0051] In a related embodiment, the 3D rendering engine could change "interaxial separation" distance for the stereo camera (the distance between 2 cameras) for different people (or: to match the current user's Interocular/Interpupillary distance). [0052] So it is desirable that the V systems using the 1^st embodiment as discussed in this document, maybe together with other "1 : 1 proportion display" display methods, "subtract reality" technologies as discussed earlier could integrate with the "Interocular/Interpupillary Distance calibration" technology discussed in the above 4 paragraphs/embodiments and make combinations that provide even more accurate 3D for every individual user to experience.

Brief Description of the Drawings

[0053] The foregoing features of the invention will be more readily understood by reference to the following detailed description, taken with reference to the accompanying drawings, in which:

[0054] Fig. 1 shows how to display a visual effects being triggered by user's control device/prop (such as a weapon) in different stages when the visual effect is moving away from the user.

[0055] Fig la depicts how to get the image location on the screen/stereo window for each user's eye— line of sight from each eye to the visual effect's spatial position and extend to the screen/stereo window.

[0056] Fig 2a shows a control device/prop with "subtract reality" features (light emitting area/display area)

[0057] Fig 2b shows internal structure of a "subtract reality" control device.

[0058] Fig 2c shows another form of "subtract reality" control device.

[0059] Fig 3 shows a VR environment in using the various 3D display and interacting technologies discussed here (such as those related to claim l,claim3 and etc.) are used together with subtracting reality to provide realistic interactive immersive VR experience to user. [0060] Fig 4 shows the camera having different "interaxial separation" distance than the viewer so the "convergence angle" captured will be different with viewer's eyes perceived, so the object feels like in a different position to user viewer.

Detailed Description of Specific Embodiments

[0061] Various embodiments will be presented in terms of systems that may include a number of components, and modules. It is to be understood and appreciated that the various systems may include additional components, modules, etc., and may not include all of the components, modules, etc. discussed in connection with the figures. A combination of these approaches may also be used.

[0062] One example will be, that we have

1. large "external" display screen that do not move or rotate together with user's head

2. Detect user head position, and desirably the location and orientation for each eye.

3. Detect device location and orientation. Determine where is the spatial location of the visual effect (and if it is on or off)

4. (In case we need to display the visual effect, this condition is optional), we determine the location on the screen for images for each eye. This can be either

A) change or adjust or make the stereo camera of the display engine, so that it reflect the spatial position of the 2 eyes (thus the whole scene would change when user's eye change its location.) , and display the visual effects in side the engine as an virtual object at the appropriate spatial location near to the device's location, where appropriate location means the location determined in step3. In this way, the camera automatically take account all the offsets and whole scene changes. or B) directly calculate the location for each image on the screen, so that the image for the visual effect/object for the left eye and the that for the right eye "converge" (cross) at the spatial location determined in step 3. This can be done as shown in the Figl. which shows virtual object (visual effect) being displayed

There are also multiple ways to detect user's head movement , for example using IR beacons or markers on user's active glasses or helmets, and have one or more (IR) camera capturing the images and thus get spatial locations, or Helmet/Glass have a up and/or font looking camera(s) to trace where is the 2 or more IR beacon; It is also possible that user have head mounted camera (maybe integrated with the glasses or helmets) that can find the position of the screen and /or position of the control device/prop. Since when user not looking at a portion of the screen or the device, there's basically little meaning to display related images for that area (or the visual effect near the device). Thus the head/glass mounted camera can be used not only to determine positions, but can also serve as an "sight tracker" to determine which part on the screen can be omitted.

[0063] In an embodiment, One or more screen(s) are placed in font (and surround) the user. User will need to wear a stereo glass. A tracking system is used to track user's head position in real time. As an example a device that can capture user's image, IR image and positions such as inect(TM) sensor is placed in upper-front direction of user, so that user's movement can be detected with out blocking. An other example will be using one or more camera(s) (could be wide angle cameras) on user's helmet/glasses to track IR beacons or image patterns in the environment (such as putted on the upper-front direction), and calculate the head position in relate to the markers/beacons.

[0064] Depends on different implementation of the tracking system, the position of the device can be tracked with the same system such as when using a tracking system with sensors not placed on user, such as the Kinect™, in which case the "external"/"3rd party" tracking systems tracks the position of both user's head/glass/eye position and the position of the device (including the orientation/which direction it is pointing).

Alternatively, separated sensors/tracking system can be used to detect the device's own position—such as using its own cameras to find out its own location— or the position relative to user's head/glass, for example camera(s) could be placed on user's helmet or glasses and capture the stereo images (or I images) of the device/weapon, and thus location in relative to user's head can be obtained/calculated.

[0065] The above position can be captured only at the time when there's a need to display visual effect. However, it will be desirable that they maintain full time capturing , because this would facilitate the method A (moving the virtual stereo camera together with user's head movement) . We can actually mention this in the patent, as it is different with those patents using "Moving Parallex", because we move the camera, and the image changed or not, which part changes depends on many situations (which can not possibly covered by the related patent), so the result could be change or no change, because we might just render just a part of the screen, or if the change is little we might not change at all.

[0066] So, when the visual effect needs to be displayed, (such as triggered by user pressing a button, or triggered by game engine when there's a incoming projectile towards the direction of user), use the position and orientation of the device to determine where is the visual effects (when it is close to device visual effect), or use the position of the head in relation to the virtual environment to determine the position of the visual effect, incase the visual effect is not triggered by the device.

[0067] After we have the spatial position of the visual effects and user's head/glass/eye position, we can 1) move virtual stereo camera in so called "cooperative 3D rendering engine" or 2) direct calculate the position what is should on screen. There are multiple ways to do that, for example by linking left eye position to the spatial position of the visual effect in a (imaginary) straight line, and extend this straight line to the screen to get the location the left image, similar for the right eye image, thus we can get the location of both images on the screen. Similarly, using solid geometry or similar methods, if we can get the spatial location of left eye Xle,Yle and Zle, the spatial location the visual effects Xve, Yve and Zve, and the location of the screen, it will be very easy to calculate which point on the screen that forms a straight line with the above 2 points, thus get the left eye image position.

[0068] When we get locations for both images, we can display the visual effect "on top of " or overlay the current image (using software or hardware functions), this is so called "independent" way that do not require cooperation of the 2D/3D rendering engine for the VR scene.

[0069] Scenario 2, as shown in the Fig 3, user could wear a helmet and an active shutter glasses for 3D viewing (that might be integrated with helmet). User could also wear props/costumes which could provide position information of user's body and head movement, or make the movements be recognized by outside sensors of VR system thus captured the positions. User is using a light saber/sword. And using a spherical surround screen. The position tracking system might be Kinect like depth mapping devices or stereo cameras calculating depth from image pairs, or other method, that could be placed in front and higher than user, so that it can "see" both user's head and the control device. The real world light sword is just a handle, with light emitting area/display screen (Flat or curved) as described in "subtract reality" above. System would generate the light sword part when user turn it on, and follows user hand movement (the handle might have sensor in it). User can use the sward to cut virtual objects, shield/deflect incoming beams and etc., in the virtual world. In case user be able to see the top part of the handle which should be partially "blocked" or "clipped" by the virtual sward, the display on the handle bar would display the image seamlessly with the images provided by the external screen. [0070] The display /light surface of the device could be flat or curved shape, for example like spherical. It is desirable that the display or light cover all one side of the device for example all the "cross-section" side of the light sword, so that when user look at it , the light emit from the device and the light from the external screen behind it could seamlessly "merge together", as if that part is transparent. In this way we can give user an illusion that some virtual objects that "attached" to that surfacejis "in front".

If full surface can not be achieved, at least the part towards the external screen have displayable/light emitting area extend all the way to the edge(s) of one (or more) surfaces without any visible seams (this assumes that side always towards the external display).

Claims

Claim 1. A method of presenting 3D visual effects with stereopsis (or "Binocular Vision") in real time to user in a VR environment utilizing external (non-head mounted) displays includes:

Equipping user with a control device/prop that have the similar look and feel of the tools/weapons in the virtual reality(VR) settings; and

Acquiring user's viewing locations such as head position (so that we can infer positions of left and right eye) or positions of each eye;

Acquiring the spatial position and orientation of the "control device/prop" user is using/wearing in real time, so that the spatial position of the visual effect that is associated with the control device/prop can be determined;

Displaying the visual effects for the left and eye channel at appropriate location on the screen so that the image for each eye appears to viewer "converge" at the spatial position of the visual effect to gain 3D effect for viewer, said appropriate location on the screen can be determined using one or the combination of the following methods:

1) by selectively displaying the images of each eye for the visual effects at the intersection point of the extern display/screen surface with the "line of sight" which is determined by the corresponding eye location and the spatial 3D location(point) of the visual effect;

2) modifying the 3D scene in virtual space to add or enable one or more virtual object(s) for the visual effects into the scene to be visible, and use the 3D rendering engine's virtual camera to generating appropriate images for the left eye and also for the right eye for the whole scene which take considerations into user's head position in relation to the device's location, and screen location;

3) other appropriate methods frequently used in the industry.

Thus to provide more realistic/accurate and fun user experience for user to interact with the VR environment.

Claim 2. A method according to claim 1, wherein the "selectively displaying the images of each eye for the visual effects" can be done selectively according to circumstances/conditions such as whether or not the user is looking at the direction, and/or when a visual effect is enabled/triggered.

Claim 3. A method according to claim 1, wherein the visual effect generated might be overlaid/supertmposed/"add on" to the images (such as images of the 3D scene, images generated by the 3D rendering engine of the scene and etc). This might also be done selectively under conditions as described in the embodiment above.

Claim 4. An apparatus for presenting realistic VR images in real time to user comprised of /including , adjusting the image size on the screen so that the virtual object having the same or similar size with the real word object to the user in front of the screen, according to configuration of the VR environment, such as user distance and screen size.

Claim 5. A method according to claim4, wherein adjusting the image size on the screen including one or the combination of the following methods:

1) calibrate the system by applying appropriate image zooming factor, which can be calculated in ways such as but not limited to

a) by calculating the viewing arc angle (width) of a given object appear to user according to object size and user-to-object distance, thus to determine the actual image size needed according to user-to-dtsplay distance and this arc angle, and then use pixel to screen ratio to calculate the "pixel size" of the image.

b) by using "proportional" method: 1^st calculate ratios of user-display/user-to-object, and use this factor to divide the size of real object to get the actual image size on screen, and then use screen resolution (pixel to screen size ratio) to calculate the pixel size.

2) adjusting the virtual camera in the 3D rendering engine,

the FOV of the virtual camera of the (3D) Image rendering system could be adjusted to the same or similar to the screen FOV to the user's point of view. So if the screen appear to be 60 degree FOV to user, the virtual camera should also adjust to be that value (or substantially close to that). The user's "actual viewing FOV" (or angle or view) to the screen can be determined by screen size/geometry (if curved/ surround screen) and the distance user to the screen, so we are able to adjust the FOV of the virtual camera according to the configuration information above., either one time, or dynamically, according to specific requirements similar to what described in the "zoom method" mentioned above

Claim 6. A method according to claim 5, wherein the zoom factor or "actual viewing FOV" (depends on the specific method being used) is calculated or loaded once before the VR simulation starts or at the start of the simulation/game/Other VR activities, according to user's distance to screen, and adjust the screen display, and keep the factor unchanged until another scene which have different virtual camera settings or other incompatible parameters with the current scene that require re-calculate the factor.

Claim 7. A method according to 5, the zoom factor or "actual viewing FOV" (depends on the specific method being used) is calculated dynamically in the VR simulation process according to user's distance to screen. Display system use this "zoom" factor dynamically change the image size. This is useful when user's movement relatively to the screen is frequent or user can notice the difference.

Claim 8 A method according to claim 5, further including :

Displaying the visual effects for the left and eye channel at appropriate location on the screen so that the image for each eye appears to viewer "converge" at the spatial position of the visual effect to gain 3D effect for viewer, said appropriate location on the screen can be determined using one or the combination of the following methods: 1) by selectively displaying the images of each eye for the visual effects at the intersection point of the extern display/screen surface with the "line of sight" which is determined by the corresponding eye location and the spatial 3D location(point) of the visual effect;

3) other appropriate methods frequently used in the industry.

Claim 9 A method for "subtract reality" to make a part of the object is "hidden" by the virtual object that appears "in front of the device (as illustrated in Fig 2) using including: using a control device/prop together with the VR display system that using one or more external display screen, such control device/prop could emitting lights /colors or having display on one or more if its surfaces. The way the lights/display is arranged on the device is that the part that can emit light / or this display area extend all the way to at least one edge(s) of the surface. In case the light emitting area or display area not extending to all edges, for example in case it just extended to 2 adjacent edges of a Hexagon, those edges are to be used towards the external screen when user looking at the device, (eliminating any visible Non-display area between the light emitting area/display area of the device and the external screen, when the device is used in front of the screen by user, so that they appear "connected" in most of the situation, such as when both displaying the same solid color with same illumination strength appear to user, or when they both display similar patterns, or when the external screen display some stereo 3D virtual objects and the images for left and right eye appears to user converges at or in front of the device, and the light emitting area/display area having the same/similar color/texture/illumination) Claim 10 A method according to claim 9, wherein adjusting the light emitting area/display area for showing the correct "clipping/hidden area" with correct color and brightness (illumination strength) to viewer including one or combination of the following methods: 1) using head mounted camera to check/get what is the light strength/ and shape of from the view point of user. 2) to calculate based on user position, device position, visual effect/object position and screen location, with consideration of brightness.

Claim 11, A method according to 9, It is also desirable to use "matte" display screen surface (instead of glossy) to prevent reflecting contents from the screen.

Claim 12, A method according to 9, Optionally there could be light/illumination of the device/weapon user is taking so that it looks the same as on screen. It could also provide stroboscopic effects, the frequency same or with integral multiplicity/division relation ship with the frequency of active shutter glasses user is wearing, to provide more special effects.

Claim 13, An apparatus for "subtract reality" to make a part of the object is "hidden" by the virtual object that appears "in front of the device, the device itself can display colors or images that selectively "match" what is displayed in the screen when user looking at it, so it can become partially "Transparent" to user, or it can display colors or images or even stereo images on one or more surfaces that match/related to the visual effects being displayed, so that it appear to user that the visual effects (or as a whole together with the image on the screen) appears in font of the device and "hide" the device (or part of it). This is something can not be done with traditional VR with external display.

Claim 14, A method according to claim 9, further includes:

Acquiring the spatial position and orientation of the "control device/prop" user is using/wearing in real time, so that the spatial position of the visual effect that is associated with the control device/prop can be determined; Displaying the visual effects for the left and eye channel at appropriate location on the screen so that the image for each eye appears to viewer "converge" at the spatial position of the visual effect to gain 3D effect for viewer, said appropriate location on the screen can be determined using one or the combination of the following methods:

3) other appropriate methods frequently used in the industry.

Claim 15, A method according to claim 9, further includes:

adjusting the image size on the screen so that the virtual object having the same or similar size with the real word object to the user in front of the screen, according to configuration of the VR environment, such as user distance and screen size.

Claiml6. A method to facilitate and accelerate calibration related process for VR system including:

Detecting and collecting calibration related data for user such as interocular distance and hight of user (or eye hight), store the information in a exchangeable format such as file,XML and ect. And allow system to be fast configured (instead of calibration each time) using data of such format;

Claim 17. A method according to 16, further include using wireless to communicate of data of such format.

Claim 18 A apparatus for realistic 3D presentation, wherein the 3D rendering engine of the apparatus could change "interaxial separation" (the distance between 2 cameras) distance for the stereo (virtual) camera according to the current user's specific interocular/interpupillary distance. Claim 19 A method for realistic 3D presentation, including changing the "interaxial separation" (the distance between 2 cameras) distance for the stereo (virtual) camera of 3D rendering engine according to the current user's specific interocular/interpupillary distance.

Claim 20 A method according to claim 1, further includes:

Detecting and collecting calibration related data for user such as interocular distance and hight of user (or eye hight), store the information in a exchangeable format such as file,XML and ect. And allow system to be configured using data of such format;

Claim 21. A method according to claim5, further includes: changing the "interaxial separation" (the distance between 2 cameras) distance for the stereo (virtual) camera of 3D rendering engine according to the current user's specific interocular/interpupillary distance.

Claim 22, A method according to 9, further including :

using a control device/prop together with the VR display system that using one or more external display screen, such control device/prop could emitting lights /colors or having display on one or more if its surfaces. The way the lights/display is arranged on the device is that the part that can emit light / or this display area extend all the way to at least one edge(s) of the surface. In case the light emitting area or display area not extending to all edges, for example in case it just extended to 2 adjacent edges of a Hexagon, those edges are to be used towards the external screen when user looking at the device, (eliminating any visible

Non-display area between the light emitting area/display area of the device and the external screen, when the device is used in front of the screen by user, so that they appear "connected" in most of the situation, such as when both displaying the same solid color with same illumination strength appear to user, or when they both display similar patterns, or when the external screen display some stereo 3D virtual objects and the images for left and right eye appears to user converges at or in front of the device, and the light emitting area/display area having the same/similar color/texture/illumination)