APPARATUS AND METHOD OF AUGMENTED REALITY
The present invention relates to an apparatus and method of augmented realily.
Most conventional augmented reality (AR) applications fall into one three broad categories; gaming, advertising and navigation. Of these, portable game applications typically rely on the use of so-called fiduciary markers being placed within the environment, such as in the game Invizimals'. AR Adverts similarly use markers, normally in the form of so-called QR codcs read by smartphoncs, such as thosc produced by Onvert' (http://onvert.com/). Other augmented reality systems use a mixture of GPS and compass measurements together with map data to indicate places of interest in a captured view. Other navigations systems use recognition of unique landmarks or tTademarks in lieu of fiduciary markers to recognise scenes and overlay relevant information.
However, there is still scope for additional AR strategies.
In a first aspect of the present invention, an entertainment device is provided in accordance with claim 1.
In another aspect of the present invention, a method of video image augmentation is provided in accordance with claim 12.
Further respective aspects and features of the invention are defined in the appended clahns.
Embodiments of the present invention will now be described by way of example with reference to the accompanying drawings, in which: -Figures 1A and lB are respective schematic diagrams of the front and rear of a portable electronic device in accordance with an embodiment of the present invention.
-Figurc 2 is a schematic diagram of thc functional structure of a portable electronic device in accordance with an embodiment of the present invention.
-Figure 3 is a schematic diagram of the ftinctional structure of a portable electronic device in accordance with an embodiment of the present invention.
-Figure 4 is a schematic diagram of a captured video image iii accordance with an embodiment of the present invention.
-Figure 5 is a schematic diagram of a captured video image in accordance with an embodiment of the present invention.
-Figure 6 is a schematic diagram of an augmented captured video image in accordance with an embodiment of the present invention.
-Figure 7 is a schematic diagram of an augmented captured video image in accordance with an embodiment of the present invention.
-Figure S is a flow diagram of a method of video image augmentation in accordance with an embodiment of the present invention.
An apparatus and method of augmented reality are disclosed. In the following description, a number of specific details are presented in order to provide a thorough understanding of the embodiments of the present invention. It will be apparent, however, to a person skilled iii the art that these specific details need not be employed to practice the present invention. Conversely, specific details known to the person skilled in the art are omitted for the purposes of clarity where appropriate.
Referring now to Figures lA and lB these illustrate an embodiment of a suitable portable entertainment device (PED) 10 for use as an apparatus for augmented reality and to implement a method of augmented rcality. Thc example PED is thc Sony ® PlayStation Vita ® (PSV). Figure 1A shows a notional front or top side of the PED, whilst Figure lB shows a notional rear or bottom side of the PED. The front and rear sides are substantially parallel to each other.
On the front side, the PED comprises a display 200 and optionally one or more loudspeakers (not shown).
In addition, the PED may comprise a number of physical controls. For example in Figure IA, a directional joypad 330 is located to the left of the display and comprises four directional buttons 331-334, and is also located adjacent a first joystick 335. In addition a shoulder button 336 is provided at the top-left of the PED. Finally, a button 337 (for example a PS' button) may be provided, enabling a user to access the PED's operating system at any time.
To the right of the display, a function joypad 340 comprises four function buttons 341-344. These function buttons arc differentiated by their icons, such as a triangle, circle, cross and square. The function joypad is located adjacent a second joystick 345. In addition a shoulder button 346 is provided at the top-right of the PED. Finally, two buttons 347, 348 may be provided, for example providing a start' function and a select' function.
In typical use, the sets of controls on the left and right side of the PED are used co-operatively by a single user to control a game. Such a typical usage may be to control positional movement of the player within a game environment using either the directional joypad or the left joystick, whilst controlling the direction of view, or a reticule or similar, using the right joystick. Meanwhile, in-game functions just as jumping, firing a weapon, blocking an attack or interacting with an object may be assigned to respective buttons of the function joypad. Meanwhile the shoulder buttons may be used either for less frequent functions, or may be used to provide alternate modes of operation (such as primary or alternate firing modes).
The buttons of the directional joypad and the function joypad may be differently shaped, with the buttons of the directional joypad shaped in response to their respective direction, whilst the buttons of the ft,nction joypad are generally identical in shape.
In an embodiment of the present invention, the PED comprises a rear touch sensitive surface 320 (indicated by the dotted lines), having similar dimensions and aspect ratio to the display 200. The rear touch sensitive surface is positioned so as to be substantially aligned with the display on the opposite side of the device.
Meanwhile, a transparent front touch sensitive surface 310 (indicated by the dotted lines) is also provided coincident with the display 200. The front and rear touch sensitive surfaces and the display thus have similar dimensions and placements on their respective sides of the device. The touch sensitive surfaces may also have a similar resolution of touch localisation.
The rear touch sensitive surface may be a conventional capacitance touchpad or panel such as that found in laptops. Such a touchpad typically comprises two layers of parallel conductive lines separated by an insulator and arranged at right angles to each other. A high frequency signal is swept through every respective pairing of lines between the two layers. The measurable current for each pair is then proportional to the capacitance at their point of intcrsection. When a user's finger is placed at or near that intersection, however, some of the electrical field between layers is shunted to wound, changing the effectivc capacitancc and hcncc thc measured current. Precise localisation of thc user's finger can bc achieved by measuring changes in capacitance at nearby points of intersection, which will be proportional to their respective distances from the finger. So-called multi-touch operation of the touchpad can be achieved by detecting distinct peaks in capacitance change at separate intersection points on the touchpad.
1 5 Meanwhile, movement of a user's finger or fingers can be estimated from successive points of intersection where contact is detected.
The front touch sensitive surface for use with the display operates in a similar manner to the rear touch sensitive surface, but in this instance the conductive lines are typically transparent (as a non-limiting example, being formed by a deposition of indium tin oxide), and the insulator between two layers is provided by all or part of the display Tindo' (e.g. a glass layer); typically a further transparent protective layer is then provided on top of the upper conductive layer.
It will be appreciated however that any suitable touch sensitive technique may be used for either touch panel.
Referring now also to Figure 2, an embodiment of the PED comprises a central processor (CPU) 100, such as the ARM® Cortex-A9 core processor, coupled to random access memory (RAM) 110 (for example 512 megabytes (MB) of RAM), and optionally to a read only memory (RUM) (not shown). In addition the CPU communicates with a gi-aphics processing unit (CPU) 220. The CPU has access to vidco RAM (VRAM) 250 (for cxample 128MB of VRAM). Thc GPU outputs video information to thc display 200. The display is typically an OLED display, but may be a conventional liquid crystal display (LCD) or any suitable display technology. As a non-limiting example the display niay have a resolution of 950x544 pixels. The CPU also outputs audio to loudspeakers 205 and/or to a headphone jack (not shown).
In addition, thc CPU communicates with an input / output bridge (I/U bridge) 120 that co-ordinates communication with peripheral components both integral to and linkable with the PED. In an embodiment of thc PED the I/O bridge 120 communicates with a surfacc input conftollcr 330, which parscs inputs from thc rear touch sensitive surface and the transparcnt front touch sensitive surfacc where provided. The 1/0 bridge also communicates with an optional motion input unit 400 comprising one or more micro electromechanical (MEMs) accelerometers aud/o!' gyroscopes, to provide up to six axes of motion input (x, y arid 7 axis lateral movement and roll, pitch and yaw rotational movement). The I/O bridge also receives input from the physical controls (buttons and joysticks) shown in Figure IA, optionally via an input control logic (not shown). Finally, the PU bridge communicates with a bus 130, upon which various peripheral devices may be linked, including one or more wireless communication units 140, such as for example 3G, WiFi (such as IEEE S02.1 lb/g'n), and / or Bluetooth® units.
It will be appreciated that the CPU 100 may be a single core or multi core processor, such as the ARM® Cortex-A9 core (having 4 cores). Similarly, the RAM may be dynamic RAM or may comprise both dynamic RAM and static (flash) RAM units. Likewise, whilst thc CPU typically uses dedicated VRAM, altcrnatively or in addition it may sharc common RAM with thc CPU. Finally, it will bc appreciatcd that the function of the surface input unit maybe performed by the c:pu itself It will also be appreciated that whilst not shown in the figures for the purposes of clarity, the PED comprises an array of switches aligned with the buttons described previously, and also two joystick input meehanisms, each of which is able to provide input to the I/O bridge, optionally via an input control logic (not shown). Similarly not shown, the PED also comprises power distribution lines to various components aix! one or more sources of power, such as an input socket (for example a conventional DC power socket, or alternatively or in addition a USB socket, not shown). Such an input socket may also be used to charge one or more batteries (also not shown). Such batteries may be user removable or may be sealed in the device. Other components not shown include, for example, an optional microphone.
Referring now also to Figure 3, an embodiment of the PED may comprise one or more additional components, either intcgratcd within the device or cornwctable to it. The additional components include, but arc not limited to, the following.
a) A card reader 160 suitable for reading from and optionally writing to memory cards, such as the Sony ® Memory Stick ®, or alternatively legacy memory cards such as those used by the Sony ® Playstation 2 ® entertainment device. Such a reader may be integral to the PED or connect to the bus 130 viaaUSB port 180.
b) A universal media disk (UMD) reader 170 or other optical disk reader (such as DVD or Blu-Ray, for accessing media and/or game content stored thereon. Such a reader maybe removably connectable to the bus 130 via a IJSB port 180 or proprietary connection.
c) A magnetometer 410 for determining compass direction, mounted integral to the FED either on the bus 130 or as part of the motion input unit 400. A gravity detector (not shown may also be included to determine the direction of gravity, either as part of the magnetometer or as a separate component.
d) A third generation (3G) or other mobile telephony and/or mobile data communication module 150. In an embodiment, the module and aerial are integral to the FED, and optionally the aerial is shared with or otheiwise coupled electromagnetically with other wireless units in the device for the purpose of transmission and reception. Alternatively the module may be removably connectable to the FED, for example via a USB port 180 or a Personal Computer Memory Card International Association (PCMCIA) slot (not shown).
e) A hard disk drive (HDD) 190 integral to the FED, providing bulk storage for audio / video media, downloaded games, and the like.
1) A GPS receiver 420. Again the GPS receiver may share an aerial with one or more other wireless units (such as WiFi) within the FED. Map information, where used, may be stored locally at the receiver, or in flash RAM of the FED, or on an HDD of thc FED.
One or more video cameras 240, typically each comprising a charge coupled device (CCD) optical sensor and suitable optics for imaging onto the CCD. The resolution of the CCD may for example be 640x480 pixels, but may be any suitable resolution, such as for example 1920x1080 pixels (full HD). The effective resolution may vary with frame capture rate. In an embodiment the or cach video camera is integral to the FED (for example with one mounted on each of the front and rear surfaces, so providing a forward facing camera and a rearward facing camera), but alternatively may he removably connectable to the bus 130 via a USH or proprietary connection.
An embodiment of the FED comprises two such video cameras 240 on one surface, thereby forming a stereoscopic pair.
In embodiments of the present invention, the FED will comprise at least one video camera.
In operation, the CPU accesses an operating system that is resident for example on a built-in ROM, flash RAM or a hard disk. The operating system co-ordinates operation of the various functions of the PED and presents a user interface to a user of the device. The user interface will typically comprise graphical outputs via the display and touch based inputs, but may also include audio outputs and/or motion-based inputs, and/or inputs from the various physical controls of the device.
The touch based inputs to the FED can be peculiar to the arrangement of a display on the front of the PED and a correspondingly positioned touch sensitive surface (or panel') on the rear of the FED This allows the user to treat the rear panel as a proxy for the display (in other words, address actions and inputs to the rear touch panel as if to the display, and/or point on the panel in order to point to the display). Thus for example, the user can point to icons or other displayed features from apparently underneath the display by touching the rear touch panel at the corresponding position.
It will be appreciated that unlike a laptop touch panel, the rear touch panel has a substantially 1:1 scale relationship with the screen, thereby not just enabling motion of a mouse pointer on screen that corresponds to motion of touch on the panel (for example), but furthermore also enabling direct placement of such a mouse on the screen at the position con'esponding to the touch on the panel, because as noted above the real' touch panel can be understood to represent the screen (i.e. act as a proxy).
Notably, because of thc relative orientation of the display and the rear touch panel, left-to-right mapping across the rear touch panel is therefore reversed to correspond to left-right mapping as seen from the front, so as to allow pointing to the appropriate position on the display. Optionally this reversal is switchable depending on the orientation of the device as detected by the motion input unit, and/or according to what peripheral devices are connected: for example if the FED were connected to a television and then held display-down for use, the left-to-right mapping of the touch panel input may not be reversed.
Use of the rear touch panel as a proxy for the display advantageously allows interaction with the graphical output of the device without the user's hand or fingers obscuring the display or marking the display window.
In addition, the subjective experience of controlling the displayed interface from behind or underneath the screen allows for new modes of user interaction; for example selection, highlighfing or magnification of a screen elcment may be achieved by a user pushing the element toward' them from behind the device. For a capacitance based touch panel, an increase in pressure on the rear panel (i.e. a push) can be detected by a flattening of the user's finger, which results in a larger covered area and hence more points of intersection in the panel having reduced capacitance. Conversely a reduction in pressure reduces the number of intersection points where touch is detected.
In conjunction with the similar but transparent front touch sensitive surface overlaid on the display, ftu'ther modes of interaction become possible. For example, objects may be selected by being pinched between thumb and forefinger, with the thumb and forefinger touching the front and back touch panels respectively. The object may then be moved around, and, for example, activated by using a squeezing action between thumb and forefinger.
Further modes of interaction rely on the correspondence between position and / or motion of the user's fingers on the two touch panels. For example in a video playback application, stroking a finger across only thc top touch panel may be interpreted as a fast-forward or rewind command (depending on direction), whilst a pinch hold followed by corresponding movement left or right of both fingers may be interpreted as selection of a specific point in playback (i.e. where the total playback time is scaled to the width of the touch panels). By contrast, however, a pinch hold followed by both fingers moving in opposite directions to each other may be interpreted as a twisting action, and adjusts a virtual volume dial.
A similar grainniar of interaction can be used for example for document or e-book navigation, with scrolling, page selection and zoom replacing the above playback functions.
Turning now to Figure 4, in an embodiment of the present invention a user of the PED may operate it outdoors in a predetermined AR mode, and capture the depicted scene using the video camera. The scene 500 shows a road 510, having a luminance of for example 10% (on an arbitrary scale). To the left a first building 520 has a shaded face 522 with a luminance of 40% and a sunlit face with a luminance of 80%.
Meanwhile a second building 530 has a shaded face 532 with a luminance of 40% and a shadowed face with a luminance of 25%. A third building 540 at the end of the road only has a shaded face visible, with a luminance of 40%. Meanwhile, the sky 550 has a luminance of 70%.
In an embodiment of the present invention, the PED segments the sky within the image.
One or more of the following techniques may be utilised to do this.
Firstly, the sky may bc scgmentcd on the basis of luminance in thc image.
Notably however, in the above example this would not work filly because the illuminated face 522 of the building 520 is brighter than the sky (for example because it is catching sunlight).
However, it will be appreciated that for a colour video camera with, for example, red, green and blue colour channels, three luminance values (one for each colour channel) may be obtained at each pixel in the image, and the colour of the sky may be different to that of the illuminated face of the building. Hence the sky may be segmented on the basis of luminance and/or colour.
However, the sky itself can be a number of different colours; varying shades of blue, varying shades of grey/white, and occasionally other colours such orange, pink and yellow. Hence it may be difficult to segment the sky from the illuminated face of the building without additional information.
A location dependent weather report obtained wirelessly may provide distinguishing information (for example indicating sunny blue skies or heavy cloud) but generally these are too unreliable on a moment-to-moment basis to be a guide to the current colour of the sky.
However, the PED itself eompnses gyroscopes and optionally other sensors, and hence can detect its own onentation and hence also which way is up in a captured picture. Assuming the user is outside, then an area of an image immediately above the user could be assumed to be a good sample of current sky conditions.
In practice, the user will probably not point the device directly upwards initially, and so an approximation of a likely initial position for the sky may be made.
Hence in an embodiment of the present invention, the device selects N% of the image closest to the area immediately above the user as a sample of the current sky conditions. The device is able to measure its own angle of inclination and stores the angle of field of view of the camera, and hence the effective angle to the horizontal of the line of sight to any pixel in the image can be calculated. If the camera of the PED is pointing straight up, then the pixels in the centre of the image are viewing at 90° to the horizontal. By contrast, as the angle of the device tips downward, then there conies a point where the angle of the device and the angle of the field of view in combination reaches 90° to the horizontal. At this point, only the pixels at the very top of the image still capture the area directly above the user.
Hence immediately prior to this point and thereafter, the top N% of the image may be taken as the best available approximation of the current sky conditions, where N is an empirically determined percentage.
Referring to Figure 5 (which in other respects is identical to Figure 4), here the N% sample of the image 560 comprises a small section of the left-hand building anda section of sky.
Within this sample, the PED may usc a majority-vote scheme to determine what parts of thc sample correspond to the sky. This vote may in addition be weighted to favour the central region of this sample.
The majority vote may make use of the majority colour balance in the sample, and/or reference to known sky colours (for example weighting in favour of pixels within a tolerance of these colours).
Other techniques to evaluate this sample of the sky (or alternatively some or all of the image) include discounting sample pixels within a certain distance of an image discontinuity (i.e. a net change in the image, such as a sum-squared change between adjacent pixels in one or more colour channels) Such high-frequency image artefacts are likely to be more common iii image areas comprising buildings and similar stmctures than in the sky itself. In particular, any region of the sample containing a threshold number or density of such high frequency artefacts may be actively discounted as a candidate for the sky.
Alternatively or in addition, in an embodiment of the present invention the FED is able to determine what elements of an image are below the horizon. For example the lower corners and edges of the buildings 520 occur below the horizon (as calculated from the PED's orientation and the field of view of the camera) and hence cannot form part of the sky. Consequently projecting up fi-om these edges, pixels of a similar colour can be assumed to belong to the building and not the sky.
Tn a similar manner, the PED may generate a sample colour palette for pixels below the horizon and discount pixels within the image as a whole that are within a tolerance of these colours, or within the same tonal group.
Alternatively or in addition, in an embodiment of the present invention the PhD is operable to use GPS signals and optionally compass / magnetometer signals to determine where in the image is likely take a viewpoint along a road, and hence is likely to show sky directly above the road. This region may then be sampled, either directly or by weighting a majority vote or similar evaluation scheme for the N% sample described previously.
It will be appreciated that the image used to detect the sky may be a copy of the video image intended for augmentation, and may for example be subjected to initial image processing before segmentation.
Example processing includes so-called stretching of the luminance, so that mid-tones and shadows are made darker whilst highlights are made lighter. The mid-point (i.e. the brightness level that does not change in this process) may be an empirically detemiined brightness level, such as for example the 10% less than brightness level of the sky detected during the last five uses of the sky segmentation system.
S
Other processing examples include reducing the image to a reduced-colour palette. This palette may be calculated in response to the captured image, or may be pre-set and comprise a palette of; for example 32, 64, 128 or 256 colours, including a range of colours commonly associated with buildings and a range of colours commonly associated with the sky. This reduced colour depth iii turn reduces the subsequent computational load on the PED when analysing the image, emphasises high-frequency image artefacts, and makes tracking building edges simpler.
Hence for a single image, the FED may use a calculated estimate of the direction of view of pixels within the image to find the best approximation to a region of sky directly above the usc!-, and from that sample region attempt to discount buildings by virtue of their luminosity, colour, comparatively high-frequency spatial structure and/or their continuity with the ground. The remaining candidate sky pixels may then be analysed to identify the mix of colours present and their respective brightness values, to form a set of candidate valid sky colours and brightnesses.
The FED may then search outward from within the N% sample region (or some predetermined initial region, or a region best matching the candidate sky palette), identifying a contiguous region of the image that falls within a predetermined threshold tolerance of the set of candidate valid sky colours and brightnesses.
In addition to the above techniques for a single image, it will be understood that the camera will be generating images at a typical frame rate of 25 or 30 frames per second. Consequently historical information about where the sky is (from a previously estimated image or images) can be used as a first approximation, in conjunction with the detected change in orientation and position of the PED between fl-ames.
Hence the single image techniques described above may be used to initialise or bootstrap the system, which may then subsequently use historical infonnation to approximate the position of the sky in the current image, and use valid colour and brightness data from the preceding image to re-estimate the extent of the sky in the present image. Because the PED can calculate any change in position within the image due to a change in position and orientation of the PED, and because for a frame rate in the order of 25 to fl-ames per second the sky can be assumed to have a static pattern, the sets of valid colour and brightness values for use in evaluating the cun'ent sky image can be made local, deriving from corresponding areas of sky in the preceding image. This allows for local but strong variation in colour to be accommodated, such as white or pink clouds in a blue sky.
Alternatively or in addition, for a sequence of images, parallax infomiation may serve to distinguish foreground buildings from the background sky, although it should be noted that in windy conditions then moving clouds may appear to have a false parallax.
Using one or more of the above techniques as described above, the FED is operable to identi' those pixels of the image likely to correspond to sky, and is operable to segment these pixels within the original video image, for example by defining a mask for this image corresponding to the region of sky.
In an embodiment of the present invention, the captured image is augmented as follows.
In an embodiment of the present invention, the video is augmented to display objects flying high in tile sky. The objects may for example be silhouettes of spaceships. To realistically augment the image, the PED can treat the sky as existing at a predetermined distance -for example 5000 metres -and position the objects upon a horizontal plane at that vertical distance from the position of the PED itself The objects are only shown on the augmented image within the masked area obtained as described previously.
More generally, the PED can select a horizont& p'ane for the object or objects in the range 200-10,000 meters above the locus of the PED, and more preferably in the range 500-5,000 meters, and more preferably still in the range 1,000-3,000 meters.
It will be appreciated that whilst objects may be positioned on the horizontal plane at the selected height as this allows rows of objects to draw the eye to the vanishing point on the horizon, not all objects need to be positioned at the same height and objects need not remain at the same height over successive video frames.
Certain graphical effects may be provided to enhance the sense of height and/or scale of the objects. For example, in a blue sky, the outline of white clouds within the blue background may be detected using conventional methods, and an alpha channel may be created in which the alpha or traiisparency value gradually changes from transparent to opaque within an M-pixel internal border of the cloud, so that the flying objects seem to disappear into or behind the cloud and re-emerge on the other side.
Where the sky is predominantly white, the alpha channel may be used to make the objects partially obscured across this white region, and optionally the transparency of the alpha channel may be made inversely proportional to the luminance of the sky -in this way as the objects appear to pass through the cloud, they will be more visible behind' wispy or light cloud and less visible behind' grey or heavy cloud. This gives the impression that the objects are genuinely behind the clouds and hence high up and very large.
Hence more generally, the alpha channel may be made proportional to brightness for a colour range primarily ranging from white to dark grey, with the minimum transparency being proportional to the proportion of such colour in the sky, so that where clouds are isolated, the object are obscured, by where clouds are widespread, the objects are visible hut partially mixed with the cloud layeL This automatically provides the sense that the objects are behind the clouds whilst ensuring that the objects remain visible in heavy cloud cover.
Similarly, optionally any small object (such as a bird or plane) surrounded by sky may automatically not be masked (because of its different colour) and hence appear to be below the objects. Alternatively, if such objects are included in the sky mask (for example due to how a sky search phase handles noise), then these objects may also be treated as opaque using the alpha channel.
The objects themselves may move according to arbitrary routes or rules, or may follow routes or rules relevant to a particular game.
Alternatively or in addition, objects may move with reference to GPS and map information so as to follow above roads or similar prominent features -for example circling a peak as detectcd from a relief map, or hovering above a geo-tagged target as part of a game.
Hence for cxamplc in Figure 6 (which in other respects is idcntical to Figure 4), augmcntcd image 500' shows a procession of spaceships (570) travefling paraflel to the road down which the user is looking, based upon GPS and map information. Again, once initially established, frame-by-frame updates may be based upon the local motion sensing of the PED. In another example, the spaceships may be Invelling transverse to the road. The intention is to make the spaceships appear in some manner purposeful by clearly relating their movement to identifiable features of the environment.
The present invention is not limited however to spaceships or other flying vehicles, creatures or people, or to objects in general. For example, the PED may notionally position a status panel in the sky above the user. Again the panel may appear huge and may be partially occluded by foregound objects such as buildings and the like, as with the spaceships. Again it may also have cloud-based variable tTansparency, although ii this case opfionally the minimum transparency is cappcd so that the information on the status panel remains legible.
In the context of a game, this status panel may indicate progress in a game as judged by a God-like figure, or other information such as active quests, health and the like. Alternatively for example in a geo-caching game that uses a city as a maze, the status panel may contain instructions, scores or a timer, as if from a laboratory teclrnieian presiding over a rat race.
In this way, status information is readily accessible in a known location relative to the user that exists within the augmented game space, but does not clutter the screen during normal play; thereby improving the opportunities and scope to augment the street-level regions of the video iii whatever manner a game director sees fit.
Still more generally, system status messages maybe viewed in this manner; for example if an SMS, email or instant messaging text is received during game play (or during use of the device for navigation or advertising purposes as described previously, or any other AR or videoing purpose), then pointing the device upwards may allow notification of such events and/or reading of such messages within the status panel. Other messages, such as diary reminders, the time, etc., may similarly bc displayed in this non-intrusive manner.
The advantage of this approach is that the sky is, in effect, the background surface in the AR world, and so any localised AR activity relating to individual landmarks, objects, people or other markers within the captured image will reliably occur in front of it and not be occluded by this permanent or semi-permanent presence.
Whilst keeping the status panel static above the PED / user time may be preferable, in principle there is rio reason why the status panel (or a succession of panels for different aspects of the game / device) could not similarly be animated to fly in the sky in a manner similar to the spaceships described previously.
It will be appreciated that, depending on the application, the graphical illusion of augmented reality image is dependent upon sky being visible, and in some embodiments assumes that the device is being used outside (for example where the sky is sampled from a region directly above the locus of the device).
Hence in an embodiment of the present invention, a GPS signal is used to detect the location of the user, and the above technique is disabled when the user is detected to be indoors based upon their GPS location.
Optionally in such circumstances certain aspects of the invention may persist -for example in order to maintain a consistent user interface, pointing the video camera of the RED to the ceiling may still reveal a status panel, but now rendered at a notional 1 metre above the locus of the device (hence whilst the panel may still occupy a similar amount of screen space, it's behaviour with respect to parallax etc. will be consistent with it being in the room close to the ceiling). In this case, the status panel is simply displayed responsive to the local orientation of the device and is not masked based upon any detection of the sky.
Alternatively, when the RED is inside a building, the RED may attempt to identiz windows. Windows are relatively distinctive as bright rectilinear objects within a scene. Hence the device may operate as describe previously, within the added restriction imposed by the existence of the window. In particular, in this mode if sky sampling is used then it will treat the top N% of the window as the sky sample rather than the top N% of the image where these are not the same. Hence for example, in Figure 7 (in which the central portion of the image is again identical to Figure 4) the augmented image 500" also comprises the frame of the window through which the sky can be seen.
Hence, in a summary embodiment of the present invention as described herein, an entertainment device or PED 10 such as the Sony Playstation Vita comprises a video camera 240 operable to capture a video image; an image analysis processor (such as CPU 100) operable to classi' pixels of the video image corresponding to sky, and to generate a mask corresponding to those classified pixels; an augmented reality processor (again such as CPU 100) operable to position a virtual object on a horizontal plane at a predetermined height above the entertainment device; and a rendering processor (such as CPU 220, optionally in combination with CPU 100) operable to render the virtual object on the captured video image within the extent of the mask.
In an instance of the summary embodiment, the entertainment device comprises one or more motion sensors (for example in motion input unit 400), the augmented reality processor successively positions the virtual object responsive to changes in viewpoint of the video camera, as detected by one or more motion sensor. In this way the object or objects appear to be static, or to move, with respect to an absolute frame of reference (i.e. like the real world in the video image) and not with respect to motion of the RED.
In an instance of the summary embodiment, the rendering processor renders pixels of the object with a transparency responsive to cloud density at the respective pixel position as calculated by the image analysis processor. As noted above, cloud density may be based upon the luminance of sky pixels within a colour range centred on the white/grey/black colour balance. In other words, where the R:G:B ratio of a iixel is within a predetermined tolerance of unity, then the luminance of the pixel is used to determrne the transparency of the object, such that the darker the cloud is, the more transparent the object is (or conversely the more opaque the cloud is).
In an instance of the summary embodiment, the object is one of a flying machine such as a UFO, a creature such as a dragon or gryphon, or a person such as a superhero, witch or genie.
In an instance of the summary embodiment, the object is a status panel. Such a pancl may be rendered to have a physical presence (i.e. rcscmbling a granite tablet, a sheet of paper or a pane of glass) but equally may simply bc a notional region ü which displayed information is placcd.
In such an instance, the displayed information may be one or more of a game status, a system status, and a received message, as described previously.
In an instance of the summary embodiment, the PED comprises a global positioning system receiver (420) operable to identi a map position (for example with reference to a locally stored map or by reference to an internet based service), and the augmented reality processor sequentially positions the virtual object on the horizontal planc responsive to a feature of the map, thereby animating the virtual object in a manner responsive to that feature. As noted previously, examples include moving parallel to a road (e.g. the augmented realily processor animates the virtual object in vertical alignment with a road most closely approaching the current viewpoint in the captured video image), or around a peak in the landscape.
In an instance of the summary embodiment, the image analysis processor is operable to classi pixels of the video image corresponding to sky responsive to one or more of their luminance levels and their colour.
In an instance of the summary embodiment the image analysis processor is operable to c1assi1 pixels of the video image corresponding to sky responsive to one or more of the spatial frequency within a predetenmned area surrounding a pixel, colour continuity with a region of the image calculated to be below the horizon in the image, and colour values within a tolerance of colours selected from a region of thc image calculated to be below the horizon, as described prcviously.
Finally, in an instance of the summary embodiment, the image analysis processor is operable to cIassi' pixels of the video image corresponding to sky responsive to the sIr classification for the preceding captured video image.
Referring now to Figure 8, a method of video image augmentation comprises: In a first step sb, capturing a video image; In a second step s20, classifting pixels of the video image corresponding to sky; In a third step s30, generating a mask corresponding to those classified pixels; In a fourth step s40, positioning a virtual object on a horizontal plane at a predetermined height above the locus of the video image; and In a fifth step sSO, rendering the virtual object on the captured video image within the extent of the mask.
It will be apparent to a person skilled in the art that variations in the above method corresponding to operation of the various embodiments of the apparatus as described and claimed herein are considered within the scope of the present invention, including but not limited to: -The virtual object being a status panel arranged to display one or more of a game status, a system status, a received message; -sensing motion of a PED, and successively positioning the virtual object responsive to changes in viewpoint of a video camera used to capture the video image; -rendering pixels of the object with a transparency responsive to cloud density at the respective pixel position; -the object being one of a flying machine, a creature, and a person; -the object being a status panel, optionally displaying one or more of a game status, a system status, and a received message; -The steps of obtaining a map position a from global positioning system receiver and animating the virtual object on the horizontal plane responsive to a feature of the map such as, for example, animating the virtual object in vertical alignment with a road most closely approaching the current viewpoint in the captured video image; -classifying pixels of the video image corresponding to sky responsive to one or more of their luminance levels and their colour; -classifying pixels of the video image corresponding to sky responsive to one or more of spafial frequency within a predctenuincd area suntunding a pixel, colour continuity with a region of the image calculated to be below the horizon, and colour values within a tolerance of colours selected from a region of the image calculated to be below the horizon; and -classifying pixels of the video image corresponding to sky responsive to the sky classification for the preceding captured video image.
It will be appreciated that the methods disclosed herein may be carried out on conventional hardware suitably adapted as applicable by software instruction or by the inclusion or substitution of dedicated hardware.
Thus the required adaptation to existing parts of a conventional equivalent device may be implemented in the form of a non-transitory computer program product or simnilar object of manufacture comprising processor implcmcntable instructions stored on a data carrier such as a floppy disk, optical disk, hard disk, PROM, RAM, flash memory or any combination of these or other storage media, or realised in hardware as an ASIC (application specific integrated circuit) or an FPGA (field programmable gate array) or other configurable circuit suitable to use in adapting the conventional equivalent device. Separately, if applicable the computer program may take the fonn of a transmission via data signals on a network such as an Ethernet, a wireless network, the Internet, or any combination of these or other networks.