WO2006025797A1

Movatterモバイル変換

Info

Publication number: WO2006025797A1
Application number: PCT/SG2005/000289
Authority: WO
Inventors: Siang Thia Goh; Yuen Khim Liow
Original assignee: Creative Technology Ltd
Priority date: 2004-09-01
Filing date: 2005-08-25
Publication date: 2006-03-09
Also published as: TW200609775A; CN1783073A

Abstract

A research system for researching digital media information, the system comprising: a conversion module to convert the digital media information into textual data; an extraction module to extract search terms from the textual data to search for a record in at least one database; and a presentation module to present at least one matching record resulting from the search to the user via the portable device.

Description

Title

A Search System

Technical Field

The invention concerns a search system for searching digital media information recorded by a portable device.

Background of the Invention

Consumer products that enable users to capture digital information are increasingly popular. These products include camera-enabled mobile phones, camera enabled portable computers, digital camera and digital video recorders.

These consumer products are highly effective at capturing digital information. However, user friendly applications to exploit and process this information are not commonly available.

Summary of the Invention

In a first preferred aspect, there is provided a search system for searching digital media information. The system comprises a conversion module to convert the recorded digital media information into textual data; an extraction module to extract search terms from the textual data to search for a record in at least one database; and a presentation module to present at least one matching record resulting from the search to the user via the portable device. Each record in the database may be mapped to at least one search term or a collection of search terms. A user interface may be provided to enable the user to browse through the matching records, and navigate via links to other related records in the database. A contribution module may be provided to enable a user to contribute additional recorded digital media information to the database.

Digital media information may include photos, video clips, or audio clips and may be recorded by a portable device. The portable device may be a mobile phone or a mobile computing device. The mobile phone or a mobile computing device may have an integrated digital camera to capture images or video clips. The mobile phone or a mobile computing device may have an integrated microphone to record audio clips.

An optical character recognition engine may be used to convert text-based information captured in graphical form in the photos or video clips into textual data.

An image recognition engine may be used to convert image information captured in photos or video clips into textual data. Image information may include people's faces, animals, famous landmarks, vehicles or other objects. Image information may also include sign language used by deaf people.

A voice recognition engine may be used to convert spoken words in an audio clip into textual data.

The database may be locally stored on the portable device. For example, if a tourist is travelling to Europe, a database storing European tourist information may be downloaded on to the portable device.

Alternatively, the database may be remotely stored on a server. The server may be accessed via the Internet through wireless communication. The portable device may comprise a communications module to communicate via the Internet.

The results may be presented to the user via a display of the portable device. The results may be presented to the user according to a user-defined format and presentation style. The results may be presented to the user as an audio delivery. The audio delivery may be a computer generated voice or a pre-recorded audio clip associated with the matching record.

More than one item of recorded digital media information may be used together in order to increase the accuracy of the search. For example, a photo of a bird and an audio recording of the bird's call are used together to identify the species of the bird. Consequently, a record of the bird is presented to the user via their portable device that contains the biological data of the bird such as migratory patterns, life span and habitat.

In a second aspect, there is provided a method for researching digital media information. The method comprises converting the recorded digital media information into textual data; extracting search terms from the textual data to search for a record in at least one database; and presenting at least one matching record resulting from the search to the user via the portable device.

Each record in the database may be mapped to at least one search term or a collection of search terms.

The method may further comprise an initial step of recording digital media information.

The method may further comprise the step of translating the textual data into another language.

If the digital media information is a photo or a video clip, the method may further comprise the step of focusing on a specific area of the photo or frame of the video clip to limit the scope of the search.

If the digital media information is an audio clip, the method may further comprise the step of focusing on a specific portion of the audio clip to limit the scope of the search.

The concentrating on a specific area may be by at least one of: zooming, framing, and select drag and drop. Motion vectors are used for a video clip.

The method may further comprise concentrating on a specific portion of the digital media and converting the specific portion into textual data. When the digital media is an audio track and a fingerprint of the audio is generated, searching may be by use of the fingerprint. Alternatively, the concentration on a specific portion may be by selecting a start and end of a portion of an audio track to form an audio segment, and searching by use of the audio segment.

Brief Description of the Drawings

An example of the invention will now be described with reference to the accompanying drawings, in which:

Figure 1 is a block diagram of the system;

Figure 2 is a perspective view of an embodiment of a device for use in the system; Figure 3 is a block diagram of part of the device of Figures 1 and 2; and Figure 4 is a process flow diagram of researching using the system.

Detailed Description of the Drawings

Figure 1 and the following discussion are intended to provide a brief, general description of a suitable computing environment in which the present invention may be implemented. Although not required, the invention will be described in the general context of computer-executable instructions, such as program modules, being executed by a personal computer. Generally, program modules include routines, programs, characters, components, data structures, ,that perform particular tasks or implement particular abstract data types. As those skilled in the art will appreciate, the invention may be practiced with other computer system configurations, including hand-held devices, multiprocessor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, and the like. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote memory storage devices.

Referring to Figure 1 , there is a provided a research system 10 for researching digital media information. The digital information may be on a personal computer, laptop computer, notebook computer or portable device. It may have been captured or recorded by a device 20 such as, for example, a computer, portable computer, or portable device.. Portable devices 20 include mobile phones, PDAs, tablet computers, notebook computers, digital camera or video recorders with a computer processor. Preferably, the portable device 20 has multimedia capabilities such as a high-resolution display and audio functionality. Digital media information includes photos, video clips, or audio clips. Input to a device such as computer 20 may be by scanner, camera, web cam, touch screen, and so forth.

As shown in Figures 1 to 3, the device 20 may have a microphone 60 for capturing audio. An analog/digital converter 61 converts the captured audio from analog to digital. A microprocessor and/or digital signal processor passes the digital audio to non-volatile memory 69. Memory 69 may be a hard disk, removable disk, or flash memory. The device 20 may also have a lens 63 for capturing images and/or video by use of an imaging system 64 or may have a separate image-capturing device such as, for example, a web cam. The captured images and/or video are also processed by microprocessor 62 and stored in memory 69. The lens may be fixed or may have a motor driver for optical zooming.

A keypad/keyboard 65 and/or a joystick 71 may be used for user input. A display

66 displays results or likely results, and a loudspeaker 68 may output audio results. An amplifier 67 is used to amplify the output audio after conversion in convertor 61.

Microprocessor 62 may be used to control other functions of device 20 (not shown), and is also used to control and operate various modules 21 and engines 50.

The modules 21 of system 10 comprise a conversion module 22, an extraction module 23 and a presentation module 24. These modules 22, 23, 24 are stored on the portable device 20 as software. Alternatively, the modules are hard-wired as a dedicated chip. Preferably, the modules are written in Java to facilitate portability and implementation onto other Java-enabled devices 20. The conversion module 22 converts the recorded digital media information into textual data. Depending on the recorded digital information, the conversion module 22 operates with an engine 50 that may include an optical character recognition engine 25, image recognition engine 26, voice recognition engine 27, face recognition engine 28, and a music engine 29.

The optical character recognition engine 25 converts text-based information captured in graphical form in photos or video clips into textual data. The voice recognition engine 27 converts spoken words in an audio clip into textual data. The image recognition engine 26 converts image information captured in photos or video clips into textual data. Image information includes animals, famous landmarks, vehicles or other objects. Image information also includes sign language used by deaf people. Image recognition is performed by known techniques. Face recognition engine 28 is for recognizing face using facial recognition software such as Facelt supplied by Visionics Corporation. The music engine 29 converts the captured music into a MIDI file and may be categorized by, for example, song title, performer and performance. An application for determining a melody from a melody sung, hummed or the like may be used. Furthermore, other audio fingerprinting techniques could be used to generate data that is representative of the recorded audio. For example, US 6,453,252, the disclosure of which is incorporated herein by reference, discloses a technique in which a fingerprint of an audio signal is generated based on the energy content in frequency sub-banks. The resulting fingerprints can then be used to help identify the recorded audio. Alternatively, an "A/B" button may be pressed at the start and finish of a desired audio segment and the search conducted on the basis of that audio segment.

When recognizing animal or insect sounds, these may be processed using one or more of: the actual sound, phonetics or mnemonics.

Next, the textual data is passed to the extraction module 23 for extracting search terms used to search for a record in a database 30. The database 30 is a centralised network database accessible via the Internet and mobile phone network. Preferably, the portable device 20 comprises a communications module (70) to communicate with the database 30 via the Internet. Alternatively, a light- version of the system 10 has a local database 30 stored on the portable device 20 which eliminates the need for a communications module. The database 30 in such case may be in memory 69. After at least one matching record is found, the presentation module 24 presents the matching record to the user via a user interface (not shown). The user interface enables the user to browse through the matching records, and navigate via links to other related records in the database 30. Each record in the database 30 is mapped to at least one search term or a collection of search terms. Records in the database 30 which are related to each other or in similar categories are linked with one another.

In one embodiment, the system provides a portable language translator. For example, in a foreign country, a road sign is encountered. Using the system, the foreign road sign is translated into a language understood by the user. A navigation application is intuitively provided to the user such as an interactive map.

In another embodiment, the system 10 is a portable tour guide or object recognizer. For example, at a museum, a camera-enabled portable device is captures an image of an object. The object is searched against the database 30. If a match is found, a translation is obtained in a language understood by the user and more detailed information about the object is retrieved. Another example is to identify an object or animal such as a bird. The system 10 is able to identify the species by its physical characteristics and play a sample of the call of the bird and/or name the bird.

In a further embodiment, the system 10 is a portable multimedia 'Internet Browser'. The system is able to recognise a face and retrieve the associated information regarding this face. For example, if the picture is of Bill Gates, the system 10 retrieves the personal biographical data of Bill Gates and lists his personal achievements, hobbies, favourite movies, and so forth, and displays them on display 66 and/or by audio output using loudspeaker 68.

In yet another embodiment, the system 10 is a portable video-based sign language translator. The system 10 is able to translate sign language captured in video format into words or voice and vice versa. The capturing device may be a video camera, still camera or mobile-phone camera. Alternatively, a wearable visor with a miniature camera can be used to explore of objects in a scene and their associated database 30.

Referring to Figure 4, in a typical scenario, the system 10 operates by firstly capturing 40 an image, audio or other sensory input and displaying any image or images forming at least a part of the input. A specific section or segment of the captured digital information is targeted 41. The targeting focuses on a particular object in the entire image or a fragment of the audio recording. This may be by zooming and/or framing and/or select, drag and drop using, for example, motion vectors for moving video. This may be in accordance with MPEG4. The object is then detected and recognised. This may be by use of MPEG7. After object recognition, the digital information is converted 42 into textual information. For music files, it is digitized to enable searching. The conversion 42 is performed by the conversion module 21 together with any of the engines in engine 40. For example, if the object is a photo of the Sydney Harbour Bridge, image recognition engine 26 converts the image into the phrase "Sydney Harbour Bridge". This phrase is extracted as a keyword phrase or as search terms for searching 43 in database 30. Since keywords are mapped in the database 30 to data records related to the objects, searching 43 is able to yield useful data. After the search is complete, the results of the search are presented 44 to the user.

The user is able to use a drop down menu box and/or a keypad to enter information to exclude certain classifications of groups if they are not relevant. For example, the user selects a group named "bridges" so that roads or other structures are filtered out of the search. This isolates and refines the search by user interaction.

In one embodiment, the user makes the final determination as to what the object is, based on recursive logical elimination of irrelevant results on a group by group basis. To assist in restricting the context of the search, when the digital information is transmitted to the database 30 for searching, the time of the recording and also the location of the recording are also transmitted. The time of the recording is a time stamp on the media file stamped by the portable device 20. The location of the recording assumes that transmission of the digital information is almost instantaneous from the time the user captures the photo, video or audio recording. The location is identifiable by the cell location of the mobile phone 20 or GPS co- ordinates of the portable device 20.

Using the previous example, if the user is in the physical vicinity of the "Sydney Harbour Bridge", the Golden Gate Bridge or the Brooklyn Bridge are not presented as likely matches. To increase customization capability, the user is able to define their user interface and how the results are to be presented to them.

To enhance the accuracy of searching, multiple input including an auxiliary input, may be used. That is, photos and audio information are used together for searching. For example, a photo of a bird and an audio recording of the bird's call may be used to identify the species of the bird. Auxiliary input may include, for example, location, temperature, humidity, light level, and so forth. These may be obtained automatically from appropriate functionality within level 20.

User input may be by use of a contribution module such as, for example, a keypad/keyboard 66, or use of voice recognition technology. If there are many objects in an image, zooming and/or framing and/or drag and drop may be used to identify the object being searched. This may be by use of known zooming and framing technologies, a joystick, and touch screen technologies. In sound, the audio to be searched can be extracted from the recorded audio by extracting the required audio and/or suppression of background or surrounding signals.

If the database cannot locate a correct match, the received data may be stored for the creation of a new entry. They may be stored in a separate database until sufficient data is received to provide conclusive information, whereupon it can be moved to the database 30. An editor may make decisions in this regard.

Furthermore, distributed corresponding may be used for communication from several devices 20 to database 30. The data may be first sent to a distribution server for controlling the distribution of data, and the search functionality. Ultra wide band may be used for connectivity.

The data may be sent from device 20 to data 30 by SMS or MMS.

It will be appreciated by persons skilled in the art that numerous variations and/or modifications may be made to the invention as shown in the specific embodiments without departing from the scope or spirit of the invention as broadly described. The present embodiments are, therefore, to be considered in all respects illustrative and not restrictive.

Claims

WE CLAIM:

1. A research system for researching digital media information, the system comprising: a conversion module to convert the digital media information into textual data; an extraction module to extract search terms from the textual data to search for a record in at least one database; and a presentation module to present at least one matching record resulting from the search to the user via the portable device.

2. The system according to claim 1 , wherein each record in the database is mapped to at least one search term or a collection of search terms.

3. The system according to claim 1, further comprising a user interface to enable the user to browse through the matching records, and navigate via links to other related records in the database, and to input additional information.

4. The system according to claim 1 , further comprising a contribution module to enable a user to contribute additional recorded digital media information to the database.

5. The system according to claim 1 , wherein digital media information includes photos, video clips, or audio clips.

6. The system according to claim 5, wherein the digital media information is recorded by a portable device, the portable device being at least one of a mobile telephone or a mobile computing device.

7. The system according to claim 6, wherein the portable device has at least one selected from the group consisting of: an integrated digital camera to capture images, an integrated digital camera to capture video clips, and an integrated microphone to record audio clips.

8. The system according to claim 7, further comprising at least one engine selected from the group consisting of: an optical character recognition engine to convert text-based information captured in graphical form in the photos or video clips into textual data, an image recognition engine to convert image information into textual data, a voice recognition engine to convert spoken words in an audio clip into textual data, a face recognition engine, and a music engine.

9. The system according to claim 8, wherein image information includes animals, famous landmarks, vehicles and sign language used by deaf people.

10. The system of claim 8, wherein an auxiliary input is provided to reduce search time, the auxiliary input including an input selected from the group consisting of: location, temperature, humidity, light level, times and an environmental analysis.

11. The system of claim 10, wherein location is determined by one of cell location, and GPS co-ordinates.

12. The system as claimed in claim 8, wherein the music engine is for converting music into a MIDI file or a fingerprint of an audio signal.

13. The system according to claim 6, wherein the database is locally stored on the portable device.

14. The system according to claim 1 , wherein the database is remotely stored on a server.

15. The system according to claim 6, further comprising a communications module to enable communication between the portable device and the database.

16. The system according to claim 6, wherein the results are presented to the user via a display of the portable device.

17. The system according to claim 1 , wherein the results are presented to the user as an audio delivery.

18. The system according to claim 1 , wherein more than one item of recorded digital media information is used to increase the accuracy of the search.

19. A method for researching digital media information, the method comprising: converting the digital media information into textual data; extracting search terms from the textual data to search for a record in at least one database; and presenting at least one matching record resulting from the search to the user via the portable device.

20. The method as claimed in claim 19, wherein each record in the database is mapped to at least one search term or a collection of search terms.

21. The method according to claim 19, further comprising an initial step of recording digital media information.

22. The method according to claim 19, further comprising the step of translating the textual data into another language.

23. The method according to claim 19, wherein when the digital media information is a photograph or a video clip, the method further comprises concentrating on a specific area of the photograph or a frame of the video clip to limit the scope of the search, concentration being by at least one of: framing, and zooming.

24. The method according to claim 19, wherein if the digital media information is an audio clip, the method further comprising the step of concentrating on a specific portion of the audio clip to limit the scope of the search.

25. The method according to claim 19, wherein if the digital media information is a music file, the music file is converted to at least one of: a MIDI file or a fingerprint of an audio signal.

26. The method according to claim 19, wherein searching is restricted based data from an auxiliary input including at least one of: the location of the portable device, and/or the time the digital information was recorded, temperature, humidity, light level, an environmental analysis.

27. The method according to claim 19, wherein the digital media information is recorded by a portable device, the user interacting with the portable device to eliminate irrelevant records resulting from the search.

28. The method according to claim 27, wherein the user interaction is selection or de-selection of items of a dynamic list to indicate whether the digital information is related to those items.

29. The method according to claim 26, wherein location is determined by one of: cell location and GPS.

30. The method of according to claim 24, wherein concentrating on the specific portion is by one or more of: extracting the specific portion, and suppressing surrounding and background audio.

31. The method according to claim 26, wherein the auxiliary input includes a user input, the user input being by use of an interface, and includes inputs of phonetics and mnemonics.

32. The method according to claim 23, wherein the concentrating on a specific area is by at least one selected from the group consisting of: zooming, framing, and select drag and drop.

33. The method according to claim 33, wherein motion vectors are used for the video clip.

34. The method according to claim 19, further comprising concentrating on a specific portion of the digital media and converting the specific portion into textual data.

35. The method according to claim 34, wherein the digital media is an audio track and a fingerprint of the audio is generated, searching being by use of the fingerprint.

36. The method according to claim 34, wherein the concentration on a specific portion is by selecting a start and end of a portion of an audio track to form an audio segment, and searching by use of the audio segment.