FIELD OF THE INVENTIONThis invention relates generally to systems and methods for sound recognition, and more particularly, to systems and methods for recognizing music, speech, and other sounds.
SUMMARY OF THE INVENTIONSystems, methods and media for sound recognition are provided herein. One object of the present technology is to recognize sounds. Sounds may include a song, a song clip, a song snippet, a singing or humming sound, voice, or any combination thereof. A further object of the present technology is to discriminate or classify sounds using an audio discriminator. Sounds may include music, speech and vocal sounds (such as humming and singing). A further object of the present technology is to receive and analyze a search query furnished by sound input via a unified search interface, where the sound input may be in one or more different formats (including but not limited to monophonic music, polyphonic music, speech, spoken words, a singing sound, a humming sound, any other type of sound that may be provided as input or any combination thereof). Once the sound input is received, with the help of the audio discriminator, a server may transmit search results in response to the search query. Another object of the present technology is to search databases and furnish a user with information regarding one or more particular sounds. According to various embodiments, the present technology permits one to provide user input by way of a computing device. User input via the computing device may include any type of user input, including but not limited to audio input, such as a user playing a sound, singing or humming, or speaking. Since songs, song clips and song snippets include sounds, one skilled in the art will recognize that the technology allows for a user to play a song, hum a song or even sing a song as the user input.
In response to the user input, the technology described herein may search one or more databases to identify the sound and provide the user with information about the sound. For instance, if a user hums a portion of a song, the present technology will discriminate the sounds, and based on that discrimination, search one or more databases to determine the title and artist of the song and provide this information to the user. A further object of the present technology is to provide music discovery related to a song. Such music discovery may include additional songs sung by the same artist, the artist's biographical information, information regarding artists that are similar to the artist who sang the song, recommendations regarding music, and videos or video links regarding the song, the artist, or any similar artists.
These and other objects of the present technology are achieved in an exemplary method of recognizing sounds. User input relating to one or more sounds is received from a computing device. Instructions, which are stored in memory, are executed by a processor to discriminate the one or more sounds, extract music features from the one or more sounds, analyze the music features using one or more databases, and obtain information regarding the music features based on the analysis. Further, information regarding the music features of the one or more sounds may be transmitted to display on the computing device.
A further exemplary method for recognizing one or more sounds includes a number of steps. User input providing a search query may comprise one or more sounds. The user input may be received from a computing device. Instructions, which are stored in memory, are executed by a processor to discriminate the one or more sounds, by classifying and routing the one or more sounds to one of three sound recognition applications for processing based on sound type, the three sound recognition applications comprising a first sound recognition application for singing or humming sounds, a second sound recognition application for recorded music, and a third sound recognition application for speech.
Further instructions, which are stored in memory, are executed by a processor to extract music features from the one or more sounds, analyze and search the music features using one of three databases for searching based on sound type, the three databases comprising a first database for singing or humming sounds, a second database for recorded music, and a third database for speech, and obtain information regarding the music features based on the analysis, searching and extraction. In response to the search query, information regarding the music features of the one or more sounds may be transmitted to display on the display of the computing device.
An audio discriminator is also provided herein. The audio discriminator may comprise a classifier of one or more sounds received by user input. The user input provides a search query comprising the one or more sounds. The user input may be received through a unified search interface provided by a computing device. The audio discriminator may include a classifier of the one or more sounds which classifies sounds based on one of three sound types, the three sound types being humming or singing sounds, recorded music and speech. The audio discriminator may further comprise a router of the one or more sounds to a database based on the classification of sound type.
In some embodiments, the objects of the present technology may be implemented by executing a program by a processor, wherein the program may be embodied on a computer readable storage medium.
BRIEF DESCRIPTION OF THE DRAWINGSFIG. 1A is a block diagram of an exemplary networking environment in accordance with various embodiments of the present invention.
FIG. 1B is a block diagram of a further exemplary networking environment in accordance with various embodiments of the present invention.
FIG. 2 is a block diagram of an exemplary computing device for recognizing sounds in accordance with embodiments of the present invention.
FIG. 3A is a block diagram of an exemplary architecture of a system for recognizing sounds in accordance with various embodiments of the present invention.
FIG. 3B is a block diagram of an exemplary environment for recognizing sounds in accordance with various embodiments of the present invention.
FIG. 4 is a flow diagram of an exemplary method for recognizing sounds in accordance with various embodiments of the present invention.
FIGS. 5-20 are exemplary screenshots of a display of a computing device in accordance with various embodiments of the present invention.
DETAILED DESCRIPTION OF THE INVENTIONEmbodiments of the present technology provide systems, methods, and media for recognizing sounds. According to various embodiments, the technology may utilize an audio discriminator to distinguish and channel audio outputs separately. In some embodiments, the audio discriminator may discriminate singing or humming sounds, recorded music, polyphonic sounds, and speech separately. In other embodiments, the audio discriminator may discriminate monophonic sounds from polyphonic sounds. By doing this, the technology may quickly recognize, discern or otherwise identify a sound.
Due to the audio discriminator, the technology may allow for a computing device to receive sound input from a user through a unified search interface. The unified search interface may allow for the user to provide sound input without having to choose or select what type of sound input they are providing. In other words, with the unified search interface, the user may provide any type of sound input with the computing device (whether the sound input may be singing or humming sounds, recorded music, speech or any combination thereof), without having to designate what type of sound input is being provided. This in turn provides the user with a superior user experience, without little to no hassle on the part of the user.
The audio discriminator will discriminate or classify the one or more sounds that make up the received sound input. In some embodiments, the audio discriminator classifies the one or more sounds to one of three separate sound recognition applications, where each of the three separate sound recognition applications is also coupled to a designated database. A separate sound recognition application may each be provided for the three exemplary types of sound input (namely, singing/humming sounds, recorded music (polyphonic sounds) and speech). However, one skilled in the art will appreciate that any number of sound recognition applications and databases may be utilized in implementing the methods and systems described herein.
With this type of technology, the resulting analysis of the one or more sounds may be quickly delivered to the user. For instance, if the sound is a song snippet that is hummed into a microphone on a computing device, in some embodiments, the technology can quickly recognize parameters of the song, such as the name of the song, the artist of the song, and the lyrics of the song, and provide information related to the song, such as the song parameters and information regarding the artist of the song. Due to its unique and novel features which will be described in greater detail, the technology may recognize sounds and determine information related to the sounds within a short time (as little as four seconds). These and other unique features of the technology will be described later herein.
FIG. 1A is a block diagram of anexemplary networking environment100 in accordance with embodiments of the present invention. Thenetworking environment100 includes clients A110,B112, and so forth through client Z118 (additional or fewer clients may be implemented), anetwork120, aserver130 with asound recognition application140 and aninterface module135, and adatabase160. As with all of the figures provided herein, one skilled in the art will recognize that any number of elements110-160 can be present in thenetworking environment100 and that the exemplary methods described herein can be executed by one or more of elements110-160. Any number of any of elements110-160 can be present in thenetworking environment100, and thenetworking environment100 is configured to serve these elements. For example, theserver130 may transmit a report via thenetwork120 to clients110-118, despite the fact that only three clients are shown inFIG. 1A. For all figures mentioned herein, like numbered elements refer to like elements throughout.
Clients110-118 may be implemented as computers having a processor that runs software stored in memory, wherein the software may include network browser applications (not shown) configured to render content pages, such as web pages, from theserver130. Clients110-118 can be any computing device, including, but not limited to desktop computers, laptop computers, computing tablets (such as the iPad®), mobile devices, smartphones (such as the iPhone®), and portable digital assistants (PDAs). The clients110-118 may communicate with a web service provided by theserver130 over thenetwork120. Additionally, the clients110-118 may be configured to store an executable application that encompasses one or more functionalities provided by thesound recognition application140.
Thenetwork120 can be any type of network, including but not limited to the Internet, LAN, WAN, a telephone network, and any other communication network that allows access to data, as well as any combination of these. Thenetwork120 may be coupled to any of the clients110-118, theinterface module135, and/or theserver130. As with all the figures provided herewith, thenetworking environment100 is exemplary and not limited to what is shown inFIG. 1A.
Theserver130 can communicate with thenetwork120 and thedatabase160. It will be apparent to one skilled in the art that the embodiments of this invention are not limited to any particular type of server and/or database. For example, theserver130 may include one or more application servers, one or more web servers, or a combination of such servers. In some embodiments, the servers mentioned herein are configured to control and route information via thenetwork120 or any other networks (additional networks not shown inFIG. 1A). The servers herein may access, retrieve, store and otherwise process data stored on any of the databases mentioned herein.
Interface module135 may be implemented as a machine separate fromserver130 or as hardware, software, or combination of hardware and software implemented onserver130. In some embodiments,interface module135 may relay communications between thesound recognition application140 andnetwork120.
Thedatabase160 may be configured to store one or more sounds (including but not limited to speech, voice, songs, song clips or snippets, and any combination thereof), music features, information about the one or more sounds, information about the music features, or any combination thereof. The database and its contents may be accessible to thesound recognition application140. The one or more sounds may include a song, a song clip, a song snippet, a humming sound, voice, or any combination thereof. In a non-exhaustive list, the information about the one or more sounds or the music features of the one or more sounds may include song title, a name of an artist, an artist's biographical information, identification of similar artists, a link to download a song, a link to download a video related to the song, or any combination thereof.
The clients110-118 may interface with thesound recognition application140 onserver150 via thenetwork120 and theinterface module135. Thesound recognition application140 may receive requests, queries, and/or data from the clients110-118. The clients110-118, may provide data for storage in thedatabase160, and therefore may be in communication with thedatabase160. Likewise, thesound recognition application140 may access thedatabase160 based on one or more requests or queries received from the clients110-118. Further details as to the data communicated in thenetworking environment100 are described more fully herein.
FIG. 1B is a block diagram of a furtherexemplary networking environment100′ in accordance with embodiments of the present invention. For all figures mentioned herein, like numbered elements refer to like elements throughout. Thus, there are some like elements that are shown both inFIG. 1A andFIG. 1B. However,FIG. 1B differs fromFIG. 1A in that anaudio discriminator130 is coupled to theinterface module135. Although inFIG. 1B theaudio discriminator130 is shown as an element coupled to theinterface module135, one skilled in the art may recognize that theaudio discriminator130 may be included and/or coupled with any number of elements110-164. Thus, in some embodiments, theaudio discriminator130 may be included with theserver150.
As described earlier, theaudio discriminator130 may discriminate or classify the one or more sounds that make up the received sound input. In some embodiments, theaudio discriminator130 classifies the one or more sounds to one of three separate sound recognition applications, where each of the three separate sound recognition applications is also coupled to a designated database, which is shown in exemplaryFIG. 1B.
UnlikeFIG. 1A,FIG. 1B shows that theinterface module135 is coupled to three applications, namely, a firstsound recognition application140, a secondsound recognition application142 and a thirdsound recognition application144. According to various embodiments, the firstsound recognition application140 may be designated for signing and/or humming sounds, and work with theserver150 to process, search or otherwise analyze singing and/or humming sounds. According to various embodiments, the secondsound recognition application142 may be designated for recorded music or polyphonic sounds, and work with theserver150 to process, search or otherwise analyze recorded music or polyphonic sounds. In some embodiments, the thirdsound recognition application144 may be designated for speech, and work with theserver150 to process, search or otherwise analyze speech.
Furthermore,FIG. 1B differs fromFIG. 1A in that a separate database may be designated to each of the three sound recognition applications. Thus, in accordance with various embodiments, the firstsound recognition application140 may be coupled to afirst database160, the secondsound recognition application142 may be coupled to thesecond database162, and the thirdsound recognition application144 may be coupled to thethird database164. The firstsound recognition application140 may work with theserver150 to search thefirst database160 for one or more singing and/or humming sounds. Likewise, the secondsound recognition application142 may work with theserver150 to search thesecond database162 for one or more recorded music songs, snippets, or other polyphonic sounds. Also, the thirdsound recognition application144 may work with theserver150 to search thethird database164 for speech.
According to various embodiments, thefirst database160 may be designated to store singing and/or humming sounds and associated data. According to various embodiments, thesecond database162 may be designated to store recorded music or polyphonic sounds (such as songs, song snippets, song clips, and the like) and associated data (such as music lyrics, artists, albums, album names, biographical information of artists, and the like). Thethird database164 may be designated to store speech and associated data (such as the name of the speaker, the source of the speech, and the like).
As with all of the figures provided herein, one skilled in the art will recognize that any number of elements110-164 can be present in thenetworking environment100′ and that the exemplary methods described herein can be executed by one or more of elements110-164. Any number of any of elements110-164 can be present in thenetworking environment100′, and thenetworking environment100′ is configured to serve these elements.
FIG. 2 is a block diagram of an exemplary computing device for recognizing sounds in accordance with embodiments of the present invention. In some embodiments, the exemplary computing device ofFIG. 2 can be used to implement portions of the clients110-118 and theserver150 as shown inFIGS. 1A and/or1B.
Thecomputing system200 ofFIG. 2 includes one ormore processors210 andmemory220. Themain memory220 stores, in part, instructions and data for execution by theprocessor210. Themain memory220 can store the executable code when in operation. Thesystem200 ofFIG. 2 further includes amass storage device230, portable storage medium drive(s)240,output devices250,user input devices260, agraphics display270, andperipheral devices280.
The components illustrated inFIG. 2 are depicted as being connected via asingle bus290. However, the components can be connected through one or more data transport means. For example, theprocessor unit210 and themain memory220 can be connected via a local microprocessor bus, and themass storage device230, peripheral device(s)280, theportable storage device240, and thedisplay system270 can be connected via one or more input/output (I/O) buses.
Themass storage device230, which can be implemented with a magnetic disk drive or an optical disk drive, is a non-volatile storage device for storing data and instructions for use by theprocessor unit210. Themass storage device230 can store the system software for implementing embodiments of the present invention for purposes of loading that software into themain memory220.
Theportable storage device240 operates in conjunction with a portable non-volatile storage medium, such as a floppy disk, compact disk or digital video disc, to input and output data and code to and from thecomputer system200 ofFIG. 2. The system software for implementing embodiments of the present invention can be stored on such a portable medium and input to thecomputer system200 via theportable storage device240.
Input devices260 provide a portion of a user interface.Input devices260 may include an alpha-numeric keypad, such as a keyboard, for inputting alpha-numeric and other information, or a pointing device, such as a mouse, a trackball, stylus, or cursor direction keys. Additionally, thesystem200 as shown inFIG. 2 includesoutput devices250. Examples of suitable output devices include speakers, printers, network interfaces, and monitors.
Thedisplay system270 may include a CRT, a liquid crystal display (LCD) or other suitable display device.Display system270 receives textual and graphical information, and processes the information for output to the display device.
Peripherals280 may include any type of computer support device to add additional functionality to the computer system. For example, peripheral device(s)280 may include a modem or a router.
The components contained in thecomputer system200 ofFIG. 2 are those typically found in computer systems that can be suitable for use with embodiments of the present invention and are intended to represent a broad category of such computer components that are well known in the art. Thus, thecomputer system200 ofFIG. 2 can be a personal computer, hand held computing device, telephone, mobile computing device, workstation, server, minicomputer, mainframe computer, or any other computing device. The computer can also include various bus configurations, networked platforms, multi-processor platforms, etc. Various operating systems can be implemented, including Unix, Linux, Windows, Macintosh OS, Palm OS, and other suitable operating systems.
FIG. 3A is a block diagram of an exemplary architecture of asystem300 for recognizing sounds in accordance with various embodiments of the present invention. According to various embodiments, thesystem300 includes one ormore computing devices310, asound recognition application140 coupled to the one ormore computing devices310, anetwork120, a third-party service orcontent provider330, and aserver350. Although various system components may be configured to perform some or all of the various steps described herein, fewer or more system components may be provided and still fall within the scope of various embodiments.
As described above, the one ormore computing devices310 may be any computing device, including, but not limited to desktop computers, laptop computers, computing tablets (such as the iPad®), mobile devices, smartphones (such as the iPhone®), and portable digital assistants (PDAs). The one ormore computing devices310 include a microphone312, an analog/digital (A/D)converter314, afilter316, aCPU318, an input/output interface320, adisplay322, user controls324, and a database oflocal music326. Thecomputing device310 may include abutton311 for recording, selecting, pressing or otherwise providing user input to thecomputing device310.
The one ormore computing devices310 may be coupled to asound recognition application140. The microphone312 is a vehicle for a user to input one or more sounds to the one ormore computing devices310 for recognition. The one or more sounds may be processed by the analog/digital converter314 so that the sounds may be converted from analog to digital signals. The one or more sounds may also be processed by afilter316, to filter sound artifacts and eliminate any other type of unwanted noise from the one or more sounds.
The one ormore computing devices310 include aCPU318, which executes or carries out the instructions stored in memory (not shown). In some embodiments, theCPU318 executes instructions stored in memory that allow for it to launch thesound recognition application140 on the one ormore computing devices310. Thesound recognition application140 may be coupled to theCPU318. The one ormore computing devices310 also include an input/output interface320 by which the one or more computing devices may communicate with thenetwork120.
The one ormore computing devices310 may include adisplay322. Thedisplay322 may be configured to display graphical user interfaces provided by thesound recognition application140, to allow a user of thecomputing device310 to interact with theserver350 via thesound recognition application140. According to various embodiments, thedisplay322 may be configured to display information or data that is transmitted by theserver350 to thecomputing device310 in response to a user's interaction with thesound recognition application140. Thedisplay322 may comprise a display system (such as thedisplay system270 ofFIG. 2).
User controls324 allow for a user to control or interact with the one ormore computing devices310. The user controls324 may comprise input devices (such asinput devices260 ofFIG. 2). Alocal music database326 to store music may be included in the one ormore computing devices310. Further, one ormore buses328 couple the elements312-348 in the one ormore computing devices310. Such buses may include the exemplary buses described earlier herein in relation toFIG. 2.
According to various embodiments, thecomputing device310 may communicate to theserver350 and/or to a third party service orcontent provider330 through the network120 (such as the Internet). The third party service orcontent provider330 may be any type of service provider, including but not limited to a music store (such as iTunes®). In some embodiments, a user of thecomputing device310 may be offered an opportunity to download and/or purchase a song by means of thesound recognition application140 and theserver350.
Theserver350 may include several elements, including but not limited to amusic database332, a CPU334, amusic processor336, an input/output interface338, a digitalsignal processing filter342, anaudio discriminator340, anoise cancellation module345, a music featuresextractor346, anaudio decoder347, and amultiplexer348. Themusic database332 on theserver350 may store information, songs, sounds, albums, and other information. Themusic database332 may comprise thedatabase160 ofFIG. 1A. The CPU334 of theserver350 executes instructions stored in memory (not shown) to implement any of the methods described herein, including methods for sound recognition. Themusic processor336 executes instructions stored in memory to utilize methods of further processing music, as described later herein. The input/output interface338 allows for theserver350 to receive and transmit communication to thecomputing device310 via thenetwork120.
The digitalsignal processing filter342 further filters or enhances the sounds to eliminate sound artifacts. Theaudio discriminator340 may distinguish a user's query type that was submitted as sound input. Thus, theaudio discriminator340 may distinguish whether the sound input received from the user via the computing device is a singing or humming sound, recorded music, or speech. Then theaudio discriminator340 routes the discriminated sound(s) to the appropriate search engine. In some embodiments, the appropriate search engine may be a sound recognition application (such as asound recognition application140 ofFIG. 1A). According to various embodiments, theaudio discriminator340 routes singing or humming sounds to a first sound recognition application (such as the firstsound recognition application140 ofFIG. 1B), routes recorded music or polyphonic sounds to a second sound recognition application (such as the secondsound recognition application142 ofFIG. 1B), and routes speech to a third sound recognition application (such as the thirdsound recognition application144 ofFIG. 1B).
Theaudio discriminator340 may discriminate, distinguish, or classify sounds. In some embodiments, theaudio discriminator340 may channel outputs of voice and music separately. Theaudio discriminator340 may discriminate monophonic sounds from polyphonic sounds. Theaudio discriminator340 may determine this with a high accuracy. Theaudio discriminator340 may analyze or look at the length of a user's query (whether the user's query be text, recorded audio, spoken words, sung or hummed music, or a combination thereof), as well as other features, including but not limited to pitch variations in the sounds, and any discrimination between speech (spoken word), voice, and music. By doing this, the technology may quickly classify a sound. Theaudio discriminator340 may classify or discriminate voice and music through channels, and route those channels through processing (such as music processing by the music processor336) and/or algorithmic analysis.
Thenoise cancellation module345 may separate music features needed for analysis from background noise. The music featuresextractor346 may extract music features from the one or more sounds. Anaudio decoder347 and amultiplexer348 may also be included in the server. Furthermore, one ormore buses344 couple the elements332-348 in the one ormore computing devices310. Such buses may include the exemplary buses described earlier herein in relation toFIG. 2.
According to various embodiments of the present technology, a user may play, sing, hum or otherwise furnish an audible sound as user input to thecomputing device310. The user may also input text (such as a search query in a text box) about a song as part of the user input. The audible sounds may be picked up by the microphone312 of thecomputing device310. The sounds are then digitized by the A/D converter314 and filtered by thefilter316 to compress sound, such that the sounds may be transmitted quickly over thenetwork120. The sounds are then processed by means of thesound recognition application140 and theserver350. Thesound recognition application140 may be coupled to theCPU318. The user may also play music from his or herlocal music database326 on thecomputing device310.
The user may ask for a sound to be recognized by providing user input to thecomputing device310. This request may be furnished to theserver350 through thenetwork120. In response to the request, theserver350 may discriminate sounds using theaudio discriminator340. Voice and music may be parsed out and classified accordingly by theaudio discriminator340. Music features of the sounds may be extracted using music features extractors (such as amusic feature extractor346 ofFIG. 3A), which will be described in greater detail later. Such music features are then analyzed using one or more databases (such as search databases) with the help of database servers and search servers. Information regarding the music features are then obtained from the databases and routed through routers to thecomputing device310 via thenetwork120. The information may be transmitted for display to the user on the display of thecomputing device310.
Information regarding a song may include a song title, a name of an artist, an artist's biographical information, the name of the album where the song can be found, identification of similar artists, a link to download a song, a link to download a video related to the song (such as a YouTube® video), similar artists, recommendations, biography of an artist, or any combination thereof. A user may also choose a track and access lyrics as the song is played. The user may also select a button to request more information. Thecomputing device310 may also display a list of what types of searches the user previously performed using thesound recognition application140. Searches may include speech searches. The searches may be spoken into the microphone of the computing device. Anaudio discriminator340 provided by theserver350 may determine what type of sound was provided to thecomputing device310.
As earlier stated, although various system components may be configured to perform some or all of the various steps described herein, fewer or more system components may be provided and still fall within the scope of various embodiments. For instance, although theexemplary system300 inFIG. 3A shows onesound recognition application140, the scope of the invention includes such embodiments where there may be more than one sound recognition application. In some various embodiments, instead of only onesound recognition application140, thesystem300 may include three separate sound recognition applications (such as a firstsound recognition application140, asecond recognition application142, and athird recognition application144 as shown inFIG. 1B). Also, theserver350 may be the same as theserver150 inFIGS. 1A and 1B. As described earlier herein, according to various embodiments, a first sound recognition application may be designated for signing and/or humming sounds, and work with theserver350 to process, search or otherwise analyze singing and/or humming sounds. According to various embodiments, a second sound recognition application may be designated for recorded music or polyphonic sounds, and work with theserver350 to process, search or otherwise analyze recorded music or polyphonic sounds. In some embodiments, a third sound recognition application may be designated for speech, and work with theserver350 to process, search or otherwise analyze speech.
Also, in various embodiments, the system may also include a separate database designated to each of the three sound recognition applications. The first sound recognition application may work with theserver350 to search the first database (not shown) for one or more singing and/or humming sounds. Likewise, the second sound recognition application may work with theserver350 to search the second database (not shown) for one or more recorded music songs, snippets, or other polyphonic sounds. In some embodiments, the second database is shown as thelocal music database326 inFIG. 3A. Also, the third sound recognition application may work with theserver350 to search the third database (not shown) for speech.
According to various embodiments, the first database160 (FIG. 1B) may be designated to store singing and/or humming sounds and associated data. According to various embodiments, the second database162 (FIG. 1B) may be designated to store recorded music or polyphonic sounds (such as songs, song snippets, song clips, and the like) and associated data (such as music lyrics, artists, albums, album names, biographical information of artists, and the like). The third database164 (FIG. 1B) may be designated to store speech and associated data (such as the name of the speaker, the source of the speech, and the like).
FIG. 3B is a block diagram of anexemplary environment360 for recognizing sounds in accordance with various embodiments of the present invention. The exemplary environment ofFIG. 3B may be included in the server or elsewhere in any of the exemplary systems ofFIG. 1A,1B, or3A. Input362 (such as user input and/or one or more sounds) is received byload balancers routers364. Theload balancers routers364 distribute the workload provided by the input to one or more computing resources, such as music feature extractors340 (which were earlier described inFIG. 3A). This distribution of the workload allows for efficient processing of the sounds, input and/or signals provided to the system.
Still referring toFIG. 3B, once theload balancers routers364 have routed the input to themusic feature extractors340, the music feature extractors366 extract or otherwise obtain one or more music features from the one or more sounds. Themusic feature extractors340 also may work in conjunction withdatabase servers368 andsearch servers370 to determine and obtain information relating to the music features of the sounds. Such information may include a song title, a name of an artist, an artist's biographical information, identification of similar artists, a link to download a song, a link to download a video related to the song, or any combination thereof. Thesearch servers370 may communicate with thedatabase servers368 and external servers (not shown) to determine such information. The information is then provided torouters372, which then route the information to be transmitted for display as one ormore search results374 on a display of a computing device.
FIG. 4 is a flow diagram of an exemplary method400 to recognize sounds. Although the method400 may be utilized to recognize songs, song clips or snippets, song lyrics, partial songs, partial song lyrics, humming of songs, voicing of lyrics, and the like, it will be appreciated by one skilled in the art that this technology may be utilized for any type of sound, not just songs or lyrics.
Atstep410, user input is received from a computing device. In some embodiments, the user input is provided through a microphone of a computing device (such as the microphone312 of thecomputing device310 ofFIG. 3A). The user input may comprise or otherwise be associated with one or more sounds. In some embodiments, the user input may be a search query that comprises one or more sounds. The user input may include but is not limited to any number of sounds, such as humming of a portion or all of a song, a partial song clip or snippet played, and the like. User input may include any number of manual user inputs, such as keystrokes, user selection, commands, mouse clicks, presses on a touch screen, swipes of a touch screen, or button presses via the one or more computing devices. For instance, user input may include pressing a button (such as thebutton311 of thecomputing device310 ofFIG. 3A) or user input via aunified search interface520 ofFIG. 5 which is described later herein) while recording, humming or playing a song or a portion of a song.
Atstep420, discrimination of one or more sounds takes place. According to various embodiments, an audio discriminator (such as theaudio discriminator340 ofFIG. 3A) may undertake the task of discriminating one or more sounds. As mentioned earlier, an audio discriminator may discriminate, distinguish, or classify sounds. The audio discriminator may discriminate monophonic sounds from polyphonic sounds with a high accuracy. The audio discriminator may analyze or look at the length of a user's query, any pitch variations in the sounds, and any discrimination between voice and music. By doing this, the technology may quickly classify sound. Furthermore, the audio discriminator may classify or discriminate voice and music, and route these sounds towards processing paths and/or algorithmic analysis.
Atstep430, music features may be extracted from the one or more sounds. This step may be accomplished using music feature extractors. Exemplary music feature extractors are shown and described asmusic feature extractors340 inFIGS. 3A and 3B. Music feature extractors may be coupled to both database servers and search servers. Exemplary database servers and search servers are shown and described asdatabase servers368 andsearch servers370 inFIG. 3B.
Atstep440, music features may be analyzed using one or more databases and atstep450, information regarding the music features based on the analysis may be obtained. According to various embodiments, database servers and search servers (such asexemplary database servers368 andsearch servers370 inFIG. 3B) may quickly identify and provide information related to music features of the one or more sounds. For instance, if a given sound is a song clip, the music features extracted may be enhanced or filtered music snippets which are quickly identified, recognized, classified or otherwise determined by one or more database servers and search servers.
According to various embodiments, the audio discriminator may classify the user's query. In some embodiments, the audio discriminator may classify the one or more sounds of the user's query as being (1) humming or singing sound, (2) recorded music or (3) speech. As shown inFIG. 1B, in some embodiments, the audio discriminator may route humming and singing sounds to a first sound recognition application (such as the firstsound recognition application140 ofFIG. 1B). Likewise, in some embodiments, the audio discriminator may route recorded music to a second sound recognition application (such as the firstsound recognition application142 ofFIG. 1B), and also route speech to a third sound recognition application (such as the thirdsound recognition application144 ofFIG. 1B).
According to various embodiments of the present technology, a separate database may be assigned to each of the sound recognition applications. Thus, as shown in exemplaryFIG. 1B, the first sound recognition application for singing or humming sounds may search the first database (such as thefirst database160 ofFIG. 1B) which stores singing or humming sounds. Likewise, the second sound recognition application for recorded music may search the second database (such as thesecond database162 ofFIG. 1B) which stores recorded music. The third sound recognition application for speech may search the third database (such as thefirst database164 ofFIG. 1B) which stores speech.
It will be appreciated by one skilled in the art that any number of sound recognition applications and databases may be used with this technology to implement one or more methods described herein.
The database servers (such asdatabase servers368 inFIG. 3B) may store information related to music features and/or sounds. The search servers (such assearch servers370 inFIG. 3B) may aggressively search through database servers, database resources, or even the Internet to obtain in real-time information related to music features and/or sounds that may or may not be present in the database servers. It will be understood that an audio discriminator (such as theaudio discriminator340 ofFIG. 3A) may comprise music feature extractors. Also, it may be appreciated that the audio discriminator may be coupled to database servers, search servers, or to any combination thereof.
Finally, atstep460, in response to the user input of a search query, the information regarding the music features of the one or more sounds is transmitted to display on the computing device (such as to thedisplay322 of thecomputing device310 ofFIG. 3A). The information may then be viewed by the user of the computing device. In a non-exhaustive list, the information regarding the music features of the one or more sounds comprises a song title, a name of an artist, an artist's biographical information, identification of similar artists, a link to download a song, a link to download a video related to the song, or any combination thereof.
An optional step for the method400 includes utilizing load balancing routers (such asload balancing routers364 inFIG. 3B) to distribute workload to one or more computing resources. The workload may comprise user input and the sounds at issues. By utilizing load balancing routers, an optimal and efficient delivery of various user inputs and sounds may be provided to the music feature extractors. Thus, the technology may be able to quickly identify sounds or music features of sounds within 4 seconds, in part due to the use of load balancing routers.
Further optional steps for the method400 include providing optional premium rows dynamically to the user. The premium rows may appear on any portion of the graphical user interface shown to the user through a display of the user's computing device. For instance, on a song page, premiums rows may be added or subtracted to push relevant content relating to the song. If the song is sung by a certain artist, t-shirts, concert tickets, posters, goods, services and any other type of merchandise may be presented to the user in one or more premium rows. According to some embodiments, the relevant content relating to the song may be obtained from a server, from the network, or from any other networked resource. Another example of content for the premium rows may include links to targeted commercials. Exemplary premium rows will be described later herein in reference toFIG. 19.
Yet another optional step for the method400 is providing a flag discriminator that is related to the song. If a user grabs a song, the technology may identify the song and then the user is presented with a graphical user interface that displays a flag on the album or song at issue. If the user already has the song in their music libraries (such as alocal music database326 in thecomputing device310 ofFIG. 3A or thesecond database162 ofFIG. 1B), then the flag will visually indicate that to the user and by pressing or clicking on the flag, the song will be played on the computing device directly. If, on the other hand, the user does not have the song stored in their music libraries already, then the flag will visually indicate to the user. The user may also be given an opportunity to purchase the song. According to some embodiments, the act to purchase the song may be simply to press the flag which will redirect the user to a third party service or content provider (such as a musiconline store330 inFIG. 3A). The flag may visually indicate whether or not the user already has a copy of a particular song by any means, including but not limited to color of the flag, whether the flag is raised or down, the position of the flag on the graphical user interface, and the like. An exemplary flag will be described later herein in reference toFIG. 8.
A further optional step for the method400 is to obtain and display to the user a listing of artists or songs that are underplayed. In other words, the technology may identify songs that are grabbed by users or searched by users, but these songs are not played on the radio as often as they should be. This listing of artists or songs that are underplayed may allow for users to discover music that they otherwise may not be exposed if they normally listen to the radio only. Exemplary listings of artists or songs will be described later herein in reference toFIGS. 8,14,15, and16.
A further optional step for the method400 is to provide and display a pop-open mini-player to the display of a computing device (such as thedisplay322 of thecomputing device310 ofFIG. 3A). The mini-player allows for a user to pause, play, and otherwise manipulate a song. It may also allow the user to quickly access a song page, which may be a webpage setting forth details about the song (such as the song's lyrics, the song's title, the artist of the song, the album where the song may be found, and a link to the song for playing, downloading and/or purchasing). An exemplary mini-player will be described later herein in reference toFIG. 20.
The technology may further allow for a direct call from a computing device (such as thecomputing device310 ofFIG. 3A) to a third party service or content provider (such as amusic store330 ofFIG. 3A). In other words, the third party service or content may be provided to the computing device directly. The technology includes API to allow for a title of a song and/or artist to be transmitted to the third party service or content provider via a sound recognition application (such as thesound recognition application140 ofFIG. 3A or one or more of the first, second and third sound recognition applications (140,142, and144, respectively) as depicted inFIG. 1B)).
One skilled in the art will recognize that the scope of the present technology allows for any order or sequence of the steps of the method400 mentioned herein to be performed. Also, it will be appreciated by one skilled in the art that the steps in the method400 may be removed altogether or replaced with other steps (such as the optional steps described herein) and still be within the scope of the invention.
FIG. 5 is anexemplary screenshot500 of a display of a computing device in accordance with various embodiments of the present technology.FIG. 5 depicts what is initially shown to a user prior to a search being initiated. The user may be presented thisscreenshot500 when the user wishes to search using a unifiedsearch interface button520. Thescreenshot500 also shows that the graphical user interface is for asearch502. Thescreenshot500 further shows ahelp button502 depicting a question mark, which if pressed by the user will display help menus and options that will provide information about the application.
The user may tap, actuate, press or otherwise activate the unifiedsearch interface button520 and then provide one or more sounds as user input through a microphone of the computing device. The user is also provided with further buttons for pressing or actuation, including a “Title or Artist”button530. When actuated or pressed, the “Title or Artist”button530 will allow for the user to search the server and database(s) for a song by title or artist.
Thescreenshot500 also depicts ahistory button550 to allow a user to see the history of searches and songs that the user has previously requested, a “What's Hot”button560 to provide a listing of “hot” or popular songs to the user (which will provide song listings such as those shown in exemplaryFIGS. 14-16), and a “Now Playing”button570 to provide a Now Playing page comprising a song being played and information regarding the song that is currently playing.
FIG. 6 is anexemplary screenshot600 that is displayed once the user has tapped, actuated, press or otherwise activated theunified interface button520. Theunified interface button520 displays that the application is “listening” for user input and the user is invited to tap the unified interface button520 a second time to indicate to the application when the user input is complete and that the application may stop “listening.”
If the computing device is a mobile phone, the user may search for recorded music by holding their phone towards music that is playing or by singing or humming through the same unified search interface using a single button. The user may hit a cancelbutton605 to cancel a search at any time before the search is complete and search results are provided to the user. Theexemplary screenshot600 also shows anindicator655 on thehistory button550. In this case, theindicator655 ofFIG. 6 shows the number “1” to indicate that the application has a historical record of one previous search or search result.
FIG. 7 is anexemplary screenshot700 of results that are displayed when the search is complete. Thescreenshot700 provides information related to the song, such as the name of asong710, the name(s) of the artist(s) who sang thesong720, and the name of thealbum730 where the song can be found. The user is given buttons to bookmark the song on their computing device740, share the song to anotheruser750 or buy thesong760.Lyrics770 of the song may also be shown. Also, related music clips orvideo clips780 of the song or related to the song or artist may be provided to the user for playing if the user wishes to click, press, or otherwise activate the clips.
FIG. 8 is anexemplary screenshot800 of alisting810 of songs in a list view. Through list views, songs may be played and/or purchased. Songs listed may be from the same artist or by different artists. A user may also see whether they already own or keep the song at issue on a local music database. If the user already owns or has the song stored on a local music database (such as thelocal music database326 ofFIG. 3), then a playsong interface button820 appears next to the song which when actuated by a user will play the song. If, on the other hand, the user does not own or have a copy of the song, then the user will be given a previewsong interface button830 which when actuated by the user will provide a short preview of the song. In the example shown inFIG. 8, the user owns or has a copy of the song “Love Story” sung by Taylor Swift because the playsong interface button820 appears adjacent to the song. However, inFIG. 8, the user does not own the song “White Horse” sung by Taylor Swift because a previewsong interface button830 indicates that it will preview 30 seconds of the song upon actuation of the previewsong interface button830.
Furthermore,FIG. 8 provides a flag that whether a song is owned or not owned already by the user. “Owned” songs are displayed with a “My music”icon840 for the flag. In the example shown inFIG. 8, the user can see they own the song “Love Story” sung by Taylor Swift because the “My music”icon840 appears adjacent to the song. “Non-owned” songs or songs that the user does not already have in their possession (such as in a local music database) are given a different flag. The flag may provide an instant option for a user to purchase or “get” a song that the user does not already have in the form of “get”icon850. InFIG. 8, the user does not own the song “White Horse” sung by Taylor Swift because the “get”icon850 indicates that the user may “get” or purchase a copy of the song if they press or actuate the button with the “get”icon850. Songs may be linked and shown withpopularity bars860 that show how popular a given song is.
FIG. 9 is anexemplary screenshot900 of a text search. Users may search for titles, artists and albums using an intuitive text search interface. The user may type in atext box910, using akeyboard920 that includes asearch key930. The text search interface may include prefix suggestions (auto complete), as well as spelling connection.
FIG. 10 is anexemplary screenshot1000 of a lyrics display resulting from a song having being identified by singing or humming sounds as the user input.FIG. 11 is anexemplary screenshot1100 of a lyrics display of a song resulting from a song being identified from it being played on a radio. Song lyrics may be shown as the song is being played.
FIG. 12 is anexemplary screenshot1200 of a lyrics search result. In the example shown inFIG. 12, after a user has typed part of a lyrics (namely, the words “All my troubles seem”), the user is provided with thescreenshot1200 showing one or more songs that result from the search for those partial lyrics.
FIGS. 13-15 are exemplary screenshots of song charts which employ popularity algorithms of the technology. Such popularity algorithms take into account information from a combination of multiple sources, including billboards, radio plays and song identification by users.FIG. 13 shows anexemplary screenshot1300 having a song chartsoverview listing1310 which includes “hottest” and “underplayed” song listings.FIG. 14 shows anexemplary screenshot1400 having the “hottest”song listing1410 which may include a breakdown of popularity view by genre.FIG. 15 shows anexemplary screenshot1500 having an “underplayed”song listing1510 and a “just grabbed”song listing1520. The “underplayed”song listing1510 shows a listing of songs or tracks which are being identified by users but are not played as often on the radio. The “just grabbed”song listing1520 shows the song(s) or tracks that were recently identified by other users on their computing devices using this technology.
FIGS. 16 and 17 showexemplary screenshots1600 and1700, respectively, of history and bookmarks. Such exemplary screenshots may be displayed after a user clicks on thehistory button550 as shown inFIGS. 5 and 6. As shown in bothFIGS. 16 and 17, users may view the history of songs listened to or identified by clicking on or actuating asearches button1610. Users may bookmark their favorite songs for future reference by clicking on or actuating abookmarks button1620. Thescreenshot1600 ofFIG. 16 also shows a pendingsearch1630 is being conducted. If there is no wireless or network connection while searching, the application may save the pending search so that users may obtain results when they have connectivity.FIG. 17 shows ascreenshot1700 similar to that ofFIG. 16, except that no pending search is shown as no search is being conducted in thescreenshot1700.
FIG. 18 is anexemplary screenshot1800 that is displayed when a user wishes to share music content. Thescreenshot1800 shows ashare song menu1810, which includes a plurality of buttons that allow for a user to press or actuate to share a song by email, Twitter®, Facebook® or to cancel sharing the song altogether.
FIG. 19 is anexemplary screenshot1900 that shows apremium row1910. As described herein previously, one or more premium rows may comprise a button that is controlled from a server for pushing relevant commercial content. In some embodiments, the relevant commercial content may be related to the song or to the artist that is identified or being played. A non-exhaustive list of commercial content for premium rows includes, but is not limited to, ringtones, full track downloads, t-shirts, concert tickets, sheet music, posters, avatars, skins, animations, and links to third party services. In the example provided inFIG. 19, thepremium row1910 is a button that, if actuated or pressed by the user, will launch a MySpace® radio station as a link to this third party service.
FIG. 20 shows twoexemplary screenshots2000 and2050. In theexemplary screenshot2000, an artist'sbiography2010 is presented along with a mini-player2020 below the artist'sbiography2010. The mini-player2020 may play a song of the artist. The mini-player2020 may include rewind, pause, fast-forward and play buttons for a user to utilize in order to manipulate the song. If a user presses a “more”button2030, then theexemplary screenshot2050 is presented to the user on the display of the computing device. In other words, the “more”button2030 takes a user back to a current Now Playing page which shows information about a song being currently played on the computing device. An exemplary Now Playing page is provided in theexemplary screenshot2050.
Some of the above-described functions may be composed of instructions that are stored on storage media (e.g., computer-readable medium). The instructions may be retrieved and executed by the processor. Some examples of storage media are memory devices, tapes, disks, and the like. The instructions are operational when executed by the processor to direct the processor to operate in accord with the invention. Those skilled in the art are familiar with instructions, processor(s), and storage media.
It is noteworthy that any hardware platform suitable for performing the processing described herein is suitable for use with the invention. The terms “computer-readable storage medium” and “computer-readable storage media” as used herein refer to any medium or media that participate in providing instructions to a CPU for execution. Such media can take many forms, including, but not limited to, non-volatile media, volatile media and transmission media. Non-volatile media include, for example, optical or magnetic disks, such as a fixed disk. Volatile media include dynamic memory, such as system RAM. Transmission media include coaxial cables, copper wire and fiber optics, among others, including the wires that comprise one embodiment of a bus. Transmission media can also take the form of acoustic or light waves, such as those generated during radio frequency (RF) and infrared (IR) data communications. Common forms of computer-readable media include, for example, a floppy disk, a flexible disk, a hard disk, magnetic tape, any other magnetic medium, a CD-ROM disk, digital video disk (DVD), any other optical medium, any other physical medium with patterns of marks or holes, a RAM, a PROM, an EPROM, an EEPROM, a FLASHEPROM, any other memory chip or cartridge, a carrier wave, or any other medium from which a computer can read.
Various forms of computer-readable media may be involved in carrying one or more sequences of one or more instructions to a CPU for execution. A bus carries the data to system RAM, from which a CPU retrieves and executes the instructions. The instructions received by system RAM can optionally be stored on a fixed disk either before or after execution by a CPU.
The above description is illustrative and not restrictive. Many variations of the invention will become apparent to those of skill in the art upon review of this disclosure. The scope of the invention should, therefore, be determined not with reference to the above description, but instead should be determined with reference to the appended claims along with their full scope of equivalents.
While the present invention has been described in connection with a series of preferred embodiment, these descriptions are not intended to limit the scope of the invention to the particular forms set forth herein. It will be further understood that the methods of the invention are not necessarily limited to the discrete steps or the order of the steps described. To the contrary, the present descriptions are intended to cover such alternatives, modifications, and equivalents as may be included within the spirit and scope of the invention as defined by the appended claims and otherwise appreciated by one of ordinary skill in the art.