TECHNICAL FIELDThe subject matter disclosed herein generally relates to the processing of data. Specifically, the present disclosure addresses systems and methods to facilitate audio identification.
BACKGROUNDA performer may give a live performance (e.g., a concert or other live show) before an audience that includes one or more individuals (e.g., audience members, fans, or concertgoers). For example, a musical soloist (e.g., a singer-songwriter) may perform at a concert before such an audience. As another example, a musical group (e.g., a rock band) may perform at a concert before such an audience. As a further example, a theater troupe (e.g., including actors, dancers, and a choir) may perform a theatrical show before such an audience.
One or more audio pieces (e.g., musical pieces or spoken word pieces) may be performed during a live performance. For example, one or more songs may be performed, and a song may be performed with or without visual accompaniment (e.g., a video, a laser show, or a dance routine). In some situations, the performer of an audio piece is an artist that recorded the audio piece (e.g., as a studio recording or as a live recording). For example, a performer may perform a song that was written and recorded by her herself. In other situations, the performer of an audio piece is different from the artist that recorded the audio piece (e.g., as a studio recording or as a live recording). For example, a performer may perform a cover of a song that was written and recorded by someone else.
BRIEF DESCRIPTION OF THE DRAWINGSSome embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings.
FIG. 1 is a network diagram illustrating a network environment suitable for audio identification, according to some example embodiments.
FIG. 2 is a block diagram illustrating components of an identification machine suitable for audio identification, according to some example embodiments.
FIGS. 3-9 are flowcharts illustrating operations in a method of audio identification, according to some example embodiments.
FIG. 10 is a block diagram illustrating components of a machine, according to some example embodiments, able to read instructions from a machine-readable medium and perform any one or more of the methodologies discussed herein.
DETAILED DESCRIPTIONExample methods and systems are directed to audio identification. Examples merely typify possible variations. Unless explicitly stated otherwise, components and functions are optional and may be combined or subdivided, and operations may vary in sequence or be combined or subdivided. In the following description, for purposes of explanation, numerous specific details are set forth to provide a thorough understanding of example embodiments. It will be evident to one skilled in the art, however, that the present subject matter may be practiced without these specific details.
During a live performance (e.g., a live concert) of one or more audio pieces (e.g., songs), one or more audience members (e.g., concertgoers) may use a network-based system to identify an audio piece during its performance (e.g., while the audio piece is being performed). The network-based system may provide its users (e.g., the audience members) with one or more audio identification services. A machine may form all or part of the network-based system and may be configured (e.g., by software) to provide such identification services to one or more users (e.g., concertgoers).
The machine may be configured to obtain an identifier (e.g., a song name) of an audio piece during a performance of an audio piece (e.g., at a first time, such as five seconds into a song). The identifier may be obtained in any one or more of various ways, including, for example, receiving the identifier as a user submission (e.g., from an audience member, from a venue manager, or from the performer herself), inferring the identifier based on some received metadata of the audio piece (e.g., a partial name of the song, an album on which the song appears, or a release year of the song), inferring the identifier based on a detected geolocation of a device whose user is at the performance, tallying votes for the identifier (e.g., from several audience members), and accessing the identifier directly from a device of the performer (e.g., a mixer, a drum machine, a media player, a smartphone, or a tablet computer).
A user's device (e.g., smartphone or smart watch configured by a mobile app) may record a segment of the audio piece during its performance, generate a fingerprint of the segment, and upload the fingerprint to the machine. The machine may receive the fingerprint during the performance (e.g., at a second time, such as 15 seconds into the song) and assign the identifier to the fingerprint. This identifier may be provided to the user's device to identify the audio piece. The machine may receive additional information (e.g., one or more additional fingerprints or classifications of additional segments of the audio piece or other audio) from additional users' devices, and the machine may determine from this additional information that the audio piece has not ended (e.g., by failing to detect silence, applause, booing, or any suitable combination thereof). The machine may provide the identifier to any one or more of these additional users' devices.
According to some example embodiments, another user's device (e.g., configured by a mobile app) may record another segment of the audio piece during its performance, generate another fingerprint of the segment, and submit this fingerprint to the machine as a query for identification of the audio piece. While the performance continues, the machine may receive this fingerprint during the performance (e.g., at a third time, such as 30 seconds into the song) and respond during the performance by providing the identifier, which may be based on its determination that additional information (e.g., one or more additional fingerprints or classifications of additional segments of the audio piece or other audio) from additional user's devices fail to indicate an end of the audio piece.
According to various example embodiments, the machine may be configured to identify an audio piece, even when a live version (e.g., a live cover version) of the audio piece is being performed differently from a reference version (e.g., a studio version or radio version) of the audio piece as recorded by an artist (e.g., same or different from the performer of the live version). The machine may receive a live fingerprint of the segment of the live version (e.g., within a query for identification of the audio piece during its performance). The fingerprinting technique used here, in contrast to traditional fingerprinting techniques that identify the exact time and frequency positions of audio events, may instead identify one or more core characteristics of the audio piece (e.g., the notes and rhythms present) and be robust to differences between the live version and a reference version of the audio piece (e.g., differences in tempo, vocal timber, vocal strength, vibrato, instrument tuning, ambient noise, reverberation, or distortions). For example, the fingerprinting technique may be based on a chromagram that represents the harmonic structure of the live version (e.g., mapped to one octave). Such a fingerprinting technique may also be used later to identify and retrieve user-uploaded recordings from the performance (e.g., for copyright clearance purposes, to automatically tag or index such recordings, or any suitable combination thereof). The machine may identify the performer of the live version (e.g., by detecting a venue at which the live version is being performed and accessing information that correlates the detected venue with the performer).
The machine may then access a set of reference fingerprints that correspond to the artist that recorded the audio piece (e.g., based on the identified performer of the live version). For example, based on the identified performer (e.g., as well as a detected venue, a current date and time, or any suitable combination of), the machine may retrieve a list of audio pieces (e.g., a playlist, a concert program, or a concert brochure) that corresponds to the performer. Using the retrieved list, the machine may identify reference versions (e.g., official or canonical versions) of the audio pieces (e.g., recorded by the artist, who may be the same or different from the performer of the live version) and access reference fingerprints of the identified reference versions of the audio pieces. The reference fingerprints may have been previously generated from segments of the reference versions of the audio pieces, and among these reference fingerprints may be a reference fingerprint of the reference version of the audio piece whose live version is currently being performed.
Accordingly, the machine may compare the live fingerprint of a segment of the live version of an audio piece to the set of reference fingerprints of segments from the reference versions of the audio piece. In some example embodiments, the machine compares the live fingerprint exclusively (e.g., only) to the set of reference fingerprints. Based on this comparison, the machine may identify a match between the live fingerprint and the reference fingerprint and thus identify the audio piece while the audio piece is being performed. Thus, based on this comparison, the machine may provide an identifier of the audio piece in a response to the query for identification of the audio piece. The identifier may be provided during the performance of the live version of the audio piece.
FIG. 1 is a network diagram illustrating a network environment suitable for audio identification, according to some example embodiments. Thenetwork environment100 includes anidentification machine110, adatabase115, anddevices120,130,140, and150 respectively being operated byusers122,132,142, and152 in an audience, as well as adevice160 and amixer161 being operated by aperformer162. Theidentification machine110, thedatabase115, thedevices120,130,140,150,160, and themixer161, may all be communicatively coupled (e.g., to each other) via anetwork190. Theidentification machine110, with or without thedatabase115, may form all or part of a network-based system105 (e.g., a cloud-based server system configured to provide one or more audio identification services to thedevices120,130,140, and150, to theirrespective users122,132,142, and152, or to any suitable combination thereof). Theidentification machine110, thedatabase115, thedevices120,130,140,150, and160, and themixer161 may each be implemented in a computer system, in whole or in part, as described below with respect toFIG. 10.
Any one or more of theusers122,132,142, and152 in the audience may be a human user (e.g., a human being), a machine user (e.g., a computer configured by a software program to interact with the device120), or any suitable combination thereof (e.g., a human assisted by a machine or a machine supervised by a human). Theuser122 is not part of thenetwork environment100, but is associated with thedevice120 and may be a user of thedevice120. For example, thedevice120 may be a desktop computer, a vehicle computer, a tablet computer, a navigational device, a portable media device, a smartphone, or a wearable device (e.g., a smart watch or smart glasses) belonging to theuser122. Similarly, theuser132 is not part of thenetwork environment100, but is associated with thedevice130 and may be a user of thedevice130. For example, thedevice130 may be a desktop computer, a vehicle computer, a tablet computer, a navigational device, a portable media device, a smartphone, or a wearable device (e.g., a smart watch or smart glasses) belonging to theuser132.
Likewise, theuser142 is not part of thenetwork environment100, but is associated with thedevice140. As an example, thedevice140 may be a desktop computer, a vehicle computer, a tablet computer, a navigational device, a portable media device, a smartphone, or a wearable device (e.g., a smart watch or smart glasses) belonging to theuser142. Moreover, theuser152 is not part of thenetwork environment100, but is associated with thedevice150. As an example, thedevice150 may be a desktop computer, a vehicle computer, a tablet computer, a navigational device, a portable media device, a smartphone, or a wearable device (e.g., a smart watch or smart glasses) belonging to theuser152. Furthermore, theperformer162 is not part of thenetwork environment100, but is associated with thedevice160 and themixer161. As an example, thedevice160 may be a desktop computer, a vehicle computer, a tablet computer, a navigational device, a portable media device, a smartphone, or a wearable device (e.g., a smart watch or smart glasses) belonging to theperformer162.
Themixer161 may be or include an audio playback device, an audio mixing device, an audio processing device, or any suitable combination thereof. According to various example embodiments, themixer161 may drive (e.g., output signals that represent audio information to) one or more amplifiers, speakers, or other audio output equipment in producing sound for the audience during a performance of an audio piece by theperformer162. In some example embodiments, themixer161 is a source of one or more segments of a reference version of an audio piece (e.g., an audio piece to be identified later during performance of the audio piece). In certain example embodiments, themixer161 may perform operations described herein for any one or more of thedevices120,130,140, and150.
Any of the machines, databases, or devices shown inFIG. 1 may be implemented in a general-purpose computer modified (e.g., configured or programmed) by software (e.g., one or more software modules) to be a special-purpose computer to perform one or more of the functions described herein for that machine, database, or device. For example, a computer system able to implement any one or more of the methodologies described herein is discussed below with respect toFIG. 10. As used herein, a “database” is a data storage resource and may store data structured as a text file, a table, a spreadsheet, a relational database (e.g., an object-relational database), a triple store, a hierarchical data store, or any suitable combination thereof. Moreover, any two or more of the machines, databases, or devices illustrated inFIG. 1 may be combined into a single machine, and the functions described herein for any single machine, database, or device may be subdivided among multiple machines, databases, or devices.
Thenetwork190 may be any network that enables communication between or among machines, databases, and devices (e.g., theidentification machine110 and the device130). Accordingly, thenetwork190 may be a wired network, a wireless network (e.g., a mobile or cellular network), or any suitable combination thereof. Thenetwork190 may include one or more portions that constitute a private network, a public network (e.g., the Internet), or any suitable combination thereof. Accordingly, thenetwork190 may include one or more portions that incorporate a local area network (LAN), a wide area network (WAN), the Internet, a mobile telephone network (e.g., a cellular network), a wired telephone network (e.g., a plain old telephone system (POTS) network), a wireless data network (e.g., WiFi network or WiMax network), or any suitable combination thereof. Any one or more portions of thenetwork190 may communicate information via a transmission medium. As used herein, “transmission medium” refers to any intangible (e.g., transitory) medium that is capable of communicating (e.g., transmitting) instructions for execution by a machine (e.g., by one or more processors of such a machine), and includes digital or analog communication signals or other intangible media to facilitate communication of such software.
FIG. 2 is a block diagram illustrating components of theidentification machine110, according to some example embodiments. Theidentification machine110 is shown as including anidentifier module210, areception module220, adetermination module230, acorrelation module240, thequery module250, aresult module260, aperformer module270, areference module280, and acomparison module290, all configured to communicate with each other (e.g., via a bus, shared memory, or a switch). Any one or more of the modules described herein may be implemented using hardware (e.g., one or more processors of a machine) or a combination of hardware and software. For example, any module described herein may configure a processor (e.g., among one or more processors of a machine) to perform the operations described herein for that module. Moreover, any two or more of these modules may be combined into a single module, and the functions described herein for a single module may be subdivided among multiple modules. Furthermore, according to various example embodiments, modules described herein as being implemented within a single machine, database, or device may be distributed across multiple machines, databases, or devices.
FIGS. 3-9 are flowcharts illustrating operations in amethod300 of audio identification (e.g., of an audio piece during a live performance of the audio piece), according to some example embodiments.FIG. 3 illustrates some interactions between theidentification machine110 and the device120 (e.g., a first device) during a performance of the audio piece by theperformer162. These illustrated interactions may form a portion of themethod300, according to various example embodiments, or may form a separate method in its entirety, according to alternative example embodiments.
Operation310 may be performed at or near the beginning of the performance (e.g., at a first time, such as five or ten seconds into the performance) of the audio piece. Inoperation310, theidentifier module210 of theidentification machine110 obtains an identifier of the audio piece. The identifier may be a title of the audio piece (e.g., a song name). As discussed below with respect toFIG. 6, the identifier may be obtained in any of several ways.
Inoperation317, the device120 (e.g., the first device) records a live segment of the audio piece being performed. For example, the live segment may be recorded by a microphone built into thedevice120. According to various example embodiments, operation370 may be performed at any point during the performance of the audio piece.
Inoperation318, thedevice120 generates a live fingerprint of the live segment recorded inoperation317. For example, thedevice120 may apply one or more audio fingerprinting techniques (e.g., algorithms) to generate the live fingerprint. In some example embodiments, the audio fingerprinting technique (e.g., a first technique) used by thedevice120 inoperation318 is designated or selected (e.g., by the identification machine110) as a default technique and may be designated or selected based on the presence or absence of processing power, available memory, or both, in thedevice120.
Inoperation319, thedevice120 communicates (e.g., sends) the generated live fingerprint to the identification machine110 (e.g., via the network190). Incorresponding operation320, thereception module220 of theidentification machine110 accesses (e.g., receives) the generated live fingerprint communicated by the device120 (e.g., at a second time, such as 15 or 20 seconds into the performance).
According to certain example embodiments, operations317-319 are performed by thedevice160 of theperformer162, or by themixer161. Thus, inoperation320, thereception module220 of theidentification machine110 may access the generated live fingerprint as communicated by thedevice160, or by the mixer161 (e.g., at the second time). In some cases, the audio piece includes multiple audio channels (e.g., 64 separate audio channels being input into themixer161, including a monophonic audio channel for a lead guitar, a monophonic audio channel for a bass guitar, left and right stereo audio channels for a synthesizer keyboard, and eight monophonic microphone channels for a drum kit). According to various example embodiments, the entire mix of these multiple channels is used for generating the live fingerprint inoperation318. In some example embodiments, the generating of the live fingerprint inoperation318 may be based on less than all of these multiple audio channels (e.g., generated from a subset of the multiple audio channels). For example, the live fingerprint may be generated exclusively from a monophonic audio channel for lead guitar.
Some example embodiments of themethod300 includeoperation328. Inoperation328, thedetermination module230 of theidentification machine110 determines that the performance has not yet been completed (e.g., has not finished or is not yet done). As discussed below with respect toFIG. 7, this determination may be made by determining that one or more live fingerprints of segments of the audio piece being performed fail to indicate an end of the audio piece, an end of the performance of the audio piece, or both. Since the performance is not completed, thedetermination module230 may determine that the respective times at which the identifier of the audio piece and the live fingerprint were accessed (e.g., the first time and the second time) occurred during the performance of the audio piece.
Inoperation330, thecorrelation module240 of theidentification machine110 assigns the identifier obtained inoperation310 to the live fingerprint received inoperation320. This may be based on the determination inoperation328 that the performance is not over. Accordingly, thecorrelation module240 may assign the identifier to the live fingerprint based on an inference that the performance of the audio piece is still ongoing (e.g., continuing).
Inoperation332, thequery module250 of theidentification machine110 accesses metadata of the audio piece (e.g., from the database115). For example, thequery module250 may generate a query based on (e.g., inclusive of) the identifier assigned to the live fingerprint inoperation330. In some example embodiments, the query is generated based on the determination inoperation328 the performance is not finished. Thequery module250 may submit the generated query to thedatabase115, and in response, thedatabase115 may provide thequery module250 with the metadata of the audio piece or access thereto.
Inoperation340, theresult module260 of theidentification machine110 provides the identifier and some or all of the metadata to the device120 (e.g., via the network190), during the performance of the audio piece. For example, theresult module260 may communicate all or part of the identifier obtained inoperation310 and all or part of the metadata accessed inoperation332 to the device120 (e.g., for presentation thereon, in whole or in part, to the user122). Incorresponding operation341, thedevice120 accesses (e.g., receives) the information that was communicated (e.g., via the network190) from theresult module260 inoperation340.
FIG. 4 illustrates some interactions betweenidentification machine110 and the device130 (e.g., a second device) during the same performance of the audio piece by theperformer162. These illustrated interactions may form a portion of themethod300, according to various example embodiments, or may form a separate method in its entirety, according to alternative example embodiments.
Inoperation417, the device130 (e.g., the second device) records a live segment of the audio piece being performed. For example, the live segment may be recorded by a microphone built into thedevice130.
Inoperation418, thedevice130 generates a live fingerprint of the live segment recorded inoperation417. For example, thedevice130 may apply one or more audio fingerprinting techniques to generate the live fingerprint. In some example embodiments, the audio fingerprinting technique (e.g., a first technique) to be used by thedevice130 inoperation418 has been designated or selected (e.g., by the identification machine110) as a default technique and may be so designated or selected based on the presence or absence of processing power, available memory, or both, in thedevice130. However, in alternative example embodiments, the audio fingerprinting technique (e.g., a second technique) to be used by thedevice140 inoperation518 is a different (e.g., non-default) technique and may be so designated or selected based on the presence or absence of processing power, available memory, or both, in thedevice140.
In some cases, the audio piece includes multiple audio channels (e.g., 64 separate audio channels, including a monophonic audio channel for a lead guitar, a monophonic audio channel for a bass guitar, left and right stereo audio channels for a synthesizer keyboard, and eight monophonic microphone channels for a drum kit). In some example embodiments, the generating of the live fingerprint inoperation418 may be based on less than all of these multiple audio channels (e.g., generated from a subset of the multiple audio channels). For example, the live fingerprint may be generated exclusively from a monophonic audio channel for lead guitar. As another example, the live fingerprint may be generated exclusively from a monophonic vocal track (e.g., using vocal melody and lyrics for generating the live fingerprint). According to various example embodiments, the live fingerprint is generated from one or more audio channels that are dominant throughout the audio piece, which may facilitate reliable and consistent identification of the audio piece.
Inoperation419, thedevice130 communicates the generated live fingerprint to the identification machine110 (e.g., via the network190). The live fingerprint may be communicated in a query for identification of the audio piece, and such a query may be submitted from thedevice130 to the network-basedsystem105 during the performance of the audio piece. Incorresponding operation420, thereception module220 of theidentification machine110 accesses the generated live fingerprint communicated by the device130 (e.g., at a third time, such as 30 or 35 seconds into the performance).
Inoperation428, thedetermination module230 of theidentification machine110 determines that the performance is not done (e.g., not yet ended, completed, finished, or over). As discussed in greater detail below with respect toFIG. 7, this determination may be made by determining that one or more live fingerprints of segments of the audio piece being performed fail to indicate an end of the audio piece, an end of the performance of the audio piece, or both.
Inoperation440, theresult module260 of theidentification machine110 provides the identifier (e.g., assigned in operation330) and some or all of the metadata to the device130 (e.g., via the network190). For example, theresult module260 may communicate all or part of the identifier obtained inoperation310 and all or part of the metadata accessed inoperation332 to the device130 (e.g., for presentation thereon, in whole or in part, to the user132). Incorresponding operation441, thedevice130 accesses the information that was communicated from theresult module260 inoperation340. This may have the effect of providing the identifier of the audio piece in a response to the query for identification of the audio piece, during the performance of the audio piece. According to various example embodiments, the identifier may be accompanied by additional information (e.g., metadata of the audio piece). Such additional information may include lyrics, album art, original release year, original composer, other performers of the audio piece, or other metadata of the audio piece, as well as an offer to sell a recording (e.g., original or non-original) of the audio piece.
In some example embodiments, the identifier may be accompanied by an authorization, such as an authorization to access backstage passes or a merchandise offer (e.g., for free or discounted merchandise related to the audio piece, to the performer, or to both). In various example embodiments, the authorization enables software (e.g., an application, an applet, or a mobile app) executing on thedevice130 to access special content that may be presented on the device130 (e.g., on a screen of the device130). Examples of such special content include screen lighting or imagery (e.g., a slideshow or background image), a game (e.g., a single-player or multiplayer quiz or treasure hunt), or any suitable combination thereof. For example, a game may challenge theuser132 to win a prize (e.g., an album on compact disc (CD) or as a music download, exclusive video footage, a t-shirt, or other merchandise item) by correctly identifying multiple audio pieces performed by theperformer162 or by being the first to correctly identify all songs released on a specific album.
FIG. 5 illustrates some interactions between theidentification machine110 and the device140 (e.g., a third device) during a live performance of an audio piece by theperformer162. In some example embodiments, the live performance is the same performance discussed above with respect toFIGS. 3-4. In certain example embodiments, theperformer162 is performing a live version (e.g., a live cover version) of an audio piece differently from a reference version (e.g., a studio version or radio version) of the audio piece as recorded by an artist who may be the same or different from theperformer162 of the live version. These illustrated interactions may form a portion of themethod300, according to various example embodiments, or may form a separate method in its entirety, according to alternative example embodiments. For example, in some example embodiments, theidentification machine110 performs onlyoperations520,530,540,550, and560 (e.g., in response to performance ofoperations517,518, and519 by the device140), without performing any operations described above with respect toFIGS. 3 and 4.
Inoperation517, the device140 (e.g., the third device) records a live segment of the audio piece being performed. For example, the live segment may be recorded by a microphone built into thedevice140. In particular, thedevice140 may record a live segment of a live version (e.g., a live cover version) of the audio piece, as the live version of the audio piece is being performed. As another example, the live segment may be received (e.g., as a digital feed, a network stream, a broadcast signal, or any suitable combination thereof) by thedevice140 via the network190 (e.g., from theidentification machine110, thedevice160, or the mixer161).
Inoperation518, thedevice140 generates a live fingerprint of the live segment recorded inoperation517. For example, thedevice140 may apply one or more audio fingerprinting techniques to generate the live fingerprint. In some example embodiments, the audio fingerprinting technique (e.g., a first technique) to be used by thedevice140 inoperation518 is designated or selected (e.g., by the identification machine110) as a default technique and may be so designated or selected based on the presence or absence of processing power, available memory, or both, in thedevice140. However, in alternative example embodiments, the audio fingerprinting technique (e.g., a second technique) to be used by thedevice140 inoperation518 is a different (e.g., non-default) technique and may be so designated or selected based on the presence or absence of processing power, available memory, or both, in thedevice140. In some example embodiments, the audio fingerprinting technique (e.g., the second technique) is particularly suitable for live version identification and may implement one or more image processing techniques to derive fingerprints that are robust to both audio degradations and audio variations, while still being compact enough for efficient matching. Further details on such an audio fingerprinting technique are provided below.
Inoperation519, thedevice140 communicates the generated live fingerprint to the identification machine110 (e.g., via the network190). The live fingerprint may be communicated in a query for identification of the audio piece, and such a query may be submitted from thedevice140 to the network-basedsystem105 during the performance of the audio piece (e.g., the live version of the audio piece). Incorresponding operation520, thereception module220 of theidentification machine110 accesses the generated live fingerprint communicated by the device140 (e.g., at any point in time during the performance of the audio piece, such as 5, 10, 15, 20, 30, 40, or 45 seconds into the performance).
Inoperation530, theperformer module270 of theidentification machine110 identifies the performer of the live version of the audio piece. For example, theperformer module270 may detect the venue of the live performance (e.g., the place or location where the live performance is occurring) and identify the performer based on the detected venue (e.g., by accessing information, which may be stored in thedatabase115, that correlates the performer with the venue). For example, the detected venue may be a concert hall, an auditorium, a hotel, a conference room, a resort, a school, a theater, an amphitheater, a fairground, a sports arena, a stadium, a private residence, or any suitable combination thereof. As discussed below with respect toFIG. 8, the detection of the venue may be based on a geolocation (e.g., Global Positioning System (GPS) coordinates) of thedevice140, an identifier (e.g., Internet protocol (IP) address) of a network (e.g., network190) at the venue (e.g., a local wireless network at the venue), an image (e.g., photo) of a ticket stub for an event that includes the live performance (e.g., generated by thedevice140 and accessed by the performer module270), a user preference for the venue (e.g., stored in a user profile of the user142), social network data that references the venue (e.g., publicly or privately published in a microblog entry by the user142), a calendar event of theuser142, a purchase record of the user142 (e.g., for tickets to an event that includes live performance), or any suitable combination thereof. In further example embodiments, the venue may be detected by detecting that thedevice140 is executing a special application that corresponds to the venue, is accessing a specific uniform resource locator (URL) that corresponds the venue, or any suitable combination thereof.
Inoperation540, thereference module280 of theidentification machine110 accesses a set of one or more reference fingerprints based on the performer identified inoperation530. Furthermore, the accessing of the reference fingerprints may be also based on the detected venue at which the live version is being performed, a current date, current time, or any suitable combination thereof. As noted above, thereference module280 may retrieve a list of audio pieces (e.g., playlist, concert program, a concert brochure, or concert poster) for the performer (e.g., based on the detected venue and at the current date and current time). Based on this retrieved list, thereference module280 may identify reference versions (e.g., official or canonical versions) of the audio pieces that correspond to the performer (e.g., and corresponding to the detected venue, the current date, the current time, or any suitable combination thereof). Thedatabase115 may store these reference fingerprints, which may have been previously generated from segments of the reference versions of the audio pieces. Among these reference fingerprints may be a reference fingerprint (e.g., a particular reference fingerprint) of a reference version of the audio piece of which a live version is currently being performed. The set of reference fingerprints may be accessed from thedatabase115, which may correlate (e.g., assign, map, or link) the reference fingerprint (e.g., the particular reference fingerprint) of the reference version with the identifier of the audio piece (e.g., as assigned in operation330). According to various example embodiments,operation540 may be performed at any point prior to operation550 (e.g., before the performance of the audio piece). In example embodiments, in whichoperation540 is performed prior to the beginning of the performance, the accessing of the reference fingerprints may be based on a scheduled date and time for the performance itself.
Inoperation550, thecomparison module290 of theidentification machine110 identifies the audio piece being performed by comparing the live fingerprint (e.g., accessed in operation520) to the set of reference fingerprints (e.g., accessed in operation540). In other words, thecomparison module290 may compare the live fingerprint of a segment of the live version to the reference fingerprints of segments of the reference versions. In some example embodiments, thecomparison module290 compares the live fingerprint exclusively (e.g., only) to the set of reference fingerprints or a subset thereof. This may have the effect of reducing computational complexity, increasing computational speed, increasing accuracy, or any suitable combination thereof. Based on this comparison, thecomparison module290 may identify a match between the live fingerprint and the reference fingerprint (e.g., the particular reference fingerprint) of the reference version of the audio piece of which the live version is currently being performed. Based on this identifying of the match, thecomparison module290 may identify the audio piece while its live version is being performed. In some example embodiments, the identified match between the live fingerprint and the reference fingerprint may be an imperfect match (e.g., a fuzzy match or a near match).
According to various example embodiments,operation550 includes performing an analysis of musically meaningful and unique features audio piece, and then performing a loose comparison that allows for differences in the playing and interpretation of the audio piece (e.g., different instrumentation, tempo, or intonation). In some example embodiments,operation550 includes determines harmonic and rhythmic elements from the live fingerprint and the set of reference fingerprints and compares these elements to find a most likely candidate match among the set of reference fingerprints. Such an analysis and comparison may be performed within a predetermined period of time (e.g., a 10 second window). In some situations, the analysis and comparison are performed in short segments (e.g., 3 second segments). The analysis and comparison may be performed until a single match (e.g., best candidate) is found, or until the analysis and comparison converge to obtain a stabilized list of a few candidate matches. For example, multiple candidate matches maybe identified in situations where the set of reference fingerprints includes reference fingerprints from multiple different recordings of the audio piece (e.g., studio recordings, live recordings, and variations, such as acoustic versions or extended remixes).
Inoperation560, theresult module260 of theidentification machine110 provides the identifier (e.g., as assigned in operation330) of the identified audio piece to the device140 (e.g., by the network190). The identifier may be provided with some or all of the metadata for the audio piece. For example, theresult module260 may communicate all or part of the identifier obtained inoperation310 and all or part of the metadata accessed inoperation332 to the device140 (e.g., for presentation thereon, in whole or in part, to the user142). Incorresponding operation561, thedevice140 accesses the information that was communicated from theresult module260 inoperation560. This may have the effect of providing the identifier of the audio piece in a response to the query for identification of the audio piece, during the performance of the live version of the audio piece. In example embodiments where the identified match between the live fingerprint and the reference fingerprint is an imperfect match (e.g., fuzzy match), the identifier may be provided as a candidate identifier (e.g., a proposed identifier) among multiple candidate identifiers (e.g., for confirmation by theuser142 via the device140). For example, a candidate identifier may be provided as part of a game (e.g., a trivia quiz) in which multiple users (e.g.,users132,142, and152) attempt to identify the audio piece by selecting the correct candidate identifier from among multiple candidate identifiers presented.
As mentioned above, the audio fingerprinting technique used (e.g., by the device140) for identifying the live version of the audio piece may be particularly well-suited for generating fingerprints that are robust to both audio degradations and audio variations, while still being compact enough for efficient matching. Such a fingerprint may be derived from a segment of an audio piece (e.g., a live segment or a reference segment) by first using a log-frequency spectrogram to capture the melodic similarity and handle key variations, and then using adaptive thresholding to reduce the feature size and handle noise degradations and local variations.
First, the segment to be transformed into a time-frequency representation, such as a log-frequency spectrogram based on the Constant Q Transform (CQT). The CQT is a transform with a logarithmic frequency resolution, similar to the human auditory system and consistent with the notes of the Western music scale. Accordingly, the CQT may be well-suited for music analysis. The CQT may handle key variations relatively easily, since pitch deviations correspond to frequency translations in the transform. According to certain example embodiments, the CQT is computed by using a fast algorithm based on the Fast Fourier Transform (FFT) in conjunction with the use of a kernel. Thus, a CQT-based spectrogram may be derived by using a time resolution of around 0.13 seconds per time frame and the frequency resolution up one quarter note per frequency channel, with a frequency range spanning from C3 (130.81 Hz) to C8 (4186.01 Hz), resulting in 120 frequency channels.
Next, the CQT-based spectrogram may be transformed into a binary image. According to various example embodiments, this is performed using adaptive thresholding method based on two-dimensional median filtering. Thresholding is a technique for image segmentation that uses a threshold value to turn a grayscale image into a binary image. In adaptive thresholding, the threshold value for each pixel of an image may be adapted based on local statistics of the pixel's neighborhood. For each time-frequency bin in the CQT-based spectrogram, given a window size, the median of the neighborhood may be computed. As an example, the window size may be 35 frequency channels by 15 time frames. Then, the value of the bin may be compared with the value of its median. If the value of the bin is higher than its median, the value of the bin may be assigned to 1. If otherwise, the value of the bin may be assigned to 0. This process may be restated as the following equation:
Accordingly, the CQT-based spectrogram may be clustered into foreground (e.g., with assigned values of one) where the energy is locally high, or background (e.g., with assigned values of zero) with the energy is locally low. The result may therefore be used as a compact fingerprint (e.g., a CQT-based fingerprint) that can handle noise degradations while still allowing local variations.
Such compact (e.g., CQT-based) fingerprints may be used to perform comparisons and matching between a query fingerprint and one or more reference fingerprints. As an example, template matching may be performed (e.g., by thecomparison module290 during operation550) between query and reference fingerprints by first using Hamming similarity to compare all pairs of time frames at different pitch shifts and handle key variations, and then using the Hough Transform to find the best alignment and handle tempo variations.
First, a similarity matrix may be computed between a query fingerprint and a reference fingerprint. As noted above, Hamming similarity may be calculated between all pairs of time frames in the query fingerprint and the reference fingerprints. The Hamming similarity is the percentage of pins that matches between two arrays (e.g., arrays of ones and zeroes). In some example embodiments, the query and reference fingerprints are converted according to the function ƒ(x)=2x−1. Then, the matrix product of the query and reference fingerprints may be computed. This matrix product may then be converted according to the function ƒ−1(x)=(x+1)/2, and each value may be normalized by the number of frequency channels in one fingerprint. Each bin in the resulting matrix then measures the Hamming similarity between any two pairs of time frames in the query and reference fingerprints. The similarity matrix for different pitch shifts in the query may also be computed. In some cases, a number of ±10 pitch shifts may be used (e.g., assuming a maximum key variation of ±5 semitones between a live performance and its studio version). This may have the effect of measuring the similarity of both the foregrounds and the backgrounds between the query and reference fingerprints, which may be beneficial in identifying an audio piece.
Next, the best alignment between the query fingerprint and the reference fingerprint may be identified. For example, the best alignment may correspond to a line that is at or near an angle of 45° in the similarity matrix and that intersects the bins with the largest calculated Hamming similarity. Such a line may be parametrically represented as ρ=x cos θ+y sin θ. As noted above, the Hough Transform may be used to determine the best alignment. The Hough Transform is a technique for detecting shapes (e.g., lines) in an image by building a parameter space matrix and identifying the parameter candidates that give the largest values. In some example embodiments, the similarity matrix computed above may be binarized based on a threshold value. The Hough Transform may then be computed, and the (ρ,θ) candidate that gives the largest normalized value in the space parameter matrix may be identified (e.g., as the highest overall Hamming similarity). As examples, the threshold value may be 0.6; a range for ρ may be equal to the number of time frames in the reference fingerprints; and a range for θ may be around −45°±5°, which may correspond to a number of ±10 time shifts (e.g., assuming a maximum tempo variation of ±20% between a live performance and its studio version). This may have the effect of identifying a short and noisy excerpt (e.g., recorded from a smartphone at a live performance) by comparing the excerpt to a database of studio recordings from a known performer or known artist. According to certain example embodiments, no hash functions are used in the above fingerprinting and matching techniques. This may have the effect of obtaining greater accuracy. In situations with relatively short queries (e.g., segments of audio less than 10 seconds in duration) and relatively small databases (e.g., 50-100 songs per artist or performer), the lack of hash functions may provide such increased accuracy without sacrificing system performance.
As shown inFIG. 6, themethod300 or portions thereof may include one or more ofoperations610,620,630,640,650, and660. One or more ofoperations610,620,630,640,650, and660 may be performed as part (e.g., a precursor task, a subroutine, or a portion) ofoperation310, in which theidentifier module210 obtains the identifier of the audio piece. Inoperation610, according to some example embodiments, theidentifier module210 receives the identifier in a user submission from theuser122 via the device120 (e.g., the first device). For example, theuser122 may be a manager, promoter, moderator, or other authoritative person for the event in which the live performance occurs, and theuser122 may submit the identifier to the network-based system105 (e.g., so thatother users132,142, and152 may be able to receive the identifier on theirrespective devices130,140, and150). In some example embodiments, the identifier is received from thedevice160 of theperformer162, themixer161, or any suitable combination thereof.
Inoperation620, according to certain example embodiments, theidentifier module210 receives some metadata of the audio piece (e.g., without the identifier of the audio piece) from the device120 (e.g., the first device, as a user submission). Such metadata may include one or more descriptors of the audio piece (e.g., an artist name, an album name, a release year, or a genre). For example, theuser122 may be an audience member that does not know the identifier of the audio piece, but knows at least some metadata of the audio piece (e.g., the artist name, the album name, the release year, the genre, or even a portion of the identifier of the audio piece). In such a situation, theuser122 may submit what he knows to the network-basedsystem105. This operation may be repeated for additional users (e.g., user152) to obtain additional metadata of the audio piece. The metadata received in operation620 (e.g., from one ormore users122 and152) may be a basis (e.g., a sufficient basis) for theidentifier module210 to obtain the identifier of the audio piece (e.g., from thedatabase115, which may correlate the metadata with the identifier of the audio piece). In some example embodiments, the metadata is received from thedevice160 of theperformer162, themixer161, or any suitable combination thereof.
Inoperation630, theidentifier module210 detects a geolocation of the device120 (e.g., the first device). This may be performed based on an indication that theuser122 has made thedevice120 available for location-based services (e.g., stored by thedatabase115 in a user profile for the user122). The detected geolocation may be a basis (e.g., a sufficient basis) for theidentifier module210 to obtain the identifier of the audio piece (e.g., from thedatabase115, which may correlate the location of the venue at which the audio piece is being performed with the identifier of the audio piece).
Inoperation640, theidentifier module210 queries thedatabase115 for the identifier of the audio piece. This query may be made based on the metadata of the audio piece received in operation620 (e.g., one or more descriptors of the audio piece), the geolocation of the device120 (e.g., the first device) detected inoperation630, or any suitable combination thereof.
Inoperation650, theidentifier module210 may have performed multiple instances ofoperation610 and received multiple submissions that attempt to submit the identifier of the audio piece (e.g., submissions that include both correct and incorrect identifiers). In situations where the multiple submissions are not unanimous, theidentifier module210 performsoperation650 by tallying votes for the identifier of the audio piece. For example, theidentifier module210 may count the quantity of submissions received for each distinct identifier. In some example embodiments, the identifier with the most votes is selected by theidentifier module210 as the identifier of the audio piece inoperation310. In alternative example embodiments, an identifier with less than the largest number of votes is selected based on results from one or more ofoperations620,630, and640. In some example embodiments, one or more of thedevices120,130,140, and150 may execute software that implements a game (e.g., a multiplayer quiz or trivia game) that solicits the multiple submissions that attempt to submit the identifier of the audio piece. For example, a game may challenge theusers122,132,142, and152 to win a prize (e.g., an album on CD) by correctly identifying multiple audio pieces performed by theperformer162 or by being the first to correctly identify all songs released on a specific album.
Inoperation660, theidentifier module210 accesses the identifier of the audio piece (e.g., directly or indirectly) from thedevice160 of theperformer162, themixer161, or any suitable combination thereof. For example, in a nightclub environment, thedevice160 may be a computer operated by a disc jockey (DJ) and configured to play the audio piece (e.g., execute the performance of the audio piece). As another example, themixer161 may be or include a computer that executes audio mixing software (e.g., programmed with a list of song names and start times). Theidentifier module210 may thus obtain (e.g., read) the identifier of the audio piece based on a playlist, current date, current time, or any suitable combination thereof. In some example embodiments, theidentifier module220 receives the identifier in response to an event within audio renderer that is executing on thedevice160, themixer161, or both. Examples of such an event include a play event, a stop event, a pause event, a scratch event, a playback position timer event, or any suitable combination thereof.
As shown inFIG. 7, themethod300 or portions thereof may include one or more ofoperations710,720,722,724, and726. In particular, example embodiments of themethod300 that include one or more ofoperations328 and428 may includeoperations710 and720. As noted above,operations328 and428 involve thedetermination module230 of theidentification machine110 determining that the performance of the audio piece is not done. This determination may be made by determining that one or more live fingerprints of segments of the audio piece being performed fail to indicate an end of the audio piece, an end of the performance of the audio piece, or both.
Inoperation710, thereception module220 of theidentification machine110 accesses (e.g., receives) one or more live fingerprints of segments of the audio piece. These live fingerprints may be received from one or more devices (e.g.,devices120,130,140, and150), and these received live fingerprints may be used by thedetermination module230 in performingoperation328,operation420, or both. Accordingly,operation710 may be performed any number of times betweenoperations310 and320 and any number of times betweenoperations310 and420.
Operation720 may be performed as part ofoperation328, in which thedetermination module230 determines that the performance of the audio piece is not done. In some example embodiments,operation720 may be performed as part ofoperation428, which is similar tooperation328. Inoperation720, thedetermination module230 determines that the live fingerprints received inoperation710 fail to indicate an end of the audio piece (e.g., that the fingerprints fail to indicate that the performance of the audio piece has ended). One or more ofoperations722,724, and726 may be performed as part ofoperation720.
Inoperation722, thedetermination module230 fails to detect silence beyond a threshold period of time (e.g., first threshold duration corresponding to a period of silence indicative of an end of a performance). Thus, the determination inoperation720 that the performance is not over may be based on an absence of silence that lasts longer than this threshold period of time.
Inoperation724, thedetermination module230 fails to detect applause beyond a threshold period of time (e.g., a second threshold duration corresponding to a period of clapping or cheering indicative of an end of the performance). Thus, the determination inoperation720 that the performance is not over may be based on an absence of applause that lasts longer than this threshold period of time.
Inoperation726, thedetermination module230 fails to detect booing beyond a threshold period of time (e.g., a third threshold duration corresponding to a period of groaning or jeering indicative of an end of the performance). Thus, the determination inoperation720 that the performance is not over may be based on an absence of booing that lasts longer than this threshold period of time.
As shown inFIG. 8, themethod300 or portions thereof may include one or more ofoperations831,832,833,834,835,836, and837. One or more of operations831-837 may be performed as part ofoperation530, in which theperformer module270 of theidentification machine110 may identify the performer by detecting the venue of the performance of the audio piece (e.g., a live performance of a live version of the audio piece).
Inoperation831, theperformer module270 accesses a geolocation (e.g., GPS coordinate) of the device140 (e.g., the third device) from which the live fingerprint was received inoperation520. In some example embodiments, the geolocation is received with the live fingerprint inoperation520.
Inoperation832, theperformer module270 accesses an identifier of a network at the venue (e.g., an IP address or a domain name of the network190) from the device140 (e.g., the third device) from which the live fingerprint was received inoperation520. Such a network may be or include a local wireless network at the venue. For example, the identifier may identify thenetwork190 to which thedevice140 is communicatively coupled. In some example embodiments, the identifier of thenetwork190 is received with the live fingerprint inoperation520.
Inoperation833, theperformer module270 accesses an image (e.g., a photo) of a ticket stub for an event that includes the live performance of the audio piece. For example, such an image may be generated (e.g., captured or taken) by a built-in camera within the device140 (e.g., the third device) from which the live fingerprint was received inoperation520. In some example embodiments, the image of the ticket stub is received with the live fingerprint inoperation520.
Inoperation834, theperformer module270 accesses a user preference for the venue (e.g., stored in a user profile of theuser142 within the database115). For example, thedatabase115 may store a user profile that indicates the venue is the closest of multiple available venues to a residence of theuser142, who is associated with (e.g., corresponds to) the device140 (e.g., the third device) from which the live fingerprint was received inoperation520. In some example embodiments, the user preference for the venue is received with the live fingerprint inoperation520.
Inoperation835, theperformer module270 accesses social network data of the user142 (e.g., stored within thedatabase115 or accessible via thenetwork190 from a third-party social network server). For example, thedatabase115 may store social network data descriptive of the user142 (e.g., status updates, microblog posts, images, comments, likes, favorites, or other public, private, or semiprivate publications to friends of the user142), and some or all of the social network data may reference the venue or otherwise indicate that theuser142 is located at the venue where the live performance is taking place at the current date and current time. Since theuser142 is associated with (e.g., corresponds to) the device140 (e.g., the third device) from which the live fingerprint was received inoperation520, theperformer module270 may detect the venue of the live performance based on the social network data of theuser142. In some example embodiments, the social network data is received with the live fingerprint inoperation520.
Inoperation836, theperformer module270 accesses a calendar event of the user142 (e.g., stored within thedatabase115 or accessible via thenetwork190 from a third-party calendar server). For example, thedatabase115 may store calendar data for the user142 (e.g., meetings, appointments, or other scheduled events), and the accessed calendar event may indicate that theuser142 is located at the venue where the live performance is taking place at the current date and current time. Since theuser142 is associated with (e.g., corresponds to) the device140 (e.g., the third device) from which the live fingerprint was received inoperation520, theperformer module270 may detect the venue of the live performance based on the calendar event of theuser142. In some example embodiments, the calendar event is received with the live fingerprint inoperation520.
Inoperation837, theperformer module270 accesses a purchase record (e.g., transaction record) of the user142 (e.g., stored within thedatabase115 or accessible via thenetwork190 from a third-party transaction server). For example, thedatabase115 may store purchase data for the user142 (e.g., transaction records for purchases made by the user142), and the purchase record may indicate that theuser142 purchased a ticket (e.g., from the venue) for an event at which the live performance is taking place at the current date and current time. Since theuser142 is associated with (e.g., corresponds to) the device140 (e.g., the third device) from which the live fingerprint was received inoperation520, theperformer module270 may detect the venue of the live performance based on the purchase record of theuser142. In some example embodiments, the purchase record is received with the live fingerprint inoperation520.
As shown inFIG. 9, themethod300 or portions thereof may includeoperation910, which in turn may include one or more ofoperations911,912,913,914,915, and916. According to various example embodiments,operation910 may be performed at any point prior tooperation540, in which thereference module280 of theidentification machine110 accesses the reference fingerprints. For example,operation910 may be performed prior to the beginning of the performance itself. In some example embodiments,operation910 is performed each time theperformer160 or an artist that originally recorded the audio piece releases new material (e.g., new recordings of audio pieces). In certain example embodiments,operation910 is performed periodically (e.g., at regularly scheduled intervals of time).
Inoperation910, thereference module280 of theidentification machine110 builds the set of reference fingerprints to be accessed inoperation540. Thereference module280 may do this by generating some or all of thedatabase115. One or more of operations911-960 may be performed as part ofoperation910.
Inoperation911, thereference module280 accesses a schedule for a venue at which an event that includes the live performance will be take place. For example, thereference module280 may access a venue schedule in the form of an event calendar (e.g., a concert calendar) for the venue, a playlist for the venue, an agenda for the venue, an advertisement (e.g., poster) for the venue, or any suitable combination thereof. The schedule may be accessed from information previously collected and stored in thedatabase115 or from a third-party server corresponding to the venue itself. According to various example embodiments, the accessed schedule may correlate the venue with theperformer162 of the audio piece, correlate the venue with an artist that recorded a reference version of the audio piece (e.g., an original artist that recorded a studio recording of the audio piece or a live recording of the audio piece), correlate the venue with a period of time during which the live fingerprint is received inoperation520, or any suitable combination thereof.
Inoperation912, thereference module280 determines (e.g., identifies) theperformer162 based on the schedule accessed inoperation911. For example, theperformer162 may be determined based on the artist being correlated with the venue by the schedule accessed inoperation911. As another example, theperformer162 may be determined based on the period of time during which the live fingerprint is received inoperation520 being correlated with the artist by the schedule. This determination of theperformer162 may enable theidentification machine110 to infer the likely audio pieces to be played and thus significantly reduce the number of possible audio pieces that may be performed during the live performance.
Inoperation913, thereference module280 accesses (e.g., retrieves) studio reference fingerprints of segments of studio recordings by an artist (e.g., original artist). In some example embodiments, the artist is theperformer162, though this need not be the case. The studio reference fingerprints may be accessed from information previously collected and stored in thedatabase115 or from a third-party server (e.g., corresponding to the venue, to the artist, to theperformer162, or any suitable combination thereof).
Inoperation914, thereference module280 accesses (e.g., retrieves) live reference fingerprints of segments of studio recordings by the artist (e.g., original artist). As noted above, the artist may be theperformer162, though this need not be the case. The live reference fingerprints may be accessed from information previously collected and stored in thedatabase115 or from a third-party server (e.g., corresponding to the venue, to the artist, to theperformer162, or any suitable combination thereof). In some example embodiments where theperformer162 is the artist, themixer161 is the source of one or more segments of a reference version of the audio piece whose live version is being performed, and one or more of the live reference fingerprints are generated (e.g., by the reference module280) from such segments received from themixer161. In addition, themixer161, thedevice160 of theperformer162, or both, may provide thereference module280 with metadata (e.g., at least some of the metadata accessed in operation332) that describes or identifies the audio piece, one or more live recordings of the audio piece, one or more studio recordings of the audio piece, or any suitable combination thereof (e.g., for storage in thedatabase115 and for access by the query module250).
Inoperation915, thereference module280 accesses (e.g., retrieves) a previously played playlist from a previously performed performance by the same artist (e.g., the performer162). This may enable theidentification machine110 to further infer the most likely audio pieces to be played and thus even further reduce the number possible audio pieces that may be performed during a live performance. According to some example embodiments, the previously played playlist may be a basis for weighting one or more of multiple candidate identifiers of the audio piece. Similarly, identifiers of audio pieces already performed during the current performance may be accorded lower weights or omitted from consideration, since it may be unlikely that theperformer162 will perform the same audio piece twice in one show, particularly, back-to-back or within a short time window (e.g., 20 minutes).
Inoperation916, thereference module280 accesses (e.g., retrieves) fingerprints for segments of likely or most likely audio pieces to be played by theperformer162. These accessed fingerprints may then be designated by thereference module280 as the set of reference fingerprints to be accessed inoperation540. As noted above, these accessed fingerprints may be stored in thedatabase115, for later use (e.g., in operation540).
According to various example embodiments, one or more of the methodologies described herein may facilitate identification of an audio piece during its performance. Moreover, one or more of the methodologies described herein may facilitate identification of the audio piece during performance of a live version of the audio piece, even where the live version differs from previously recorded versions of the audio piece. Hence, one or more the methodologies described herein may facilitate retrieval and presentation of information regarding the identified audio piece (e.g., its identifier and some or all of its metadata) to one or more audience members during performance of the same audio piece. Furthermore, one or more of the methodologies described herein may facilitate identification and tagging of recordings that were made during the performance.
When these effects are considered in aggregate, one or more of the methodologies described herein may obviate a need for certain efforts or resources that otherwise would be involved in identifying an audio piece during the performance of an audio piece. Efforts expended by a user may be reduced by one or more of the methodologies described herein. Computing resources used by one or more machines, databases, or devices (e.g., within the network environment100) may similarly be reduced. Examples of such computing resources include processor cycles, network traffic, memory usage, data storage capacity, power consumption, and cooling capacity.
FIG. 10 is a block diagram illustrating components of amachine1000, according to some example embodiments, able to readinstructions1024 from a machine-readable medium1022 (e.g., a non-transitory machine-readable medium, a machine-readable storage medium, a computer-readable storage medium, or any suitable combination thereof) and perform any one or more of the methodologies discussed herein, in whole or in part. Specifically,FIG. 10 shows themachine1000 in the example form of a computer system (e.g., a computer) within which the instructions1024 (e.g., software, a program, an application, an applet, an app, or other executable code) for causing themachine1000 to perform any one or more of the methodologies discussed herein may be executed, in whole or in part.
In alternative embodiments, themachine1000 operates as a standalone device or may be connected (e.g., networked) to other machines. In a networked deployment, themachine1000 may operate in the capacity of a server machine or a client machine in a server-client network environment, or as a peer machine in a distributed (e.g., peer-to-peer) network environment. Themachine1000 may be a server computer, a client computer, a personal computer (PC), a tablet computer, a laptop computer, a netbook, a cellular telephone, a smartphone, a set-top box (STB), a personal digital assistant (PDA), a web appliance, a network router, a network switch, a network bridge, or any machine capable of executing theinstructions1024, sequentially or otherwise, that specify actions to be taken by that machine. Further, while only a single machine is illustrated, the term “machine” shall also be taken to include any collection of machines that individually or jointly execute theinstructions1024 to perform all or part of any one or more of the methodologies discussed herein.
Themachine1000 includes a processor1002 (e.g., a central processing unit (CPU), a graphics processing unit (GPU), a digital signal processor (DSP), an application specific integrated circuit (ASIC), a radio-frequency integrated circuit (RFIC), or any suitable combination thereof), amain memory1004, and astatic memory1006, which are configured to communicate with each other via abus1008. Theprocessor1002 may contain microcircuits that are configurable, temporarily or permanently, by some or all of theinstructions1024 such that theprocessor1002 is configurable to perform any one or more of the methodologies described herein, in whole or in part. For example, a set of one or more microcircuits of theprocessor1002 may be configurable to execute one or more modules (e.g., software modules) described herein.
Themachine1000 may further include a graphics display1010 (e.g., a plasma display panel (PDP), a light emitting diode (LED) display, a liquid crystal display (LCD), a projector, a cathode ray tube (CRT), or any other display capable of displaying graphics or video). Themachine1000 may also include an alphanumeric input device1012 (e.g., a keyboard or keypad), a cursor control device1014 (e.g., a mouse, a touchpad, a trackball, a joystick, a motion sensor, an eye tracking device, or other pointing instrument), astorage unit1016, an audio generation device1018 (e.g., a sound card, an amplifier, a speaker, a headphone jack, or any suitable combination thereof), and anetwork interface device1020.
Thestorage unit1016 includes the machine-readable medium1022 (e.g., a tangible and non-transitory machine-readable storage medium) on which are stored theinstructions1024 embodying any one or more of the methodologies or functions described herein. Theinstructions1024 may also reside, completely or at least partially, within themain memory1004, within the processor1002 (e.g., within the processor's cache memory), or both, before or during execution thereof by themachine1000. Accordingly, themain memory1004 and theprocessor1002 may be considered machine-readable media (e.g., tangible and non-transitory machine-readable media). Theinstructions1024 may be transmitted or received over thenetwork190 via thenetwork interface device1020. For example, thenetwork interface device1020 may communicate theinstructions1024 using any one or more transfer protocols (e.g., hypertext transfer protocol (HTTP)).
In some example embodiments, themachine1000 may be a portable computing device, such as a smart phone or tablet computer, and have one or more additional input components1030 (e.g., sensors or gauges). Examples ofsuch input components1030 include an image input component (e.g., one or more cameras), an audio input component (e.g., a microphone), a direction input component (e.g., a compass), a location input component (e.g., a global positioning system (GPS) receiver), an orientation component (e.g., a gyroscope), a motion detection component (e.g., one or more accelerometers), an altitude detection component (e.g., an altimeter), and a gas detection component (e.g., a gas sensor). Inputs harvested by any one or more of these input components may be accessible and available for use by any of the modules described herein.
As used herein, the term “memory” refers to a machine-readable medium able to store data temporarily or permanently and may be taken to include, but not be limited to, random-access memory (RAM), read-only memory (ROM), buffer memory, flash memory, and cache memory. While the machine-readable medium1022 is shown in an example embodiment to be a single medium, the term “machine-readable medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, or associated caches and servers) able to store instructions. The term “machine-readable medium” shall also be taken to include any medium, or combination of multiple media, that is capable of storing theinstructions1024 for execution by themachine1000, such that theinstructions1024, when executed by one or more processors of the machine1000 (e.g., processor1002), cause themachine1000 to perform any one or more of the methodologies described herein, in whole or in part. Accordingly, a “machine-readable medium” refers to a single storage apparatus or device, as well as cloud-based storage systems or storage networks that include multiple storage apparatus or devices. The term “machine-readable medium” shall accordingly be taken to include, but not be limited to, one or more tangible (e.g., non-transitory) data repositories in the form of a solid-state memory, an optical medium, a magnetic medium, or any suitable combination thereof.
Throughout this specification, plural instances may implement components, operations, or structures described as a single instance. Although individual operations of one or more methods are illustrated and described as separate operations, one or more of the individual operations may be performed concurrently, and nothing requires that the operations be performed in the order illustrated. Structures and functionality presented as separate components in example configurations may be implemented as a combined structure or component. Similarly, structures and functionality presented as a single component may be implemented as separate components. These and other variations, modifications, additions, and improvements fall within the scope of the subject matter herein.
Certain embodiments are described herein as including logic or a number of components, modules, or mechanisms. Modules may constitute software modules (e.g., code stored or otherwise embodied on a machine-readable medium or in a transmission medium), hardware modules, or any suitable combination thereof. A “hardware module” is a tangible (e.g., non-transitory) unit capable of performing certain operations and may be configured or arranged in a certain physical manner. In various example embodiments, one or more computer systems (e.g., a standalone computer system, a client computer system, or a server computer system) or one or more hardware modules of a computer system (e.g., a processor or a group of processors) may be configured by software (e.g., an application or application portion) as a hardware module that operates to perform certain operations as described herein.
In some embodiments, a hardware module may be implemented mechanically, electronically, or any suitable combination thereof. For example, a hardware module may include dedicated circuitry or logic that is permanently configured to perform certain operations. For example, a hardware module may be a special-purpose processor, such as a field programmable gate array (FPGA) or an ASIC. A hardware module may also include programmable logic or circuitry that is temporarily configured by software to perform certain operations. For example, a hardware module may include software encompassed within a general-purpose processor or other programmable processor. It will be appreciated that the decision to implement a hardware module mechanically, in dedicated and permanently configured circuitry, or in temporarily configured circuitry (e.g., configured by software) may be driven by cost and time considerations.
Accordingly, the phrase “hardware module” should be understood to encompass a tangible entity, and such a tangible entity may be physically constructed, permanently configured (e.g., hardwired), or temporarily configured (e.g., programmed) to operate in a certain manner or to perform certain operations described herein. As used herein, “hardware-implemented module” refers to a hardware module. Considering embodiments in which hardware modules are temporarily configured (e.g., programmed), each of the hardware modules need not be configured or instantiated at any one instance in time. For example, where a hardware module comprises a general-purpose processor configured by software to become a special-purpose processor, the general-purpose processor may be configured as respectively different special-purpose processors (e.g., comprising different hardware modules) at different times. Software (e.g., a software module) may accordingly configure one or more processors, for example, to constitute a particular hardware module at one instance of time and to constitute a different hardware module at a different instance of time.
Hardware modules can provide information to, and receive information from, other hardware modules. Accordingly, the described hardware modules may be regarded as being communicatively coupled. Where multiple hardware modules exist contemporaneously, communications may be achieved through signal transmission (e.g., over appropriate circuits and buses) between or among two or more of the hardware modules. In embodiments in which multiple hardware modules are configured or instantiated at different times, communications between such hardware modules may be achieved, for example, through the storage and retrieval of information in memory structures to which the multiple hardware modules have access. For example, one hardware module may perform an operation and store the output of that operation in a memory device to which it is communicatively coupled. A further hardware module may then, at a later time, access the memory device to retrieve and process the stored output. Hardware modules may also initiate communications with input or output devices, and can operate on a resource (e.g., a collection of information).
The various operations of example methods described herein may be performed, at least partially, by one or more processors that are temporarily configured (e.g., by software) or permanently configured to perform the relevant operations. Whether temporarily or permanently configured, such processors may constitute processor-implemented modules that operate to perform one or more operations or functions described herein. As used herein, “processor-implemented module” refers to a hardware module implemented using one or more processors.
Similarly, the methods described herein may be at least partially processor-implemented, a processor being an example of hardware. For example, at least some of the operations of a method may be performed by one or more processors or processor-implemented modules. As used herein, “processor-implemented module” refers to a hardware module in which the hardware includes one or more processors. Moreover, the one or more processors may also operate to support performance of the relevant operations in a “cloud computing” environment or as a “software as a service” (SaaS). For example, at least some of the operations may be performed by a group of computers (as examples of machines including processors), with these operations being accessible via a network (e.g., the Internet) and via one or more appropriate interfaces (e.g., an application program interface (API)).
The performance of certain operations may be distributed among the one or more processors, not only residing within a single machine, but deployed across a number of machines. In some example embodiments, the one or more processors or processor-implemented modules may be located in a single geographic location (e.g., within a home environment, an office environment, or a server farm). In other example embodiments, the one or more processors or processor-implemented modules may be distributed across a number of geographic locations.
Some portions of the subject matter discussed herein may be presented in terms of algorithms or symbolic representations of operations on data stored as bits or binary digital signals within a machine memory (e.g., a computer memory). Such algorithms or symbolic representations are examples of techniques used by those of ordinary skill in the data processing arts to convey the substance of their work to others skilled in the art. As used herein, an “algorithm” is a self-consistent sequence of operations or similar processing leading to a desired result. In this context, algorithms and operations involve physical manipulation of physical quantities. Typically, but not necessarily, such quantities may take the form of electrical, magnetic, or optical signals capable of being stored, accessed, transferred, combined, compared, or otherwise manipulated by a machine. It is convenient at times, principally for reasons of common usage, to refer to such signals using words such as “data,” “content,” “bits,” “values,” “elements,” “symbols,” “characters,” “terms,” “numbers,” “numerals,” or the like. These words, however, are merely convenient labels and are to be associated with appropriate physical quantities.
Unless specifically stated otherwise, discussions herein using words such as “processing,” “computing,” “calculating,” “determining,” “presenting,” “displaying,” or the like may refer to actions or processes of a machine (e.g., a computer) that manipulates or transforms data represented as physical (e.g., electronic, magnetic, or optical) quantities within one or more memories (e.g., volatile memory, non-volatile memory, or any suitable combination thereof), registers, or other machine components that receive, store, transmit, or display information. Furthermore, unless specifically stated otherwise, the terms “a” or “an” are herein used, as is common in patent documents, to include one or more than one instance. Finally, as used herein, the conjunction “or” refers to a non-exclusive “or,” unless specifically stated otherwise.