BACKGROUNDA video is a stream of images that may be displayed to users to view entities in motion. A video may contain audio to be played when the image stream is being displayed. A video, including video data and audio data, may be stored in a video file in various forms. Examples of video file formats that store compressed video/audio data include MPEG (e.g., MPEG-2, MPEG-4), 3GP, ASF (advanced systems format), AVI (audio video interleaved), Flash Video, etc. Videos may be displayed by various devices, including computing devices and televisions that display the video based on video data stored in a storage medium (e.g., a digital video disc (DVD), a hard disk drive, a digital video recorder (DVR), etc.) or received over a network.
Closed captions may be displayed for videos to show a textual transcription of speech included in the audio portion of the video as it occurs. Closed captions may be displayed for various reasons, including to aid persons that are hearing impaired, to aid persons learning to read, to aid persons learning to speak a non-native language, to aid persons in an environment where the audio is difficult to hear or is intentionally muted, and to be used by persons who simply wish to read a transcript along with the program audio. Such closed captions, however, provide little other functionality with respect to a video being played.
SUMMARYThis Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
Methods, systems, and computer program products are provided for enabling the content of a video to be accessed and searched. A textual transcript of audio associated with a video is displayed along with the video. For instance, the textual transcript may be displayed in the form of a series of textual captions (closed captions) or in other form. The textual transcript is enabled to be searched according to search criteria. Portions of the transcript that match the search criteria may be highlighted, enabling those portions of the transcript to be accessed and viewed relatively quickly. Locations/play times in the video corresponding to the portions of the transcript that match the search criteria may also be indicated, enabling rapid navigation to those locations/play times.
In one method implementation, a user interface is generated to display at a computing device. A video display region of the user interface is generated that displays a video. A transcript display region of the user interface is generated that displays at least a portion of a transcript. The transcript includes one or more textual captions of audio associated with the video. A search interface is generated to display in the user interface, and is configured to receive one or more search terms from a user to be applied to the transcript.
As such, one or more search terms may be provided to the search interface by a user. One or more textual captions of the transcript that include the search term(s) are determined. One or more indications are generated to display in the transcript display region that indicate the determined textual captions that include the search term(s).
Still further, a graphical feature may be generated to display in the user interface having a length that corresponds to a time duration of the video. One or more indications may be generated to display at positions on the graphical feature to indicate times of occurrence of audio corresponding to textual caption(s) determined to include the search term(s).
Still further, a graphical feature may be generated to display in the user interface having a length that corresponds to a length of the transcript. One or more indications may be generated to display at positions on the graphical feature that indicate positions of occurrence in the transcript of textual caption(s) determined to include the search term(s).
Still further, a user may be enabled to interact with a textual caption displayed in the transcript display region to provide an edit to text of the textual caption and/or to annotate the textual caption. Furthermore, a user interface element may be displayed that enables a user to select a language from a plurality of languages for text of the transcript to be displayed in the transcript display region.
In another implementation, a video searching media player system is provided. The video searching media player system includes a media player, a transcript display module, and a search interface module. The media player plays a video in a video display region of a user interface. The video is included in a media object that further includes a transcript of audio associated with the video. The transcript includes a plurality of textual captions. The transcript display module displays at least a portion of the transcript in a transcript display region of the user interface. The displayed transcript includes at least one of the textual captions. The search interface module generates a search interface displayed in the user interface that is configured to receive one or more search terms from a user to be applied to the transcript.
The system may further include a search module. The search module determines one or more textual captions of the transcript that match the received search terms. The transcript display module generates one or more indications to display in the transcript display region that indicate the determined textual caption(s) that include the search term(s).
Computer program products containing computer readable storage media are also described herein that store computer code/instructions for enabling the content of videos to be searched, as well as enabling additional embodiments described herein.
Further features and advantages of the invention, as well as the structure and operation of various embodiments of the invention, are described in detail below with reference to the accompanying drawings. It is noted that the invention is not limited to the specific embodiments described herein. Such embodiments are presented herein for illustrative purposes only. Additional embodiments will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein.
BRIEF DESCRIPTION OF THE DRAWINGS/FIGURESThe accompanying drawings, which are incorporated herein and form a part of the specification, illustrate the present invention and, together with the description, further serve to explain the principles of the invention and to enable a person skilled in the pertinent art to make and use the invention.
FIG. 1 shows a block diagram of a user interface for a playing a video, displaying a transcript of the video, and enabling a search of the transcript, according to an example embodiment.
FIG. 2 shows a block diagram of a system that generates a transcript of a video, according to an example embodiment.
FIG. 3 shows a block diagram of a communications environment in which a media object is delivered to a computing device having a video searching media player system, according to an example embodiment.
FIG. 4 shows a block diagram of a computing device that includes a video searching media player system, according to an example embodiment.
FIG. 5 shows a flowchart providing a process for generating a user interface that displays a video, displays a transcript, and provides a transcript search interface, according to an example embodiment.
FIG. 6 shows a block diagram of a video searching media player system, according to an example embodiment.
FIG. 7 shows a flowchart providing a process for highlighting textual captions of a transcript of a video to indicate search results, according to an example embodiment.
FIG. 8 shows a block diagram of an example of the user interface ofFIG. 1, according to an embodiment.
FIG. 9 shows a flowchart providing a process for indicating play times of search results in a video, according to an example embodiment.
FIG. 10 shows a flowchart providing a process for indicating locations of search results in a transcript of a video, according to an example embodiment.
FIG. 11 shows a process that enables a user to edit a textual caption of a transcript of a video, according to an example embodiment.
FIG. 12 shows a process that enables a user to select a language of a transcript of a video, according to an example embodiment.
FIG. 13 shows a block diagram of an example computer that may be used to implement embodiments of the present invention.
The features and advantages of the present invention will become more apparent from the detailed description set forth below when taken in conjunction with the drawings, in which like reference characters identify corresponding elements throughout. In the drawings, like reference numbers generally indicate identical, functionally similar, and/or structurally similar elements. The drawing in which an element first appears is indicated by the leftmost digit(s) in the corresponding reference number.
DETAILED DESCRIPTIONI. IntroductionThe present specification discloses one or more embodiments that incorporate the features of the invention. The disclosed embodiment(s) merely exemplify the invention. The scope of the invention is not limited to the disclosed embodiment(s). The invention is defined by the claims appended hereto.
References in the specification to “one embodiment,” “an embodiment,” “an example embodiment,” etc., indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to effect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.
Furthermore, it should be understood that spatial descriptions (e.g., “above,” “below,” “up,” “left,” “right,” “down,” “top,” “bottom,” “vertical,” “horizontal,” “upper,” “lower,” etc.) used herein are for purposes of illustration only, and that practical implementations of the structures described herein can be spatially arranged in any orientation or manner.
Numerous exemplary embodiments of the present invention are described as follows. It is noted that any section/subsection headings provided herein are not intended to be limiting. Embodiments are described throughout this document, and any type of embodiment may be included under any section/subsection.
II. Example EmbodimentsConsumers of videos face challenges with respect to the videos, especially technical videos. For instance, how does a user know whether information desired by the user (e.g., an answer to a question, etc.) is included in the information provided by a video? Furthermore, if the desired information is included in the video, how does the user navigate directly to the information? Still further, if the voice audio of a video is not in a language that is familiar to the user, how can the user even use the video? Video content is locked into a timeline of the video, so even if a user believes the information that they desire is included in the video, the user has to guess where the content is in time in the video, and manually advance the video to the guessed location. Due to these deficiencies of videos, content publishers suffer from low return on investment (ROI) on their video content because search engines can only access limited metadata associated with the video (e.g., a record time and date for the video, etc.).
Embodiments overcome these deficiencies of videos, enabling users and search engines to quickly and confidently view, search, and share the content contained in videos. According to embodiments, a user interface is provided that enables a textual transcript of audio associated with a video to be searched according to search criteria. Text in the transcript that matches the search criteria may be highlighted, enabling the text to be accessed quickly. Furthermore, locations in the video corresponding to the text matching the search criteria may be indicated, enabling rapid navigation to those locations in the video. As such, users are enabled to rapidly find information located in a video by searching through the transcript of the audio content.
Embodiments provide content publishers with benefits, including improved crawling and indexing of their content, which can improve content ROI through discoverability. Search, navigation, community, and social features are provided that can be applied to a video through the power of captions.
Embodiments enable various features, including time-stamped search relevancy, tools that enhance discovery of content within videos, aggregation of related content based on video content, deep linking to other content, and multiple layers of additional metadata that drive a rich user experience.
As described above, in embodiments, users may be enabled to search the content of videos, such as by interacting with a user interface. Such a user interface may be implemented in various ways. For instance,FIG. 1 shows a block diagram of auser interface102 for a playing a video, displaying a transcript of the video, and enabling a search of the transcript, according to an example embodiment. As shown inFIG. 1,user interface102 includes avideo display region104, atranscript display region106, and asearch interface108.User interface102 and its features are described as follows.
User interface102 may be displayed by a display screen associated with a device. As shown inFIG. 1,video display region104 displays avideo110 that is being played. In other words, a stream of images of a video is displayed invideo display region104 asvideo110.Transcript display region106 displays atranscript112, which is a textual transcript of audio associated withvideo110. For instance,transcript112 may include one or more textual captions of the audio associated withvideo110, such as a firsttextual caption114a, a secondtextual caption114b, and optionally further textual captions (e.g., closed captions). Each textual caption may correspond to a full spoken sentence, or a portion of a spoken sentence. Depending on the length oftranscript112, all oftranscript112 may be visible intranscript display region106 at any particular time, or a portion oftranscript112 may be visible in transcript display region106 (e.g., a subset of the textual captions of transcript112). During normal operation, whenvideo110 is playing invideo display region104, a textual caption oftranscript112 may be displayed intranscript display region106 that corresponds to the audio ofvideo110 that is concurrently/synchronously playing. For instance, the textual caption of currently playing audio may be displayed at the top oftranscript display region106, and may automatically scroll downward (e.g., in a list of textual captions) when a next textual caption is displayed that corresponds to the next currently playing audio. The textual caption corresponding to currently playing audio may also optionally be displayed invideo display region104 over a portion ofvideo110.
Search interface108 is displayed inuser interface102, and is configured to receive one or more search terms (search keywords) from a user to be applied totranscript112. For instance, a user that is interacting withuser interface102 may type or otherwise enter search criteria that includes one or more search terms into a user interface element ofsearch interface108 to havetranscript112 accordingly searched. Simple word searches may be performed, such that the user may enter one or more words intosearch interface102, and those one or more words are searched for intranscript112 to generate search results. Alternatively, more complex searches may be performed, such that the user may enter one or more words as well as one or more search operators (e.g., Boolean operators such as “OR”, “AND”, “ANDNOT”, etc.) to form a search expression (that may or may not be nested) that is applied totranscript112 to generate search results. As described in further detail below, the search results may be indicated intranscript112, such as by highlighting specific text and/or specific textual captions that match the search criteria.
Search interface108 may have any form suitable to enable a user to provide search criteria. For instance,search interface108 may include one or more of any type of suitable graphical user interface element, such as a text entry box, a button, a pull down menu, a pop-up menu, a radio button, etc. to enable search criteria to be provided, and a corresponding search to be executed. A user may interact withsearch interface108 in any manner, including a keyboard, a thumb wheel, a pointing device, a roller ball, a stick pointer, a touch sensitive display, any number of virtual interface elements, a voice recognition system, etc.
User interface102 may be a user interface generated by any type of application, including a web browser, a desktop application, a mobile “app” or other mobile device application, and/or any other application. For instance, in a web browser example,user interface102 may be shown on a web page, andvideo display region104,transcript display region106, andsearch interface108 may each be portions of the web page (e.g., panels, frames, etc.). In the example ofFIG. 1,video display region104 is positioned in a left side ofuser interface102,transcript display region106 is shown positioned in a bottom-right side ofuser interface102, andsearch interface108 is shown positioned in a top-right side ofuser interface102. This arrangement ofvideo display region104,transcript display region106, andsearch interface108 inuser interface102 is provided for purposes of illustration, and is not intended to be limiting. In further embodiments,video display region104,transcript display region106, andsearch interface108 may be positioned and sized inuser interface108 in any manner, as desired for a particular application.
Transcript112 may be generated in any manner, including being generated offline (e.g., prior to playing ofvideo110 to a user) or in real-time (e.g., during play ofvideo110 to a user).FIG. 2 shows a block diagram of atranscript generation system200 that generates a transcript of a video, according to an example embodiment. As shown inFIG. 2,system200 includes atranscript generator202 that receives avideo object204.Video object204 is formed of one or more files that contain a video and audio associated with the video. Examples of compressed video file formats forvideo object204 include MPEG (e.g., MPEG-2, MPEG-4), 3GP, ASF (advanced systems format) (which may encapsulate video in WMV (Windows Media Video) format and audio in WMA (Windows Media Audio) format), AVI (audio video interleaved), Flash Video, etc.Transcript generator202 receivesvideo object204, and generates a transcript of the audio ofvideo object204. For instance, as shown inFIG. 2,transcript generator202 may generate amedia object206 that includesvideo208,audio210, and atranscript212.Video208 is the video ofvideo object204,audio210 is the audio ofvideo object204, andtranscript212 is a textual transcription of the audio ofvideo object204.Transcript212 is an example oftranscript112 ofFIG. 1, and may include the audio ofvideo object204 in the form of text in any manner, including as a list of textual captions.Transcript generator202 may generate media object206 in any form, including according to file formats such as MPEG, 3GP, ASF, AVI, Flash Video, etc.
Transcript generator202 may generate media object206 in any manner, including according to commercially available or proprietary transcription techniques. For instance, in an embodiment,transcript generator202 may implement a speech-to-text translator and/or speech recognition techniques to generatetranscript212 from audio ofvideo object204. In embodiments,transcript generator202 may implement speech recognition based on Hidden Markov Models, dynamic time warping, and/or neural networks. In one embodiment,transcript generator202 may implement the Microsoft® Research Audio Video Indexing System (MAVIS), developed by Microsoft Corporation of Redmond, Wash. MAVIS includes a set of software components that use speech recognition technology to recognize speech, and thereby can be used to generatetranscript212 to include a series of closed captions. In an embodiment, confidence ratings may also be generated (e.g., by MAVIS, or by other technique) that indicate a confidence in an accuracy of a translation of speech-to-text bytranscript generator202. A confidence rating may be generated for and associated with each textual caption or other portion oftranscript212, for instance. A confidence rating may or may not be displayed with the corresponding textual caption intranscript display region106, depending on the particular implementation.
Media objects that include video, audio, and audio transcripts may be received at devices for playing and searching in any manner For instance,FIG. 3 shows a block diagram of acommunications environment300 in which amedia object312 is delivered to acomputing device302 having a video searchingmedia player system314, according to an example embodiment. As shown inFIG. 1,environment300 includescomputing device302, acontent server304,storage306, and anetwork308. Environment100 is provided as an example embodiment, and embodiments may be implemented in alternative environments. Environment100 is described as follows.
Content server304 is configured to serve content to user computers, and may be any type of computing device capable of serving content.Computing device302 may be any type of stationary or mobile computing device, including a desktop computer (e.g., a personal computer, etc.), a mobile computer or computing device (e.g., a Palm® device, a RIM Blackberry® device, a personal digital assistant (PDA), a laptop computer, a notebook computer, a tablet computer (e.g., an Apple iPad™), a netbook, etc.), a mobile phone (e.g., a cell phone, a smart phone such as an Apple iPhone, a Google Android™ phone, a Microsoft Windows® phone, etc.), or other type of stationary or mobile device.
Asingle content server304 and asingle computing device302 are shown inFIG. 3 for purposes of illustration. However, any number ofcomputing devices302 andcontent servers304 may be present inenvironment300, including tens, hundreds, thousands, and even greater numbers ofcomputing devices302 and/orcontent servers304.
Computing device302 andcontent server304 are communicatively coupled bynetwork308.Network308 may include one or more communication links and/or communication networks, such as a PAN (personal area network), a LAN (local area network), a WAN (wide area network), or a combination of networks, such as the Internet.Computing device302 andcontent server304 may be communicatively coupled tonetwork308 using various links, including wired and/or wireless links, such as IEEE 802.11 wireless LAN (WLAN) wireless links, Worldwide Interoperability for Microwave Access (Wi-MAX) links, cellular network links, wireless personal area network (PAN) links (e.g., Bluetooth™ links), Ethernet links, USB links, etc.
As shown inFIG. 3,storage306 is coupled tocontent server304.Storage306 stores any number of media objects310. At least some ofmedia objects310 may be similar tomedia object206, including video, associated audio, and an associated textual transcript of the audio.Content server304 may accessstorage306 formedia objects310 to transmit to computing devices in response to requests.
For instance, in an embodiment,computing device302 may transmit a request (not shown inFIG. 3) throughnetwork308 tocontent server304 for a media object. A user ofcomputing device302 may desire to play and/or interact with the media object using video searchingmedia player system314. In response,content server304 may access the media object identified in the request fromstorage306, and may transmit the media object tocomputing device302 throughnetwork308 as media object312. As shown inFIG. 3,computing device302 receivesmedia object312, which may be provided to video searchingmedia player system314.Media object312 may be transmitted bycontent server304 according to any suitable communication protocol, such as TCP/IP (Transmission Control Protocol/Internet Protocol), User Datagram Protocol (UDP), etc., and according to any suitable file transfer protocol, such as FTP (File Transfer Protocol), HTTP (Hypertext Transfer Protocol), etc.
Video searchingmedia player system314 is capable of playing a video ofmedia object312, playing the associated audio, and displaying the transcript ofmedia object312. Furthermore, video searchingmedia player system314 provides search capability for searching the transcript ofmedia object312. For instance, in an embodiment, video searchingmedia player system314 may generate a user interface similar touser interface102 ofFIG. 1 to enable searching of video content.
Video searchingmedia player system314 may be configured in various ways to perform its functions. For instance,FIG. 4 shows a block diagram of acomputing device400 that enables searching of video content, according to an example embodiment. As shown inFIG. 4,computing device400 includes a video searchingmedia player system402 and adisplay device404. Furthermore, video searchingmedia player system402 includes amedia player406, atranscript display module408, and asearch interface module410. Video searchingmedia player system402 is an example of video searchingmedia player system314 ofFIG. 3, andcomputing device400 is an example ofcomputing device302 ofFIG. 3.
As shown inFIG. 4, video searchingmedia player system402 receivesmedia object312. Video searchingmedia player system402 is configured to generateuser interface102 to display a video ofmedia object312, to view a transcript of audio associated with the displayed video, and to search the transcript for information. Video searchingmedia player system402 is further described as follows with respect toFIG. 5.FIG. 5 shows aflowchart500 providing a process for generating a user interface that displays a video, displays a transcript, and provides a transcript search interface, according to an example embodiment. In an embodiment, video searchingmedia player system402 may operate according toflowchart500. Video searchingmedia player system402 andflowchart500 are described as follows. Further structural and operational embodiments will be apparent to persons skilled in the relevant art(s) based on the following description of video searchingmedia player system402 andflowchart500.
Flowchart500 begins withstep502. Instep502, a user interface is displayed at a computing device. As described above, in an embodiment, video searchingmedia player system402 may generateuser interface102 to be displayed bydisplay device404.Display device404 may include any suitable type of display, such as a cathode ray tube (CRT) display (e.g., in the case wherecomputing device400 is a desktop computer), a liquid crystal display (LCD) display, a light emitting diode (LED) display, a plasma display, or other display type.User interface102 enables a video of media object312 to be played, displays a textual transcript of the playing video, and enables the transcript to be searched.Steps504,506, and508 further describe these features of step502 (and therefore steps504,506, and508 may be considered to be processes performed duringstep502 offlowchart500, in an embodiment).
Instep504, a video display region of the user interface is generated that displays a video. For instance, in an embodiment,media player406 may play video110 (of media object312) in a region designated asvideo display region104 ofuser interface102.Media player406 may be configured in any suitable manner to playvideo110. For instance,media player406 may include a proprietary video player or a commercially available video player, such as Windows Media Player developed by Microsoft Corporation of Redmond, Wash., QuickTime® developed by Apple Inc. of Cupertino, Calif., etc.Media player406 may also play the audio associated withvideo110 synchronously withvideo110.
Instep506, a transcript display region of the user interface is generated that displays at least a portion of a transcript. For instance, in an embodiment,transcript display module408 may display all or a portion of transcript112 (of media object312) in a region designated astranscript display region106 ofuser interface102.Transcript display module408 may be configured in any suitable manner to displaytranscript112. For instance,transcript display module408 may include a proprietary or commercially available module configured to display scrollable text.
Instep508, a search interface is generated that is displayed in the user interface, and that is configured to receive one or more search terms from a user to be applied to the transcript. For example, in an embodiment,search interface module410 may generatesearch interface108 to be displayed inuser interface102. As described above,search interface108 is configured to receive one or more search terms and/or other search criteria from a user to be applied totranscript112.Search interface module410 may be configured in any suitable manner to generatesearch interface108 for display, including using user interface elements that are included in commercially available operating systems and/or browsers, and/or according to other techniques.
In this manner, a user interface may be generated for playing a selected video, displaying a transcript associated with the selected video, and displaying a search interface for searching the transcript. The above example embodiments ofuser interface102, video searchingmedia player system314, video searchingmedia player system402, andflowchart500 are provided for illustrative purposes, and are not intended to be limiting. User interfaces for accessing video content, methods for generating such user interfaces, and video searching media player systems may be implemented in other ways, as would be apparent to persons skilled in the relevant art(s) from the teachings herein.
It is noted that as shown inFIG. 4, video searchingmedia player system402 may be included incomputing device400 that is accessed locally by a user. In other embodiments, one or more of the components of video searchingmedia player system402 may be located remotely from computing device400 (e.g., in content server304), such as in a cloud-based implementation.
In embodiments, video searchingmedia player system402 may be configured with further functionality, including search capability, caption editing capability, and techniques for indicating the locations of search terms in videos. For instance,FIG. 6 shows a block diagram of video searchingmedia player system402, according to an example embodiment. As shown inFIG. 6, video searchingmedia player system402 includesmedia player406,transcript display module408,search interface module410, asearch module602, a captionplay time indicator604, acaption location indicator606, and acaption editor608. The elements of video searchingmedia player system402 shown inFIG. 6 are described as follows.
Search module602 is configured apply the search criteria received at search interface108 (FIG. 1) from a user totranscript112 to determine search results.Search module602 may be configured in various ways to apply search criteria totranscript112 to generate search results. In embodiments, simple word searches may be performed bysearch module602. For instance, in an embodiment,search module602 may determine one or more textual captions oftranscript112 that include one or more search terms that are provided by the user to searchinterface108. The determined one or more textual captions may be provided as search results.
Alternatively, even more complex searches may be performed bysearch module602. For instance, a user may enter search operators (e.g., Boolean operators such as “OR”, “AND”, “ANDNOT”, etc.) in addition to search terms to form a search expression that may applied totranscript112 bysearch module602 to generate search results. In still further embodiments,search module602 may indextranscript112 in a similar manner to a search engine indexing a document. In this manner, the media object (e.g., video) that is associated withtranscript112 may show up in search results for searches performed by a search engine. In such an embodiment,search module602 may include a search engine that indexes a plurality of documents (e.g., documents of the World Wide Web) includingtranscript112.
In an embodiment,search module602 may operate according toFIG. 7.FIG. 7 shows aflowchart700 providing a process for highlighting textual captions of a transcript of a video that includes search results, according to an example embodiment. In an embodiment,search module602 may performflowchart700.Search module602 andflowchart700 are described as follows. Further structural and operational embodiments will be apparent to persons skilled in the relevant art(s) based on the following description offlowchart700.
Flowchart700 begins withstep702. Instep702, at least one search term provided to the search interface is received. For instance, as described above, a user may input one or more search terms to searchinterface108. For example, the user may type in the words “red corvette,” or other search terms of interest.
Instep704, one or more textual captions of the transcript is/are determined that include the at least one search term. Referring toFIG. 6, in an embodiment,search module602 may receive the search term(s) fromsearch interface module410.Search module602 may search through the transcript displayed bytranscript display module408 for any occurrences of the search term(s), and may generate search results that indicate the occurrences of the search term(s).Search module602 may indicate the location(s) in the transcript of the search term(s) in any manner, including by timestamp, word-by-word, by textual caption (e.g., where each textual caption has an associated identifier), by sentence, by paragraph, and/or in another manner Furthermore,search module602 may indicate the play time invideo110 in which the search term is found by the play time (timestamp) of the corresponding word, textual caption, sentence, paragraph, etc., invideo110.Search module602 may store the determined locations and play times for each search result in storage associated with video searching media player system402 (e.g., memory, etc.), as described elsewhere herein.
Instep706, one or more indications are generated to display in the transcript display region that indicate the determined one or more textual captions. Referring toFIG. 6, in an embodiment,search module602 may provide the search results totranscript display module408.Transcript display module408 may receive the search results, and may generate one or more indications for display intranscript display region106 to display the search results. For instance, in embodiments,transcript display module408 may show each occurrence of the search term(s), and/or may highlight the sentence, textual caption, paragraph, and/or other transcript portion that includes one or more occurrence of the search term(s).Transcript display module408 may indicate the search results intranscript display region106 in any manner, including by applying an effect totranscript112 such as bold text, italicized text, a color of text, a size of text, highlighting a block of text such as a sentence, a textual caption, a paragraph, etc. (e.g., by showing the text in a rectangular or other shaped shaded/colored block, etc.), and/or using any other technique to highlight the search results intranscript112.
For example,FIG. 8 shows a block diagram of auser interface800, according to an embodiment.User interface800 is an example ofuser interface102 ofFIG. 1. As shown inFIG. 8,user interface800 includesvideo display region104,transcript display region106, andsearch interface108.Video display region104 displays avideo110 that is being played. As shown inFIG. 8,video display region104 may include one or more user interface controls, such as a “play”button814 and/or other user interface elements (e.g., a pause button, a fast forward button, a rewind button, a stop button, etc.) that may be used to control the playing ofvideo110. Furthermore,video display region104 may display a textual caption818 (e.g., overlaid onvideo110, or elsewhere) that corresponds to audio currently being played synchronously with video110 (e.g., via one or more speakers).Transcript display region106 displays an example oftranscript112, wheretranscript112 includes first-sixth textual captions114a-114f. Furthermore,search interface108 includes atext entry box802 and asearch button804. According to step702 ofFIG. 7, a user may enter one or more search terms intotext entry box802, and may interact with (e.g., click on, using a mouse, etc.)search button804 to cause a search oftranscript112 to be performed.
In the example ofFIG. 8, a user entered the search term “Javascript” intotext entry box802 and interacted withsearch button804 to cause a search oftranscript112 to be performed. As a result, according to step704 ofFIG. 7,search module602 performs a search oftranscript112 for the search term “Javascript.”
In the example ofFIG. 8, three search results were found bysearch module602 intranscript112 for the search term “Javascript.” According to step706 ofFIG. 7,transcript display module408 has generated rectangular gray boxes to indicate the search results intranscript112 for the user to see. As shown inFIG. 7,textual caption114aincludes the text “and Javascript is only one of the eight subsystems,”textual caption114cincludes the text “We completely re-architected our Javascript engine,” andtextual caption114dincludes the text “so that Javascript applications are extremely fast,” each of which include an occurrence of the word “Javascript.” As such,transcript display module408 has generated first-third indications814a-814cas rectangular gray boxes that overlaytextual captions114a,114c, and114d, respectively, to indicate that the search term “Javascript” was found in each oftextual captions114a,114c, and114d.
As such, a user is enabled to perform a search of a transcript associated with a video, thereby enabling the user to search the contents of the video. As described above, results of the search may be indicated in the transcript, and the user may be enabled to scroll, page, or otherwise move forwards and/or backwards through the transcript to view the search results. In embodiments, further features may be provided to enable the user to more rapidly ascertain a frequency of search terms appearing in the transcript, to determine a location of the search terms in the transcript, and to move to locations of the transcript that include the search terms.
For example, in an embodiment, a user interface element may be displayed that indicates locations of search results in a time line of the video associated with the transcript. For instance,FIG. 9 shows aflowchart900 providing a process for indicating play times of a video for search results, according to an example embodiment. In an embodiment,flowchart900 may be performed by captionplay time indicator604. Captionplay time indicator604 andflowchart900 are described as follows. Further structural and operational embodiments will be apparent to persons skilled in the relevant art(s) based on the following description of captionplay time indicator604 andflowchart900.
Flowchart900 begins withstep902. Instep902, a graphical feature is generated to display in the user interface having a length that corresponds to a time duration of the video. For example,FIG. 8 shows a firstgraphical feature806 having a rectangular shape, being positioned belowvideo110 invideo display region104, and having a length that is approximately the same as a width of the displayedvideo110 invideo display region104. In an embodiment, the length of firstgraphical feature806 corresponds to a time duration ofvideo110. For instance, ifvideo110 has a total time duration of 20 minutes, each position along the length of firstgraphical feature806 corresponds to a time during the time duration of 20 minutes. The left most position of firstgraphical feature806 corresponds to a time zero ofvideo110, the right most position of firstgraphical feature806 corresponds to the 20 minute time ofvideo110, and each position in between of firstgraphical feature806 corresponds to a time ofvideo110 between zero and 20 minutes, with the time ofvideo110 increasing when moving from left to right along firstgraphical feature806.
Instep904, at least one indication is generated to display at a position on the graphical feature that indicates a time of occurrence of audio corresponding to a textual caption determined to include the at least one search term. In an embodiment, captionplay time indicator604 may receive the play time(s) invideo110 for the search result(s) from search module602 (or directly from storage). For instance, captionplay time indicator604 may receive a timestamp invideo110 for each textual caption that includes a search term. In an embodiment, captionplay time indicator604 is configured to generate an indication that is displayed on firstgraphical feature806 for the search result(s) at each play time. Any type of indication may be displayed on firstgraphical feature806, including an arrow, a letter, a number, a symbol, a color, etc., to indicate the play time for a search result. For instance, as shown inFIG. 8, first-third vertical bar indications808a-808care shown displayed on firstgraphical feature806 to indicate the play times fortextual captions114a,114c, and114d, each of which were determined to include the search term “Javascript.”
Thus, firstgraphical feature806 indicates the locations/play times in a video corresponding to the portions of a transcript of the video that match search criteria. A user can view the indications displayed on firstgraphical feature806 to easily ascertain the locations in the video of matching search terms. In an embodiment, the user may be enabled to interact with firstgraphical feature806 to cause the display/playing ofvideo110 to switch to a location of a matching search term. For instance, the user may be enabled to “click” on an indication displayed on firstgraphical feature806 to cause play ofvideo110 to occur at the location of the indication. In another embodiment, the user may be enabled to “slide” a video play position indicator along firstgraphical feature806 to the location of an indication to cause play ofvideo110 to occur at the location of the indication. In other embodiments, the user may be enabled to cause the display/playing ofvideo110 to switch to a location of a matching search term in other ways.
For instance, in the example ofFIG. 8, the user may be enabled in this manner to cause the display/playing ofvideo110 to switch to a play time of any ofindications808a,808b, and808c(FIG. 8), where a corresponding textual caption oftranscript112 ofvideo110 contains the search term of “Javascript.”
In another embodiment, a user interface element may be displayed that indicates locations of search results in the transcript. For instance,FIG. 10 shows aflowchart1000 providing a process for indicating locations of search results in a transcript of a video, according to an example embodiment. In an embodiment,flowchart1000 may be performed bycaption location indicator606.Caption location indicator606 andflowchart1000 are described as follows. Further structural and operational embodiments will be apparent to persons skilled in the relevant art(s) based on the following description ofcaption location indicator606 andflowchart1000.
Flowchart1000 begins withstep1002. Instep1002, a graphical feature is generated to display in the user interface having a length that corresponds to a length of the transcript. For example,FIG. 8 shows a secondgraphical feature810 having a rectangular shape, being positioned adjacent totranscript112 intranscript display region106, and having a length that is approximately the same as a height of the displayed portion oftranscript112 intranscript display region106. In an embodiment, the length of secondgraphical feature810 corresponds to a length of transcript112 (including a portion oftranscript112 that is not displayed in transcript display region106). For instance, iftranscript112 includes one hundred textual captions, each position along the length of secondgraphical feature810 corresponds to a particular textual caption of the one hundred textual captions. A first (e.g., upper most) position of secondgraphical feature810 corresponds to a first textual caption oftranscript112, a last (e.g., lower most) position of secondgraphical feature810 corresponds to the one hundredth textual caption oftranscript112, and each position in-between of secondgraphical feature810 corresponds to a textual transcript oftranscript112 between the first and last textual transcripts, with the number of the textual transcript (in order) intranscript112 increasing when moving from top to bottom along secondgraphical feature810.
Instep1004, at least one indication is generated to display at a position on the graphical feature that indicates a position of occurrence in the transcript of the textual caption determined to include the at least one search term. In an embodiment,caption location indicator606 may receive the location of the textual captions (e.g., by identifier and/or timestamp) intranscript112 for the search result(s) from search module602 (or directly from storage). In an embodiment,caption location indicator606 is configured to generate an indication that is displayed on secondgraphical feature810 at each of the locations. Any type of indication may be displayed on secondgraphical feature810, including an arrow, a letter, a number, a symbol, a color, etc., to indicate the location for a search result. For instance, as shown inFIG. 8, first-third horizontal bar indications812a-812care shown displayed on secondgraphical feature810 to indicate the locations oftextual captions114a,114c, and114d, intranscript112, each of which were determined to include the search term “Javascript.”
Thus, secondgraphical feature810 indicates the locations in a transcript that match search criteria. A user can view the indications displayed on secondgraphical feature810 to easily ascertain the locations in the transcript of the matching search terms. In an embodiment, the user may be enabled to interact with secondgraphical feature810 to cause the display oftranscript112 intranscript display region106 to switch to a location of a matching search term. For instance, the user may be enabled to “click” on an indication displayed on secondgraphical feature810 to causetranscript display region106 to display the portion oftranscript112 at the location of the indication. In another embodiment, the user may be enabled to “slide” a scroll bar along secondgraphical feature810 to overlap the location of an indication to cause the portion oftranscript112 at the location of the indication to be displayed. For instance, one or more textual captions may be displayed, including a textual caption that includes a search term indicated by the indication. In other embodiments, the user may be enabled to cause the display oftranscript112 to switch to a location of a matching search term in other ways.
For instance, in the example ofFIG. 8, the user may be enabled in this manner to cause the display oftranscript112 to switch to displaying the textual caption associated with any ofindications812a,812b, and812c(FIG. 8).
In another embodiment, users may be enabled to edit textual captions of a transcript. In this manner, the accuracy of the speech-to-text transcription of transcripts may be improved. For instance,FIG. 11 shows astep1102 that enables a user to edit a textual caption of a transcript of a video, according to an example embodiment. In an embodiment,step1102 may be performed bycaption editor608.
Instep1102, a user is enabled to interact with a textual caption displayed in the transcript display region to provide an edit to text of the textual caption. In embodiments,caption editor608 may enable a textual caption to be edited in any manner. For instance, in an embodiment, the user may use a mouse pointer or other mechanism for interacting with a textual caption displayed intranscript display region106. The user may hover the mouse pointer over a textual caption that the user selects to be edited, such astextual caption114bshown inFIG. 8, which may causecaption editor608 to generate an editor interface for editing text oftextual caption114b, or may interact in another suitable way. The user may edit the text oftextual caption114bin any manner, including by deleting text and/or adding new text (e.g., by typing, by voice input, etc.). The user may be enabled to save the edited text by interacting with a “save” button or other user interface element. The edited text may be saved intranscript112 in place of the previous text, and the previous text is deleted, or the previous text may be saved in an edit history fortranscript112, in embodiments. During subsequent viewings oftextual caption114bintranscript112, the edited text may be displayed.
In another embodiment, users may be enabled to select a display language for a transcript. In this manner, users that understand various different languages may all be enabled to read textual captions of a displayed transcript. For instance,FIG. 12 shows astep1202 for enabling a user to select a language of a transcript of a video, according to an example embodiment. In an embodiment,step1202 may be performed bytranscript display module408.
Instep1202, a user interface element is generated that enables a user to select a language of a plurality of languages for text of the transcript to be displayed in the transcript display region. In embodiments, transcript display module408 (e.g., a language selector module of transcript display module408) may generate any suitable type of user interface element described elsewhere herein or otherwise known to enable a language to be selected from a list of languages fortranscript112. For instance, as shown inFIG. 8,transcript display module408 may generate auser interface element820 that is a pull down menu. A user may interact withuser interface element820 by clicking onuser interface element820 with a mouse pointer (or in other manner), which causes a pull down list of languages from which the user can select (by mouse pointer) a language in which the text oftranscript112 shall be displayed. For instance, the user may be enabled to select English, Spanish, French, German, Chinese, Japanese, etc., as a display language fortranscript112.
As such,transcript112 may be stored in a media object in the form of one or multiple languages. Each language version fortranscript112 may be generated by manual or automatic translation. Furthermore, in embodiments, textual edits may be separately received for each language version of transcript112 (using caption editor608), or may be received for one language version oftranscript112, and automatically translated to the other language versions oftranscript112.
In another embodiment, a user may be enabled to share a video and the related search information that the user generated by interacting withsearch interface108. In this manner, users may be provided with information regarding searches performed on video content by other users in a quick and easy fashion.
For instance, in an embodiment, as shown inFIG. 8,video display region104 may display a “share”button816 or other user interface element. When a first user interacts withshare button816,media player406 may generate a link (e.g., a uniform resource locator (URL)) that may be provided to other users by email, text message (e.g., by a tweet), instant message, or other communication medium, as designated by the user (e.g., by providing email addresses, etc.). The generated link include a link/address forvideo110, may include a timestamp for a current play time ofvideo110, and may include search terms and/or other search criteria used by the first user, to be automatically applied tovideo110 when a user clicks on the link. When a second user clicks on the link (e.g., on a web page, in an email, etc.),video110 may be displayed (e.g., in a user interface similar to user interface102), and may be automatically forwarded to the play time indicated by the timestamp included in the link. Furthermore,transcript112 may be displayed, with the textual captions oftranscript112 highlighted (as described above) to indicate the search results for the search criteria (e.g., highlighting textual captions that include search terms) applied by the first user.
In further embodiments, additional and/or alternative user interface elements may be present to enable functions to be performed with respect tovideo110,transcript112, andsearch interface108. For instance, a user interface element may be present that may be interacted with to automatically generate a “remixed” version ofvideo110. The remixed version ofvideo110 may be a shorter version ofvideo110 that includes portions ofvideo110 andtranscript112 centered around the search results. For instance, the shorter version ofvideo110 may include the portions ofvideo110 andtranscript112 that include the textual captions determined to include search terms.
Furthermore, in embodiments,transcript display module408 may be configured to automatically add links to text intranscript112. For instance,transcript display module408 may include a map that relates links to particular text, may parsetranscript112 for the particular text, and may apply links (e.g., displayed intranscript display region106 as a clickable hyperlinks) to the particular text. In this manner, users that viewtranscript112 may click on links intranscript112 to be able to view further information that is not included invideo110, but that may enhance the experience of the user. For instance, if speech invideo110 discusses a particular website or other content (e.g., another video, a snippet of computer code, etc.), a link to the content may be shown on the particular text intranscript112, and the user may be enabled to click on the link to be navigated to the content. Links to help sites and other content may also be provided.
In further embodiments, a group of textual captions may be tagged with metadata to indicate the group of textual captions as a “chapter” to provide increase relevancy for search in textual captions.
One or more videos related tovideo110 may be determined bysearch module602, and may be displayed adjacent to video110 (e.g., by title, as thumbnails, etc.). For instance,search module602 may search a library of videos according to the criteria that the user applied tovideo110 for one or more videos that are most relevant to the search criteria, and may display these most relevant videos. Furthermore, other content than videos (e.g., web pages, etc.) that is related tovideo110 may be determined bysearch module602, and may be displayed adjacent tovideo110, in a similar fashion. For instance,search module602 may include a search engine to which the search terms are applied as search keywords, or may apply the search terms to a remote search engine, to determine the related content.
Still further, the search terms input by users to searchinterface108 may be collected, analyzed, and compared with those of other users to provide enhancements. For instance, content hotspots may be determined by analyzing search terms, and these content hotspots may be used to drive additional related content with higher relevance, to select advertisements for display inuser interface102, and/or may be used for further enhancements.
In another embodiment,caption editor608 may enable a user to annotate one or more textual captions. For instance, in a similar manner as described above with respect to editing textual captions,caption editor608 may enable a user to add text as metadata to a textual caption as a textual annotation. When the textual caption is shown intranscript display region106 bytranscript display module408, the textual annotation may be shown associated with the textual caption in transcript display region106 (e.g., may be displayed next to or below the textual caption, may become visible if a user interacts with the textual caption, etc.).
III Example Computing Device EmbodimentsTranscript generator202, video searchingmedia player system314, video searchingmedia player system402,media player406,transcript display module408,search interface module410,search module602, captionplay time indicator604,caption location indicator606,caption editor608,flowchart500,flowchart700,flowchart900,flowchart1000,step1102, andstep1202 may be implemented in hardware, or hardware and any combination of software and/or firmware. For example,transcript generator202, video searchingmedia player system314, video searchingmedia player system402,media player406,transcript display module408,search interface module410,search module602, captionplay time indicator604,caption location indicator606,caption editor608,flowchart500,flowchart700,flowchart900,flowchart1000,step1102, and/orstep1202 may be implemented as computer program code configured to be executed in one or more processors and stored in a computer readable storage medium. Alternatively,transcript generator202, video searchingmedia player system314, video searchingmedia player system402,media player406,transcript display module408,search interface module410,search module602, captionplay time indicator604,caption location indicator606,caption editor608,flowchart500,flowchart700,flowchart900,flowchart1000,step1102, and/orstep1202 may be implemented as hardware logic/electrical circuitry.
For instance, in an embodiment, one or more oftranscript generator202, video searchingmedia player system314, video searchingmedia player system402,media player406,transcript display module408,search interface module410,search module602, captionplay time indicator604,caption location indicator606,caption editor608,flowchart500,flowchart700,flowchart900,flowchart1000,step1102, and/orstep1202 may be implemented together in a system-on-chip (SoC). The SoC may include an integrated circuit chip that includes one or more of a processor (e.g., a microcontroller, microprocessor, digital signal processor (DSP), etc.), memory, one or more communication interfaces, and/or further circuits and/or embedded firmware to perform its functions.
FIG. 13 depicts an exemplary implementation of acomputer1300 in which embodiments of the present invention may be implemented. For example,transcript generation system200,computing device302,content server304, andcomputing device400 may each be implemented in one or more computer systems similar tocomputer1300, including one or more features ofcomputer1300 and/or alternative features.Computer1300 may be a general-purpose computing device in the form of a conventional personal computer, a mobile computer, a server, or a workstation, for example, orcomputer1300 may be a special purpose computing device. The description ofcomputer1300 provided herein is provided for purposes of illustration, and is not intended to be limiting. Embodiments of the present invention may be implemented in further types of computer systems, as would be known to persons skilled in the relevant art(s).
As shown inFIG. 13,computer1300 includes one ormore processors1302, asystem memory1304, and abus1306 that couples various system components includingsystem memory1304 toprocessor1302.Bus1306 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures.System memory1304 includes read only memory (ROM)1308 and random access memory (RAM)1310. A basic input/output system1312 (BIOS) is stored inROM1308.
Computer1300 also has one or more of the following drives: ahard disk drive1314 for reading from and writing to a hard disk, amagnetic disk drive1316 for reading from or writing to a removablemagnetic disk1318, and anoptical disk drive1320 for reading from or writing to a removableoptical disk1322 such as a CD ROM, DVD ROM, or other optical media.Hard disk drive1314,magnetic disk drive1316, andoptical disk drive1320 are connected tobus1306 by a harddisk drive interface1324, a magneticdisk drive interface1326, and anoptical drive interface1328, respectively. The drives and their associated computer-readable media provide nonvolatile storage of computer-readable instructions, data structures, program modules and other data for the computer. Although a hard disk, a removable magnetic disk and a removable optical disk are described, other types of computer-readable storage media can be used to store data, such as flash memory cards, digital video disks, random access memories (RAMs), read only memories (ROM), and the like.
A number of program modules may be stored on the hard disk, magnetic disk, optical disk, ROM, or RAM. These programs include anoperating system1330, one ormore application programs1332,other program modules1334, andprogram data1336.Application programs1332 orprogram modules1334 may include, for example, computer program logic (e.g., computer program code or instructions) for implementingtranscript generator202, video searchingmedia player system314, video searchingmedia player system402,media player406,transcript display module408,search interface module410,search module602, captionplay time indicator604,caption location indicator606,caption editor608,flowchart500,flowchart700,flowchart900,flowchart1000,step1102, and/or step1202 (including any step offlowcharts500,700,900, and1000), and/or further embodiments described herein.
A user may enter commands and information into thecomputer1300 through input devices such askeyboard1338 andpointing device1340. Other input devices (not shown) may include a microphone, joystick, game pad, satellite dish, scanner, a touch screen and/or touch pad, a voice recognition system to receive voice input, a gesture recognition system to receive gesture input, or the like. These and other input devices are often connected toprocessor1302 through aserial port interface1342 that is coupled tobus1306, but may be connected by other interfaces, such as a parallel port, game port, or a universal serial bus (USB).
Adisplay device1344 is also connected tobus1306 via an interface, such as avideo adapter1346. In addition to the monitor,computer1300 may include other peripheral output devices (not shown) such as speakers and printers.
Computer1300 is connected to a network1348 (e.g., the Internet) through an adaptor ornetwork interface1350, amodem1352, or other means for establishing communications over the network.Modem1352, which may be internal or external, may be connected tobus1306 viaserial port interface1342, as shown inFIG. 13, or may be connected tobus1306 using another interface type, including a parallel interface.
As used herein, the terms “computer program medium,” “computer-readable medium,” and “computer-readable storage medium” are used to generally refer to media such as the hard disk associated withhard disk drive1314, removablemagnetic disk1318, removableoptical disk1322, as well as other media such as flash memory cards, digital video disks, random access memories (RAMs), read only memories (ROM), and the like. Such computer-readable storage media are distinguished from and non-overlapping with communication media (do not include communication media). Communication media typically embodies computer-readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wireless media such as acoustic, RF, infrared and other wireless media. Embodiments are also directed to such communication media.
As noted above, computer programs and modules (includingapplication programs1332 and other program modules1334) may be stored on the hard disk, magnetic disk, optical disk, ROM, or RAM. Such computer programs may also be received vianetwork interface1350,serial port interface1342, or any other interface type. Such computer programs, when executed or loaded by an application, enablecomputer1300 to implement features of embodiments of the present invention discussed herein. Accordingly, such computer programs represent controllers of thecomputer1300.
The invention is also directed to computer program products comprising software stored on any computer useable medium. Such software, when executed in one or more data processing devices, causes a data processing device(s) to operate as described herein. Embodiments of the present invention employ any computer-useable or computer-readable medium, known now or in the future. Examples of computer-readable mediums include, but are not limited to storage devices such as RAM, hard drives, floppy disks, CD ROMs, DVD ROMs, zip disks, tapes, magnetic storage devices, optical storage devices, MEMs, nanotechnology-based storage devices, and the like.
VI. ConclusionWhile various embodiments of the present invention have been described above, it should be understood that they have been presented by way of example only, and not limitation. It will be understood by those skilled in the relevant art(s) that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined in the appended claims. Accordingly, the breadth and scope of the present invention should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.