US20110022387A1

Movatterモバイル変換

Info

Publication number: US20110022387A1
Application number: US12/746,352
Authority: US
Inventors: Paul M. Hager
Original assignee: Vovision LLC
Current assignee: III Holdings 1 LLC
Priority date: 2007-12-04
Filing date: 2008-12-04
Publication date: 2011-01-27
Also published as: US20140136199A1; US9715876B2; WO2009073768A1

Abstract

Methods and systems for requesting a transcription of audio data. One method includes displaying a send-for-transcription button within an email-client interface on a computer-controlled display, and automatically sending a selected email message and associated audio data to a transcription server as a request for a transcription of the associated audio data when a user selects the send-for-transcription button.

Description

RELATED APPLICATIONS

The present application is a continuation-in-part of International Application PCT/US2007/066791 filed on Apr. 17, 2007, which claims priority to U.S.Provisional Application 60/792,640 filed on Apr. 17, 2006, the entire contents of which are both hereby incorporated by reference. The present application also claims priority to U.S.Provisional Application 60/992,187 filed on Dec. 4, 2007; U.S. Provisional Application 61/005,456 filed on Dec. 4, 2007; and U.S. Provisional Application 61/076,054 filed on Jun. 26, 2008, the entire contents of which are all hereby incorporated by reference.

BACKGROUND OF THE INVENTION

Each day individuals and companies receive multiple audio messages. These audio messages can include personal greetings and information or business-related instructions and information. In either case, it may be useful or required that the audio messages be transcribed in order to create written records of the messages.

Software currently exists that generates written text based on audio data. For example, Nuance Communications, Inc. provides a number of software programs, trademarked “Dragon,” that take audio files in .WAV format, .MP3 format, or other audio formats and translate such files into text files. The Dragon software also provides mechanisms for comparing audio files to text files in order to “learn” and improve future transcriptions. The “learning” mechanism included in the Dragon software, however, is only intended to learn based on a voice-dependent model, which means that the same person trains the software program over time. In addition, learning mechanisms in existing transcription software are often non-continuous and include set training parameters that limit the amount of training that is performed.

SUMMARY OF THE INVENTION

Embodiments of the present invention provide methods and systems for correcting transcribed text. One method includes a user sending one or more emails to a transcription server that include audio data via an email-client interface. The emails may be sent from one or more data sources running email-clients and include audio data to be transcribed. The audio data is transcribed based on a voice model to generate text data. The method also includes making the text data available to the user over at least one computer network and receiving corrected text data over the at least one computer network from the user. In addition, the method includes modifying the voice model based on the corrected text data.

Embodiments of the present invention also provide systems for correcting transcribed text. One system includes a transcription server, at least one translation server, an email-client correction interface, and at least one training server. The transcription server receives audio data from one or more audio data sources and the translation server can transcribe the audio data based on a voice model to generate text data. The email-client correction interface is accessible by a user from within an email-client and provides the user with access to the text data. The transcription server also receives corrected text data from the plurality of users. The training server then modifies the voice model based on the corrected text data.

Additional embodiments of the invention also provide methods of performing audio data transcription. One method includes obtaining audio data from at least one audio data source, such as a voice over IP system or a voicemail system, transcribing the audio data based on a voice-independent model to generate text data, and sending the text data to an owner of the audio data as an email message.

Embodiments of the invention also provide a method of requesting a transcription of audio data. The method includes displaying a send-for-transcription button within an email-client interface on a computer-controlled display, and automatically sending a selected email message and associated audio data to a transcription server as a request for a transcription of the associated audio data when a user selects the send-for-transcription button.

Further embodiments of the invention provide a system for requesting a transcription of audio data. The system includes a transcription server and an email-client interface. The email-client interface displays at least one email message associated with audio data to a user, displays a send-for-transcription button to the user, receives a selection of the at least one email message from the user, receives a selection of the send-for-transcription button from the user, and automatically sends the at least one email message and associated audio data to the transcription server as a request for a transcription of the associated audio data in response to the user's selection of the send-for-transcription button.

Additional embodiments of the invention also provide a system for generating a transcription of audio data. The system includes a transcription server and a translation server. The transcription server is configured to receive at least one email message and associated audio data from an email-client, identify an account based on the at least one email message, and obtain stored account settings associated with the identified account. The translation server is configured to generate a transcription of the associated audio data based on the account settings and a voice-independent model.

BRIEF DESCRIPTION OF THE DRAWINGS

In the drawings:

FIGS. 1 and 2 schematically illustrate systems for transcribing audio data according to various embodiments of the invention.

FIG. 3 illustrates an email-client interface according to an embodiment of the invention.

FIG. 4 illustrates a process for transcribing audio data using the email-client interface according to an embodiment of the invention.

FIG. 5 illustrates the transcription server ofFIGS. 1 and 2 according to an embodiment of the invention.

FIG. 6 illustrates a file transcription, correction, and training method according to an embodiment of the invention.

FIG. 7 illustrates another file transcription, correction, and training method according to an embodiment of the invention.

FIG. 8 illustrates a correction method according to an embodiment of the invention.

FIGS. 9-10 illustrate a correction notification according to an embodiment of the invention.

FIGS. 11-14 illustrate an email-client correction interface according to an embodiment of the invention.

FIG. 15 illustrates a message notification according to an embodiment of the invention.

DETAILED DESCRIPTION

Before any embodiments of the invention are explained in detail, it is to be understood that the invention is not limited in its application to the details of construction and the arrangement of components set forth in the following description or illustrated in the following drawings. The invention is capable of other embodiments and of being practiced or of being carried out in various ways.

In addition, it should be understood that embodiments of the invention include hardware, software, and electronic components or modules that, for purposes of discussion, may be illustrated and described as if the majority of the components were implemented solely in hardware. However, based on a reading of this detailed description, one of ordinary skill in the art would recognize that, in at least one embodiment, the electronic based aspects of the invention may be implemented in software. As such, it should be noted that a plurality of hardware and software based devices, as well as a plurality of different structural components, may be utilized to implement the invention. Furthermore, and as described in subsequent paragraphs, the specific configurations illustrated in the drawings are intended to exemplify embodiments of the invention. Other alternative configurations are possible.

FIG. 1 illustrates atranscription system10 for transcribing audio data according to an embodiment of the invention. As shown inFIG. 1, thesystem10 includes atranscription server20, a data source running an email-client30, and athird party device40. Thetranscription server20 includes, among other things, avoice file directory52, aqueue server54, and atranslation server56. The transcription server is described in more detail below. The data source email-client30 and thethird party device40 can be connected to thetranscription server20 via awide area network50 such as a cellular network or the Internet.

Information flow through thesystem10 begins in the data source email-client30. The data source email-client30 can include a stand-alone email-client, such as Outlook manufactured by Microsoft™ or Lotus Notes manufactured by IBM™. In other embodiments, the data source email-client30 can include a browser-based email-client, such as Hotmail, Gmail, Yahoo, AOL, etc. As described below, in addition to providing standard emailing operations, the data source email-client30 can provide one or more email-client interfaces (e.g., via one or more plug-ins or additional software modules installed and used as part of the email-client30) that allow a user to request, view, manage, and correct transcribed text data.

A user sends information from the data source email-client30 through the wide area network50 (e.g. a cellular network, the Internet, etc.) to thetranscription server20. Thetranscription server20 places the information in thevoice file directory52 related to an account for the user that sent the information. The information to be transcribed is placed in thequeue server54 before being routed to thetranslation server56 to be transcribed. After the information has been transcribed, it is sent back through thewide area network50 and may, optionally, be sent to athird party device40 for correction. In some embodiments, if the information is not sent to athird party device40 for correction or if thethird party device40 has finished correcting the transcription, the information is sent back to the data source email-client30.

FIG. 2 illustrates an exemplary embodiment of thenetwork10 fromFIG. 1. Thetranscription server20 can include or can be connected to anemail server20athat receives email messages from aclient computer30aor other devices running email-clients, such as a personal digital assistant (“PDA”)30b, aBlackberry device30c, or amobile phone30d. In other embodiments, additional devices that support email-clients may also by used. Thesystem10 also includes athird party device40. Thethird party device40 can receive messages including transcribed text to be corrected or checked before the text is sent back to the user. As described below, in some embodiments, thethird party device40 provides one or more email-client interfaces for viewing and correcting transcribed text.

FIG. 3 illustrates an embodiment of an email-client interface60. The email-client interface60 allows a user to interact with thetranscription server20 fromFIGS. 1 and 2. In some embodiments, the email-client interface60 is provided through an email-client, such as the data source email-client30. The email-client can include a stand-alone email-client, such as Outlook manufactured by Microsoft™ or Lotus Notes manufactured by IBM™. In other embodiments, the email-client can include a browser-based email-client, such as Hotmail, Gmail, Yahoo, AOL, etc. In some embodiments, the email-client interface60 is provided by a plug-in or additional software module that is installed and used with the email-client, which allows a user to access and manage transcribed text from within a standard email-client and without having to launch and access a separate interface for managing transcribed text.

As shown inFIG. 3, the email-client interface60 includes asend button62, aquick play button64, asearch field66, and anoptions button68. Thesend button60 allows the user to send one or more selected email messages that include audio data to thetranscription server20. Thesearch field66 allows a user to search messages that have already been sent to thetranscription server20. As a result, thesearch field66 allows a user to access information within thetranscription system10 without having to access a web interface. Thequick play button64 allows the user to play audio data related to a message that has already been sent to thetranscription server20. Theoptions button68 allows a user to modify features related to the email-client interface60 and an email-client correction interface described below. In some embodiments, theoptions button68 allows a user to modify account settings related to delivery settings, transcription settings, format settings, and the like. In other embodiments, the email-client interface60 includes additional buttons and functionality.

In conjunction with the email-client interface60, the email-client correction interface is also accessed from within an email-client, such as the data source email-client30 or an email-client executed by thethird party device40. In some embodiments, the email-client correction interface is also is provided by a plug-in or additional software module that is installed and used with the email-client. The email-client correction interface can be part of the same plug-in providing the email-client interface60.

The email-client correction interface allows a user to access a web-based correction interface from within an email-client, eliminating the need to launch a separate web browsing application or interface. Aspects of the email-client correction interface include, among other things, the ability to view and correct transcriptions of audio data, monitor the transcription status of audio data sent to the transcription server, and modify account settings. The email-client correction interface is described in greater detail below with respect toFIGS. 11-14.

FIG. 4 illustrates aprocess70 for using the email-client interface60 to send messages including audio data through thetranscription system10. The user selects one or more email messages including audio data to be transcribed (step72). In some embodiments, the selected email message include attached audio data representing voice mail messages. Selecting the email messages may include highlighting the messages, opening individual messages, or any other acceptable selection techniques. Afterstep72, the user selects thesend button62 from the email-client interface60 to forward the selected email messages to the transcription server20 (step74). Additionally or alternatively, the user can reply to a message from thetranscription server20, make changes or corrections to the transcribed text, and send the message back to thetranscription server20, as described below.

When the messages arrive at thetranscription server20, identifying information is taken from the email messages to identify a user account (step76). In some embodiments the identifying information is metadata taken from the email message. The metadata may include, among other things, information such as a sender's email address and IP address. In other embodiments, identifying information is included in the body of the email message and extracted to identify a user account. After the account is identified, the message is sent to avoice file directory52 related to that account (step78). Account settings, such as, for example, destination information and formatting information, may be modified for each account. The account settings can be modified or accessed through a system interface, such as the email-client correction interface.

The messages stored in thevoice file directory52 awaiting transcription are polled into a queue server54 (step80). Thequeue server54 holds the messages until atranslation server56 becomes available. When atranslation server56 becomes available, thequeue server54 routes the messages to the available translation server56 (step82). The messages enter thetranslation server56 and the audio data associated with the message is transcribed (step84). As described below, the transcription server can also receive messages with corrected transcribed text. If thetranscription server20 receives a message including corrected transcribed text, thetranscription server20 compares the original transcribed text with the user-corrected transcribed text. After thetranscription server20 has compared the original and the user-corrected text, a message including the user-corrected text or the differences between the original text and the user-corrected text is sent to a training queue to update the voice model, as described below.

After the audio data has been transcribed, the transcribed text may be sent to a third party for correction or may be sent directly to one or more destinations specified in the user's account settings (step86). As described above, the transcribed text can be sent to a destination in an email message (e.g., embedded or as an attached file). In some embodiments, if the transcribed text is not sent to a third party, it is sent directly to the training queue to update the voice model (step90). If the transcribed text is sent to a third party for correction, the third party will correct the transcription using, for example, the email-client correction interface described below (step88). Afterstep88, the transcribed and/or corrected text is sent to the training queue to update the voice model (step90). The transcribed text is then sent back to the user (step92). A more detailed description of thetranscription server20 is provided below.

Once thetranscription server20 or separate polling computer receives one or more messages (received by request or otherwise), thetranscription server20 or separate polling computer places the messages and/or the associated audio data to be transcribed into one ormore queue servers54. Thequeue servers54 look for an open or available processor ortranslation server56. As shown inFIG. 5, thetranscription server20 includesmultiple translation servers56, although a different number of translation servers56 (e.g., physical or virtual) are possible. Upon identifying anavailable translation server56, thequeue servers54 route audio data to theavailable translation server56. Thetranslation server56 transcribes the audio data to generate text data and, in some embodiments, indexes the message. Thetranslation servers56 index the messages using a database to identify discrete words. For example, thetranslation server56 can use an extensible markup language (“XML”), structured query language (“SQL”), mySQL, idx, or other database language to identify discrete words or phrases within the transcribed text.

In addition to transcribing audio data included in messages as just described, some embodiments of atranslation server56 generate an index of keywords based upon the transcribed text. For example, in some embodiments, thetranslation server56 removes those words that are less commonly searched and/or less useful for searching (e.g., I, the, a, an, but, and the like) from transcribed text, which leaves a number of keywords that can be stored in memory available to thetranslation servers56. The resulting “keyword index” includes the exact positions of each keyword in the transcribed text, and, in some cases, includes the exact location of each keyword in the corresponding audio data. This keyword index enables users to perform searches on transcribed text. For example, a user accessing the transcribed text associated with particular audio data (whether for purposes of correcting any errors in the transcribed text or for searching within the transcribed text) can select one or more words from the keyword index of the message generated earlier. In so doing, the exact locations (e.g., page and/or line numbers) of such words can be provided quickly and efficiently—in many cases significantly faster and with less processing power than performing a standard search for the word through the entire transcribed text. Thesystem10 can provide the keyword index to a user in any suitable manner, such as in a pop-up or pull-down menu included in an interface of thesystem10, such as the email-client correction interface, during text correction or searching of transcribed text (described below).

Also, in some embodiments, atranslation server56 generates two or more possible candidates for a transcription of a spoken word or phrase from audio data. The most likely candidate is displayed or otherwise used to generate the transcribed text, and the less likely candidate(s) are saved in a memory accessible by thetranslation server56 and/or by another server orthird party device40 as needed. This capability can be useful, for example, during correction of the transcribed text (described below). In particular, if a word in the transcribed text is wrong, a user can obtain other candidate(s) identified by thetranslation server56 during transcription, which can speed up and/or simplify the correction process.

Once audio data is transcribed, thesystem10 can allow a user to search transcribed text for particular words and/or phrases. This searching capability can be used during correction of transcribed text as described below or when a transcribed text file is searched for particular words (whether a search for such words is performed on the file alone or in combination with one or more other files). For example, using the indexed message, a user viewing generated text data can select a word or phrase included in the text data and, in some embodiments, can hear the corresponding portion of the audio data from which the text data was generated. In some embodiments, thesystem10 is adapted to enable a user to search some or all transcribed text files accessible by thetranscription server20, regardless of whether such files have been corrected. Also, thesystem10 can enable a user to search transcribed text using Boolean and/or other search terms.

Search results can be generated in a number of manners, such as in a table form enabling a user to select one or more files in which a word or phrase has been found and/or one or more locations at which a word or phrase has been found in particular text data. The search results can also be sorted in one or more manners according to one or more rules (e.g., date, relevance, number of instances in which the word or phrase has been found in text data, and the like) and can be printed, displayed, or exported as desired. In some embodiments, the search results also provide the text around the found word or phrase. The search results can also include additional information, such as the number of instances in which a word or phrase has been found in a transcribed text file and/or the number of transcribed text files in which a word or phrase has been found.

After thetranslation servers56 index and translate audio data, the audio data and/or the generated text data is stored. The audio data and text data can be stored internally by thetranscription server20 or can be stored externally to one or more data storage devices (e.g., databases, servers, and the like). In some embodiments, a user (e.g., a user associated with a particular audio data source email-client30) decides how long audio data and/or text data is stored by thetranscription server20, after which time the audio data and/or text data can be automatically deleted, over-written, or stored in another storage device (e.g., a relatively low-accessibility mass storage device). An interface of the system10 (e.g., the email-client correction interface) enables a user to specify a time limit for audio data and/or text data stored by thetranscription server20.

As shown inFIGS. 1 and 2, a data source email-client30 connects to thetranscription server20 over a network, such as the Internet, one or more local or wide-area networks50, or the like, in order to obtain audio data and/or corresponding, generated text data. A user uses the data source email-client30 to access the email-client correction interface associated withtranscription server20 to obtain generated text data and/or corresponding audio data. For example, using the email-client interface correction, the user can request particular audio data and/or the corresponding text data. The requested data is obtained from thetranscription server20 and/or a separate data storage device and is transmitted to the user for display via the interface.

Thetranscription server20 sends audio data and/or corresponding generated text data to the user as an email message. Thetranscription server20 can send an email message to a user that includes the audio data and the text data as attached files. In other embodiments, thetranscription server20 sends an email message to a user that includes a notification that audio data and/or text data is available for the user. A user uses the email-client correction interface in order to listen to the audio data, view the text data, and/or to correct the text data. As described above, in some embodiments, a user can reply to the email message sent from thetranscription server20, correct the transcription, and send the corrected transcription back to thetranscription server20. The transcription server then updates the voice model based on a comparison of the original transcribed text and the user-corrected transcribed text. If the user replies directly to the transcription server, the user does not need to access the email-client correction interface, web interface, or other interfaces of thesystem10.

In other embodiments, the user can choose to correct only parts of transcribed text. If the user corrects only a portion of the transcribed text, the email-client (e.g., the email-client correction interface) recognizes that only a portion of the text has changed and transmits only the corrected portion of the text to thetranscription server20 for use in training the voice model. By submitting only the corrected or changed portion of the transcribed text, the amount of data transmitted to thetranscription server20 for processing is reduced. In other embodiments, another email-client interface, a web-based interface, thetranscription server20, or another device included in thesystem10 can determine what portions of transcribed text have been changed and can limit transmission and/or processing of the changed text accordingly.

If a user forwards or sends an email message to thetranscription server20 that includes audio data, thetranscription server20 can send a return email message to the user after thetranscription server20 transcribes the submitted audio file. The email message can inform the user that the submitted audio data was transcribed and that corresponding text data is available. As previously noted, the email message from thetranscription server20 can include the submitted audio data and/or the generated text data.

Thesystem10 can also enable a user to provide destination settings for audio data and/or text data on a per-generated-text-data basis. In some embodiments, before or after audio data is transcribed, a user specifies a particular destination for the text data. As described above, certain implementations allow a user to specify destination settings in an email message. For example, if the user sends an email message to thetranscription server20 that includes audio data, the user can specify destination information in the email message. After the audio message is transcribed and the generated text data is corrected (if applicable), thetranscription server20 sends an email message to the identified recipient (e.g., via a SMTP server).

In some embodiments, to protect the privacy and security of the audio and text data, thetranscription server20 transmits data (e.g., audio data and/or text data) to thethird party device40 or another destination device using file transfer protocol (“FTP”). The transmitted data can also be protected by a secure socket layer (“SSL”) mechanism (e.g., a bank level certificate).

In one embodiment,system10 includes an email-client correction interface and astreaming translation server102 that a user accesses (e.g., via the data source email-client30) to view generated text. As described below with respect toFIG. 11, in some embodiments, the email-client correction interface and thestreaming translation server102 also enable a user to stream the entire audio data corresponding to the generated text data and/or to stream any desired portion of the audio data corresponding to selected text data. For example, the email-client correction interface and thestreaming translation server102 enable a user to select (e.g., click-on, highlight, mouse over, etc.) a portion of the text in order to hear the corresponding audio data. In addition, in some embodiments, the email-client correction interface and thestreaming translation server102 enable a user to specify a number of seconds that the user desires to hear before and/or after a selected portion of text data.

The email-client correction interface also enables a user to correct generated text data. For example, if a user listens to audio data and determines that a portion of the corresponding generated text data is incorrect, the user can correct the generated text data via the email-client correction interface. In some embodiments, the email-client correction interface automatically identifies potentially incorrect portions of generated text data by displaying potentially incorrect portions of the generated text data in a particular color or other format (e.g., via a different font, highlighting in bold, italics, underline, or any other manner). The email-client correction interface also displays portions of the generated text in various colors or other formats depending on the confidence that the portion of the generated text is correct. The email-client correction interface also inserts a placeholder (e.g., an image, an icon, etc.) into text that marks portions of the generated text where text is missing (i.e., thetranscription server20 could not generate text based on the audio data). A user selects the placeholder in order to hear the audio data corresponding to the missing text and can insert the missing text accordingly.

In order to assist a user in correcting generated text data, some embodiments of the email-client correction interface automatically generate words similar to incorrectly-generated words. In this regard, a user selects a word (e.g., by highlighting, clicking, or by any other suitable manner) within generated text data that is or appears to be incorrect. Upon such selection, the email-client correction interface suggests similar words, such as in a pop-up menu, pull-down menu, or in any other format. The user selects a word or words from the list of suggested words in order to make a desired correction.

In some embodiments, the translation server(s)56 are configured to automatically determine speakers in an audio file. For example, thetranslation server56 processes audio files for drastic changes in voice or audio patterns. Thetranslation server56 then analyzes the patterns in order to identify the number of individuals or sources speaking in an audio file. In other embodiments, a user or information associated with the audio file (e.g., information included in the email message containing the audio data, or stored in a separate text file associated with the audio data) identifies the number of speakers in an audio file before the audio file is transcribed. For example, a user uses an interface of the system10 (e.g., the email-client correction interface) to specify the number of speakers in an audio file before or after the audio file is transcribed.

After identifying the number of speakers in an audio file, the translation server(s)56 can generate a speaker list that marks the number of speakers and/or the times in the audio file where each speaker speaks. The translation server(s)56 can use the speaker list when creating or formatting the corresponding text data to provide markers or identifiers of the speakers (e.g.,Speaker1,Speaker2, etc.) within the generated text data. In some embodiments, a user can update the speaker list in order to change the number of speakers included in an audio file, change the identifier of the speakers (e.g., to the names of the speakers), and/or specify that two or more speakers identified by the translation server(s)56 relate to a single speaker or audio source. Also, in some embodiments, a user can use an interface of the system10 (e.g., the email-client correction interface) to modify the speaker list or to upload a new speaker list. For example, a user can change the identifiers of the speakers by updating a field of the email-client correction interface that identifies a particular speaker. For example, each speaker identifier displayed within generated text data can be placed in a user-editable field. In some embodiments, changing an identifier of a speaker in one field automatically changes the identifier for the speaker throughout the generated text data.

In some embodiments, thesystem10 also formats transcribed text data based on one or more templates, such as templates adapted for particular users or businesses (e.g., medical, legal, engineering, or other fields). For example, after generating text data, the system10 (e.g., the translation server(s)56) compares the text data with one or more templates. If the format or structure of the text data corresponds to the format or structure of a template and/or if the text data includes one or more keywords associated with a template, thesystem10 formats the text data based on the template. For example, if thesystem10 includes a template specifying the following format:

Date:

Type of Illness:

and text data generated by thesystem10 is “the date today is September the 12^th, the year is 2007, the illness is flu,” thesystem10 automatically applies the template to the text data in order to create the following formatted text data:

Date: Sep. 12, 2007

Type of Illness: Flu

In some embodiments, thesystem10 is configured to automatically apply a template to text data if text data corresponds to the template. Therefore, as thesystem10 “learns” and improves its transcription quality, as described below, thesystem10 also “learns” and improves its application of templates. In other embodiments, a user uses an interface of the system10 (e.g., the email-client correction interface) to manually specify a template to be applied to text data. For example, a user can select a template to apply to text data from a drop down menu or other selection mechanism included in the interface.

Thesystem10 can store the formatted text data and can make the formatted text data available for review and correction, as described below. In some embodiments, thesystem10 stores or retains the unformatted text data separately from the formatted text data. By retaining the unformatted text data, the text data can be applied to new or different templates. In addition, thesystem10 can use the unformatted text data to train thesystem10, as described below.

Thesystem10 is configured to allow a user to create a customized template and upload the template to the system. For example, a user uses a word processing application, such as Microsoft® Word®, to create a text file that defines the format and structure of a customized template. The user then uploads the text file to thesystem10 using an interface of the system10 (e.g., the email-client interface60 and/or the email-client correction interface). In some embodiments, thesystem10 reformats uploaded templates. For example, thesystem10 can store predefined templates and/or customized templates in a mark-up language, such as XML or HTML.

Templates can be associated with a particular user or a group of users. For example, only users with certain permission may be allowed to use or apply particular templates. In other embodiments, a user can upload one or more templates that only he or she can use or apply. Settings and restrictions for predefined and/or customized templates can be configured by a user or an administrator using an interface of thesystem10.

In some embodiments, alternatively or in addition to configuring templates, thesystem10 enables a user to configure one or more commands that replace transcribed text with different text. For example, a user configures thesystem10 to insert the current date into text data whenever audio data and/or corresponding text data includes the word “date” or the phrases “today's date,” “current date,” or “insert today's date.” Similarly, in another embodiment,system10 is configured to start a new paragraph within transcribed text data each time audio data and/or corresponding text data includes the word “paragraph,” the phrase “new paragraph,” or a similar identifier. The commands can be defined on a per user basis and/or on a group of users basis, and settings or restrictions for the commands can be set by a user or an administrator using thesystem10.

Some embodiments of thesystem10 also enable a user correcting text data via the email-client correction interface to create commands and/or keyboard shortcuts. In one example, the system is configured so that a user can use the commands and/or keyboard shortcuts to stream audio data, add common words or phrases to text data, play audio data, pause audio data, or start or select objects or functions provided through the email-client correction interface or other interfaces of thesystem10. In some embodiments, a user uses the email-client correction interface to configure the commands and/or keyboard shortcuts. The commands and/or keyboard shortcuts can be stored on a user level and/or a group level. An administrator can also configure commands and/or keyboard shortcuts that can be made available to one user or multiple users. For example, users with particular permissions may be allowed to use particular commands and/or keyboard shortcuts.

In one embodiment, the email-client correction interface reacts to commands spoken by the user. In another embodiment, thesystem10 is configured to permit a user to create commands that when spoken by the user cause the email-client correction interface to perform certain actions. In some embodiments, the user can say “play,” “pause,” “forward,” “backward,” etc. to control the playing of the audio data by the email-client correction interface. Other commands include insert, delete, or edit text in transcribed text data. For example, when user says “date,” the email-client correction interface inserts date information into transcribed text data.

In some embodiments, thesystem10 also performs translations of transcribed text data. For example, the email-client correction interface or another interface of thesystem10 includes features to permit a user to request a translation of transcribed text data into another language. Thetranscription server20 includes one or more language translation modules configured to create text data in a particular language based on generated text data in another language. The system is also configured to process an audio source (e.g., an individual submitting an email message with an attached audio file to the transcription server20) with a request to translate the file to a specific language when an audio file is submitted to thetranscription server20.

With continued reference to the illustrated embodiment ofFIG. 5, corrections made by a user through the email-client correction interface are transmitted to thetranscription server20. As shown inFIG. 5, thetranscription server20 includes atraining server104. Thetraining server104 can use the corrections made by a user to “learn” so that future incorrect translations are avoided. In some embodiments, since audio data is received from one or moreaudio data sources30 representing multiple “speakers,” and since the email-client correction interface can be accessible over a network by multiple users, thetraining server104 receives corrections from multiple users and, therefore, uses a voice independent model to learn from multiple speakers or audio data sources.

The voice independent model developed by thetranscription server20 can be shared and used bymultiple transcription servers20. For example, in some embodiments, the voice independent model developed by atranscription server20 can be copied to or shared withother transcription servers20. The model can be copied toother transcription servers20 based on a predetermined schedule, anytime the model is updated, on a manual basis, etc. In some embodiments, alead transcription server20 collects audio and text data from other transcription servers20 (e.g., audio and text data which has not been applied to a training server) and transfers the data to alead training server104. Thelead transcription server20 can collect the audio and text data during periods of low network or processor usage. Theindividual training servers104 of one ormore transcription servers20 can also take turns processing batches of audio data and copying updated voice models to other transcription servers20 (e.g., in a predetermined sequence or schedule), which can ensure that eachtranscription server20 is using the most up-to-date voice model.

In some embodiments, individuals may be hired to correct transcribed audio files (“correctors”), and the correctors may be paid on a per-line, per-word, per-file, time, or the like basis, and thetranscription server20 can track performance data for the correctors. The performance data can include line counts, usage counts, word counts, etc. for individual correctors and/or groups of correctors. In some embodiments, thetranscription server20 enables a user (e.g., an administrator) to access the performance data via an interface of the system10 (e.g., an email-client correction interface or a website). The user can use the interface to input personal information associated with the performance data, such as the correctors' names, employee numbers, etc. In some embodiments, the user can also use the interface to initiate and/or specify payments to be made to the correctors. The performance data (and any related information provided by a user, such as an administrator) can be stored in a database and/or can be exported to an external accounting system, such as accounting systems and solutions provided by Paychex, Inc. or QuickBooks® provided by Intuit, Inc. Thetranscription server20 can send the performance data to an external accounting system via a direct connection or an indirect connection, such as the Internet. Thetranscription server20 can also generate a file that can be stored to a portable data storage medium (e.g., a compact disk, a jump drive, etc.). The file can then be uploaded to an external accounting system from the portable data storage medium. An external account system can use the performance data to pay the correctors, generate financial documents, etc.

In some embodiments, a user may not desire or need transcribed text data to be corrected. For example, a user may not want text data that is substantially accurate to be corrected. In these situations, thesystem10 can allow a user to designate an accuracy threshold, and thesystem10 can apply the threshold to determine whether text data should be corrected. For example, if generated text data has a percentage or other measurement of accurate words (as determined by the transcription server20) that is equal to or greater than the accuracy threshold specified by the user, thesystem10 can allow the text data to skip the correction process (and the associated training or learning process). Thesystem10 can deliver any generated text data that skips the correction process directly to its destination (e.g., directly sent to a user via an email message, directly stored to a database, etc.). In some embodiments, the accuracy threshold can be set by a user using any described interface of thesystem10. The threshold can be applied to all text data or only to particular text data (e.g., only text data generated based on audio data received from a particular audio source, only text data that is associated with a particular destination, etc.).

FIG. 6 illustrates an exemplary transcription, correction, and training method or process performed by thesystem10. The transcription, correction, and training process of thesystem10 can be a continual process by which files enter thesystem10 and are moved through the series of steps shown inFIG. 6. As shown inFIG. 6 (also with reference toFIGS. 1-3), thetranscription server20 receivesaudio data100 from one or more data source email-clients30. Next, thetranscription server20 places theaudio data100 into one or more queues54 (step120). Once a translation server orprocessor56 is available, theaudio data100 is transmitted from aqueue54 to atranslation server56. Thetranslation server56 transcribes the audio data to generate text data, and indexes the audio data (step122).

After the audio data is indexed and transcribed, the audio data and/or generated text data is made available to a user for review and/or correction via the email-client correction interface (step124). If the text data needs to be corrected (step126), the user makes the corrections and submits the corrections to thetraining server104 of the transcription server20 (step128). The corrections are placed in a training queue and are prepared for archiving (step130). Periodically, thetraining server104 obtains all the corrected files from the training queue and begins a training cycle for an independent voice model (step132). In other embodiments, thetraining server104 obtains such corrected files immediately, rather than periodically. Thetraining server104 can be a server that is separate from thetranscription server20, and can update thetranscription server20 and/or other servers on a continuous or periodic basis. In other embodiments, thetraining server104,transcription server20, and any other servers associated with thesystem10 are defined by the same computer. It should be understood that, as used herein and in the appended claims, the terms “server,” “queue,” “module”, etc. are intended to encompass hardware and/or software adapted to perform a particular function.

Any portion or all of the transcription, correction, and training process performed by thesystem10 can be performed by one or more polling managers (e.g., associated with thetranscription server20, thetraining server104, or other servers). In some embodiments, thetranscription server20 and/or thetraining server104 utilizes one or more “flags” to indicate a stage of a file. By way of example only, these flags can include: (1) waiting for transcription; (2) transcription in progress; (3) waiting for correction; (4) correction completed; (5) waiting for training; (6) training in progress; (7) retention; (8) move to history pending; and (9) history.

In some embodiments, the only action required by a user as a message moves through different stages of thesystem10 is to indicate that correction of the message has been completed. In other embodiments, a less automated system can exist, requiring more input from a user during the transcription, correction, and training process.

Another example of a method by which messages are processed in thesystem10 is illustrated inFIG. 7. In this embodiment, a polling manager is used to control the timing of file processing in the system. In particular, at least a portion of the transcription, correction, and training process is moved along by alternating actions of a polling manager. In some embodiments, the polling manager runs on a relatively short time interval to move files from stage to stage within the transcription, correction, and training process. Although not required, the polling manager can move multiple files in different stages to the next stage at the same time.

The archival process allows files to move out of thesystem10 immediately or based at least in part upon set retention rules. Archived or historical files allow thesystem10 to keep current files available quickly while older files can be encrypted, compressed, and stored. Archived files can also be returned to a user (step222) in any manner as described above.

In some embodiments, the email-client correction interface shows the stage of one or more files in the transcription, correction, and training process. This process can be automated and database driven so that all files are used to build and train the voice independent model.

It should be noted that a database-drivensystem10 allows redundancy within the system. Multiple servers can share the load of the process described above. Also, multiple servers across different geographic regions can provide backup in the event of a natural disaster or other problem at one or more sites.

FIG. 8 illustrates a correction method according to an embodiment of the invention. The correction process ofFIG. 8 begins when audio data is received by thetranscription server20 and is transcribed (step250). As described above with respect toFIGS. 1-2, thetranscription server20 can receive audio data from one or more devices running email-clients30, such as acomputer30a, aPDA30b, ablackberry device30c, amobile phone30d, etc.

Thetranscription server20 can send the correction notification to a user who is assigned to the correction of transcribed audio data associated with a particular owner or destination. For example, as thetranscription server20 transcribes voicemail messages for a particular member of an organization, thetranscription server20 can send a notification to a secretary or assistant of the member. An administrator can use an interface of the system10 (e.g., the email-client interface60) to configure one or more recipients who are to receive the correction notifications for a particular destination (e.g., a particular email account). An administrator can also specify settings for notifications, such as the type of notification to send (e.g., email, text, audio, etc.), the addresses or identifiers of the notification recipients (e.g., email addresses), the information to be included in the notifications, etc. For example, an administrator can establish rules for sending correction notifications, such as transcriptions associated with audio data received by thetranscription server20 from a particular audio data source should be corrected by particular users. In addition, as described above, an administration can set one or more accuracy thresholds, which can dictate when transcribed audio data skips the correction process.

FIG. 9 illustrates anemail correction notification254 according to an embodiment of the invention that is listed in aninbox255 of an email application. As shown inFIG. 9, theemail correction notification254 is listed as an email message in theinbox255 similar toother email messages256 received from other sources. For example, theinbox255 can display the sender of the email correction notification254 (i.e., the transcription server20), an account or destination associated with the audio data and generated text data (e.g., an account number), and an identifier of the source of the audio data (e.g., the name of an individual that sent the message). As shown inFIG. 9, the identifier of the source of the audio data can optionally include an address or location of the audio data source. In some embodiments (e.g., depending on the email application used), theinbox255 lists additional information about thenotification254, such as the size of theemail correction notification254, the time thenotification254 was sent, and/or the date that thenotification254 was sent.

To read theemail correction notification254, a user can select the notification254 (e.g., by clicking on, highlighting, etc.) in theinbox255. After the user selects thenotification254, the email application can display the contents of thenotification254, as shown inFIG. 10. The contents of theemail correction notification254 can include similar information as displayed in theinbox255. The contents of theemail correction notification254 can also indicate the length of the audio data transcribed by thetranscription server20 and the day, date, and/or time that the audio data was received by thetranscription server20. To correct the transcription, the user can access the email-client correction interface from their email-client. However, if the user does not have access to the email-client correction interface, alink257 to a web interface is provided in the email correction notification.

Referring toFIGS. 11-14 illustrate the email-client correction interface260 according to an embodiment of the invention. After a user receives acorrection notification254, the user can access the email-client correction interface260 to review and correct the generated text data (if needed) (step262). The email-client correction interface260 is accessed from within the email-client. For example, when a user receives a correction notification indicating that the user has messages that either have been corrected or are ready to be corrected, the user can access the email-client correction interface260 without launching a separate web browsing application. Additionally, a user can also reply directly to a correction notification that includes transcribed text, correct the transcribed text in the body of the message, and send the corrected transcribed text back to thetranscription server20. After sending the corrected transcribed text back to thetranscription server20, the voice model is updated accordingly.

As shown inFIG. 11, to access the email-client correction interface260, the user may first be prompted to enter credentials and/or identifying information via alogin screen264 of theinterface260. For example, thelogin screen264 can include one or more selection mechanisms and/orinput mechanisms266 that enable a user to select or enter credentials and/or identifying information. As shown inFIG. 11, thelogin screen264 can includeinput mechanisms266 for entering a username and a password. Theinput mechanisms266 can be case sensitive and/or can be limited to a predetermined set and/or number of characters. For example, theinput mechanisms266 can be limited to approximately 30 non-space characters. A user can enter his or her username and password (e.g., as set by the user or an administrator) and can select a log inselection mechanism268. Alternatively, a user can select ahelp selection mechanism270 in order to access instructions, tips, help web pages, electronic manuals, etc. for the email-client correction interface260.

After the user enters his or her credentials and/or identifying information, the email-client correction interface260 verifies the entered information, and, if verified, the email-client correction interface260 displays amain page272, as shown inFIG. 12. Themain page272 includes anavigation area274 and aview area276. Thenavigation area274 includes one or more selection mechanisms for accessing standard functions of the email-client correction interface260. For example, as shown inFIG. 12, thenavigation area274 includes ahelp selection mechanism278 and a log offselection mechanism280. As described above, a user can select thehelp selection mechanism278 in order to access instructions, tips, help web pages, electronic manuals, etc. for the email-client correction interface260. A user selects the log offselection mechanism280 in order to exit the email-client correction interface260. In some embodiments, if a user selects the log offselection mechanism280, the email-client correction interface260 returns the user to thelogin page264.

As shown inFIG. 12, thenavigation area274 also includes aninbox selection mechanism282, a myhistory selection mechanism284, asettings selection mechanism286, ahelp selection mechanism288, and/or a log offselection mechanism290. A user selects theinbox selection mechanism282 in order to view themain page272. The user selects the myhistory selection mechanism284 in order to access previously corrected transcriptions. In some embodiments, if a user selects the myhistory selection mechanism284, the email-client correction interface260 displays a history page (not shown) similar to themain page272 that lists previously corrected transcriptions. Alternatively or in addition to displaying the information displayed in the main page272 (e.g., file name, checked out by, checked in by, creation date, priority), the history page can display correction date(s) for each transcription.

A user can select thesettings selection mechanism286 in order to access one or more setting pages (not shown) of the email-client correction interface260. The setting pages can enable a user to change his or her notification preferences, email-client correction interface260 preferences (e.g., change a username and/or password, set a time limit for transcriptions displayed in a history page), etc. For example, as described above, a user can use the settings pages to specify destination settings for audio data and/or generated text data, configure commands and keyboard shortcuts, specify accuracy thresholds, turn on or off particular features of the email-client correction interface260 and/or thesystem10, etc. In some embodiments, the number and degree of settings configurable by a particular user via the settings pages are based on the permissions of the user. An administrator can use the setting pages to specify global settings, group settings (e.g., associated with particular permissions), and individual settings. In addition, an administrator can use a setting page of the email-client correction interface260 to specify users of the email-client correction interface260 and can establish usernames and passwords for users. Furthermore, as described above with respect toFIGS. 9 and 10, an administrator can use a setting page of the email-client correction interface260 to specify notification parameters, such as who receives particular notifications, what type of notifications are sent, what information is included in the notifications, etc.

As shown inFIG. 12, theview area276 lists transcriptions (e.g., associated with the logged-in user) that need attention (e.g., correction). In some embodiments, theview area276 includes one or morefilter selection mechanisms292, that a user can use to filter and/or sort the listed transcriptions. For example, a user can use afilter selection mechanism292 to filter and/or sort transcriptions by creation date, priority, etc.

Theview area274 can also list additional information for each transcription. For example, as shown inFIG. 12, theview area274 can list a file name, a checked out by parameter, a checked out on parameter, a creation date, and a priority for each listed transcription. Theview area274 can also include anedit selection mechanism294 and acomplete selection mechanism296 for each transcription.

Returning toFIG. 8, after a user accesses the email-client correction interface260, the user can select a transcription to correct (step298). As shown inFIG. 12, to correct a particular transcription, the user selects theedit selection mechanism294 associated with the transcription. When a user selects anedit selection mechanism294, the email-client correction interface260 displays acorrection page300, an example of which is shown inFIG. 13. Thecorrection page300 includes thenavigation area274, as described above with respect toFIG. 12, and acorrection view area302. Thecorrection view area302 displays thetext data303 generated by the transcription. A user can edit thetext data303 by deleting text, inserting text, cutting text, copying text, etc. within the correction view area.

In some embodiments, thecorrection view area302 also includes arecording control area304. Therecording control area304 can include one or more selection mechanisms for listening to or playing the audio data associated with thetext data303 displayed in thecorrection view area302. For example, as shown inFIG. 13, therecording control area304 can include aplay selection mechanism306, astop selection mechanism308, and apause selection mechanism310. A user can select theplay selection mechanism306 to play the audio data from the beginning and can select thestop selection mechanism308 to stop the audio data. Similarly, a user can select thepause selection mechanism310 to pause the audio data. In some embodiments, selecting thepause selection mechanism310 after pausing the audio data causes thecorrection interface260 to continue playing the audio data (e.g., from the point at which the audio data was paused).

As shown inFIG. 13, therecording control area304 can also include a continue fromcursor selection mechanism312. A user can select the continue fromcursor selection mechanism312 in order to start playing the audio data at a location corresponding to the position of the cursor within thetext data303. For example, if a user places a cursor within thetext data303 before the word “Once” and selects the continue fromcursor selection mechanism312, the email-client correction interface260 plays the audio data starting from the word “Once.” In some embodiments, therecording control area304 also includes a playbackcontrol selection mechanism314 that a user can use to specify a number of seconds to play before playing the audio data starting at the cursor position. For example, as shown inFIG. 13, a user can specify 1 to 8 seconds using the play control selection mechanism314 (e.g., by dragging an indicator along the timeline or in another suitable manner). After setting the playbackcontrol selection mechanism314, the user can select the continue fromcursor selection mechanism312, which causes the email-client correction interface260 to play the audio data starting at the cursor position minus the number of seconds specified by the playcontrol selection mechanism314.

In some embodiments, therecording control area304 also includes a speed control mechanism (not shown) that allows a user to decrease and increase the playback speed of audio data. For example, therecording control area304 includes a speed control mechanism that includes one or more selection mechanisms (e.g., buttons, timelines, etc.). A user can select (e.g., click, drag, etc.) the selection mechanisms in order to increase or decrease the playback of audio data by a particular speed. In some embodiments, the speed control mechanism can also include a selection mechanism that a user can select in order to play audio data at normal speed.

In some embodiments, a user can hide therecording control area304. For example, as shown inFIG. 13, thecorrection view area302 can include one or more selection mechanisms315 (e.g., tabs) that enable a user to choose whether to view thetext data303 only (e.g., by selecting afull text tab315a) or to view thetext data303 and the recording control area304 (e.g., by selecting a listen/text tab315b).

Thecorrection view area302 can also include asave selection mechanism316. A user can select thesave selection mechanism316 in order to save the current state of the correctedtext data303. A user can select thesave selection mechanism316 at any time during the correction process.

Thecorrection view area302 can also include a table318 that lists, among other things, the system's confidence in its transcription quality. For example, as shown inFIG. 13, thecorrection view area302 can list the total number of words in thetext data303, the number of low-confidence words in thetext data303, the number of medium-confidence words in thetext data303, and/or the number of high-confidence words in the text data. “Low” words can include words that are least likely to be correct. “Medium” words can include words that are moderately likely to be correct. “High” words can include words that are very likely to be correct. In some embodiments, if the number of low words in thetext data303 is close to the number of total words in thetext data303, it may be useful for the user to delete thetext data303 and manually retype thetext data303 by listening to the corresponding audio data. This situation may occur if the audio data was received from an audio data source that thesystem10 has not previously received data from or has not previously received significant data from.

Returning toFIG. 8, after a user selects a transcription to correct, the user corrects the transcription as necessary via the email-client correction interface260 (step320) and submits or saves the corrected transcription (step322). As described above with respect toFIG. 13, to submit or save correctedtext data303, a user can select thesave selection mechanism316 included in thecorrection page300. In some embodiments, when a user selects thesave selection mechanism316, the email-client correction interface260 displays a saveoptions page330, as shown inFIG. 14. The saveoptions page330 can include thenavigation area274, as described above with respect toFIGS. 12 and 13, and a saveoptions view area332. The saveoptions view area332 can display one or more selection mechanisms for saving the current state of the correctedtext data303. For example, as shown inFIG. 14, theoptions view area332 can include a saverecording selection mechanism334, a save and mark ascomplete selection mechanism336, and a save, mark as complete and send toowner selection mechanism338. A user can select the saverecording selection mechanism334 in order to save the current state of thetext data303 with any corrections made by the user. The user is then returned to themain page272. A user may select the saverecording selection mechanism334 if the user has not finished making corrections to thetext data303 but wants to stop working on the corrections at the current time. A user may also select the saverecording selection mechanism334 if the user wants to periodically save corrections when working on long transcriptions. In some embodiments, the saverecording selection mechanism334 is the default selection.

A user can select the save and mark ascomplete selection mechanism336 in order to save the corrections made by the user and move the transcription to the user's history. Once the corrections are saved and moved to the history folder, the user can access the corrected transcription (e.g., via the history page of the email-client correction interface260) but may not be able to edit the corrected transcription.

A user can select the save, mark as complete and send toowner selection mechanism338 in order to save the corrected transcription, move the corrected transcription to the user's history folder, and send the corrected transaction and/or the associated audio data to the owner or destination of the audio data (e.g., the owner's email address). As described above, a destination for corrected transcriptions can include files and multiple devices running email clients. For example, the email-client correction interface260 can send a message notification to the owner of the transcription that includes the corrected transcription (e.g., as text within the message or as an attached file).FIG. 15 illustrates anemail message notification339 according to an embodiment of the invention. As shown inFIG. 15, thenotification339 includes the corrected transcription.

Once a user selects a save option, the user can select an acceptselection mechanism340 in order to accept the selected option or can select a cancelselection mechanism342 in order to cancel the selected option. In some embodiments, if a user selects the cancelselection mechanism342, the email-client correction interface260 returns the user to thecorrection page300.

A user can also select acomplete selection mechanism296 included in themain page272 of the email-client correction interface260 in order to submit or save transcriptions. In some embodiments, if a user selects acomplete selection mechanism296 included in themain page272, the email-client correction interface260 displays the saveoptions page330 as described above with respect toFIG. 14. In other embodiments, if a user selects acomplete selection mechanism296 included in themain page272, the email-client correction interface260 automatically saves any previous corrections made to the transcription associated with thecomplete selection mechanism296, moves the corrected transcription to the user's history folds, and sends the completed transcription and/or the corresponding audio data to the owner or destination associated with the transcription.

The embodiments described above and illustrated in the figures are presented by way of example only and are not intended as a limitation upon the concepts and principles of the invention. As such, it will be appreciated by one having ordinary skill in the art that various changes in the elements and their configuration and arrangement are possible without departing from the spirit and scope of the present invention. For example, in some embodiments thetranscription server20 utilizes multiple threads to transcribe multiple files concurrently. This process can use a single database or a cluster of databases holding temporary information to assist in multiple thread transcription on the same or different machines. Each system or device included in embodiments of the present invention can also be performed by one or more machines and/or one or more virtual machines.

Various features and advantages of the invention are set forth in the following claims.

Claims

1. A method of requesting a transcription of audio data, the method comprising:

displaying a send-for-transcription button within an email-client interface on a computer-controlled display; and

automatically sending a selected email message and associated audio data to a transcription server as a request for a transcription of the associated audio data when a user selects the send-for-transcription button.

2. The method ofclaim 1, further comprising displaying a status of the selected email message within the email-client interface, wherein the status indicates at least one of whether the selected email message has been sent to the transcription server, whether transcribed text based on the associated audio data has been received, and whether corrected text data has been received associated with the transcribed text.

3. The method ofclaim 2, further comprising playing the associated audio data within the email-client interface so that the audio data is audible to a user.

4. The method ofclaim 1, further comprising receiving the transcription of the associated audio data from the transcription server.

5. The method ofclaim 4, further comprising displaying the transcription of the associated audio data to a user within the email-client interface.

6. The method ofclaim 5, further comprising receiving corrected text data associated with the transcription of the associated audio data from a user within the email-client interface.

7. The method ofclaim 6, further comprising sending the corrected text data to the transcription server.

8. A system for requesting a transcription of audio data, the system comprising:

a transcription server;

an email-client interface displaying at least one email message associated with audio data to a user, displaying a send-for-transcription button to the user, receiving a selection of the at least one email message from the user, receiving a selection of the send-for-transcription button from the user, and automatically sending the at least one email message and associated audio data to the transcription server as a request for a transcription of the associated audio data in response to the user's selection of the send-for-transcription button.

9. The system ofclaim 8, wherein the email-client interface displays a status associated with the at least one email message, wherein the status includes at least one of whether the at least one email message has been sent to the transcription server, whether transcribed text based on the associated audio data has been received, and whether corrected text data has been received associated with the transcribed text.

10. The system ofclaim 8, wherein the email-client interface plays the associated audio data so that the associated audio data is audible to a user.

11. The system ofclaim 8, wherein the transcription server generates the transcription of the associated audio data based on a voice independent model.

12. The system ofclaim 8, wherein the transcription server identifies an account associated with the at least one email message based on at least one of an email address and an internet protocol address associated with the at least one email message.

13. The system ofclaim 12, wherein the transcription server obtains stored account settings associated with the identified account, the account settings including at least one of transcribed text delivery settings, transcription settings, and transcription format settings.

14. The system ofclaim 13, wherein the transcription server generates the transcription of the associated audio data based on the account settings.

15. The system ofclaim 8, wherein the transcription server generates the transcription of the associated audio data and sends the transcription of the associated audio data to the email-client interface.

16. The system ofclaim 15, wherein the email-client interface displays the transcription of the associated audio data to a user, receives corrected text data associated with the transcription of the associated audio data from the user, and sends the corrected text data to the transcription server.

17. The system ofclaim 16, wherein the transcription server modifies a voice-independent model based on the corrected text data.

18. A system for generating a transcription of audio data, the system comprising:

a transcription server configured to receive at least one email message and associated audio data from an email-client, identify an account based on the at least one email message, and obtain stored account settings associated with the identified account; and

a translation server configured to generate a transcription of the associated audio data based on the account settings and a voice-independent model.

19. The system ofclaim 18, wherein the account settings include at least one of transcribed text delivery settings, transcription settings, and transcription format settings.

20. The system ofclaim 18, wherein the transcription server identifies an account based on at least one of an email address and an internet protocol address associated with the at least one email message.