HK1206836B

Movatterモバイル変換

Info

Publication number: HK1206836B
Application number: HK15107325.4A
Authority: HK
Inventors: C‧B‧弗莱扎克; T‧R‧格鲁伯
Original assignee: 苹果公司
Priority date: 2012-06-29
Filing date: 2013-06-25
Publication date: 2019-10-18

Description

用于文档的语音激活导航和浏览的设备、方法和用户界面Device, method, and user interface for voice-activated navigation and browsing of documents

技术领域Technical Field

所公开的实施例整体涉及数字助理系统，且更具体地涉及执行文档的语音激活导航和浏览的数字助理系统。The disclosed embodiments relate generally to digital assistant systems, and more particularly to digital assistant systems that perform voice-activated navigation and browsing of documents.

背景技术Background Art

正如人类个人助理一样，数字助理系统可执行所请求的任务并提供所请求的建议、信息或服务。数字助理系统满足用户请求的能力取决于数字助理系统对请求或指令的正确理解。在自然语言处理方面的最新进展已使得用户能够使用口头或文本形式的自然语言来与数字助理系统进行交互。此类数字助理系统可解释用户的输入以推断用户的意图、将所推断出的意图转换成可执行的任务和参数、执行操作或部署服务以执行任务，以及产生可被用户理解的输出。Just like a human personal assistant, a digital assistant system can perform requested tasks and provide requested advice, information, or services. The ability of a digital assistant system to satisfy a user's request depends on the digital assistant system's correct understanding of the request or instruction. Recent advances in natural language processing have enabled users to interact with digital assistant systems using natural language, either spoken or in text form. Such digital assistant systems can interpret user input to infer the user's intent, convert the inferred intent into executable tasks and parameters, perform operations or deploy services to perform the tasks, and generate output that can be understood by the user.

此类数字助理系统可被配置为协助在与电子设备进行交互方面具有有限访问性的用户。例如，视力下降的人群诸如低视力用户和盲人用户、存在阅读障碍的用户或具有学习障碍的其他用户，或者甚至是仅仅希望或需要在操作期间不必注视设备便能够使用设备的有视力的用户均可受益于向用户阅读信息的数字助理系统。在另一个实例中，对于具有触摸屏的电子设备而言，动作技能有限的人群，诸如某根手指或某只手受损的那些人，如果在触摸屏上执行触摸手势不是不可能的话，也可能发现执行该触摸手势很困难。然而，数字助理系统可接收语音命令，从而消除对触摸手势的需求。Such digital assistant systems can be configured to assist users who have limited access to interacting with electronic devices. For example, people with reduced vision, such as low-vision and blind users, users with dyslexia or other learning disabilities, or even sighted users who simply want or need to be able to use a device without having to look at the device during operation, can benefit from a digital assistant system that reads information to the user. In another example, for electronic devices with touch screens, people with limited motor skills, such as those with impairments to a finger or hand, may find it difficult, if not impossible, to perform touch gestures on the touch screen. However, the digital assistant system can receive voice commands, thereby eliminating the need for touch gestures.

然而，对文档的导航和浏览仍然是麻烦且效率低下的，从而对视力受损和/或动作技能有限的用户带来明显的认知负担。However, navigating and browsing documents remains cumbersome and inefficient, placing a significant cognitive burden on users with impaired vision and/or limited motor skills.

发明内容Summary of the Invention

如上所述，需要用于向视力受损和/或动作技能有限的用户提供用于文档的导航和浏览的改进的用户界面的数字助理系统。这使用户能够有效地导航通过并浏览文档。As described above, a need exists for a digital assistant system that provides an improved user interface for navigating and browsing documents to users who are visually impaired and/or have limited motor skills. This enables users to efficiently navigate through and browse documents.

本文所公开的实施例提供了提供文档的语音激活导航和浏览的方法、系统和计算机可读存储介质。Embodiments disclosed herein provide methods, systems, and computer-readable storage media that provide voice-activated navigation and browsing of documents.

一些实施例提供了一种用于导航通过在具有存储器和一个或多个处理器的电子设备处执行的文档的方法。该方法包括接收含有多个链接的第一文档、输出对第一文档的至少一部分的语音阅读、输出识别所述多个链接中的一个链接的可听信息，以及响应于输出识别该链接的可听信息，从用户接收关于该链接的第一类型的语音命令。该方法还包括，响应于从用户接收到语音命令，输出对第二文档的与该链接相关联的至少一部分的语音阅读。Some embodiments provide a method for navigating through documents executed on an electronic device having a memory and one or more processors. The method includes receiving a first document containing a plurality of links, outputting a spoken reading of at least a portion of the first document, outputting audible information identifying a link from the plurality of links, and, in response to outputting the audible information identifying the link, receiving a first type of voice command from a user regarding the link. The method also includes, in response to receiving the voice command from the user, outputting a spoken reading of at least a portion of a second document associated with the link.

根据一些实施例，一种用于浏览在具有存储器和一个或多个处理器的电子设备处执行的文档的方法包括接收具有多个部分的文档，其中所述部分中的至少一些部分与相应的元数据相关联。该方法还包括输出对文档的相应部分的语音阅读，包括基于相应的元数据可听地区分所述相应部分。该方法还包括从用户接收请求导航至与特定元数据相关联的特定部分的语音命令；以及，响应于接收到该语音命令，输出对与特定元数据相关联的特定部分的语音阅读。According to some embodiments, a method for browsing a document executed on an electronic device having a memory and one or more processors includes receiving a document having a plurality of parts, wherein at least some of the parts are associated with corresponding metadata. The method also includes outputting a spoken reading of the corresponding parts of the document, including audibly distinguishing the corresponding parts based on the corresponding metadata. The method also includes receiving a voice command from a user requesting navigation to a specific part associated with specific metadata; and, in response to receiving the voice command, outputting a spoken reading of the specific part associated with the specific metadata.

根据一些实施例，一种用于识别在具有存储器和一个或多个处理器的电子设备处执行的一组文档的方法包括输出对多个文档中的一个文档的至少一部分的语音阅读。该方法还包括，在输出语音阅读的同时，从用户接收请求对应于特定标准的文档的语音命令。该方法还包括，响应于从用户接收到语音命令，识别所述多个文档中的对应于特定标准的一个或多个文档，并且输出对所述一个或多个识别的文档的相应文档的至少一部分的语音阅读。According to some embodiments, a method for recognizing a set of documents executed at an electronic device having a memory and one or more processors includes outputting a spoken reading of at least a portion of a document from a plurality of documents. The method further includes, while outputting the spoken reading, receiving a voice command from a user requesting a document corresponding to a specific criterion. The method further includes, in response to receiving the voice command from the user, identifying one or more documents from the plurality of documents corresponding to the specific criterion and outputting a spoken reading of at least a portion of a corresponding document of the one or more identified documents.

根据一些实施例，电子设备包括存储用于被所述一个或多个处理器执行的一个或多个程序的存储器和一个或多个处理器。所述一个或多个程序包括用于执行上述方法中的任一中方法的操作的指令。根据一些实施例，在具有显示器、存储器、和执行存储在存储器中的一个或多个程序的一个或多个处理器的电子设备上的图形用户界面包括在上述方法的任一种方法中显示的元件中的一个或多个元件，这些元件响应于输入而被更新，如上述方法中的任一种方法中所述。根据一些实施例，计算机可读存储介质已在其中存储了指令，当该指令由具有存储器和一个或多个处理器的电子设备执行时，使得该设备执行上述方法中的任一种方法的操作。根据一些实施例，电子设备包括用于执行上述方法中的任一种方法的操作的装置。根据一些实施例，用于电子设备中的信息处理装置包括用于执行上述方法中的任一种方法的操作的装置。根据一些实施例，电子设备包括被配置为执行上述方法中的任一种方法的操作的处理单元。According to some embodiments, an electronic device includes a memory and one or more processors storing one or more programs for execution by the one or more processors. The one or more programs include instructions for performing the operations of any of the above methods. According to some embodiments, a graphical user interface on an electronic device having a display, a memory, and one or more processors executing one or more programs stored in the memory includes one or more elements of the elements displayed in any of the above methods, which elements are updated in response to input, as described in any of the above methods. According to some embodiments, a computer-readable storage medium has stored therein instructions that, when executed by an electronic device having a memory and one or more processors, cause the device to perform the operations of any of the above methods. According to some embodiments, the electronic device includes a device for performing the operations of any of the above methods. According to some embodiments, an information processing device used in an electronic device includes a device for performing the operations of any of the above methods. According to some embodiments, the electronic device includes a processing unit configured to perform the operations of any of the above methods.

根据一些实施例，电子设备包括被配置为接收音频输入的音频输入单元。电子设备还包括被配置为输出可听信息的音频输出单元。电子设备包括耦接至音频输入单元和音频输出单元的处理单元。处理单元被配置为接收包括多个链接的第一文档。处理单元被配置为输出对第一文档的至少一部分的语音阅读。处理单元被配置为输出识别所述多个链接中的一个链接的可听信息。处理单元被配置为，响应于输出识别该链接的可听信息，从用户接收关于该链接的语音命令。处理单元被配置为，响应于从用户接收到语音命令，输出对第二文档的与该链接相关联的至少一部分的语音阅读。According to some embodiments, the electronic device includes an audio input unit configured to receive audio input. The electronic device also includes an audio output unit configured to output audible information. The electronic device includes a processing unit coupled to the audio input unit and the audio output unit. The processing unit is configured to receive a first document including a plurality of links. The processing unit is configured to output a voice reading of at least a portion of the first document. The processing unit is configured to output audible information identifying a link from the plurality of links. The processing unit is configured to, in response to outputting the audible information identifying the link, receive a voice command about the link from a user. The processing unit is configured to, in response to receiving the voice command from the user, output a voice reading of at least a portion of a second document associated with the link.

根据一些实施例，电子设备包括被配置为接收音频输入的音频输入单元。电子设备还包括被配置为输出可听信息的音频输出单元。电子设备包括耦接至音频输入单元和音频输出单元的处理单元。处理单元被配置为接收具有多个部分的文档，其中所述部分中的至少一些部分与相应的元数据相关联。处理单元被配置为输出对文档的相应部分的语音阅读，包括基于相应的元数据可听地区分所述相应部分。处理单元被配置为从用户接收请求导航至与特定元数据相关联的特定部分的语音命令。处理单元被配置为，响应于接收到语音命令，输出对与特定元数据相关联的特定部分的语音阅读。According to some embodiments, the electronic device includes an audio input unit configured to receive audio input. The electronic device also includes an audio output unit configured to output audible information. The electronic device includes a processing unit coupled to the audio input unit and the audio output unit. The processing unit is configured to receive a document having a plurality of parts, wherein at least some of the parts are associated with corresponding metadata. The processing unit is configured to output a voice reading of the corresponding parts of the document, including audibly distinguishing the corresponding parts based on the corresponding metadata. The processing unit is configured to receive a voice command from a user requesting navigation to a specific part associated with specific metadata. The processing unit is configured to output a voice reading of the specific part associated with the specific metadata in response to receiving the voice command.

根据一些实施例，电子设备包括被配置为接收音频输入的音频输入单元。电子设备还包括被配置为输出可听信息的音频输出单元。电子设备包括耦接至音频输入单元和音频输出单元的处理单元。处理单元被配置为输出对多个文档中的一个文档的至少一部分的语音阅读。处理单元被配置为，在输出语音阅读的同时，从用户接收请求对应于特定标准的文档的语音命令。处理单元被配置为，响应于从用户接收到语音命令，识别所述多个文档中的对应于特定标准的一个或多个文档；并且输出对所述一个或多个识别的文档的相应文档的至少一部分的语音阅读。According to some embodiments, the electronic device includes an audio input unit configured to receive an audio input. The electronic device also includes an audio output unit configured to output audible information. The electronic device includes a processing unit coupled to the audio input unit and the audio output unit. The processing unit is configured to output a voice reading of at least a portion of a document from a plurality of documents. The processing unit is configured to, while outputting the voice reading, receive a voice command from a user requesting a document corresponding to a specific standard. The processing unit is configured to, in response to receiving the voice command from the user, identify one or more documents from the plurality of documents corresponding to the specific standard; and output a voice reading of at least a portion of a corresponding document of the one or more identified documents.

因此，用启用文档的导航和浏览的新型且改进的方法来提供数字助理系统，从而改进针对具有有限访问性的用户的用户界面。此类方法和系统可补充或替换现有的方法和系统。Thus, a digital assistant system is provided with a new and improved method for enabling navigation and browsing of documents, thereby improving the user interface for users with limited accessibility. Such methods and systems may supplement or replace existing methods and systems.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为示出根据一些实施例的数字助理在其中操作的环境的框图。1 is a block diagram illustrating an environment in which a digital assistant operates according to some embodiments.

图2为示出根据一些实施例的数字助理客户端系统的框图。Figure 2 is a block diagram showing a digital assistant client system according to some embodiments.

图3为示出根据一些实施例的独立式数字助理系统或数字助理服务器系统的框图。Figure 3 is a block diagram showing a stand-alone digital assistant system or a digital assistant server system according to some embodiments.

图4A-4N示出根据一些实施例的在电子设备上显示的示例性用户界面。4A-4N illustrate exemplary user interfaces displayed on an electronic device in accordance with some embodiments.

图5为示出根据一些实施例的用于导航通过由电子设备执行的文档的操作的流程图。5 is a flow diagram illustrating operations for navigating through a document executed by an electronic device in accordance with some embodiments.

图6为示出根据一些实施例的用于浏览由电子设备执行的文档的操作的流程图。6 is a flow diagram illustrating operations for browsing a document executed by an electronic device in accordance with some embodiments.

图7为示出根据一些实施例的用于识别由电子设备执行的一组文档的操作的流程图。7 is a flow diagram illustrating operations for identifying a set of documents performed by an electronic device in accordance with some embodiments.

图8为根据一些实施例的电子设备的功能框图。FIG8 is a functional block diagram of an electronic device according to some embodiments.

图9为根据一些实施例的电子设备的功能框图。FIG9 is a functional block diagram of an electronic device according to some embodiments.

图10为根据一些实施例的电子设备的功能框图。10 is a functional block diagram of an electronic device in accordance with some embodiments.

相似的附图标号是指整个附图中的对应部件。Like reference numerals refer to corresponding parts throughout the drawings.

具体实施方式DETAILED DESCRIPTION

图1为根据一些实施例的数字助理的操作环境100的框图。术语“数字助理”、“虚拟助理”、“智能自动化助理”或“自动数字助理”是指解释口头和/或文本形式的自然语言输入以推断用户意图(例如，识别对应于自然语言输入的任务类型)并基于推断出的用户意图来执行动作(例如，执行对应于所识别的任务类型的任务)的任何信息处理系统。例如，为遵照推断出的用户意图来执行动作，系统可执行以下操作中的一者或多者：识别具有被设计来实现推断出的用户意图的步骤和参数的任务流(例如，识别任务类型)，将来自推断出的用户意图的特定要求输入到任务流中，通过调用程序、方法、服务、API等来执行任务流(例如，向服务提供方发送请求)；以及以可听(例如，语音)和/或可视形式来生成对用户的输出响应。Figure 1 is a block diagram of an operating environment 100 for a digital assistant according to some embodiments. The terms "digital assistant," "virtual assistant," "intelligent automated assistant," or "automated digital assistant" refer to any information processing system that interprets natural language input, either spoken and/or textual, to infer user intent (e.g., identify a task type corresponding to the natural language input) and performs an action based on the inferred user intent (e.g., perform a task corresponding to the identified task type). For example, to perform an action in accordance with an inferred user intent, the system may perform one or more of the following: identify a task flow with steps and parameters designed to implement the inferred user intent (e.g., identify the task type), input specific requirements from the inferred user intent into the task flow, execute the task flow by calling a procedure, method, service, API, etc. (e.g., sending a request to a service provider); and generate an output response to the user in an audible (e.g., voice) and/or visual form.

具体地，数字助理系统能够接受至少部分地以自然语言命令、请求、声明、讲述和/或询问的形式的用户请求。通常，用户请求寻求数字助理系统的任务的信息性回答和执行。对用户请求的令人满意的响应通常是提供所请求的信息性回答、执行所请求的任务、或这两者的组合。例如，用户可向数字助理系统提出一个问题，诸如“我现在在哪儿？”基于用户的当前位置，数字助理可回答：“您在西大门附近的中央公园”。用户还可请求执行一项任务，例如通过声明“请邀请我的好友们下周参加我女朋友的生日派对”。作为响应，数字助理可通过生成语音输出“好的，马上”来确认该请求，并随后将适当的日程邀请从用户的电子邮件地址发送至列于用户的电子通讯录中的用户好友中的每一位。存在与数字助理进行交互以请求信息或执行各种任务的许多其他方法。除提供言语响应并采取经编程的动作之外，数字助理还可提供其他视频或音频形式(例如，作为文本、警报、音乐、视频、动画等)的响应。Specifically, the digital assistant system is capable of accepting user requests that are at least partially in the form of natural language commands, requests, statements, narrations, and/or inquiries. Typically, user requests seek informational answers and the performance of tasks by the digital assistant system. A satisfactory response to a user request is typically providing the requested informational answer, performing the requested task, or a combination of the two. For example, a user may ask the digital assistant system a question, such as, "Where am I now?" Based on the user's current location, the digital assistant may respond, "You are in Central Park near Seodaemun." A user may also request the performance of a task, such as by stating, "Please invite my friends to my girlfriend's birthday party next week." In response, the digital assistant may confirm the request by generating a voice output, "Okay, right away," and then send the appropriate calendar invitations from the user's email address to each of the user's friends listed in the user's electronic address book. There are many other ways to interact with a digital assistant to request information or perform various tasks. In addition to providing verbal responses and taking programmed actions, the digital assistant may also provide responses in other visual or audio forms (e.g., as text, alarms, music, videos, animations, etc.).

如图1中所示，在一些实施例中，数字助理系统根据客户端-服务器模型来实施。数字助理系统包括在用户设备(例如，104a和104b)上执行的客户端侧部分(例如，102a和102b)(后文称作“数字助理(DA)客户端102”)，以及在服务器系统108上执行的服务器侧部分106(后文称作“数字助理(DA)服务器106”)。DA客户端102通过一个或多个网络110与DA服务器106进行通信。DA客户端102提供客户端侧功能诸如面向用户的输入和输出处理，并且与DA服务器106进行通信。DA服务器106为任意数量的DA客户端102提供服务器侧功能，所述任意数量的DA客户端各自驻留在相应的用户设备104(也称作客户端设备)上。As shown in Figure 1, in some embodiments, the digital assistant system is implemented according to a client-server model. The digital assistant system includes a client-side portion (e.g., 102a and 102b) (hereinafter referred to as "digital assistant (DA) client 102") executed on a user device (e.g., 104a and 104b), and a server-side portion 106 (hereinafter referred to as "digital assistant (DA) server 106") executed on a server system 108. The DA client 102 communicates with the DA server 106 via one or more networks 110. The DA client 102 provides client-side functionality such as user-oriented input and output processing, and communicates with the DA server 106. The DA server 106 provides server-side functionality for any number of DA clients 102, each of which resides on a corresponding user device 104 (also referred to as a client device).

在一些实施例中，DA服务器106包括面向客户端的I/O接口112、一个或多个处理模块114、数据与模型116，以及至外部服务的I/O接口118。面向客户端的I/O接口有助于数字助理服务器106的面向客户端的输入和输出处理。所述一个或多个处理模块114利用数据与模型116基于自然语言输入来确定用户的意图，并基于推断出的用户意图来执行任务执行。In some embodiments, the DA server 106 includes a client-facing I/O interface 112, one or more processing modules 114, data and models 116, and an I/O interface to external services 118. The client-facing I/O interface facilitates client-facing input and output processing of the digital assistant server 106. The one or more processing modules 114 utilize the data and models 116 to determine the user's intent based on natural language input and perform task execution based on the inferred user intent.

在一些实施例中，DA服务器106通过一个或多个网络110与外部服务(例如，一种或多种导航服务、一种或多种消息传送服务、一种或多种信息服务、日历服务、电话服务等)通信以用于完成任务或获取信息。至外部服务的I/O接口118有助于此类通信。In some embodiments, the DA server 106 communicates with external services (e.g., one or more navigation services, one or more messaging services, one or more information services, a calendar service, a telephony service, etc.) over one or more networks 110 to complete tasks or obtain information. An I/O interface 118 to the external services facilitates such communications.

用户设备104的实例包括但不限于手持式计算机、个人数字助理(PDA)、平板电脑、膝上型计算机、台式计算机、蜂窝电话、智能电话、增强型通用分组无线电服务(EGPRS)移动电话、媒体播放器、导航设备、游戏机、电视机、遥控器、或者这些数据处理设备中的任意两种或更多种的组合或任何其他合适的数据处理设备。有关用户设备104的更多细节参照图2中示出的示例性用户设备104提供。Examples of user devices 104 include, but are not limited to, handheld computers, personal digital assistants (PDAs), tablet computers, laptop computers, desktop computers, cellular phones, smart phones, enhanced general packet radio service (EGPRS) mobile phones, media players, navigation devices, game consoles, televisions, remote controls, or any combination of two or more of these data processing devices or any other suitable data processing device. More details about user devices 104 are provided with reference to the exemplary user devices 104 shown in FIG2 .

一个或多个通信网络110的实例包括局域网(“LAN”)和广域网(“WAN”)，例如互联网。一个或多个通信网络110可使用任何已知的网络协议，包括各种有线或无线协议诸如以太网、通用串行总线(USB)、火线(FIREWIRE)、全球移动通信系统(GSM)、增强型数据GSM环境(EDGE)、码分多址(CDMA)、时分多址(TDMA)、蓝牙、Wi-Fi、互联网协议语音技术(VoIP)、Wi-MAX、或任何其他合适的通信协议来实施。Examples of the one or more communication networks 110 include a local area network ("LAN") and a wide area network ("WAN"), such as the Internet. The one or more communication networks 110 can be implemented using any known network protocol, including various wired or wireless protocols such as Ethernet, Universal Serial Bus (USB), FireWire, Global System for Mobile Communications (GSM), Enhanced Data GSM Environment (EDGE), Code Division Multiple Access (CDMA), Time Division Multiple Access (TDMA), Bluetooth, Wi-Fi, Voice over Internet Protocol (VoIP), Wi-MAX, or any other suitable communication protocol.

服务器系统108可在计算机的至少一个数据处理装置和/或分布式网络上实施。Server system 108 may be implemented on at least one data processing device of a computer and/or a distributed network.

尽管图1中示出的数字助理系统包括客户端侧部分(例如，DA客户端102)和服务器侧部分(例如，DA服务器106)两者，但在一些实施例中，数字助理系统仅指服务器侧部分(例如，DA服务器106)。作为另外一种选择，在一些实施例中，可将数字助理的功能实施为安装在用户设备上的独立式应用程序。此外，数字助理的客户端部分与服务器部分之间的功能划分在不同的实施例中可以是变化的。例如，在一些实施例中，DA客户端102为仅提供面向用户的输入和输出处理功能且将数字助理的所有其他功能委托给DA服务器106的瘦客户端。在一些其他实施例中，DA客户端102被配置为执行或协助DA服务器106的一个或多个功能。Although the digital assistant system shown in Figure 1 includes both a client-side portion (e.g., DA client 102) and a server-side portion (e.g., DA server 106), in some embodiments, the digital assistant system refers only to the server-side portion (e.g., DA server 106). Alternatively, in some embodiments, the functionality of the digital assistant can be implemented as a stand-alone application installed on the user device. In addition, the functional division between the client portion and the server portion of the digital assistant can vary in different embodiments. For example, in some embodiments, the DA client 102 is a thin client that only provides user-facing input and output processing functions and delegates all other functions of the digital assistant to the DA server 106. In some other embodiments, the DA client 102 is configured to perform or assist one or more functions of the DA server 106.

图1还示出了web服务器120(例如，web服务器1(122-1)、web服务器2(122-2)、web服务器3(122-3)等)。尽管所述一个或多个web服务器120不是数字助理系统的一部分，但服务器系统108和/或一个或多个用户设备104可与web服务器120中的一者或多者进行通信以检索一个或多个文档和/或与所述一个或多个文档相关联的信息。FIG1 also shows web servers 120 (e.g., web server 1 (122-1), web server 2 (122-2), web server 3 (122-3), etc.). Although the one or more web servers 120 are not part of the digital assistant system, the server system 108 and/or one or more user devices 104 can communicate with one or more of the web servers 120 to retrieve one or more documents and/or information associated with the one or more documents.

图2为根据一些实施例的用户设备104的框图。用户设备104包括存储器接口202、一个或多个处理器204以及外围设备接口206。用户设备104中的各种部件通过一条或多条通信总线或信号线来耦接。用户设备104包括各种传感器、子系统、以及耦接至外围设备接口206的外围设备。传感器、子系统、以及外围设备采集信息和/或有助于用户设备104的各种功能。FIG2 is a block diagram of a user device 104 according to some embodiments. The user device 104 includes a memory interface 202, one or more processors 204, and a peripheral device interface 206. The various components of the user device 104 are coupled via one or more communication buses or signal lines. The user device 104 includes various sensors, subsystems, and peripheral devices coupled to the peripheral device interface 206. The sensors, subsystems, and peripheral devices collect information and/or facilitate various functions of the user device 104.

例如，在一些实施例中，运动传感器210、光传感器212、以及接近传感器214耦接至外围设备接口206以有助于取向、照明和接近感测功能。在一些实施例中，其他传感器216诸如定位系统(例如，GPS接收器)、温度传感器、生物计量传感器等，连接至外围设备接口206，以有助于相关功能。For example, in some embodiments, motion sensor 210, light sensor 212, and proximity sensor 214 are coupled to peripherals interface 206 to facilitate orientation, lighting, and proximity sensing functions. In some embodiments, other sensors 216, such as a positioning system (e.g., a GPS receiver), a temperature sensor, a biometric sensor, etc., are connected to peripherals interface 206 to facilitate related functions.

在一些实施例中，用户设备104包括耦接至外围设备接口206的照相机子系统220。在一些实施例中，照相机子系统220的光学传感器222有助于照相机功能，诸如拍摄照片和记录视频剪辑。在一些实施例中，用户设备104包括提供通信功能的一个或多个有线和/或无线通信子系统224。通信子系统224通常包括各种通信端口、射频接收器和发射器、和/或光(例如，红外)接收器和发射器。在一些实施例中，用户设备104包括音频子系统226，该音频子系统耦接至一个或多个扬声器228及一个或多个麦克风230以有助于支持语音的功能，诸如语音识别、语音复制、数字记录和电话功能。In some embodiments, the user device 104 includes a camera subsystem 220 coupled to the peripherals interface 206. In some embodiments, the optical sensor 222 of the camera subsystem 220 facilitates camera functions, such as taking pictures and recording video clips. In some embodiments, the user device 104 includes one or more wired and/or wireless communication subsystems 224 that provide communication functions. The communication subsystem 224 typically includes various communication ports, radio frequency receivers and transmitters, and/or optical (e.g., infrared) receivers and transmitters. In some embodiments, the user device 104 includes an audio subsystem 226 that is coupled to one or more speakers 228 and one or more microphones 230 to facilitate voice-enabled functions, such as voice recognition, voice replication, digital recording, and telephony functions.

在一些实施例中，I/O子系统240还耦接至外围设备接口206。在一些实施例中，用户设备104包括触摸屏246，并且I/O子系统240包括耦接至触摸屏246的触摸屏控制器242。当用户设备104包括触摸屏246和触摸屏控制器242时，触摸屏246和触摸屏控制器242通常被配置为例如使用多种触摸灵敏度技术中的任一种技术来检测接触和移动或它们的间断，所述多种触摸灵敏度技术诸如电容性技术、电阻性技术、红外技术、表面声波技术、接近传感器阵列等。在一些实施例中，用户设备104包括不具有触敏表面的显示器。在一些实施例中，用户设备104包括独立的触敏表面。在一些实施例中，用户设备104包括一个或多个其他输入控制器244。当用户设备104包括一个或多个其他输入控制器244时，所述一个或多个其他输入控制器244通常耦接至其他输入/控制设备248，诸如一个或多个按钮、摇臂开关、拇指滚轮、红外线端口、USB端口、和/或指针设备诸如触笔。In some embodiments, the I/O subsystem 240 is also coupled to the peripheral device interface 206. In some embodiments, the user device 104 includes a touch screen 246, and the I/O subsystem 240 includes a touch screen controller 242 coupled to the touch screen 246. When the user device 104 includes a touch screen 246 and a touch screen controller 242, the touch screen 246 and the touch screen controller 242 are typically configured to detect contact and movement or their interruption, for example, using any of a variety of touch sensitivity technologies, such as capacitive technology, resistive technology, infrared technology, surface acoustic wave technology, proximity sensor array, etc. In some embodiments, the user device 104 includes a display that does not have a touch-sensitive surface. In some embodiments, the user device 104 includes a separate touch-sensitive surface. In some embodiments, the user device 104 includes one or more other input controllers 244. When the user device 104 includes one or more other input controllers 244, the one or more other input controllers 244 are typically coupled to other input/control devices 248, such as one or more buttons, rocker switches, thumb wheels, infrared ports, USB ports, and/or pointer devices such as a stylus.

存储器接口202耦接至存储器250。在一些实施例中，存储器250包括非暂态计算机可读介质，诸如高速随机存取存储器和/或非易失性存储器(例如，一个或多个磁盘存储设备、一个或多个闪存存储器设备、一个或多个光学存储设备、和/或其他非易失性固态存储器设备)。The memory interface 202 is coupled to the memory 250. In some embodiments, the memory 250 includes non-transitory computer-readable media, such as high-speed random access memory and/or non-volatile memory (e.g., one or more magnetic disk storage devices, one or more flash memory devices, one or more optical storage devices, and/or other non-volatile solid-state memory devices).

在一些实施例中，存储器250存储操作系统252、通信模块254、图形用户界面模块256、传感器处理模块258、电话模块260和应用程序262，以及它们的子集或超集。操作系统252包括用于处理基础系统服务并用于执行硬件相关任务的指令。通信模块254有助于与一个或多个附加设备、一个或多个计算机和/或一个或多个服务器进行通信。图形用户界面模块256有助于图形用户界面处理。传感器处理模块258有助于与传感器相关的处理和功能(例如，处理用一个或多个麦克风228所接收的语音输入)。电话模块260有助于与电话相关的过程和功能。应用程序模块262有助于用户应用程序的各种功能，诸如电子消息传送、网页浏览、媒体处理、导航、成像和/或其他过程和功能。在一些实施例中，应用程序模块262包括web浏览器应用程序270或与web浏览器应用程序270交互。在一些实施例中，应用程序模块262包括电子消息传送应用程序或与电子消息传送应用程序交互。In some embodiments, memory 250 stores an operating system 252, a communications module 254, a graphical user interface module 256, a sensor processing module 258, a telephony module 260, and applications 262, as well as subsets or supersets thereof. Operating system 252 includes instructions for handling basic system services and for performing hardware-related tasks. Communications module 254 facilitates communication with one or more additional devices, one or more computers, and/or one or more servers. Graphical user interface module 256 facilitates graphical user interface processing. Sensor processing module 258 facilitates sensor-related processing and functions (e.g., processing voice input received using one or more microphones 228). Telephony module 260 facilitates telephony-related processes and functions. Application module 262 facilitates various user application functions, such as electronic messaging, web browsing, media processing, navigation, imaging, and/or other processes and functions. In some embodiments, application module 262 includes or interacts with a web browser application 270. In some embodiments, application module 262 includes or interacts with an electronic messaging application.

如上所述，在一些实施例中，存储器250还存储客户端侧数字助理指令(例如，在数字助理客户端模块264中)以及各种用户数据266(例如，用户专用的词汇数据、偏好数据，和/或其他数据诸如用户的电子通讯录、待办事项列表、购物清单等)以提供数字助理的客户端侧功能。As described above, in some embodiments, the memory 250 also stores client-side digital assistant instructions (e.g., in the digital assistant client module 264) and various user data 266 (e.g., user-specific vocabulary data, preference data, and/or other data such as the user's electronic address book, to-do list, shopping list, etc.) to provide the client-side functionality of the digital assistant.

在各种实施例中，数字助理客户端模块264能够通过用户设备104的各种用户界面(例如，I/O子系统244)接受语音输入、文本输入、触摸输入和/或手势输入。数字助理客户端模块264还能够提供音频、视觉和/或触觉形式的输出。例如，可将输出提供为语音、声音、警报、文本消息、菜单、图形、视频、动画、振动、和/或以上两者或更多者的组合。在操作期间，数字助理客户端模块264使用通信子系统224来与数字助理服务器(例如，数字助理服务器106，图1)进行通信。In various embodiments, the digital assistant client module 264 can accept voice input, text input, touch input, and/or gesture input through various user interfaces of the user device 104 (e.g., the I/O subsystem 244). The digital assistant client module 264 can also provide output in the form of audio, visual, and/or tactile. For example, the output can be provided as speech, sound, alarm, text message, menu, graphic, video, animation, vibration, and/or a combination of two or more of the above. During operation, the digital assistant client module 264 uses the communication subsystem 224 to communicate with a digital assistant server (e.g., the digital assistant server 106, FIG. 1).

在一些实施例中，数字助理客户端模块264利用各种传感器、子系统和外围设备来从用户设备104的周围环境采集附加信息以建立与用户输入相关联的上下文。在一些实施例中，数字助理客户端模块264将上下文信息或其子集与用户输入一起提供至数字助理服务器(例如，数字助理服务器106，图1)以帮助推断用户的意图。In some embodiments, the digital assistant client module 264 utilizes various sensors, subsystems, and peripherals to gather additional information from the surrounding environment of the user device 104 to establish a context associated with the user input. In some embodiments, the digital assistant client module 264 provides the context information, or a subset thereof, along with the user input to a digital assistant server (e.g., digital assistant server 106, FIG. 1 ) to help infer the user's intent.

在一些实施例中，可伴随用户输入的上下文信息包括传感器信息，例如照明、环境噪声、环境温度、周围环境的图像或视频等。在一些实施例中，上下文信息还包括设备的物理状态，例如，设备取向、设备位置、设备温度、功率电平、速度、加速度、运动模式、蜂窝信号强度等。在一些实施例中，还将与用户设备106的软件状态相关的信息，例如，用户设备104的运行过程、已安装程序、过去和当前的网络活动、后台服务、错误日志、资源使用等，作为与用户输入相关联的上下文信息而提供至数字助理服务器(例如，数字助理服务器106，图1)。In some embodiments, the contextual information that may accompany the user input includes sensor information, such as lighting, ambient noise, ambient temperature, images or videos of the surrounding environment, etc. In some embodiments, the contextual information also includes the physical state of the device, such as device orientation, device location, device temperature, power level, speed, acceleration, motion pattern, cellular signal strength, etc. In some embodiments, information related to the software state of the user device 106, such as the running processes of the user device 104, installed programs, past and current network activities, background services, error logs, resource usage, etc., is also provided to the digital assistant server (e.g., digital assistant server 106, FIG. 1 ) as contextual information associated with the user input.

在一些实施例中，DA客户端模块264响应于来自数字助理服务器的请求而选择性地提供存储在用户设备104上的信息(例如，用户数据266的至少一部分)。在一些实施例中，数字助理客户端模块264还当数字助理服务器106(图1)进行请求时经由自然语言对话或其他用户界面来引出来自用户的附加输入。数字助理客户端模块264将附加输入传送至数字助理服务器106以帮助数字助理服务器106进行意图推断和/或满足在用户请求中所表达的用户的意图。In some embodiments, the DA client module 264 selectively provides information stored on the user device 104 (e.g., at least a portion of the user data 266) in response to a request from the digital assistant server. In some embodiments, the digital assistant client module 264 also elicits additional input from the user via a natural language dialog or other user interface when requested by the digital assistant server 106 ( FIG. 1 ). The digital assistant client module 264 transmits the additional input to the digital assistant server 106 to assist the digital assistant server 106 in inferring intent and/or satisfying the user's intent expressed in the user request.

在一些实施例中，存储器250可包括附加指令或更少的指令。此外，用户设备104的各种功能可在硬件和/或在软件中实施，该硬件和/或软件包括在一个或多个信号处理集成电路和/或专用集成电路中，因此用户设备104不需要包括图2中示出的所有模块和应用程序。例如，在一些实施例中，用户设备104不包括触摸屏246。In some embodiments, memory 250 may include additional instructions or fewer instructions. Furthermore, various functions of user device 104 may be implemented in hardware and/or software, which may be included in one or more signal processing integrated circuits and/or application-specific integrated circuits, so user device 104 need not include all of the modules and applications shown in FIG2 . For example, in some embodiments, user device 104 does not include touch screen 246.

图3为根据一些实施例的示例性数字助理系统300的框图。在一些实施例中，数字助理系统300在独立式计算机系统上实施。在一些实施例中，数字助理系统300跨多个计算机而分布。在一些实施例中，数字助理的模块和功能中的一些被划分成服务器部分和客户端部分，其中客户端部分驻留在用户设备(例如，用户设备104)上并通过一个或多个网络与服务器部分(例如，服务器系统108)进行通信，例如如图1中所示。在一些实施例中，数字助理系统300为图1中所示的服务器系统108(和/或数字助理服务器106)的实施例。在一些实施例中，数字助理系统300在用户设备(例如，用户设备104，图1)中实施，从而消除了对客户端-服务器系统的需求。应当指出的是，数字助理系统300仅为数字助理系统的一个实例，且该数字助理系统300可具有比示出更多或更少的部件、可组合两个或更多个部件、或可具有部件的不同配置或布置。图3中所示的各种部件可在硬件、软件、固件(包括一个或多个信号处理集成电路和/或专用集成电路)，或它们的组合中实施。FIG3 is a block diagram of an exemplary digital assistant system 300 according to some embodiments. In some embodiments, the digital assistant system 300 is implemented on a stand-alone computer system. In some embodiments, the digital assistant system 300 is distributed across multiple computers. In some embodiments, some of the modules and functions of the digital assistant are divided into a server part and a client part, wherein the client part resides on a user device (e.g., user device 104) and communicates with the server part (e.g., server system 108) through one or more networks, such as shown in FIG1 . In some embodiments, the digital assistant system 300 is an embodiment of the server system 108 (and/or digital assistant server 106) shown in FIG1 . In some embodiments, the digital assistant system 300 is implemented in a user device (e.g., user device 104, FIG1 ), thereby eliminating the need for a client-server system. It should be noted that the digital assistant system 300 is only an example of a digital assistant system, and the digital assistant system 300 may have more or less components than shown, may combine two or more components, or may have different configurations or arrangements of components. The various components shown in FIG. 3 may be implemented in hardware, software, firmware (including one or more signal processing integrated circuits and/or application specific integrated circuits), or a combination thereof.

数字助理系统300包括存储器302、一个或多个处理器304、输入/输出(I/O)接口306，以及网络通信接口308。这些部件通过一条或多条通信总线或信号线310彼此通信。The digital assistant system 300 includes a memory 302, one or more processors 304, an input/output (I/O) interface 306, and a network communication interface 308. These components communicate with each other via one or more communication buses or signal lines 310.

在一些实施例中，存储器302包括非暂态计算机可读介质，诸如高速随机存取存储器和/或非易失性计算机可读存储介质(例如，一个或多个磁盘存储设备、一个或多个闪存存储器设备、一个或多个光学存储设备、和/或其他非易失性固态存储器设备)。In some embodiments, memory 302 includes non-transitory computer-readable media, such as high-speed random access memory and/or non-volatile computer-readable storage media (e.g., one or more disk storage devices, one or more flash memory devices, one or more optical storage devices, and/or other non-volatile solid-state memory devices).

I/O接口306将数字助理系统300的输入/输出设备316诸如显示器、键盘、触摸屏和麦克风耦接至用户界面模块322。I/O接口306与用户界面模块322结合，接收用户输入(例如，语音输入、键盘输入、触摸输入等)并相应地对这些输入进行处理。在一些实施例中，当数字助理在独立式用户设备上实施时，数字助理系统300包括相对于图2中的用户设备104所描述的部件和I/O接口及通信接口中的任一者(例如，一个或多个麦克风228)。在一些实施例中，数字助理系统300代表数字助理具体实施的服务器部分，并且通过驻留在用户设备(例如，图2中所示的用户设备104)上的客户端侧部分与用户进行交互。The I/O interface 306 couples the input/output devices 316 of the digital assistant system 300, such as a display, keyboard, touch screen, and microphone, to the user interface module 322. The I/O interface 306, in conjunction with the user interface module 322, receives user input (e.g., voice input, keyboard input, touch input, etc.) and processes the input accordingly. In some embodiments, when the digital assistant is implemented on a stand-alone user device, the digital assistant system 300 includes any of the components and I/O interfaces and communication interfaces described with respect to the user device 104 in Figure 2 (e.g., one or more microphones 228). In some embodiments, the digital assistant system 300 represents the server portion of a specific implementation of the digital assistant and interacts with the user through a client-side portion residing on a user device (e.g., the user device 104 shown in Figure 2).

在一些实施例中，网络通信接口308包括一个或多个有线通信端口312和/或无线传输和接收电路314。一个或多个有线通信端口经由一个或多个有线接口，例如以太网、通用串行总线(USB)、火线等来接收和发送通信信号。无线电路314通常从通信网络及其他通信设备接收RF信号和/或光学信号以及将RF信号和/或光学信号发送至通信网络及其他通信设备。无线通信可使用多种通信标准、协议和技术中的任一种，这些通信标准、协议和技术诸如GSM、EDGE、CDMA、TDMA、蓝牙、Wi-Fi、VoIP、Wi-MAX、或任何其他合适的通信协议。网络通信接口308实现数字助理系统300与网络以及其他设备之间的通信，该网络诸如互联网、内联网和/或无线网络诸如蜂窝电话网络、无线局域网(LAN)和/或城域网(MAN)。In some embodiments, the network communication interface 308 includes one or more wired communication ports 312 and/or wireless transmission and reception circuitry 314. The one or more wired communication ports receive and send communication signals via one or more wired interfaces, such as Ethernet, Universal Serial Bus (USB), FireWire, etc. The wireless circuitry 314 typically receives RF signals and/or optical signals from a communication network and other communication devices and sends RF signals and/or optical signals to a communication network and other communication devices. Wireless communication can use any of a variety of communication standards, protocols, and technologies, such as GSM, EDGE, CDMA, TDMA, Bluetooth, Wi-Fi, VoIP, Wi-MAX, or any other suitable communication protocol. The network communication interface 308 enables communication between the digital assistant system 300 and a network and other devices, such as the Internet, an intranet, and/or a wireless network such as a cellular telephone network, a wireless local area network (LAN), and/or a metropolitan area network (MAN).

在一些实施例中，存储器302的非暂态计算机可读存储介质存储程序、模块、指令和数据结构，这些程序、模块、指令和数据结构包括以下中的全部或子集：操作系统318、通信模块320、用户界面模块322、一个或多个应用程序324、以及数字助理模块326。一个或多个处理器304执行这些程序、模块和指令，并从数据结构读取数据或将数据写到数据结构。In some embodiments, the non-transitory computer-readable storage medium of memory 302 stores programs, modules, instructions, and data structures, including all or a subset of the following: operating system 318, communication module 320, user interface module 322, one or more application programs 324, and digital assistant module 326. One or more processors 304 execute these programs, modules, and instructions, and read data from or write data to the data structures.

操作系统318(例如，Darwin、RTXC、LINUX、UNIX、OS X、WINDOWS、或嵌入式操作系统诸如VxWorks)包括用于控制和管理一般系统任务(例如，存储器管理、存储设备控制、电力管理等)的各种软件部件和/或驱动器，并有助于各种硬件、固件与软件部件之间的通信。The operating system 318 (e.g., Darwin, RTXC, LINUX, UNIX, OS X, WINDOWS, or an embedded operating system such as VxWorks) includes various software components and/or drivers for controlling and managing general system tasks (e.g., memory management, storage device control, power management, etc.) and facilitates communication between various hardware, firmware, and software components.

通信模块320有助于通过通信网络接口308来进行数字助理系统300与其他设备之间的通信。例如，通信模块320可与图2中所示的设备104的通信模块254进行通信。通信模块320还包括用于处理被无线电路314和/或有线通信端口312所接收的数据的各种软件部件。The communication module 320 facilitates communication between the digital assistant system 300 and other devices via the communication network interface 308. For example, the communication module 320 can communicate with the communication module 254 of the device 104 shown in Figure 2. The communication module 320 also includes various software components for processing data received by the wireless circuitry 314 and/or the wired communication port 312.

在一些实施例中，用户界面模块322经由I/O接口306来从用户(例如，来自键盘、触摸屏、和/或麦克风)接收命令和/或输入，并将用户界面对象提供在显示器上。In some embodiments, user interface module 322 receives commands and/or input from a user (eg, from a keyboard, touch screen, and/or microphone) via I/O interface 306 and provides user interface objects on a display.

应用程序324包括被配置为由所述一个或多个处理器304执行的程序和/或模块。例如，如果数字助理系统在独立式用户设备上实施，则应用程序324可包括用户应用程序，诸如游戏、日历应用程序、导航应用程序、web浏览器应用程序、或邮件应用程序。如果数字助理系统300在服务器场上实施，则应用程序324可包括例如资源管理应用程序、诊断应用程序、或调度应用程序。Applications 324 include programs and/or modules configured to be executed by the one or more processors 304. For example, if the digital assistant system is implemented on a stand-alone user device, applications 324 may include user applications such as games, calendar applications, navigation applications, web browser applications, or email applications. If the digital assistant system 300 is implemented on a server farm, applications 324 may include, for example, resource management applications, diagnostic applications, or scheduling applications.

存储器302还存储数字助理模块(或数字助理的服务器部分)326。在一些实施例中，数字助理模块326包括以下子模块、或者它们的子集或超集：输入/输出处理模块328、语音文本转换(STT)处理模块330、自然语言处理模块332、对话流处理模块334、任务流处理模块336、以及服务处理模块338。这些处理模块中的每一个处理模块均具有对数字助理326的以下数据与模型中的一者或多者，或者它们的子集或超集的访问权限：知识本体360、词汇索引344、用户数据348、任务流模型354、以及服务模型356。The memory 302 also stores a digital assistant module (or the server portion of the digital assistant) 326. In some embodiments, the digital assistant module 326 includes the following submodules, or subsets or supersets thereof: an input/output processing module 328, a speech-to-text (STT) processing module 330, a natural language processing module 332, a dialog flow processing module 334, a task flow processing module 336, and a service processing module 338. Each of these processing modules has access to one or more of the following data and models of the digital assistant 326, or subsets or supersets thereof: a knowledge ontology 360, a vocabulary index 344, user data 348, a task flow model 354, and a service model 356.

在一些实施例中，使用处理模块(例如，输入/输出处理模块328、STT处理模块330、自然语言处理模块332、对话流处理模块334、任务流处理模块336、和/或服务处理模块338)、数据以及在数字助理模块326中实施的模型，数字助理系统300执行以下操作中的至少一些操作：识别在从用户接收的自然语言输入中表达的用户意图；主动引出并获得推断用户意图所需的信息(例如，通过消除词、姓名、意图的歧义等)；确定用于满足推断出的意图的任务流；以及执行该任务流以满足推断出的意图。在一些实施例中，当出于各种原因而未向或不能向用户提供令人满意的响应时，数字助理还采取适当的行动。In some embodiments, using processing modules (e.g., input/output processing module 328, STT processing module 330, natural language processing module 332, dialog flow processing module 334, task flow processing module 336, and/or service processing module 338), data, and models implemented in digital assistant module 326, digital assistant system 300 performs at least some of the following operations: identifying user intent expressed in natural language input received from the user; proactively eliciting and obtaining information needed to infer user intent (e.g., by disambiguating words, names, intents, etc.); determining a task flow for satisfying the inferred intent; and executing the task flow to satisfy the inferred intent. In some embodiments, the digital assistant also takes appropriate action when a satisfactory response is not or cannot be provided to the user for various reasons.

在一些实施例中，I/O处理模块328通过I/O设备316与用户进行交互，或通过网络通信接口308与用户设备(例如，图1中的用户设备104)进行交互以获得用户输入(例如，语音输入)并提供对用户输入的响应。I/O处理模块328随同接收到用户输入一起或在接收到用户输入之后不久任选地获得与来自用户设备的用户输入相关联的上下文信息。上下文信息包括用户专用的数据、词汇、和/或与用户输入有关的偏好。在一些实施例中，上下文信息还包括当接收到用户请求时所述设备(例如，图1中的用户设备104)的软件和硬件状态，和/或与当接收到用户请求时用户的周围环境相关的信息。在一些实施例中，I/O处理模块328还向用户发送有关用户请求的跟进问题，并从用户接收回答。在一些实施例中，当用户请求被I/O处理模块328接收到且用户请求包含语音输入时，I/O处理模块328将语音输入转发至语音文本转换(STT)处理模块330以用于语音文本转换。In some embodiments, the I/O processing module 328 interacts with the user via the I/O device 316 or with the user device (e.g., the user device 104 in FIG. 1 ) via the network communication interface 308 to obtain user input (e.g., voice input) and provide a response to the user input. The I/O processing module 328 optionally obtains contextual information associated with the user input from the user device along with or shortly after receiving the user input. The contextual information includes user-specific data, vocabulary, and/or preferences related to the user input. In some embodiments, the contextual information also includes the software and hardware status of the device (e.g., the user device 104 in FIG. 1 ) at the time the user request was received, and/or information related to the user's surrounding environment at the time the user request was received. In some embodiments, the I/O processing module 328 also sends follow-up questions about the user request to the user and receives answers from the user. In some embodiments, when a user request is received by the I/O processing module 328 and the user request includes voice input, the I/O processing module 328 forwards the voice input to the speech-to-text (STT) processing module 330 for speech-to-text conversion.

在一些实施例中，语音文本转换处理模块330通过I/O处理模块328来接收语音输入(例如，在语音记录中捕捉的用户话语)。在一些实施例中，语音文本转换处理模块330使用各种声音和语言模型来将语音输入识别为音素的序列，并最终将其识别为以一种或多种语言书写的词或符号的序列。语音文本转换处理模块330使用任何合适的语音识别技术、声音模型以及语言模型，诸如隐马尔可夫(Hidden Markov)模型、基于动态时间规整(DTW)的语音识别以及其他统计和/或分析技术来加以实施。在一些实施例中，语音文本转换处理可至少部分地由第三方服务执行或在用户的设备上执行。一旦语音文本转换处理模块330获得语音文本转换处理的结果(例如，词或符号的序列)，其便将结果传送至自然语言处理模块332以用于意图推断。In some embodiments, the speech-to-text conversion processing module 330 receives speech input (e.g., user speech captured in a voice recording) through the I/O processing module 328. In some embodiments, the speech-to-text conversion processing module 330 uses various sound and language models to recognize the speech input as a sequence of phonemes, and ultimately recognizes it as a sequence of words or symbols written in one or more languages. The speech-to-text conversion processing module 330 uses any suitable speech recognition technology, sound model, and language model, such as a Hidden Markov model, dynamic time warping (DTW)-based speech recognition, and other statistical and/or analytical techniques to implement it. In some embodiments, the speech-to-text conversion processing may be performed at least in part by a third-party service or on the user's device. Once the speech-to-text conversion processing module 330 obtains the results of the speech-to-text conversion processing (e.g., a sequence of words or symbols), it transmits the results to the natural language processing module 332 for intent inference.

数字助理326的自然语言处理模块332(“自然语言处理器”)取得由语音文本转换处理模块330生成的词或符号的序列(“符号序列”)，并尝试将该符号序列与由数字助理所识别的一个或多个“可执行意图”相关联。如本文所用，“可执行意图”表示可由数字助理326和/或数字助理系统300执行并且具有在任务流模型354中实施的相关联的任务流的任务。相关联的任务流是数字助理系统300为了执行任务而采取的一系列经编程的动作和步骤。数字助理系统的能力范围取决于已在任务流模型354中实施并存储的任务流的数量和种类，或换言之，取决于数字助理系统300所识别的“可执行意图”的数量和种类。然而，数字助理系统300的有效性还取决于数字助理系统从以自然语言表达的用户请求中推断出正确的“一种或多种可执行意图”的能力。The natural language processing module 332 ("natural language processor") of the digital assistant 326 takes the sequence of words or symbols ("symbol sequence") generated by the speech-to-text conversion processing module 330 and attempts to associate the symbol sequence with one or more "executable intents" recognized by the digital assistant. As used herein, an "executable intent" means a task that can be performed by the digital assistant 326 and/or the digital assistant system 300 and has an associated task flow implemented in the task flow model 354. The associated task flow is a series of programmed actions and steps taken by the digital assistant system 300 to perform the task. The scope of the capabilities of the digital assistant system depends on the number and variety of task flows that have been implemented and stored in the task flow model 354, or in other words, on the number and variety of "executable intents" recognized by the digital assistant system 300. However, the effectiveness of the digital assistant system 300 also depends on the ability of the digital assistant system to infer the correct "one or more executable intents" from user requests expressed in natural language.

在一些实施例中，除从语音文本转换处理模块330获得的词或符号的序列之外，自然语言处理器332还接收与用户请求相关联的上下文信息(例如，来自I/O处理模块328)。自然语言处理器332任选地使用上下文信息来明确、补充和/或进一步定义包含在从语音文本转换处理模块330接收的符号序列中的信息。上下文信息包括例如用户偏好、用户设备的硬件和/或软件状态，在用户请求之前、期间或之后不久收集的传感器信息，数字助理与用户之间的先前交互(例如，对话)，等等。In some embodiments, the natural language processor 332 receives contextual information associated with the user request (e.g., from the I/O processing module 328) in addition to the sequence of words or symbols obtained from the speech-to-text processing module 330. The natural language processor 332 optionally uses the contextual information to clarify, supplement, and/or further define the information contained in the sequence of symbols received from the speech-to-text processing module 330. The contextual information includes, for example, user preferences, the hardware and/or software state of the user's device, sensor information collected before, during, or shortly after the user's request, previous interactions (e.g., conversations) between the digital assistant and the user, and the like.

在一些实施例中，自然语言处理基于知识本体360。知识本体360是一种包含多个节点的层级结构，每个节点要么代表“可执行意图”、要么代表与“可执行意图”中的一者或多者有关的一种“属性”或其他“属性”。如上所指出的，“可执行意图”代表数字助理系统300能够执行的任务(例如，“可执行”或者可对其采取行动的任务)。“属性”代表与可执行意图或另一属性的子方面相关联的参数。知识本体360中可执行意图节点与属性节点之间的连接定义由属性节点所代表的参数如何从属于由可执行意图节点所代表的任务。In some embodiments, natural language processing is based on a knowledge ontology 360. The knowledge ontology 360 is a hierarchical structure comprising a plurality of nodes, each of which represents either an "executable intent" or an "attribute" or other "property" related to one or more of the "executable intents." As noted above, an "executable intent" represents a task that the digital assistant system 300 can perform (e.g., a task that is "executable" or actionable). An "attribute" represents a parameter associated with a sub-aspect of an executable intent or another attribute. The connections between the executable intent nodes and the attribute nodes in the knowledge ontology 360 define how the parameters represented by the attribute nodes are subordinate to the tasks represented by the executable intent nodes.

在一些实施例中，知识本体360由可执行意图节点和属性节点组成。在知识本体360内，每个可执行意图节点直接连接至或通过一个或多个中间属性节点连接至一个或多个属性节点。类似地，每个属性节点直接连接至或通过一个或多个中间属性节点连接至一个或多个可执行意图节点。In some embodiments, the knowledge ontology 360 is composed of executable intent nodes and attribute nodes. Within the knowledge ontology 360, each executable intent node is directly connected to or connected to one or more attribute nodes through one or more intermediate attribute nodes. Similarly, each attribute node is directly connected to or connected to one or more executable intent nodes through one or more intermediate attribute nodes.

可执行的意图节点连同其所链接的概念节点一起可被描述为“域”。在本讨论中，每个域与相应的可执行的意图相关联，并涉及与特定可执行的意图相关联的一组节点(以及其间的关系)。在一些实施例中，知识本体360由多个域组成。每个域可与一个或多个其他域共享一个或多个属性节点。Executable intent nodes, along with the concept nodes to which they are linked, can be described as "domains." In this discussion, each domain is associated with a corresponding executable intent and refers to a set of nodes (and the relationships between them) associated with a particular executable intent. In some embodiments, the knowledge ontology 360 is composed of multiple domains. Each domain can share one or more attribute nodes with one or more other domains.

在一些实施例中，知识本体360包括数字助理能够理解并对其起作用的所有域(因此可执行的意图)。在一些实施例中，知识本体360可诸如通过添加或移除域或节点，或者通过修改知识本体360内的节点之间的关系来进行修改。In some embodiments, ontology 360 includes all domains that the digital assistant can understand and act upon (thus, executable intents). In some embodiments, ontology 360 can be modified, such as by adding or removing domains or nodes, or by modifying the relationships between nodes within ontology 360.

在一些实施例中，可将与多个相关的可执行意图相关联的节点群集在知识本体360中的“超级域”下。例如，“旅行”超级域可包括与旅行相关的属性节点和可执行的意图节点的群集。与旅行相关的可执行的意图节点可包括“机票预订”、“酒店预订”、“汽车租赁”、“获取路线”、“寻找兴趣点”，等等。同一超级域(例如，“旅行”超级域)下的可执行的意图节点可具有多个共用的属性节点。例如，针对“机票预订”、“酒店预订”、“汽车租赁”、“获取路线”、“寻找兴趣点”的可执行的意图节点可共享属性节点“起始位置”、“目的地”、“出发日期/时间”、“到达日期/时间”及“同行人数”中的一个或多个。In some embodiments, nodes associated with multiple related executable intents may be clustered under a "superdomain" in the knowledge ontology 360. For example, the "travel" superdomain may include a cluster of attribute nodes and executable intent nodes related to travel. The executable intent nodes related to travel may include "ticket booking", "hotel booking", "car rental", "get directions", "find points of interest", and so on. The executable intent nodes under the same superdomain (e.g., the "travel" superdomain) may have multiple shared attribute nodes. For example, the executable intent nodes for "ticket booking", "hotel booking", "car rental", "get directions", "find points of interest" may share one or more of the attribute nodes "starting location", "destination", "departure date/time", "arrival date/time" and "number of people traveling together".

在一些实施例中，知识本体360中的每个节点和与由节点所代表的属性或可执行意图有关的一组词和/或短语相关联。与每个节点相关联的相应组的词和/或短语是与节点相关联的所谓的“词汇”。可将与每个节点相关联的相应组的词和/或短语存储在与由节点所代表的属性或可执行意图相关联的词汇索引344中。例如，与“餐厅”的属性的节点相关联的词汇可包括词诸如“食物”、“饮品”、“菜系”、“饥饿”、“吃”、“比萨”、“快餐”、“一餐”等。又如，与“发起电话呼叫”的可执行意图的节点相关联的词汇可包括词和短语诸如“呼叫”、“打电话”、“拨打”、“与......通电话”、“呼叫该号码”、“打电话给”等。词汇索引344任选地包括不同语言的词和短语。In some embodiments, each node in the knowledge ontology 360 is associated with a set of words and/or phrases related to the attribute or executable intent represented by the node. The corresponding set of words and/or phrases associated with each node is the so-called "vocabulary" associated with the node. The corresponding set of words and/or phrases associated with each node can be stored in a vocabulary index 344 associated with the attribute or executable intent represented by the node. For example, the vocabulary associated with the node of the attribute of "restaurant" may include words such as "food", "drinks", "cuisine", "hunger", "eat", "pizza", "fast food", "meal", etc. For another example, the vocabulary associated with the node of the executable intent of "initiate a phone call" may include words and phrases such as "call", "make a phone call", "dial", "talk to...", "call the number", "call", etc. The vocabulary index 344 optionally includes words and phrases in different languages.

在一些实施例中，自然语言处理器332从语音文本转换处理模块330接收符号序列(例如，文本串)，并确定符号序列中的词牵涉哪些节点。在一些实施例中，如果发现符号序列中的词或短语与知识本体360中的一个或多个节点相关联(经由词汇索引344)，则所述词或短语将“触发”或“激活”这些节点。当多个节点被“触发”时，基于已激活节点的数量和/或相对重要性，自然语言处理器332将选择可执行意图中的一个作为用户意图让数字助理执行的任务(或任务类型)。在一些实施例中，选择具有最多“已触发”节点的域。在一些实施例中，选择具有最高置信度(例如，基于其各个已触发节点的相对重要性)的域。在一些实施例中，基于已触发节点的数量和重要性的组合来选择域。在一些实施例中，在选择节点的过程中还考虑附加因素，诸如数字助理系统300先前是否已正确解释来自用户的类似请求。In some embodiments, the natural language processor 332 receives a symbol sequence (e.g., a text string) from the speech-to-text conversion processing module 330 and determines which nodes the words in the symbol sequence involve. In some embodiments, if a word or phrase in the symbol sequence is found to be associated with one or more nodes in the knowledge ontology 360 (via the vocabulary index 344), the word or phrase will "trigger" or "activate" these nodes. When multiple nodes are "triggered," based on the number and/or relative importance of the activated nodes, the natural language processor 332 will select one of the executable intentions as the task (or task type) that the user intends the digital assistant to perform. In some embodiments, the domain with the most "triggered" nodes is selected. In some embodiments, the domain with the highest confidence (e.g., based on the relative importance of its various triggered nodes) is selected. In some embodiments, the domain is selected based on a combination of the number and importance of the triggered nodes. In some embodiments, additional factors are also considered in the process of selecting the node, such as whether the digital assistant system 300 has previously correctly interpreted a similar request from the user.

在一些实施例中，数字助理系统300还将特定实体的名称存储在词汇索引344中，使得当在用户请求中检测到这些名称中的一个名称时，自然语言处理器332将能够识别该名称涉及知识主体中的属性或子属性的特定实例。在一些实施例中，特定实体的名称是企业、餐厅、人、电影等的名称。在一些实施例中，数字助理系统300可从其他数据源中搜索并识别特定实体名称，所述其他数据源诸如用户的通讯录、电影数据库、音乐家数据库和/或餐厅数据库。在一些实施例中，当自然语言处理器332识别出符号序列中的词是特定实体的名称(诸如用户通讯录中的名称)时，在于用户请求的知识主体内选择可执行意图的过程中，为该词赋予附加的重要性。In some embodiments, the digital assistant system 300 also stores the names of specific entities in the vocabulary index 344 so that when one of these names is detected in a user request, the natural language processor 332 will be able to recognize that the name refers to a specific instance of an attribute or sub-attribute in the knowledge body. In some embodiments, the name of the specific entity is the name of a business, a restaurant, a person, a movie, etc. In some embodiments, the digital assistant system 300 can search for and identify specific entity names from other data sources, such as a user's address book, a movie database, a musician database, and/or a restaurant database. In some embodiments, when the natural language processor 332 recognizes that a word in the symbol sequence is the name of a specific entity (such as a name in the user's address book), the word is given additional importance in the process of selecting an executable intent within the knowledge body requested by the user.

例如，当从用户请求中识别出词“Santo先生”且当词汇索引344中发现姓“Santo”是用户联系人列表中的联系人之一时，则用户请求可能对应于“发送消息”或“发起电话呼叫”域。又如，当在用户请求中发现的词“ABC咖啡馆”且当在词汇索引344中发现的词语“ABC咖啡馆”是用户所在城市中的特定餐厅的名称时，则用户请求可能对应于“餐厅预订”域。For example, when the word "Mr. Santo" is identified from the user request and when the last name "Santo" is found in the vocabulary index 344 as one of the contacts in the user's contact list, the user request may correspond to the "send message" or "make phone call" domain. For another example, when the word "ABC Cafe" is found in the user request and when the term "ABC Cafe" found in the vocabulary index 344 is the name of a specific restaurant in the user's city, the user request may correspond to the "restaurant reservation" domain.

用户数据348包括用户专用的信息，诸如用户专用的词汇、用户偏好、用户地址、用户的默认语言和第二语言、用户的联系人列表，以及每位用户的其他短期或长期信息。自然语言处理器332可使用用户专用的信息来补充包含在用户输入中的信息以进一步限定用户意图。例如，针对用户请求“邀请我的朋友参加我的生日派对”，自然语言处理器332能够访问用户数据348以确定“朋友”是哪些人以及“生日派对”将于何时何地举行，而不需要用户在其请求中明确地提供此类信息。User data 348 includes user-specific information such as a user-specific vocabulary, user preferences, user address, the user's default and second languages, the user's contact list, and other short-term or long-term information about each user. Natural language processor 332 can use this user-specific information to supplement the information included in the user input to further define the user's intent. For example, in response to a user request to "invite my friends to my birthday party," natural language processor 332 can access user data 348 to determine who "friends" are and when and where the "birthday party" will be held, without requiring the user to explicitly provide such information in their request.

一旦自然语言处理器332基于用户请求识别出可执行意图(或域)，自然语言处理器332便生成结构化查询以表示所识别的可执行意图。在一些实施例中，结构化查询包括针对可执行意图的域内的一个或多个节点的参数，并且所述参数中的至少一些参数填充有在用户请求中指定的特定信息和要求。例如，用户可以说：“通知寿司店预定晚上7点的座位。”在该情况下，自然语言处理器332能够基于用户输入将可执行意图正确地识别为“餐厅预订”。根据知识主体，“餐厅预订”域的结构化查询可包括参数诸如{菜系}、{时间}、{日期}、{同行人数}等。基于包含在用户话语中的信息，自然语言处理器332可针对餐厅预订域生成部分结构化的查询，其中部分结构化的查询包括参数{菜系＝“寿司类”}和{时间＝“下午7点”}。然而，在该实例中，用户话语包含不足以完成与域相关联的结构化查询的信息。因此，其他必要参数诸如{同行人数}和{日期}未基于当前可用的信息在结构化查询中指定。在一些实施例中，自然语言处理器332利用所接收的上下文信息来填充结构化查询中的一些参数。例如，如果用户请求“我附近”的寿司餐厅，则自然语言处理器332可利用来自用户设备104的GPS坐标来填充结构化查询中的{位置参数}。Once the natural language processor 332 identifies an executable intent (or domain) based on the user request, it generates a structured query representing the identified executable intent. In some embodiments, the structured query includes parameters for one or more nodes within the executable intent's domain, and at least some of the parameters are populated with specific information and requirements specified in the user request. For example, a user may say, "Please tell the sushi restaurant to make a reservation for 7 PM." In this case, the natural language processor 332 is able to correctly identify the executable intent as "restaurant reservation" based on the user input. According to the knowledge subject, a structured query for the "restaurant reservation" domain may include parameters such as {cuisine}, {time}, {date}, {number of people in the group}, etc. Based on the information contained in the user utterance, the natural language processor 332 may generate a partially structured query for the restaurant reservation domain, where the partially structured query includes the parameters {cuisine = "sushi"} and {time = "7 PM"}. However, in this example, the user utterance contains insufficient information to complete the structured query associated with the domain. Therefore, other necessary parameters such as {number of people in the group} and {date} are not specified in the structured query based on the currently available information. In some embodiments, the natural language processor 332 utilizes the received context information to populate some parameters in the structured query. For example, if the user requests sushi restaurants “near me,” the natural language processor 332 may utilize the GPS coordinates from the user device 104 to populate the {location parameter} in the structured query.

在一些实施例中，自然语言处理器332将结构化查询(包括任何已完成的参数)传送至任务流处理模块336(“任务流处理器”)。任务流处理器336被配置为执行以下中的一者或多者：从自然语言处理器332接收结构化查询，完成结构化查询，以及执行“完成”用户的最终请求所需的动作。在一些实施例中，完成这些任务所必需的各种过程在任务流模型354中提供。在一些实施例中，任务流模型354包括用于获取来自用户的附加信息的过程，以及用于执行与可执行意图相关联的动作的任务流。In some embodiments, the natural language processor 332 passes the structured query (including any completed parameters) to the task flow processing module 336 ("task flow processor"). The task flow processor 336 is configured to perform one or more of the following: receive the structured query from the natural language processor 332, complete the structured query, and perform the actions required to "complete" the user's final request. In some embodiments, the various processes necessary to complete these tasks are provided in the task flow model 354. In some embodiments, the task flow model 354 includes processes for obtaining additional information from the user, as well as a task flow for performing actions associated with an executable intent.

如上所述，为了完成结构化查询，任务流处理器336可能需要发起与用户的附加对话以便获得附加信息和/或弄清可能有模糊的话语。当此类交互有必要时，任务流处理器336调用对话处理模块334以进行与用户的对话。在一些实施例中，对话处理模块334确定如何(和/或何时)向用户询问附加信息，并接收和处理用户响应。在一些实施例中，通过I/O处理模块328将问题提供给用户并从用户接收回答。例如，对话处理模块334经由音频和/或视频输出向用户呈现对话输出，并接收经由口头或物理(例如，触摸手势)响应的来自用户的输入。继续上文的实例，当任务流处理器336调用对话处理模块334以针对与域“餐厅预订”相关联的结构化查询来确定“同行人数”和“日期”信息时，对话流处理器334生成问题诸如“共有多少人用餐？”和“具体哪天用餐？”以传送至用户。一旦从用户接收到回答，对话处理模块334便用缺失是信息填充结构化查询，或将信息传送至任务流处理器336以完成结构化查询中的缺失信息。As described above, to complete a structured query, the task flow processor 336 may need to initiate additional conversations with the user to obtain additional information and/or clarify potentially ambiguous utterances. When such interaction is necessary, the task flow processor 336 invokes the conversation processing module 334 to conduct the conversation with the user. In some embodiments, the conversation processing module 334 determines how (and/or when) to ask the user for additional information and receives and processes user responses. In some embodiments, questions are presented to the user and responses are received from the user via the I/O processing module 328. For example, the conversation processing module 334 presents conversation output to the user via audio and/or video output and receives input from the user via verbal or physical (e.g., touch gesture) responses. Continuing with the above example, when the task flow processor 336 invokes the conversation processing module 334 to determine "number of people in the group" and "date" information for a structured query associated with the domain "restaurant reservation," the conversation flow processor 334 generates questions such as "How many people are dining together?" and "What specific date are you dining?" for transmission to the user. Once a response is received from the user, the dialog processing module 334 fills in the structured query with the missing information or passes the information to the task flow processor 336 to complete the missing information in the structured query.

在一些情况下，任务流处理器336可能接收到具有一个或多个模糊属性的结构化查询。例如，针对“发送消息”域的结构化查询可能指示预期接收人为“Bob”，并且用户可具有多个名为“Bob”的联系人。任务流处理器336将请求对话处理器334弄清结构化查询的这个属性。继而，对话处理器334可询问用户“哪个Bob？”，并显示(或读出)名为“Bob”的联系人的列表，用户可从该列表中进行选择。In some cases, the task flow processor 336 may receive a structured query with one or more ambiguous attributes. For example, a structured query for the "Send Message" domain may indicate that the intended recipient is "Bob," and the user may have multiple contacts named "Bob." The task flow processor 336 will request that the conversation processor 334 clarify this attribute of the structured query. The conversation processor 334 may then ask the user "Which Bob?" and display (or read out) a list of contacts named "Bob," from which the user can select.

一旦任务流处理器336已针对可执行意图完成结构化查询，任务流处理器336就继续执行与可执行意图相关联的最终任务。因此，任务流处理器336根据包含在结构化查询中的特定参数来执行任务流模型中的步骤和指令。例如，针对可执行意图“餐厅预订”的任务流模型可包括用于联系餐厅并实际上请求在特定时间针对特定同行人数的预订的步骤和指令。例如，通过使用结构化查询诸如：{餐厅预订，餐厅＝ABC咖啡馆，日期＝2012/3/12，时间＝下午7点，同行人数＝5人}，任务流处理器336可执行以下步骤：(1)登录ABC咖啡馆的服务器，或者被配置为接受针对多个餐厅诸如ABC咖啡馆的预订的餐厅预订系统，(2)在网站上的表格中输入日期、时间和同行人数信息，(3)提交表格，以及(4)在用户日历中针对该预订制作日历条目。Once the task flow processor 336 has completed the structured query for the executable intent, the task flow processor 336 proceeds to execute the final task associated with the executable intent. Thus, the task flow processor 336 executes the steps and instructions in the task flow model based on the specific parameters contained in the structured query. For example, the task flow model for the executable intent "restaurant reservation" may include steps and instructions for contacting a restaurant and actually requesting a reservation for a specific number of people at a specific time. For example, by using a structured query such as: {restaurant reservation, restaurant = ABC Cafe, date = 2012/3/12, time = 7pm, number of people = 5}, the task flow processor 336 may perform the following steps: (1) log in to the server of ABC Cafe, or a restaurant reservation system configured to accept reservations for multiple restaurants such as ABC Cafe, (2) enter the date, time, and number of people in a form on the website, (3) submit the form, and (4) make a calendar entry for the reservation in the user's calendar.

在一些实施例中，任务流处理器336在服务处理模块338(“服务处理器”)的辅助下完成用户输入中所请求的任务或者提供用户输入中所请求的信息性回答。例如，服务处理器338可代替任务流处理器336发起电话呼叫、设置日历条目、调用地图搜索、调用用户设备上安装的其他应用程序或与所述其他应用程序进行交互，以及调用第三方服务(例如，餐厅预订门户网站、社交网站、银行门户网站等)或与第三方服务进行交互。在一些实施例中，每项服务所需的协议和应用编程接口(API)可通过服务模型356间的相应服务模型来指定。服务处理器338针对服务访问适当的服务模型并依据该服务模型根据该服务所需的协议和API来生成针对该服务的请求。In some embodiments, the task flow processor 336, with the assistance of the service processing module 338 ("service processor"), completes the task requested in the user input or provides the informational response requested in the user input. For example, the service processor 338 can initiate a phone call, set a calendar entry, invoke a map search, invoke or interact with other applications installed on the user's device, and invoke or interact with third-party services (e.g., restaurant reservation portals, social networking sites, banking portals, etc.) on behalf of the task flow processor 336. In some embodiments, the protocols and application programming interfaces (APIs) required for each service can be specified by corresponding service models among the service models 356. The service processor 338 accesses the appropriate service model for the service and, based on the service model, generates a request for the service using the protocols and APIs required for the service.

例如，如果餐厅已启用在线预订服务，则餐厅可提交一服务模型，该服务模型指定进行预订的必要参数以及用于将必要参数的值传送至在线预订服务的API。当被任务流处理器336请求时，服务处理器338可使用存储在服务模型356中的web地址来建立与在线预订服务的网络连接，并将预订的必要参数(例如，时间、日期、同行人数)以根据在线预订服务的API的格式发送至在线预订接口。For example, if a restaurant has enabled an online reservation service, the restaurant may submit a service model that specifies the necessary parameters for making a reservation and an API for transmitting the values of the necessary parameters to the online reservation service. When requested by the task flow processor 336, the service processor 338 may use the web address stored in the service model 356 to establish a network connection with the online reservation service and send the necessary parameters for the reservation (e.g., time, date, number of people in the group) to the online reservation interface in a format according to the API of the online reservation service.

在一些实施例中，自然语言处理器332、对话处理器334以及任务流处理器336共同且反复地使用以推断并限定用户的意图、获得信息以进一步明确并提炼用户意图、并最终生成响应(例如，将输出提供至用户，或完成任务)以满足用户的意图。In some embodiments, the natural language processor 332, the dialog processor 334, and the task flow processor 336 are used together and iteratively to infer and define the user's intent, obtain information to further clarify and refine the user's intent, and ultimately generate a response (e.g., providing output to the user, or completing a task) to satisfy the user's intent.

在一些实施例中，在已执行满足用户请求所需的所有任务之后，数字助理326制定确认响应，并通过I/O处理模块328将该响应发送回用户。如果用户请求寻求信息性回答，则确认响应向用户呈现所请求的信息。在一些实施例中，数字助理还请求用户来指示用户是否对由数字助理326所产生的响应满意。In some embodiments, after performing all tasks necessary to satisfy the user's request, the digital assistant 326 formulates a confirmation response and sends the response back to the user via the I/O processing module 328. If the user's request sought an informational answer, the confirmation response presents the requested information to the user. In some embodiments, the digital assistant also requests the user to indicate whether the user is satisfied with the response generated by the digital assistant 326.

图4A-4N示出根据一些实施例的在电子设备(例如，相对于图2所描述的用户设备104，或相对于图3所描述的独立式设备300)上所显示的示例性用户界面。Figures 4A-4N illustrate exemplary user interfaces displayed on an electronic device (eg, user device 104 described with respect to Figure 2, or standalone device 300 described with respect to Figure 3) in accordance with some embodiments.

图4A-4J示出根据一些实施例的与导航通过和浏览一个或多个文档相关联的示例性用户界面。4A-4J illustrate exemplary user interfaces associated with navigating through and browsing one or more documents in accordance with some embodiments.

图4A示出包含第一文档的一部分的示例性用户界面。在一些实施例中，第一文档包括例如用下划线指示的多个链接(例如，“新国家”、“内战”、“一个大战场”和“战场的一角”)。Fig. 4A illustrates an exemplary user interface that includes a portion of a first document. In some embodiments, the first document includes, for example, a plurality of links indicated by underlining (e.g., "New Country," "Civil War," "A Large Battlefield," and "A Corner of the Battlefield").

图4A还示出，在一些实施例中，在输出对第一文档的语音阅读之前，电子设备输出指示第一文档上的当前位置的可听信息和/或包括有关第一文档的信息的可听信息。例如，电子设备可输出音频信号(例如，“哔哔声”)或语音指示，该语音指示指示第一文档的将要阅读的部分为第一文档的开头或者电子设备即将输出语音阅读。此外，或代替之，电子设备可输出有关第一文档的信息，诸如第一文档的标题、作者、日期和来源。Figure 4A also shows that in some embodiments, before outputting the voice reading of the first document, the electronic device outputs audible information indicating the current position on the first document and/or audible information including information about the first document. For example, the electronic device may output an audio signal (e.g., a "beep") or a voice indication indicating that the portion of the first document to be read is the beginning of the first document or that the electronic device is about to output the voice reading. In addition, or instead, the electronic device may output information about the first document, such as the title, author, date, and source of the first document.

图4B示出了电子设备输出对第一文档的一部分的语音阅读，该语音阅读对应于文本“八十七年前，我们的先辈在这个大陆上建立了”。FIG4B shows that the electronic device outputs a voice reading of a portion of the first document, the voice reading corresponding to the text “Eighty-seven years ago, our ancestors established on this continent.” ...

图4C示出了电子设备输出识别与文本“新国家”相关联的链接的可听信息。例如，电子设备输出词“链接”的语音信号和/或类似于“哔哔声”的音频信号。在一些实施例中，术语“链接”是指包含所链接的文档的位置的元数据。在一些实施例中，术语“链接”包括超文本锚点。语音信号和音频信号可在输出对与链接相关联的文本的语音阅读之前或之后输出。在一些实施例中，电子设备还输出有关链接的信息，诸如所链接的文档的标题、作者、日期和来源。4C shows that the electronic device outputs audible information identifying a link associated with the text "New Country". For example, the electronic device outputs a voice signal of the word "link" and/or an audio signal similar to a "beep". In some embodiments, the term "link" refers to metadata containing the location of the linked document. In some embodiments, the term "link" includes a hypertext anchor. The voice signal and audio signal can be output before or after outputting the voice reading of the text associated with the link. In some embodiments, the electronic device also outputs information about the link, such as the title, author, date, and source of the linked document.

在一些实施例中，电子设备在输出语音信号、音频信号、有关链接的信息和/或对与链接相关联的文本的语音阅读之后暂停预定义的时段(例如，一秒、两秒、三秒、四秒或五秒等)。In some embodiments, the electronic device pauses for a predefined period (e.g., one, two, three, four, or five seconds, etc.) after outputting a voice signal, an audio signal, information about a link, and/or a voice reading of text associated with a link.

图4D示出了电子设备输出对第一文档的继与所述链接相关联的文本之后的一部分的语音阅读。FIG4D shows that the electronic device outputs a voice reading of a portion of the first document following the text associated with the link.

在一些实施例中，电子设备从用户接收语音命令。在一些实施例中，在电子设备输出语音信号、音频信号、有关链接的信息和/或对与链接相关联的文本的语音阅读的同时；在暂停期间；或者在电子设备输出对所述文档的继与所述链接相关联的文本之后的一部分的语音阅读的同时，该电子设备从用户接收语音命令。在一些实施例中，电子设备在输出有关不同于所述链接的第二链接的语音信号和/或音频信号之前从用户接收语音命令。In some embodiments, the electronic device receives a voice command from the user. In some embodiments, the electronic device receives the voice command from the user while the electronic device outputs a voice signal, an audio signal, information about a link, and/or a spoken reading of text associated with the link; during a pause; or while the electronic device outputs a spoken reading of a portion of the document subsequent to the text associated with the link. In some embodiments, the electronic device receives the voice command from the user before outputting a voice signal and/or an audio signal about a second link different from the link.

在一些实施例中，语音命令是对导航至所链接的文档的请求(例如，“跟随链接”)。在一些实施例中，电子设备存储文档的最后一个语音输出部分。在一些实施例中，电子设备存储有关链接的信息(例如，链接的位置和/或标识)作为文档的最后一个语音输出部分的替代。In some embodiments, the voice command is a request to navigate to the linked document (e.g., "follow link"). In some embodiments, the electronic device stores the last voice output portion of the document. In some embodiments, the electronic device stores information about the link (e.g., the location and/or identity of the link) as a replacement for the last voice output portion of the document.

图4E示出了电子设备显示所链接的文档的一部分并输出对所链接的文档的一部分的语音阅读。在一些实施例中，在输出对链接的文档的语音阅读之前，电子设备输出指示文档上的当前位置的可听信息和/或包括有关文档的信息的可听信息。例如，电子设备可输出音频信号(例如，“哔哔声”)或语音指示，该音频信号或语音指示指示文档的将要阅读的部分为文档的开头或者电子设备即将输出语音阅读。此外，或代替之，电子设备可输出有关文档的信息，诸如文档的标题、作者、日期和来源。此外，电子设备可输出音频信号或语音指示，该音频信号或语音指示指示文档的将要阅读的部分为标题、新行、新句子和/或新段落。Figure 4E shows that an electronic device displays a portion of a linked document and outputs a voice reading of a portion of the linked document. In some embodiments, before outputting the voice reading of the linked document, the electronic device outputs audible information indicating the current position on the document and/or audible information including information about the document. For example, the electronic device may output an audio signal (e.g., a "beep") or a voice indication that the portion of the document to be read is the beginning of the document or that the electronic device is about to output a voice reading. In addition, or instead of this, the electronic device may output information about the document, such as the title, author, date, and source of the document. In addition, the electronic device may output an audio signal or a voice indication that the portion of the document to be read is a title, a new line, a new sentence, and/or a new paragraph.

图4F示出电子设备进一步输出对所链接的文档的后续部分的语音阅读。类似地，电子设备可继续输出对所链接的文档的剩余部分的语音阅读。Fig. 4F shows that the electronic device further outputs the voice reading of the subsequent part of the linked document. Similarly, the electronic device can continue to output the voice reading of the remaining part of the linked document.

图4G示出电子设备输出对所链接的文档的最后一个句子的语音阅读。FIG4G shows that the electronic device outputs a voice reading of the last sentence of the linked document.

在一些实施例中，在电子设备输出对所链接的文档的一个或多个部分的语音阅读的同时，电子设备从用户接收请求导航回到第一文档的语音命令。图4H示出了，作为响应，电子设备显示第一文档的一个或多个部分并输出对第一文档的一部分的语音阅读。In some embodiments, while the electronic device outputs a spoken reading of one or more portions of the linked document, the electronic device receives a voice command from the user requesting navigation back to the first document. Figure 4H shows that, in response, the electronic device displays one or more portions of the first document and outputs a spoken reading of a portion of the first document.

在一些实施例中，电子设备输出有关导航回到第一文档的语音询问。在一些实施例中，在输出对所链接的文档的最后一个句子的语音阅读之后，电子设备输出有关导航回到第一文档的语音询问。当用户提供请求导航回到第一文档的语音命令时，电子设备输出对第一文档的一部分的语音阅读，如上所述(即，导航回到第一文档)。作为另外一种选择，在一些实施例中，例如当电子设备完成输出对所链接的文档的语音阅读时，电子设备在不从用户接收语音命令的情况下自动导航回到第一文档。In some embodiments, the electronic device outputs a voice query about navigating back to the first document. In some embodiments, after outputting a voice reading of the last sentence of the linked document, the electronic device outputs a voice query about navigating back to the first document. When the user provides a voice command requesting navigation back to the first document, the electronic device outputs a voice reading of a portion of the first document, as described above (i.e., navigates back to the first document). Alternatively, in some embodiments, for example, when the electronic device completes outputting a voice reading of the linked document, the electronic device automatically navigates back to the first document without receiving a voice command from the user.

在一些实施例中，当电子设备恢复输出对第一文档的语音阅读时，电子设备开始输出对第一文档的对应于该第一文档的最后一个语音输出部分的一部分的语音阅读。在一些实施例中，电子设备通过输出对与链接相关联的文本的语音阅读来恢复。在一些实施例中，电子设备通过输出对第一文档的继与所述链接相关联的文本之后的一部分的语音阅读来恢复。在一些实施例中，电子设备通过输出对包括与所述链接相关联的文本的句子的语音阅读来恢复。在一些实施例中，电子设备通过输出对包括与所述链接相关联的文本的段落的语音阅读来恢复。In some embodiments, when the electronic device resumes outputting the voice reading of the first document, the electronic device begins outputting the voice reading of a portion of the first document corresponding to the last voice output portion of the first document. In some embodiments, the electronic device resumes by outputting the voice reading of the text associated with the link. In some embodiments, the electronic device resumes by outputting the voice reading of a portion of the first document following the text associated with the link. In some embodiments, the electronic device resumes by outputting the voice reading of a sentence including the text associated with the link. In some embodiments, the electronic device resumes by outputting the voice reading of a paragraph including the text associated with the link.

图4I示出电子设备输出对第一文档中的下一个段落的一部分的语音阅读。在一些实施例中，在输出对下一个段落的一部分的语音阅读之前，电子设备输出指示文档上的当前位置的可听信息。例如，电子设备可输出音频信号(例如，“哔哔声”)或语音指示，该音频信号或语音指示指示文档的将要阅读的部分为新行、新句子和/或新段落。Figure 4I shows an electronic device outputting a spoken reading of a portion of the next paragraph in a first document. In some embodiments, before outputting the spoken reading of a portion of the next paragraph, the electronic device outputs audible information indicating the current position on the document. For example, the electronic device may output an audio signal (e.g., a "beep") or a spoken indication that the portion of the document to be read is a new line, a new sentence, and/or a new paragraph.

图4J示出了，在一些实施例中，文档包括具有相应样式的多个部分。换句话讲，文档的第一部分可包括第一样式并且文档的第二部分包括第二样式。例如，图4J中所示的文档的第一段落具有第一样式(例如，该段落为非斜体字体)并且文档的后续段落具有不同于第一样式的第二样式(例如，该段落为斜体字体)。电子设备使用第一组语音特征(例如，成年女性嗓音)来输出对文档的第一部分的语音阅读并且使用第二组语音特征(例如，成年男性嗓音)来输出对文档的第二部分的语音阅读。语音特征可包括音高、语速和音量中的一者或多者，和/或对基于例如性别(例如，男性或女性)和年龄(例如，成年人或儿童)分类的特定组的说话人而言为典型的特征。Figure 4J shows that, in some embodiments, the document includes a plurality of parts with corresponding styles. In other words, the first part of the document may include a first style and the second part of the document includes a second style. For example, the first paragraph of the document shown in Figure 4J has a first style (for example, the paragraph is a non-italic font) and the subsequent paragraphs of the document have a second style (for example, the paragraph is an italic font) that is different from the first style. The electronic device uses a first group of voice features (for example, an adult female voice) to output the voice reading of the first part of the document and uses a second group of voice features (for example, an adult male voice) to output the voice reading of the second part of the document. Voice features may include one or more of pitch, speech rate and volume, and/or be typical features for speakers of a specific group classified based on, for example, gender (for example, male or female) and age (for example, adult or child).

图4K-4N示出了根据一些实施例的与识别一个或多个文档相关联的示例性用户界面。尽管图4K-4N中所示的示例性用户界面包括电子邮件消息，但类似方法可与其他类型的文档诸如书刊章节、百科全书条目等一起使用。Figures 4K-4N illustrate exemplary user interfaces associated with identifying one or more documents according to some embodiments. Although the exemplary user interfaces shown in Figures 4K-4N include email messages, similar methods can be used with other types of documents such as book chapters, encyclopedia entries, etc.

图4K示出了电子消息的示例性用户界面。电子设备输出对消息的至少一部分的语音阅读(例如，“有一封从David发送给您、John、Karen和Paul的电子邮件。主题为新功能。该消息说到‘这看起来真好。大伙儿觉得呢？’”)。Figure 4K shows an exemplary user interface for an electronic message. The electronic device outputs a spoken reading of at least a portion of the message (e.g., "There's an email from David to you, John, Karen, and Paul. The subject is New Features. The message says, 'This looks great. What do you guys think?'").

在一些实施例中，电子设备从用户接收请求对应于特定标准的一个或多个文档的语音命令。例如，在一些实施例中，特定标准是一个或多个文档由一位或多位作者创作(例如，“找到来自David的电子邮件消息”、“找到来自David、John、Karen和Paul的电子邮件消息”、“找到来自该电子邮件的地址的电子邮件消息”，等等)。在其他实施例中，特定标准是一个或多个文档与特定文档相关联(例如，“在该话题中找到电子邮件消息”、“找到对该电子邮件的回复”，等等)。在另外的实施例中，特定标准是一个或多个文档是对特定文档的回复。在另外的实施例中，特定标准是一个或多个文档包括来自相应作者的最新消息(例如，“找到来自David的最新电子邮件”)。在一些实施例中，特定标准是一个或多个文档包括来自相应作者的第一条消息。In some embodiments, the electronic device receives a voice command from the user requesting one or more documents corresponding to a specific criterion. For example, in some embodiments, the specific criterion is that the one or more documents are authored by one or more authors (e.g., "find email messages from David," "find email messages from David, John, Karen, and Paul," "find email messages from the address of this email," etc.). In other embodiments, the specific criterion is that one or more documents are associated with a specific document (e.g., "find email messages in this topic," "find replies to this email," etc.). In another embodiment, the specific criterion is that one or more documents are replies to a specific document. In another embodiment, the specific criterion is that one or more documents include the latest message from the corresponding author (e.g., "find the latest email from David"). In some embodiments, the specific criterion is that one or more documents include the first message from the corresponding author.

在一些实施例中，特定标准是一个或多个文档对应于特定日期范围(例如，“找到上周接所收的电子邮件”、“找到在1月1日至3月31日期间所接收的电子邮件消息”，等等)。在一些实施例中，特定日期范围对应于单个日期。In some embodiments, the specific criteria is that the one or more documents correspond to a specific date range (e.g., "find emails received in the last week," "find email messages received between January 1 and March 31," etc.) In some embodiments, the specific date range corresponds to a single date.

在一些实施例中，特定标准包括上述提到的标准中的一个或多个(例如，“找到上周从David和Karen接收的电子邮件”)。In some embodiments, the specific criteria include one or more of the criteria mentioned above (eg, "find emails received from David and Karen in the last week").

图4L示出了响应于用户请求由特定作者例如David创作的电子邮件消息而显示的示例性用户界面。电子设备输出对电子邮件消息列表的语音阅读。在一些实施例中，电子设备输出指示列表中电子邮件消息的数量的可听信息。4L shows an exemplary user interface displayed in response to a user requesting email messages created by a specific author, such as David. The electronic device outputs a voice reading of the email message list. In some embodiments, the electronic device outputs audible information indicating the number of email messages in the list.

图4M示出了响应于用户请求对先前所显示的电子邮件消息的回复而显示的另选的用户界面。电子设备输出对回复的列表的语音阅读。在一些实施例中，电子设备输出指示列表中的电子邮件消息的数量的可听信息。4M shows an alternative user interface displayed in response to a user requesting a reply to a previously displayed email message. The electronic device outputs a voice reading of the list of replies. In some embodiments, the electronic device outputs audible information indicating the number of email messages in the list.

在一些实施例中，电子设备输出对列表(例如，图4L或图4M中所示的列表)中的电子邮件消息的语音阅读。在一些实施例中，当电子设备输出对列表中的最后一封电子邮件的语音阅读时，电子设备输出指示被读出的电子邮件是列表中的最后一封电子邮件的可听信息。In some embodiments, the electronic device outputs a spoken reading of email messages in a list (e.g., the list shown in Figure 4L or Figure 4M). In some embodiments, when the electronic device outputs a spoken reading of the last email in the list, the electronic device outputs audible information indicating that the email being read is the last email in the list.

尽管图4A-4N示出了在电子设备上显示的示例性用户界面，但在一些实施例中，电子设备可在不于电子设备上显示示例性用户界面的情况下输出可听信息(例如，语音阅读和/或可听信号)。在一些实施例中，电子设备完全不包括显示器。Although Figures 4A-4N illustrate exemplary user interfaces displayed on an electronic device, in some embodiments, the electronic device may output audible information (e.g., voice reading and/or audible signals) without displaying the exemplary user interface on the electronic device. In some embodiments, the electronic device does not include a display at all.

图5为示出了根据一些实施例的用于导航通过由电子设备(例如，相对于图2所述的用户设备104，或相对于图3所述的独立式设备300)所执行的文档的操作的流程图。在一些实施例中，电子设备包括便携式电子设备。在一些实施例中，电子设备包括计算机系统。FIG5 is a flow diagram illustrating operations for navigating through a document performed by an electronic device (e.g., user device 104 described with respect to FIG2 , or standalone device 300 described with respect to FIG3 ) according to some embodiments. In some embodiments, the electronic device comprises a portable electronic device. In some embodiments, the electronic device comprises a computer system.

这些操作仅仅是示例性的并且在各种实施例中可由电子设备执行较少或更少的操作。These operations are merely exemplary and fewer or fewer operations may be performed by the electronic device in various embodiments.

在一些实施例中，电子设备接收(502)第一文档。In some embodiments, the electronic device receives (502) a first document.

电子设备输出(504)对文档的一部分的语音阅读(例如，图4B)。The electronic device outputs (504) a spoken reading of a portion of the document (eg, FIG. 4B).

在一些实施例中，电子设备确定(506)文档的该部分是否包括链接。如果文档的该部分包括链接，则电子设备输出(508)识别所述多个链接中的一个链接的可听信息(例如，词“链接”的语音输出或音频信号“哔哔声”，和/或有关链接的信息诸如所链接的文档的标题、作者、日期和来源，如相对于图4C所述的)。In some embodiments, the electronic device determines (506) whether the portion of the document includes a link. If the portion of the document includes a link, the electronic device outputs (508) audible information identifying one of the plurality of links (e.g., a spoken output of the word "link" or an audio signal "beep," and/or information about the link such as the title, author, date, and source of the linked document, as described with respect to FIG. 4C ).

在一些实施例中，电子设备从用户接收语音命令。在一些实施例中，电子设备确定(510)电子设备是否已接收到语音命令。In some embodiments, the electronic device receives a voice command from a user. In some embodiments, the electronic device determines (510) whether the electronic device has received a voice command.

在一些实施例中，当电子设备接收到语音命令时，电子设备确定所接收的语音命令是否为导航命令(512)(例如，所接收的语音命令是否包括对导航至与链接相关联的第二文档(即，所链接的文档)的请求)。如果所接收的语音命令包括对导航至所链接的文档的请求，则电子设备检索(514)所链接的文档。电子设备然后输出(504)对所链接的文档的一个或多个部分的语音阅读(例如，图4E)。In some embodiments, when the electronic device receives a voice command, the electronic device determines whether the received voice command is a navigation command (512) (e.g., whether the received voice command includes a request to navigate to a second document associated with the link (i.e., the linked document)). If the received voice command includes a request to navigate to the linked document, the electronic device retrieves (514) the linked document. The electronic device then outputs (504) a spoken reading of one or more portions of the linked document (e.g., FIG. 4E ).

在一些实施例中，当电子设备接收到语音命令时，电子设备确定所接收的语音命令是否为信息命令(516)(例如，所接收的语音命令是否包括对有关所链接的文档的信息的请求)。如果所接收的语音命令包括对有关所链接的文档的信息(例如，标题、作者、日期、来源、摘要、概要、第一个句子、第一个段落，等等)诸如“作者是谁”的请求，则电子设备检索(518)所请求的信息。在一些实施例中，从所链接的文档获得所请求的信息的至少一部分(例如，标题、作者、日期和摘要)。在一些实施例中，通过处理所链接的文档(例如，对所链接的文档进行概述)来获得所请求的信息的至少一部分。在一些实施例中，由第三方服务器来提供所请求的信息的至少一部分(例如，查看所链接的文档)。电子设备输出(520)对所请求的信息的语音阅读。在其他实施例中，电子设备从所链接的文档的元数据接收所请求的信息的至少一部分，或者从与所链接的文档相关联的其他来源检索信息以获得细节。In some embodiments, when the electronic device receives a voice command, the electronic device determines whether the received voice command is an information command (516) (e.g., whether the received voice command includes a request for information about the linked document). If the received voice command includes a request for information about the linked document (e.g., title, author, date, source, abstract, summary, first sentence, first paragraph, etc.), such as "who is the author," the electronic device retrieves (518) the requested information. In some embodiments, at least a portion of the requested information (e.g., title, author, date, and abstract) is obtained from the linked document. In some embodiments, at least a portion of the requested information is obtained by processing the linked document (e.g., summarizing the linked document). In some embodiments, at least a portion of the requested information is provided by a third-party server (e.g., viewing the linked document). The electronic device outputs (520) a spoken reading of the requested information. In other embodiments, the electronic device receives at least a portion of the requested information from metadata of the linked document or retrieves information from other sources associated with the linked document to obtain details.

在一些实施例中，当用户在两个相邻链接之间或紧接在两个相邻链接之后提供语音命令时，电子设备将这两个链接识别为对应于语音命令的候选链接。由于用户的语音命令可能指向两个相邻链接的第一链接或第二链接，所以可能需要相对于用户的语音命令应当执行哪一个链接来进行阐明。在一些实施例中，电子设备输出有关候选链接的可听信息和/或有关用户想要针对哪个链接来执行语音命令的可听询问。例如，在电子设备输出对句子“学校A与学校B之间的下一场比赛将于3月3日举行”的语音阅读的同时，用户提供语音命令“跟随链接”。在一些实施例中，电子设备输出可听询问“哪一个链接？”并从用户接收后续语音命令。当后续语音命令为“学校B”时，电子设备检索与文本“学校B”链接的文档并输出对所链接的文档的一部分的语音阅读。In some embodiments, when a user provides a voice command between two adjacent links or immediately after two adjacent links, the electronic device identifies the two links as candidate links corresponding to the voice command. Since the user's voice command may point to the first link or the second link of the two adjacent links, it may be necessary to clarify which link should be executed relative to the user's voice command. In some embodiments, the electronic device outputs audible information about the candidate links and/or an audible inquiry about which link the user wants to execute the voice command for. For example, while the electronic device outputs a voice reading of the sentence "The next game betweenSchool A andSchool B will be held on March 3", the user provides the voice command "Follow the link". In some embodiments, the electronic device outputs an audible inquiry "Which link?" and receives a subsequent voice command from the user. When the subsequent voice command is "School B", the electronic device retrieves the document linked to the text "School B" and outputs a voice reading of a portion of the linked document.

在一些实施例中，在输出对文档的相应部分的语音阅读之后，电子设备确定(522)是否已到达文档的结尾(例如，电子设备是否已输出对整个文档的语音阅读)。如果尚未到达文档的结尾，则电子设备输出(504)对文档的后续部分的语音阅读。在一些实施例中，如果已到达文档的结尾，则电子设备等待来自用户的导航命令。当电子设备从用户接收到(524)请求导航至前一个文档或新文档的导航命令时，电子设备检索(526)该前一个文档或新文档。如先前所述，这些操作(例如，操作522、524和526等等)可由服务器或移动设备来执行。在一些实施例中，在从用户接收到导航命令之前，电子设备输出有关用户是否想要导航至另一个文档的语音询问(例如，如相对于图4G所述的)。在一些实施例中，当用户不想要导航至另一个文档时，电子设备停止输出语音阅读。In some embodiments, after outputting the voice reading of the corresponding portion of the document, the electronic device determines (522) whether the end of the document has been reached (e.g., whether the electronic device has output the voice reading of the entire document). If the end of the document has not been reached, the electronic device outputs (504) a voice reading of the subsequent portion of the document. In some embodiments, if the end of the document has been reached, the electronic device waits for a navigation command from the user. When the electronic device receives (524) a navigation command from the user requesting navigation to a previous document or a new document, the electronic device retrieves (526) the previous document or the new document. As previously described, these operations (e.g., operations 522, 524, and 526, etc.) can be performed by a server or a mobile device. In some embodiments, before receiving a navigation command from the user, the electronic device outputs a voice query regarding whether the user wants to navigate to another document (e.g., as described with respect to FIG. 4G ). In some embodiments, when the user does not want to navigate to another document, the electronic device stops outputting the voice reading.

图6为示出了根据一些实施例的用于浏览由电子设备(例如，相对于图2所述的用户设备104或相对于图3所述的独立式设备300)所执行的文档的操作的流程图。在一些实施例中，电子设备包括便携式电子设备。在一些实施例中，电子设备包括计算机系统。FIG6 is a flow chart illustrating operations for browsing a document performed by an electronic device (e.g., user device 104 described with respect to FIG2 or standalone device 300 described with respect to FIG3 ) according to some embodiments. In some embodiments, the electronic device comprises a portable electronic device. In some embodiments, the electronic device comprises a computer system.

在一些实施例中，电子设备接收(602)具有多个部分的文档。所述部分中的至少一些部分与相应的元数据相关联。在一些实施例中，相应的元数据指示文档的结构(例如，段落、句子、标题、样式等等)。In some embodiments, the electronic device receives (602) a document having a plurality of parts. At least some of the parts are associated with corresponding metadata. In some embodiments, the corresponding metadata indicates the structure of the document (e.g., paragraphs, sentences, headings, styles, etc.).

电子设备输出(604)对文档的相应部分的语音阅读。在输出对文档的相应部分的语音阅读的同时，电子设备基于该部分的相应元数据可听地区分每个相应部分。例如，电子设备确定(606)相应部分是否对应于位标。如本文所用，位标是指文档中的预定义的位置或预定义类型的位置。在一些实施例中，当相应部分对应于位标时，电子设备输出(608)语音信号或音频信号以指示该部分是否对应于位标。例如，在一些实施例中，在新段落的开始处，电子设备输出语音信号或音频信号以指示新段落的开始。类似地，在一些实施例中，使用语音信号或音频信号用于指示章节的开头或结尾和/或指示书签。The electronic device outputs (604) a voice reading of the corresponding portion of the document. While outputting the voice reading of the corresponding portion of the document, the electronic device audibly distinguishes each corresponding portion based on the corresponding metadata of the portion. For example, the electronic device determines (606) whether the corresponding portion corresponds to a marker. As used herein, a marker refers to a predefined location or a predefined type of location in a document. In some embodiments, when the corresponding portion corresponds to a marker, the electronic device outputs (608) a voice signal or an audio signal to indicate whether the portion corresponds to the marker. For example, in some embodiments, at the beginning of a new paragraph, the electronic device outputs a voice signal or an audio signal to indicate the beginning of the new paragraph. Similarly, in some embodiments, a voice signal or an audio signal is used to indicate the beginning or end of a chapter and/or to indicate a bookmark.

继输出语音信号或音频信号以指示该部分是否对应于位标之后(例如，在于输出语音信号或音频信号以指示该部分是否对应于位标之后输出对数个段落的语音阅读之后)，电子设备从用户接收语音命令并确定(610)语音命令是否为导航命令。当语音命令是请求导航至位标的导航命令时，电子设备导航(614)至对应于位标的部分，并输出(604)对对应于位标的部分的语音阅读。After outputting a voice signal or an audio signal to indicate whether the portion corresponds to the landmark (e.g., after outputting a voice reading of several paragraphs after outputting the voice signal or the audio signal to indicate whether the portion corresponds to the landmark), the electronic device receives a voice command from the user and determines (610) whether the voice command is a navigation command. When the voice command is a navigation command requesting navigation to the landmark, the electronic device navigates (614) to the portion corresponding to the landmark and outputs (604) a voice reading of the portion corresponding to the landmark.

在一些实施例中，在输出对文档的相应部分的语音阅读之后，电子设备确定(616)是否已到达文档的结尾(例如，电子设备是否已输出对整个文档的语音阅读)。如果尚未到达文档的结尾，则电子设备输出(604)对文档的后续部分的语音阅读。In some embodiments, after outputting the spoken reading of the corresponding portion of the document, the electronic device determines (616) whether the end of the document has been reached (e.g., whether the electronic device has output the spoken reading of the entire document). If the end of the document has not been reached, the electronic device outputs (604) the spoken reading of the subsequent portion of the document.

图7为示出了根据一些实施例的用于识别由电子设备(例如，相对于图2所述的用户设备104，或相对于图3所述的独立式设备300)所执行的一组文档的操作的流程图。在一些实施例中，电子设备包括便携式电子设备。在一些实施例中，电子设备包括计算机系统。FIG7 is a flow chart illustrating operations for identifying a set of documents performed by an electronic device (e.g., user device 104 described with respect to FIG2 , or standalone device 300 described with respect to FIG3 ) according to some embodiments. In some embodiments, the electronic device comprises a portable electronic device. In some embodiments, the electronic device comprises a computer system.

在一些实施例中，电子设备输出(702)对文档的至少一部分的语音阅读。In some embodiments, the electronic device outputs ( 702 ) a spoken reading of at least a portion of the document.

电子设备从用户接收(704)请求对应于特定标准的文档的语音命令。在一些实施例中，电子设备在输出语音阅读的同时接收语音命令的至少一部分。在一些实施例中，特定标准要求所述一个或多个识别的文档由被用户所识别的一位或多位作者创作(706)。在一些实施例中，特定标准要求所述一个或多个识别的文档与特定文档相关联(708)。在一些实施例中，特定标准要求所述一个或多个识别的文档是对特定文档的回复(710)。在一些实施例中，特定标准要求所述一个或多个识别的文档包括来自相应作者的最新消息(712)。在一些实施例中，特定标准要求所述一个或多个识别的文档对应于由用户所识别的特定日期范围(714)。The electronic device receives (704) a voice command from a user requesting a document corresponding to a specific criterion. In some embodiments, the electronic device receives at least a portion of the voice command while outputting a voice reading. In some embodiments, the specific criterion requires that the one or more identified documents be authored by one or more authors identified by the user (706). In some embodiments, the specific criterion requires that the one or more identified documents be associated with a specific document (708). In some embodiments, the specific criterion requires that the one or more identified documents be a reply to a specific document (710). In some embodiments, the specific criterion requires that the one or more identified documents include the latest message from the corresponding author (712). In some embodiments, the specific criterion requires that the one or more identified documents correspond to a specific date range identified by the user (714).

在一些实施例中，在接收到请求对应于特定标准的文档的语音命令之前，电子设备从用户接收对关于文档的信息(例如，作者、字数、最后更新的日期、文档中章节的数量等等)的请求。在一些实施例中，响应于对关于文档的信息的请求，电子设备输出包括关于文档的所请求的信息的可听信息。在一些实施例中，电子设备从一个或多个远程计算机系统(例如，一个或多个搜索引擎和/或数据库服务器)检索所请求的信息。在一些实施例中，电子设备从文档确定所请求的信息。In some embodiments, before receiving a voice command requesting a document corresponding to specific criteria, the electronic device receives a request from the user for information about the document (e.g., author, word count, last updated date, number of chapters in the document, etc.). In some embodiments, in response to the request for information about the document, the electronic device outputs audible information including the requested information about the document. In some embodiments, the electronic device retrieves the requested information from one or more remote computer systems (e.g., one or more search engines and/or database servers). In some embodiments, the electronic device determines the requested information from the document.

在一些实施例中，电子设备从用户接收对存储所请求的信息的请求，并且作为响应，存储所请求的信息(例如，存储在电子设备或远程存储设备上)。在一些实施例中，电子设备在没有来自用户的对存储所请求信息的请求的情况下，存储所请求的信息。In some embodiments, the electronic device receives a request from a user to store the requested information and, in response, stores the requested information (e.g., on the electronic device or a remote storage device). In some embodiments, the electronic device stores the requested information without a request from the user to store the requested information.

在一个实例中，该请求针对文档的长度(例如，“这篇文章有多长？”)，并且响应于对文档长度的请求，电子设备确定文档的长度(例如，字数或页数)并输出包括文档的长度的可听信息。在另一个实例中，该请求针对文档是否更新过(例如，“这篇文章是否更新过？”)，并且响应于对文档是否更新过的请求，电子设备确定文档是否已被更新并输出指示文档是否已被更新的可听信息(例如，“存在对该篇文章的更正”)。在又一个实例中，该请求针对文档的作者(例如，“谁创作了这篇文章？”)。在一些实施例中，响应于对文档作者的请求，电子设备输出包括文档的作者的可听信息。在一些情况下，作者信息从文档提取(例如，在如“由......创作”的文本部分中或在标记语言的作者字段中)。在一些情况下，作者信息从链接至该文档的网页提取。In one instance, the request is for the length of the document (e.g., "How long is this article?"), and in response to the request for the length of the document, the electronic device determines the length of the document (e.g., the number of words or pages) and outputs audible information including the length of the document. In another instance, the request is for whether the document has been updated (e.g., "Has this article been updated?"), and in response to the request for whether the document has been updated, the electronic device determines whether the document has been updated and outputs audible information indicating whether the document has been updated (e.g., "There are corrections to this article"). In yet another instance, the request is for the author of the document (e.g., "Who wrote this article?"). In some embodiments, in response to the request for the author of the document, the electronic device outputs audible information including the author of the document. In some cases, the author information is extracted from the document (e.g., in a text portion such as "Written by..." or in an author field in a markup language). In some cases, the author information is extracted from a web page linked to the document.

在一些实施例中，电子设备基于所请求的信息来接收对附加信息的请求，获得附加信息，并输出附加信息。在上文相对于作者信息所描述的一些实例中，在输出包括文档作者的可听信息之后，电子设备接收对由同一位作者创作的其他文档的请求(例如，“这位作者还写过哪些别的文章？”)，并且作为响应，获得包括由该文档的作者创作的其他文档的信息，并输出所述信息。在一些实施例中，电子设备通过向一个或多个远程计算机系统(例如，一个或多个搜索引擎和/或数据库服务器)发送请求并从所述一个或多个远程计算机系统的至少一个子组接收包括由该文档的作者创作的其他文档的信息的至少一部分来获得所述信息。In some embodiments, the electronic device receives a request for additional information based on the requested information, obtains the additional information, and outputs the additional information. In some examples described above with respect to author information, after outputting audible information including the author of the document, the electronic device receives a request for other documents authored by the same author (e.g., "What other articles has this author written?"), and in response, obtains information including other documents authored by the author of the document, and outputs the information. In some embodiments, the electronic device obtains the information by sending a request to one or more remote computer systems (e.g., one or more search engines and/or database servers) and receiving at least a portion of the information including other documents authored by the author of the document from at least a subset of the one or more remote computer systems.

在一些实施例中，在输出对文档的至少一部分的语音阅读的同时，电子设备从用户接收对存储关于文档的信息的请求。在一些实施例中，响应于接收到对存储关于文档的信息的请求，电子设备存储关于文档的信息。例如，在一些实施例中，该请求是为文档加书签(例如，“为这篇文章加书签”)，并且该电子设备将文档的访问信息(例如，文档的统一资源定位符)存储为书签。在一些实施例中，电子设备将文档的访问信息存储在电子设备上。在一些实施例中，电子设备将访问信息存储在远程服务器处。In some embodiments, while outputting a voice reading of at least a portion of a document, the electronic device receives a request from a user to store information about the document. In some embodiments, in response to receiving the request to store information about the document, the electronic device stores the information about the document. For example, in some embodiments, the request is to bookmark the document (e.g., "Bookmark this article"), and the electronic device stores access information for the document (e.g., a uniform resource locator for the document) as a bookmark. In some embodiments, the electronic device stores the access information for the document on the electronic device. In some embodiments, the electronic device stores the access information at a remote server.

对于本领域的普通技术人员而言将显而易见的是，在一些实施例中，电子设备在不执行后续操作(例如，下文所述的操作716、718和720)的情况下提供所请求的信息。It will be apparent to one of ordinary skill in the art that in some embodiments, the electronic device provides the requested information without performing subsequent operations (eg, operations 716, 718, and 720 described below).

电子设备识别(716)对应于特定标准的一个或多个文档(例如，图4L和图4M)。The electronic device identifies ( 716 ) one or more documents corresponding to a particular standard (eg, FIG. 4L and FIG. 4M ).

电子设备输出(718)对所述一个或多个识别的文档的相应文档的至少一部分的语音阅读。The electronic device outputs (718) a spoken reading of at least a portion of a corresponding document of the one or more identified documents.

在一些实施例中，在输出对文档的相应部分的语音阅读之后，电子设备确定(720)是否已到达文档的结尾(例如，电子设备是否已输出对整个文档的语音阅读)。如果尚未到达文档的结尾，则电子设备输出(718)对文档的后续部分的语音阅读。In some embodiments, after outputting the spoken reading of the corresponding portion of the document, the electronic device determines (720) whether the end of the document has been reached (e.g., whether the electronic device has output the spoken reading of the entire document). If the end of the document has not been reached, the electronic device outputs (718) the spoken reading of the subsequent portion of the document.

在一些实施例中，如果已到达文档的结尾，则电子设备确定是否已阅读所述一个或多个识别的文档中的所有文档。如果尚未阅读所述一个或多个识别的文档中的所有文档，则电子设备输出对所述一个或多个识别的文档中的下一个文档的一部分的语音阅读。In some embodiments, if the end of the document has been reached, the electronic device determines whether all of the one or more identified documents have been read. If not, the electronic device outputs a voice reading of a portion of the next document in the one or more identified documents.

根据一些实施例，图8示出了根据如上所述的本发明的原理配置的电子设备800的功能框图。设备的功能块可以由硬件、软件、或者硬件和软件的组合实现以实行本发明的原理。本领域的技术人员能够理解，图8中所述的功能块可被组合为或者被分离为子块以实现如上所述的本发明的原理。因此，本文的描述可支持本文所述的功能块的任何可能的组合或分离或进一步的定义。According to some embodiments, Fig. 8 shows a functional block diagram of an electronic device 800 configured according to the principle of the present invention as described above. The functional block of equipment can be implemented by hardware, software, or a combination of hardware and software to implement the principle of the present invention. Those skilled in the art will appreciate that the functional block described in Fig. 8 can be combined into or separated into sub-blocks to implement the principle of the present invention as described above. Therefore, the description herein can support any possible combination or separation or further definition of the functional block described herein.

如图8中所示，电子设备800包括音频输入单元804和音频输出单元806。在一些实施例中，电子设备800包括被配置为显示电子文档的一个或多个部分的显示单元802。在一些实施例中，电子设备800包括触敏表面单元808，该触敏表面单元被配置为检测触敏表面单元808上的一个或多个手势。电子设备还包括耦接至音频输入单元804和音频输出单元806的处理单元810。在一些实施例中，处理单元810还耦接至显示单元802和触敏表面单元808。在一些实施例中，处理单元810包括文档接收单元812、输出单元814、语音命令接收单元816、存储单元818、检索单元820、识别单元822以及用户选择接收单元824。As shown in FIG8 , electronic device 800 includes an audio input unit 804 and an audio output unit 806. In some embodiments, electronic device 800 includes a display unit 802 configured to display one or more portions of an electronic document. In some embodiments, electronic device 800 includes a touch-sensitive surface unit 808 configured to detect one or more gestures on touch-sensitive surface unit 808. The electronic device also includes a processing unit 810 coupled to audio input unit 804 and audio output unit 806. In some embodiments, processing unit 810 is also coupled to display unit 802 and touch-sensitive surface unit 808. In some embodiments, processing unit 810 includes a document receiving unit 812, an output unit 814, a voice command receiving unit 816, a storage unit 818, a retrieval unit 820, a recognition unit 822, and a user selection receiving unit 824.

处理单元810被配置为接收包括多个链接的第一文档(例如，利用文档接收单元812)。处理单元810被配置为输出对第一文档的至少一部分的语音阅读(例如，利用输出单元814通过音频输出单元806)。处理单元810被配置为输出识别所述多个链接中的一个链接的可听信息(例如，利用输出单元814通过音频输出单元806)。处理单元810被配置为，响应于输出识别该链接的可听信息，从用户接收关于该链接的语音命令(例如，利用语音命令接收单元816通过音频输入单元804)。处理单元810被配置为，响应于从用户接收到语音命令，输出对第二文档的与该链接相关联的至少一部分的语音阅读(例如，利用输出单元814通过音频输出单元806)。Processing unit 810 is configured to receive a first document comprising a plurality of links (e.g., using document receiving unit 812). Processing unit 810 is configured to output a spoken reading of at least a portion of the first document (e.g., using output unit 814 via audio output unit 806). Processing unit 810 is configured to output audible information identifying a link from the plurality of links (e.g., using output unit 814 via audio output unit 806). Processing unit 810 is configured to, in response to outputting the audible information identifying the link, receive a voice command from a user regarding the link (e.g., using voice command receiving unit 816 via audio input unit 804). Processing unit 810 is configured to, in response to receiving the voice command from the user, output a spoken reading of at least a portion of a second document associated with the link (e.g., using output unit 814 via audio output unit 806).

在一些实施例中，链接与第一文档的该部分中的文本相关联。In some embodiments, the link is associated with text in the portion of the first document.

在一些实施例中，处理单元810被配置为，响应于从用户接收到关于链接的语音命令，存储有关该链接的信息(例如，利用存储单元818)并输出对第二文档的一个或多个部分的语音阅读(例如，利用输出单元814通过音频输出单元806)。In some embodiments, processing unit 810 is configured to, in response to receiving a voice command about a link from a user, store information about the link (e.g., using storage unit 818) and output a spoken reading of one or more portions of the second document (e.g., using output unit 814 through audio output unit 806).

在一些实施例中，处理单元810被配置为，在输出对第二文档的至少一部分的语音阅读之后，从用户接收请求导航回到第一文档的语音命令(例如，利用语音命令接收单元816通过音频输入单元804)；以及，响应于从用户接收到请求导航回到第一文档的语音命令，输出对第一文档的继第一文档中的与所述链接相关联的文本之后的一个或多个部分的语音阅读(例如，利用输出单元814通过音频输出单元806)。In some embodiments, the processing unit 810 is configured to, after outputting a voice reading of at least a portion of the second document, receive a voice command from the user requesting navigation back to the first document (e.g., using the voice command receiving unit 816 through the audio input unit 804); and, in response to receiving a voice command from the user requesting navigation back to the first document, output a voice reading of one or more portions of the first document following the text associated with the link in the first document (e.g., using the output unit 814 through the audio output unit 806).

在一些实施例中，处理单元810被配置为，在输出对第二文档的至少一部分的语音阅读之后，输出有关导航回到第一文档的语音询问(例如，利用输出单元814通过音频输出单元806)；响应于输出有关导航回到第一文档的语音询问，从用户接收请求导航回到第一文档的语音命令(例如，利用语音命令接收单元816通过音频输入单元804)；以及，响应于从用户接收到请求导航回到第一文档的语音命令，输出对第一文档的包括在第一文档中的与所述链接相关联的文本的一个或多个部分的语音阅读(例如，利用输出单元814通过音频输出单元806)。In some embodiments, the processing unit 810 is configured to, after outputting a voice reading of at least a portion of the second document, output a voice query about navigating back to the first document (e.g., using the output unit 814 through the audio output unit 806); in response to outputting the voice query about navigating back to the first document, receive a voice command from the user requesting navigation back to the first document (e.g., using the voice command receiving unit 816 through the audio input unit 804); and, in response to receiving the voice command from the user requesting navigation back to the first document, output a voice reading of one or more portions of the text associated with the link in the first document, including the text in the first document, (e.g., using the output unit 814 through the audio output unit 806).

在一些实施例中，处理单元810被配置为，在输出对第二文档的至少一部分的语音阅读之后，响应于从用户接收到请求导航回到第一文档的语音命令，自动输出对第一文档的继第一文档中的与所述链接相关联的文本之后的一个或多个部分的语音阅读(例如，利用输出单元814通过音频输出单元806)。In some embodiments, the processing unit 810 is configured to, after outputting a voice reading of at least a portion of the second document, automatically output a voice reading of one or more portions of the first document following the text associated with the link in the first document in response to receiving a voice command from the user requesting to navigate back to the first document (e.g., using the output unit 814 through the audio output unit 806).

在一些实施例中，语音命令为第一类型的语音命令。在一些实施例中，处理单元810被配置为从用户接收关于链接的不同于第一类型的第二类型的语音命令(例如，利用语音命令接收单元816通过音频输入单元804)；以及，响应于从用户接收到关于所述链接的第二类型的语音命令，从第二文档中检索信息(例如，利用检索单元820)；以及基于来自第二文档的信息，输出对应于第二类型的语音命令的语音信息(例如，利用输出单元814通过音频输出单元806)。In some embodiments, the voice command is a first type of voice command. In some embodiments, the processing unit 810 is configured to receive a second type of voice command from the user regarding a link, which is different from the first type (e.g., using the voice command receiving unit 816 through the audio input unit 804); and, in response to receiving the second type of voice command from the user regarding the link, retrieve information from the second document (e.g., using the retrieval unit 820); and output voice information corresponding to the second type of voice command based on the information from the second document (e.g., using the output unit 814 through the audio output unit 806).

在一些实施例中，第二类型的语音命令包括对有关第二文档的作者的信息的请求。In some embodiments, the second type of voice command includes a request for information about the author of the second document.

在一些实施例中，第二类型的语音命令包括对第二文档的概要的请求。In some embodiments, the second type of voice command comprises a request for a summary of the second document.

在一些实施例中，处理单元810被配置为将两个或更多个链接识别为对应于来自用户的语音命令的候选链接(例如，利用识别单元822)；输出有关候选链接的可听信息(例如，利用输出单元814通过音频输出单元806)；以及从用户接收对候选链接中的单个链接的选择(例如，利用用户选择接收单元824通过音频输入单元804或触敏表面单元808)。In some embodiments, the processing unit 810 is configured to identify two or more links as candidate links corresponding to a voice command from a user (e.g., using the identification unit 822); output audible information about the candidate links (e.g., using the output unit 814 through the audio output unit 806); and receive a selection of a single link from the user from the candidate links (e.g., using the user selection receiving unit 824 through the audio input unit 804 or the touch-sensitive surface unit 808).

根据一些实施例，图9示出了根据如上所述的本发明的原理配置的电子设备900的功能框图。设备的功能块可由硬件、软件、或者硬件和软件的组合实现以实行本发明的原理。本领域的技术人员能够理解，图9中所述的功能块可被组合为或者被分离为子块以实现如上所述的本发明的原理。因此，本文的描述可支持本文所述功能块的任何可能的组合或分离或进一步的定义。According to some embodiments, Figure 9 shows a functional block diagram of an electronic device 900 configured according to the principles of the present invention as described above. The functional blocks of equipment can be implemented by hardware, software, or a combination of hardware and software to implement the principles of the present invention. Those skilled in the art will appreciate that the functional blocks described in Figure 9 can be combined into or separated into sub-blocks to implement the principles of the present invention as described above. Therefore, the description herein can support any possible combination or separation or further definition of the functional blocks described herein.

如图9中所示，电子设备900包括音频输入单元904和音频输出单元906。在一些实施例中，电子设备900包括被配置为显示电子文档的一个或多个部分的显示单元902。在一些实施例中，电子设备900包括触敏表面单元908，该触敏表面单元被配置为检测触敏表面单元908上的一个或多个手势。电子设备还包括耦接至音频输入单元904和音频输出单元906的处理单元910。在一些实施例中，处理单元910还耦接至显示单元902和触敏表面单元908。在一些实施例中，处理单元910包括文档接收单元912、输出单元914以及语音命令接收单元916。As shown in Figure 9, electronic device 900 includes an audio input unit 904 and an audio output unit 906. In some embodiments, electronic device 900 includes a display unit 902 configured to display one or more portions of an electronic document. In some embodiments, electronic device 900 includes a touch-sensitive surface unit 908 configured to detect one or more gestures on touch-sensitive surface unit 908. The electronic device also includes a processing unit 910 coupled to audio input unit 904 and audio output unit 906. In some embodiments, processing unit 910 is also coupled to display unit 902 and touch-sensitive surface unit 908. In some embodiments, processing unit 910 includes a document receiving unit 912, an output unit 914, and a voice command receiving unit 916.

处理单元910被配置为接收具有多个部分的文档(例如，利用文档接收单元912)，其中所述部分中的至少一些部分与相应的元数据相关联。处理单元910被配置为输出对文档的相应部分的语音阅读，包括基于相应的元数据可听地区分相应部分(例如，利用输出单元914通过音频输出单元906)。处理单元910被配置为从用户接收请求导航至与特定元数据相关联的特定部分的语音命令(例如，利用语音命令接收单元916通过音频输入单元904)。处理单元910被配置为，响应于接收到语音命令，输出对与特定元数据相关联的特定部分的语音阅读(例如，利用输出单元914通过音频输出单元906)。Processing unit 910 is configured to receive a document having a plurality of parts (e.g., using document receiving unit 912), wherein at least some of the parts are associated with corresponding metadata. Processing unit 910 is configured to output a spoken reading of the corresponding parts of the document, including audibly distinguishing the corresponding parts based on the corresponding metadata (e.g., using output unit 914 via audio output unit 906). Processing unit 910 is configured to receive a voice command from a user requesting navigation to a specific part associated with specific metadata (e.g., using voice command receiving unit 916 via audio input unit 904). Processing unit 910 is configured to, in response to receiving the voice command, output a spoken reading of the specific part associated with the specific metadata (e.g., using output unit 914 via audio output unit 906).

在一些实施例中，处理单元910被配置为根据相应部分的相应样式向用户输出对文本的相应部分的语音阅读，具体方式为：使用第一组语音特征向用户输出对文档中文本的第一部分的语音阅读(例如，利用输出单元914通过音频输出单元906)；以及使用不同于第一组语音特征的第二组语音特征来向用户输出对文档中的文本的第二部分的语音阅读(例如，利用输出单元914通过音频输出单元906)。In some embodiments, the processing unit 910 is configured to output a voice reading of a corresponding part of the text to the user according to the corresponding style of the corresponding part, specifically by: using a first set of voice features to output a voice reading of a first part of the text in the document to the user (for example, using the output unit 914 through the audio output unit 906); and using a second set of voice features different from the first set of voice features to output a voice reading of a second part of the text in the document to the user (for example, using the output unit 914 through the audio output unit 906).

在一些实施例中，处理单元910被配置为在输出对文档的一个或多个部分的语音阅读的同时，从用户接收语音命令的至少一部分(例如，利用语音命令接收单元916通过音频输入单元904)。In some embodiments, processing unit 910 is configured to receive at least a portion of a voice command from a user (e.g., using voice command receiving unit 916 through audio input unit 904) while outputting a voice reading of one or more portions of a document.

在一些实施例中，文档的相应部分中的至少一个部分为文档的相应段落。In some embodiments, at least one of the respective portions of the document is a respective paragraph of the document.

在一些实施例中，文档的相应部分中的至少一个部分为文档的相应标题。In some embodiments, at least one of the respective portions of the document is a respective title of the document.

在一些实施例中，文档的相应部分中的至少一个部分为文档的相应句子。In some embodiments, at least one of the corresponding portions of the document is a corresponding sentence of the document.

在一些实施例中，文档的第一部分具有第一样式并且文档的第二部分具有不同于第一样式的第二样式。In some embodiments, a first portion of the document has a first style and a second portion of the document has a second style that is different from the first style.

在一些实施例中，文档的每个部分与单个链接相关联。In some embodiments, each portion of the document is associated with a single link.

根据一些实施例，图10示出了根据如上所述的本发明的原理配置的电子设备1000的功能框图。设备的功能块可以由硬件、软件、或者硬件和软件的组合实现以实行本发明的原理。本领域的技术人员能够理解，图10中所述的功能块可被组合为或者被分离为子块以实现如上所述的本发明的原理。因此，本文的描述可支持本文所述的功能块的任何可能的组合或分离或进一步的定义。According to some embodiments, Figure 10 shows a functional block diagram of an electronic device 1000 configured according to the principles of the present invention as described above. The functional blocks of the device can be implemented by hardware, software, or a combination of hardware and software to implement the principles of the present invention. Those skilled in the art will appreciate that the functional blocks described in Figure 10 can be combined into or separated into sub-blocks to implement the principles of the present invention as described above. Therefore, the description herein can support any possible combination or separation or further definition of the functional blocks described herein.

如图10中所示，电子设备1000包括音频输入单元1004和音频输出单元1006。在一些实施例中，电子设备1000包括被配置为显示电子文档的一个或多个部分的显示单元1002。在一些实施例中，电子设备1000包括触敏表面单元1008，该触敏表面单元被配置为检测触敏表面单元1008上的一个或多个手势。电子设备还包括耦接至音频输入单元1004和音频输出单元1006的处理单元1010。在一些实施例中，处理单元1010还耦接至显示单元1002和触敏表面单元1008。在一些实施例中，处理单元1010包括输出单元1012、语音命令接收单元1014以及识别单元1016。As shown in FIG10 , electronic device 1000 includes an audio input unit 1004 and an audio output unit 1006. In some embodiments, electronic device 1000 includes a display unit 1002 configured to display one or more portions of an electronic document. In some embodiments, electronic device 1000 includes a touch-sensitive surface unit 1008 configured to detect one or more gestures on touch-sensitive surface unit 1008. The electronic device also includes a processing unit 1010 coupled to audio input unit 1004 and audio output unit 1006. In some embodiments, processing unit 1010 is further coupled to display unit 1002 and touch-sensitive surface unit 1008. In some embodiments, processing unit 1010 includes an output unit 1012, a voice command receiving unit 1014, and a recognition unit 1016.

处理单元1010被配置为输出对多个文档中的一个文档的至少一部分的语音阅读(例如，利用输出单元1012通过音频输出单元1006)。处理单元1010被配置为，在输出语音阅读的同时，从用户接收请求对应于特定标准的文档的语音命令(例如，利用语音命令接收单元1014通过音频输入单元1004)。处理单元1010被配置为，响应于从用户接收到语音命令，识别所述多个文档中的对应于特定标准的一个或多个文档(例如，利用识别单元1016)；并且输出对所述一个或多个识别的文档的相应文档的至少一部分的语音阅读(例如，利用输出单元1012通过音频输出单元1006)。The processing unit 1010 is configured to output a spoken reading of at least a portion of a document from a plurality of documents (e.g., using the output unit 1012 through the audio output unit 1006). The processing unit 1010 is configured to, while outputting the spoken reading, receive a voice command from a user requesting a document corresponding to a specific criterion (e.g., using the voice command receiving unit 1014 through the audio input unit 1004). The processing unit 1010 is configured to, in response to receiving the voice command from the user, identify one or more documents from the plurality of documents corresponding to the specific criterion (e.g., using the recognition unit 1016); and output a spoken reading of at least a portion of a corresponding document of the one or more identified documents (e.g., using the output unit 1012 through the audio output unit 1006).

在一些实施例中，特定标准要求所述一个或多个识别的文档由被用户识别的一位或多位作者创作。In some embodiments, certain criteria require that the one or more identified documents be authored by one or more authors identified by the user.

在一些实施例中，特定标准要求所述一个或多个识别的文档与特定文档相关联。In some embodiments, certain criteria require that the one or more identified documents be associated with a specific document.

在一些实施例中，特定标准要求所述一个或多个识别的文档是对特定文档的回复。In some embodiments, certain criteria require that the one or more identified documents be responses to a certain document.

在一些实施例中，特定标准要求所述一个或多个识别的文档包括来自相应作者的最新消息。In some embodiments, certain criteria require that the one or more identified documents include a recent message from the corresponding author.

在一些实施例中，特定标准要求所述一个或多个识别的文档对应于被用户所识别的特定日期范围。In some embodiments, specific criteria require that the one or more identified documents correspond to a specific date range identified by the user.

在一些实施例中，处理单元1010被配置为输出指示所述一个或多个识别的文档的数量的可听信息(例如，利用输出单元1012通过音频输出单元1006)。In some embodiments, processing unit 1010 is configured to output audible information indicating the number of the one or more identified documents (eg, through audio output unit 1006 using output unit 1012 ).

在一些实施例中，处理单元1010被配置为输出指示相应文档为所述一个或多个识别的文档的最后一个文档的可听信息(例如，利用输出单元1012通过音频输出单元1006)。In some embodiments, processing unit 1010 is configured to output audible information indicating that the corresponding document is a last document of the one or more identified documents (eg, via audio output unit 1006 using output unit 1012 ).

出于解释的目的，前面的描述是通过参考具体实施例来进行描述的。然而，上面的示例性讨论并旨在是穷尽的或要将本发明限制到所公开的精确形式。根据以上教导内容，很多修改和变型都是可能的。选择和描述实施例是为了充分阐明本发明的原理及其实际应用，以由此使得本领域的其他技术人员能够充分利用具有适合于所构想的特定用途的各种修改的本发明以及各种实施例。For the purpose of explanation, the foregoing description is described with reference to specific embodiments. However, the above exemplary discussion is not intended to be exhaustive or to limit the invention to the precise forms disclosed. In light of the above teachings, many modifications and variations are possible. The embodiments have been chosen and described in order to fully illustrate the principles of the invention and their practical application, so as to enable others skilled in the art to take full advantage of the present invention and various embodiments with various modifications suitable for the specific purposes contemplated.

例如，相对于图6所述的操作的一个或多个方面可与相对于图7和图8所述的操作(例如，在到达文档的结尾之后输出可听询问)一起使用。类似地，相对于图7所述的操作的一个或多个方面可与相对于图6和图8所述的操作一起使用，并且相对于图8所述的操作的一个或多个方面可与相对于图6和图7所述的操作一起使用。为简明起见，这些细节不再重复。For example, one or more aspects of the operations described with respect to FIG6 may be used with the operations described with respect to FIG7 and FIG8 (e.g., outputting an audible query after reaching the end of a document). Similarly, one or more aspects of the operations described with respect to FIG7 may be used with the operations described with respect to FIG6 and FIG8, and one or more aspects of the operations described with respect to FIG8 may be used with the operations described with respect to FIG6 and FIG7. For the sake of brevity, these details are not repeated.