Movatterモバイル変換


[0]ホーム

URL:


HK1240350B - Headless task completion within digital personal assistants - Google Patents

Headless task completion within digital personal assistants

Info

Publication number
HK1240350B
HK1240350BHK17113502.5AHK17113502AHK1240350BHK 1240350 BHK1240350 BHK 1240350BHK 17113502 AHK17113502 AHK 17113502AHK 1240350 BHK1240350 BHK 1240350B
Authority
HK
Hong Kong
Prior art keywords
voice
user
application
task
personal assistant
Prior art date
Application number
HK17113502.5A
Other languages
Chinese (zh)
Other versions
HK1240350A1 (en
Inventor
V‧S‧坎南
A‧乌瑟拉克
D‧J‧黄
R‧L‧钱伯斯
T‧索米欧
A‧M‧特鲁芬尼斯库
K‧沙希德
A‧艾玛米
Original Assignee
微软技术许可有限责任公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 微软技术许可有限责任公司filedCritical微软技术许可有限责任公司
Publication of HK1240350A1publicationCriticalpatent/HK1240350A1/en
Publication of HK1240350BpublicationCriticalpatent/HK1240350B/en

Links

Description

Translated fromChinese
数字个人助理内的无头任务完成Headless task completion within digital personal assistants

背景background

随着计算技术的发展,越来越强大的计算设备已经变得可用。例如,计算设备正越来越多地增加各特征,例如语音识别。对于用户来说,语音可以是一种与计算设备进行通信的有效方式,并且正在开发语音控制应用,诸如语音控制的数字个人助理。With the development of computing technology, increasingly powerful computing devices have become available. For example, computing devices are increasingly adding features such as voice recognition. Voice can be an effective way for users to communicate with computing devices, and voice-controlled applications, such as voice-controlled digital personal assistants, are being developed.

数字个人助理可以被用于执行用于个体的任务或服务。例如,数字个人助理可以是在移动设备或台式计算机上运行的软件模块。可由数字个人助理执行的任务和服务的示例可包括检索天气状况和预报、赛事比分、交通指导和状况、本地和/或国内新闻事件、以及股票价格;通过创建新时间表条目来管理用户的时间表,以及向用户提醒即将到来的事件;以及存储和检索提醒事项。A digital personal assistant can be used to perform tasks or services for an individual. For example, a digital personal assistant can be a software module that runs on a mobile device or desktop computer. Examples of tasks and services that can be performed by a digital personal assistant may include retrieving weather conditions and forecasts, sports scores, traffic directions and conditions, local and/or national news events, and stock prices; managing a user's schedule by creating new schedule entries and reminding the user of upcoming events; and storing and retrieving reminders.

然而,数字个人助理可能不能执行用户可能想要执行的每一任务。因此,存在足够机会来改进涉及语音控制的数字个人助理的技术。However, digital personal assistants may not be able to perform every task a user may want to perform. Thus, there is ample opportunity to improve the technology related to voice-controlled digital personal assistants.

概述Overview

提供本概述以便以简化的形式介绍以下在详细描述中进一步描述的一些概念。本概述并不旨在标识所要求保护主题的关键特征或必要特征,也不旨在用于限制所要求保护主题的范围。This Summary is provided to introduce some concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

描述了用于在数字个人助理的后台无头地完成应用的任务的技术和工具。例如,一种方法可由包括话筒的计算设备实现。该方法可包括由语音控制的数字个人助理接收由用户生成的数字语音输入。数字语音输入可经由话筒来接收。可以使用该数字语音输入执行自然语言处理以确定用户语音命令。用户语音命令可包括执行第三方启用语音的应用的预定义功能的请求。预定义功能可以使用定义可用第三方启用语音的应用使用语音输入所支持的功能的数据结构来标识。可以使得所述第三方启用语音的应用作为后台进程执行所述预定义功能,而不在所述计算设备的显示器上出现所述第三方启用语音的应用的用户接口;可以接收来自所述第三方启用语音的应用的指示与所述预定义功能相关联的状态的响应。所述语音控制的数字个人助理的用户接口可基于接收到的与所述预定义功能相关联的状态向所述用户提供响应,以使得所述响应来自所述语音控制的数字个人助理的用户接口的上下文内,而不出现所述第三方启用语音的应用的用户接口。Techniques and tools are described for headlessly completing tasks of an application in the background of a digital personal assistant. For example, a method may be implemented by a computing device including a microphone. The method may include receiving, by a voice-controlled digital personal assistant, digital voice input generated by a user. The digital voice input may be received via the microphone. Natural language processing may be performed using the digital voice input to determine a user voice command. The user voice command may include a request to execute a predefined function of a third-party voice-enabled application. The predefined function may be identified using a data structure that defines functions supported by the third-party voice-enabled application using voice input. The third-party voice-enabled application may be caused to execute the predefined function as a background process without the third-party voice-enabled application's user interface appearing on a display of the computing device; and a response may be received from the third-party voice-enabled application indicating a status associated with the predefined function. The user interface of the voice-controlled digital personal assistant may provide a response to the user based on the received status associated with the predefined function, such that the response is within the context of the user interface of the voice-controlled digital personal assistant without the third-party voice-enabled application's user interface appearing.

作为另一示例,可提供包括处理单元、存储器以及一个或多个话筒的用于执行本文所述的操作的计算设备。例如,一种由该计算设备执行的方法可包括经由该一个或多个话筒接收用户所生成的语音输入。可使用该语音输入执行语音识别以确定所讲命令。所讲命令可包括执行第三方应用的任务的请求。该任务可以使用定义能由所讲命令调用的第三方应用的任务的数据结构来标识。可以确定所述第三方应用的任务是否能够被无头地执行。在确定所述第三方应用的任务能够被无头地执行时,可使得所述第三方应用作为后台进程执行以无头地执行所述任务。可接收来自所述第三方应用的指示与所述任务相关联的状态的响应。所述语音控制的数字个人助理的用户接口可基于接收到的与所述任务相关联的状态向所述用户提供响应,以使得所述响应来自所述语音控制的数字个人助理的用户接口的上下文内,而不出现所述第三方应用的用户接口。As another example, a computing device for performing the operations described herein may be provided that includes a processing unit, a memory, and one or more microphones. For example, a method performed by the computing device may include receiving voice input generated by a user via the one or more microphones. Voice recognition may be performed using the voice input to determine a spoken command. The spoken command may include a request to perform a task of a third-party application. The task may be identified using a data structure that defines tasks of the third-party application that can be called by the spoken command. It may be determined whether the task of the third-party application can be executed headlessly. When it is determined that the task of the third-party application can be executed headlessly, the third-party application may be caused to execute as a background process to execute the task headlessly. A response may be received from the third-party application indicating a status associated with the task. The user interface of the voice-controlled digital personal assistant may provide a response to the user based on the received status associated with the task, so that the response comes from within the context of the user interface of the voice-controlled digital personal assistant without the user interface of the third-party application appearing.

作为另一示例,可提供包括处理单元和存储器的用于执行本文所述的操作的计算设备。例如,一种计算设备可执行用于在语音控制的数字个人助理的上下文内完成启用语音的应用的任务的操作。所述操作可包括在语音控制的数字个人助理处接收由用户生成的数字语音输入。数字语音输入可经由话筒来接收。可以使用该数字语音输入执行自然语言处理以确定用户语音命令。用户语音命令可包括执行启用语音的应用的任务的请求。任务可使用将用户语音命令映射到启用语音的应用的任务的可扩展数据结构来标识。可以确定所述启用语音的应用的任务是前台任务还是后台任务。在确定所述任务是后台任务时,可使得所述启用语音的应用将所述任务作为后台任务并在所述语音控制的数字个人助理的上下文内执行,而不出现所述启用语音的应用的用户接口。可接收来自启用语音的应用的响应。该响应可指示与该任务相关联的状态。可基于接收到的与该任务相关联的状态来向用户提供响应。在确定任务是后台任务时,响应可以在语音控制的数字个人助理的上下文内提供,而不出现启用语音的应用的用户接口。As another example, a computing device including a processing unit and a memory for performing the operations described herein may be provided. For example, a computing device may perform operations for completing a task of a voice-enabled application within the context of a voice-controlled digital personal assistant. The operations may include receiving digital voice input generated by a user at the voice-controlled digital personal assistant. The digital voice input may be received via a microphone. Natural language processing may be performed using the digital voice input to determine a user voice command. The user voice command may include a request to perform a task of the voice-enabled application. The task may be identified using an extensible data structure that maps the user voice command to a task of the voice-enabled application. A determination may be made as to whether the task of the voice-enabled application is a foreground task or a background task. If the task is determined to be a background task, the voice-enabled application may be caused to execute the task as a background task within the context of the voice-controlled digital personal assistant without presenting a user interface of the voice-enabled application. A response may be received from the voice-enabled application. The response may indicate a status associated with the task. A response may be provided to the user based on the received status associated with the task. If the task is determined to be a background task, the response may be provided within the context of the voice-controlled digital personal assistant without presenting a user interface of the voice-enabled application.

如本文所述的,各种其它特征和优点可按照需要被结合到所述技术中。As described herein, various other features and advantages may be incorporated into the technology as desired.

附图简述BRIEF DESCRIPTION OF THE DRAWINGS

图1是描绘用于在数字个人助理的后台无头地完成应用的任务的系统的示例的示图。1 is a diagram depicting an example of a system for headlessly completing tasks of an application in the background of a digital personal assistant.

图2是描绘用于在数字个人助理的后台无头地完成应用的任务的示例软件体系结构的示图。2 is a diagram depicting an example software architecture for headlessly completing an application's tasks in the background of a digital personal assistant.

图3是用于与数字个人助理对接的应用的示例状态机的示图。3 is a diagram of an example state machine for an application interfacing with a digital personal assistant.

图4是可被用来创建用于启用应用与数字个人助理之间的接口的数据结构的命令定义的示例。4 is an example of a command definition that may be used to create a data structure for enabling an interface between an application and a digital personal assistant.

图5是示出用于从数字个人助理内无头地执行应用的任务的多个线程的通信的示例序列图。5 is an example sequence diagram illustrating the communication of multiple threads for headlessly executing tasks of an application from within a digital personal assistant.

图6是用于在数字个人助理的后台无头地完成应用的任务的示例方法的流程图。6 is a flow diagram of an example method for headlessly completing tasks of an application in the background of a digital personal assistant.

图7是用于确定在用户正向数字个人助理讲话时是否使应用热身的示例方法的流程图。7 is a flow diagram of an example method for determining whether to warm up an application while a user is speaking to a digital personal assistant.

图8是可用于实现所描述的一些实施例的示例计算系统的图示。FIG8 is a diagram of an example computing system that may be used to implement some described embodiments.

图9是可以结合本文所述的技术来使用的示例移动设备。9 is an example mobile device that can be used in conjunction with the techniques described herein.

图10是可结合本文中描述的技术来使用的示例云支持环境。FIG. 10 is an example cloud support environment that may be used in conjunction with the techniques described herein.

详细描述Detailed description

概览Overview

随着用户对使用数字个人助理变得更加舒适,该用户可能更喜欢在数字个人助理的上下文内执行更多动作。然而,数字个人助理的提供者不能预测或花费时间来开发用户可能想要使用的每一应用。因而,数字个人助理能够调用或启动已由该数字个人助理的提供者以外的实体创建的第三方应用是合乎需要的。As a user becomes more comfortable using a digital personal assistant, the user may prefer to perform more actions within the context of the digital personal assistant. However, the provider of a digital personal assistant cannot predict or invest the time to develop every application that a user may want to use. Therefore, it is desirable for a digital personal assistant to be able to call or launch third-party applications that have been created by an entity other than the provider of the digital personal assistant.

在典型的解决方案中,在数字个人助理启动应用时该应用的用户接口出现,并且程序控制从该数字个人助理传递到该应用。一旦该应用的用户接口出现,用户就可验证该请求的状态并且用户可从该应用内执行附加任务。为返回该数字个人助理的用户接口,用户必须在控制可被返回到该数字个人助理之前退出该应用。In a typical solution, when a digital personal assistant launches an application, the application's user interface appears, and program control is transferred from the digital personal assistant to the application. Once the application's user interface appears, the user can verify the status of the request and perform additional tasks from within the application. To return to the digital personal assistant's user interface, the user must exit the application before control can be returned to the digital personal assistant.

作为使用移动电话的数字个人助理的一个具体示例,用户可请求使用安装在移动电话上的影片应用将影片添加到该用户的队列中。例如,用户可以向该数字个人助理的用户接口说出“影片应用,将影片X添加到我的队列(Movie-Application,add Movie-X to myqueue)”。在该命令被说出并被该助理识别之后,该助理可以启动影片应用,该影片应用将呈现该影片应用的用户接口。影片可被添加到用户的队列并且该队列可被呈现给用户作为影片已被添加的验证。用户可继续使用影片应用或者用户可以关闭影片应用以返回到数字个人助理的用户接口。As a specific example of using a digital personal assistant on a mobile phone, a user may request that a movie be added to the user's queue using a movie application installed on the mobile phone. For example, the user may say "Movie-Application, add Movie-X to myqueue" to the user interface of the digital personal assistant. After the command is spoken and recognized by the assistant, the assistant may launch the movie application, which will present its user interface. The movie may be added to the user's queue, and the queue may be presented to the user as verification that the movie has been added. The user may continue to use the movie application, or the user may close the movie application to return to the user interface of the digital personal assistant.

在数字个人助理将控制转移到应用时,将该应用以及其用户接口加载到存储器可花费能感知到的时间量。该延迟可潜在地影响用户的生产力,诸如通过延迟用户完成后续任务和/或通过中断用户的思路。例如,用户的注意力可被引导到在返回到数字个人助理的用户接口之前关闭该应用。此外,通过将控制转移到该应用,数字个人助理可用的上下文信息可能对该应用而言是不可用的。例如,数字个人助理可能理解用户的配偶的身份和联系信息、用户住宅或办公室的位置、或者用户的日托提供者的位置,但该应用可能不具有对该上下文信息的访问权。When the digital personal assistant transfers control to an application, it may take a noticeable amount of time for the application and its user interface to load into memory. This delay may potentially affect the user's productivity, such as by delaying the user from completing subsequent tasks and/or by interrupting the user's train of thought. For example, the user's attention may be directed to closing the application before returning to the digital personal assistant's user interface. Furthermore, by transferring control to the application, contextual information available to the digital personal assistant may not be available to the application. For example, the digital personal assistant may understand the identity and contact information of the user's spouse, the location of the user's home or office, or the location of the user's daycare provider, but the application may not have access to this contextual information.

在本文描述的技术和解决方案中,数字个人助理可以确定第三方应用的任务是否可以在后台执行,以便用于执行该任务读操作在数字个人助理的上下文内执行并且不出现启用语音的应用的用户接口。因而,用户可体验到给定任务集合在数字个人助理的上下文内执行,与正在做用户任务的应用的上下文形成对比。此外,设备的功率可潜在地降低(并且电池寿命延长),因为在该应用的任务在后台执行时该应用的用户接口没有被加载到存储器。In the techniques and solutions described herein, a digital personal assistant can determine whether a task of a third-party application can be executed in the background so that the read operation for performing the task is performed within the context of the digital personal assistant and the user interface of the voice-enabled application does not appear. Thus, the user can experience a given set of tasks as being performed within the context of the digital personal assistant, as opposed to the context of the application that is doing the user's task. In addition, the power consumption of the device can potentially be reduced (and battery life extended) because the user interface of the application is not loaded into memory while the application's tasks are executed in the background.

各应用可以向数字个人助理注册以扩展该助理提供的本机能力的列表。各应用可被安装在设备上或通过网络(诸如因特网)作为服务来被调用。模式定义可以使各应用能够注册语音命令,在用户请求该命令/任务时,一请求将被无头地启动。例如,应用可包括可由数字个人助理访问的语音命令定义(VCD)文件,其中VCD文件标识可被无头地启动的任务。该定义可以指定该应用的任务要总是无头地启动,或者该定义可以指定该应用的任务要在特定情形下无头地启动。例如,如果用户正在请求在不具有显示表面的设备(诸如无线健身手环)上执行任务,或者在用户正以免手模式操作时(诸如在用户连接到蓝牙耳机时),应用可选择无头地做一些事。Applications can register with the digital personal assistant to expand the list of native capabilities provided by the assistant. Applications can be installed on a device or called as a service over a network (such as the Internet). Mode definitions can enable applications to register voice commands, and when a user requests the command/task, a request will be launched headlessly. For example, an application may include a voice command definition (VCD) file accessible by the digital personal assistant, where the VCD file identifies tasks that can be launched headlessly. The definition may specify that the application's tasks are to always be launched headlessly, or the definition may specify that the application's tasks are to be launched headlessly in specific circumstances. For example, if a user is requesting to perform a task on a device that does not have a display surface (such as a wireless fitness bracelet), or when the user is operating in hands-free mode (such as when the user is connected to a Bluetooth headset), the application may choose to do something headlessly.

各应用可提供与所请求的任务的进行中、失败、以及成功完成有关的响应,并且与状态相关的输出可由数字个人助理的用户接口来提供。各应用可以将许多不同类型的数据提供回数字个人助理,包括例如显示文本、可被大声读出的文本、回到该应用的深链接、到网页或网站的链接、以及基于超文本标记语言(HTML)的web内容。从应用到助理的数据可如同来自助理的本机功能一样来经由该助理的用户接口呈现。Each application can provide responses related to the progress, failure, and successful completion of the requested task, and output related to the status can be provided by the user interface of the digital personal assistant. Each application can provide many different types of data back to the digital personal assistant, including, for example, displayed text, text that can be read aloud, deep links back to the application, links to web pages or websites, and web content based on Hypertext Markup Language (HTML). Data from the application to the assistant can be presented via the user interface of the assistant as if it were from the assistant's native functionality.

如果用户向应用提供可能具有多个含义或结果的请求,则该应用可向数字个人助理提供选择列表并且该助理的用户接口可被用来消除这些选择之间的歧义。如果用户向应用提供可能是破坏性或重要的请求(诸如在用户请求银行应用执行余额代偿),该助理的确认接口可被使用以在完成该破坏性或重要任务之前确认该请求。If a user provides an application with a request that could have multiple meanings or outcomes, the application can provide the digital personal assistant with a list of choices and the assistant's user interface can be used to disambiguate between the choices. If a user provides an application with a request that could be disruptive or important (such as when a user requests a banking application to perform a balance transfer), the assistant's confirmation interface can be used to confirm the request before completing the disruptive or important task.

随着命令被讲出,应用可被投机地加载或热身。例如,在用户完成来自命令“影片应用,将影片X添加到我的队列”的短语“影片应用(Movie-Application)”时,可分配存储器并且可从存储中检索已安装影片应用的各个子例程并将其加载到所分配的存储器中以准备该命令变完整时使用这些子例程。当应用是web服务时,热身可包括例如建立通信会话并从远程服务器处的数据库检索用户专用信息。通过将应用热身,对用户作出响应的时间可潜在地被降低,使得交互更加自然并且使得用户可更快地移至下一任务,从而使得用户更具生产力。Applications can be speculatively loaded or warmed up as commands are spoken. For example, when a user completes the phrase "Movie Application" from the command "Movie Application, add Movie X to my queue," memory can be allocated and various subroutines of the installed Movie Application can be retrieved from storage and loaded into the allocated memory in preparation for use when the command is completed. When the application is a web service, warming up can include, for example, establishing a communication session and retrieving user-specific information from a database at a remote server. By warming up the application, the time it takes to respond to the user can potentially be reduced, making the interaction more natural and allowing the user to move on to the next task more quickly, thereby making the user more productive.

使用本文的技术,希望使用影片应用来将影片添加到用户的队列的用户可具有与在使用启动影片应用并将控制传递给该应用的典型解决方案时不同的体验。在这一示例中,影片应用的将影片添加到队列命令可在命令数据结构(诸如VCD文件)中被定义为是无头的。当用户说出来自命令“影片应用,将影片X添加到我的队列”的“影片应用”时,该影片应用可被热身以使得对用户的响应时间可被降低。在该命令变完整时,可使用影片应用将影片添加到用户的队列,但没有出现该影片应用的用户接口。影片可被添加到用户的队列并且数字个人助理可(使用该助理的用户接口)确认影片已被添加。用户可体验到更快的响应时间并可执行更少步骤来完成任务(例如,影片应用不必被关闭)。Using the technology herein, a user who wishes to use a movie application to add a movie to a user's queue may have a different experience than when using a typical solution that launches a movie application and passes control to the application. In this example, the movie application's add movie to queue command may be defined as headerless in a command data structure (such as a VCD file). When a user says "movie application" from the command "movie application, add movie X to my queue," the movie application may be warmed up so that response time to the user may be reduced. When the command is complete, the movie application may be used to add the movie to the user's queue, but the user interface of the movie application may not appear. The movie may be added to the user's queue and the digital personal assistant may confirm (using the assistant's user interface) that the movie has been added. The user may experience faster response times and may perform fewer steps to complete the task (e.g., the movie application does not have to be closed).

包括数字个人助理的示例系统Example system including a digital personal assistant

图1是描绘用于在数字个人助理120的后台无头地完成启用语音的应用110的任务112的系统100的示例的系统图。启用语音的应用110和数字个人助理120可以是安装在计算设备130上的软件模块。计算设备130可以是例如台式计算机、膝上型设备、移动电话、智能电话、可穿戴设备(诸如手表或无线电子手环)、或平板计算机。计算设备130可包括用于标识可由数字个人助理120启动的应用和应用的任务的命令数据结构140。应用可由数字个人助理120在前台(诸如在该应用被启用时该应用的用户接口出现的情形中)和/或在后台(诸如在该应用被启动时该应用的用户接口不出现的情形中)启动。例如,应用的一些任务可以在前台启动且同一应用的不同任务可以在后台启动。命令数据结构140可以定义该应用和/或该应用的任务应当如何从数字个人助理120启动。FIG1 is a system diagram illustrating an example of a system 100 for headlessly completing tasks 112 of a voice-enabled application 110 in the background of a digital personal assistant 120. The voice-enabled application 110 and the digital personal assistant 120 may be software modules installed on a computing device 130. The computing device 130 may be, for example, a desktop computer, a laptop, a mobile phone, a smartphone, a wearable device (such as a watch or a wireless electronic bracelet), or a tablet computer. The computing device 130 may include a command data structure 140 for identifying applications and application tasks that can be launched by the digital personal assistant 120. Applications can be launched by the digital personal assistant 120 in the foreground (such as when the application's user interface appears when the application is launched) and/or in the background (such as when the application's user interface does not appear when the application is launched). For example, some tasks of an application can be launched in the foreground and different tasks of the same application can be launched in the background. The command data structure 140 may define how the application and/or its tasks should be launched from the digital personal assistant 120.

计算设备130可包括用于将声音转换成电信号的话筒150。话筒150可以是分别使用电磁感应、电容或压电中的相应改变从气压变化中产生电子信号的动态的电容器或压电话筒。话筒150可以包括放大器、一个或多个模拟或数字滤波器和/或模-数转换器以产生数字声音输入。数字声音输入可以包括用户语音的再现,诸如在用户正命令数字个人助理120完成任务时。计算设备130可包括用于允许用户录入文本输入的触摸屏或键盘(未示出)。The computing device 130 may include a microphone 150 for converting sound into electrical signals. The microphone 150 may be a dynamic capacitor or piezoelectric microphone that generates electrical signals from changes in air pressure using corresponding changes in electromagnetic induction, capacitance, or piezoelectricity, respectively. The microphone 150 may include an amplifier, one or more analog or digital filters, and/or an analog-to-digital converter to generate digital sound input. The digital sound input may include a reproduction of the user's voice, such as when the user is instructing the digital personal assistant 120 to complete a task. The computing device 130 may include a touch screen or keyboard (not shown) for allowing the user to enter text input.

数字声音输入和/或文本输入可由数字个人助理120的自然语言处理模块122来处理。例如,自然语言处理模块122可以接收数字声音输入并将用户所说的词语转换成文本。所提取的文本可以被语义分析以确定用户语音命令。通过分析数字声音输入并响应于所说出的命令来采取动作,数字个人助理120可以是语音控制的。例如,数字个人助理120可以将所提取的文本与可能用户命令的列表相比较以确定最可能匹配用户意图的命令。该匹配可以是基于统计或概率方法、决策树或其它规则、其它合适的匹配准则或其组合。可能用户命令可以是数字个人助理120的本机命令和/或在命令数据结构140中所定义的命令。因而,通过在命令数据结构140中定义命令,可以代表用户由数字个人助理120所执行的任务的范围可被扩展。可能命令可包括执行启用语音的应用110的任务112,这可在命令数据结构140中被定义为无头或后台任务。Digital voice input and/or text input may be processed by the natural language processing module 122 of the digital personal assistant 120. For example, the natural language processing module 122 may receive the digital voice input and convert the user's spoken words into text. The extracted text may be semantically analyzed to determine the user's voice command. By analyzing the digital voice input and taking actions in response to the spoken command, the digital personal assistant 120 may be voice-controlled. For example, the digital personal assistant 120 may compare the extracted text with a list of possible user commands to determine the command that most likely matches the user's intent. This matching may be based on statistical or probabilistic methods, decision trees or other rules, other suitable matching criteria, or a combination thereof. Possible user commands may be native commands of the digital personal assistant 120 and/or commands defined in the command data structure 140. Thus, by defining commands in the command data structure 140, the range of tasks that can be performed on behalf of the user by the digital personal assistant 120 may be expanded. Possible commands may include executing a task 112 of a voice-enabled application 110, which may be defined in the command data structure 140 as a headless or background task.

自然语言处理模块122可以在处理语音时生成文本流,以使得中间文本串可在用户话语完成之前被分析。因而,如果用户以应用的名称来开始命令,则该应用可在话语中提早标识,并且该应用可在用户说完命令之前被热身。使应用热身可包括从相对较慢的非易失性存储器(诸如硬盘驱动器或闪存)检索该应用的各指令并将这些指令存储在相对较快的易失性存储器(诸如主存储器或高速缓存存储器)中。The natural language processing module 122 can generate a text stream when processing speech so that intermediate text strings can be analyzed before the user's utterance is completed. Thus, if the user begins a command with the name of an application, the application can be identified early in the utterance and the application can be warmed up before the user finishes speaking the command. Warming up the application can include retrieving the application's instructions from a relatively slow non-volatile memory (such as a hard drive or flash memory) and storing the instructions in a relatively fast volatile memory (such as main memory or cache memory).

在数字个人助理120确定命令与应用的任务相关联时,该应用的任务可被执行。如果数字个人助理120确定该应用的任务要作为后台进程来执行(诸如通过分析命令数据结构140中的定义),则该应用可以在后台执行。该应用(诸如启用语音的应用110)可以与数字个人助理120通信。例如,该应用可以顺序通过与任务的完成相关联的状态集,并且该应用的状态可被传递给数字个人助理120。例如,该应用能以“初始”状态开始,在正在执行任务时转移到“进行中”状态,并随后在完成任务时转移到“最终”状态。When the digital personal assistant 120 determines that a command is associated with a task of an application, the task of the application can be performed. If the digital personal assistant 120 determines that the task of the application is to be performed as a background process (such as by analyzing the definition in the command data structure 140), the application can be executed in the background. The application (such as the voice-enabled application 110) can communicate with the digital personal assistant 120. For example, the application can sequence through a set of states associated with the completion of the task, and the state of the application can be passed to the digital personal assistant 120. For example, the application can start in an "initial" state, transition to an "in progress" state while performing the task, and then transition to a "final" state when the task is completed.

数字个人助理120可经由用户接口124报告任务的进度。用户接口124可以按各种方式向用户传递信息,诸如通过在计算设备130的显示器上呈现文本、图形或超链接,从计算设备130的扬声器生成音频输出,或者生成其他传感输出,诸如来自连接到计算设备130的离心重量的电机的振动。例如,用户接口124可以使得在任务处于进行中状态时在计算设备130的显示屏上呈现转轮。作为另一示例,用户接口124可以在任务处于最终状态并且任务被成功完成时生成指示任务的成功完成的仿真语音。通过使用数字个人助理120的用户接口124来报告任务的状态,响应可来自用户接口124的上下文内而不出现该应用的用户接口。The digital personal assistant 120 can report the progress of the task via the user interface 124. The user interface 124 can convey information to the user in various ways, such as by presenting text, graphics, or hyperlinks on the display of the computing device 130, generating audio output from a speaker of the computing device 130, or generating other sensory output, such as vibrations from a motor of an eccentric weight connected to the computing device 130. For example, the user interface 124 can cause a running wheel to be presented on the display screen of the computing device 130 when the task is in an ongoing state. As another example, the user interface 124 can generate simulated speech indicating the successful completion of the task when the task is in a final state and the task is successfully completed. By using the user interface 124 of the digital personal assistant 120 to report the status of the task, the response can come from within the context of the user interface 124 without the user interface of the application appearing.

应当注意,启用语音的应用110可由数字个人助理120的生产者或与该生产者不同的第三方来创建。数字个人助理120与启用语音的应用110的互操作可通过符合应用-到-应用软件合同并通过在命令数据结构140中定义功能性来实现。启用语音的应用110可能够作为独立应用或只做为数字个人助理120的组件来操作。作为独立应用,启用语音的应用110可作为前台进程在数字个人助理120外部启动,诸如通过轻击或双击与启用语音的应用110相关联且显示在计算设备130的显示屏上的图标。启用语音的应用110可以在被启动时呈现用户接口并且用户可与该用户接口交互以执行任务。交互可以只使用语音输入,或者也可使用其他输入模式,诸如文本输入或做姿势。由数字个人助理120调用的应用可被安装在计算设备130上或者可以是web服务。It should be noted that the voice-enabled application 110 may be created by the manufacturer of the digital personal assistant 120 or by a third party different from the manufacturer. Interoperability between the digital personal assistant 120 and the voice-enabled application 110 may be achieved by conforming to an application-to-application software contract and defining functionality in the command data structure 140. The voice-enabled application 110 may be capable of operating as a standalone application or solely as a component of the digital personal assistant 120. As a standalone application, the voice-enabled application 110 may be launched as a foreground process outside of the digital personal assistant 120, such as by tapping or double-clicking an icon associated with the voice-enabled application 110 and displayed on the display screen of the computing device 130. When launched, the voice-enabled application 110 may present a user interface, and the user may interact with the user interface to perform tasks. Interaction may utilize voice input alone, or other input modes, such as text input or gestures, may also be used. The application invoked by the digital personal assistant 120 may be installed on the computing device 130 or may be a web service.

数字个人助理120可以调用web服务,诸如在远程服务器计算机160上执行的web服务162。web服务是在网络(诸如网络170)上的网络地址处提供的软件功能。网络170可包括局域网(LAN)、广域网(WAN)、因特网、内联网、有线网络、无线网络、蜂窝网络、其组合、或适用于提供在计算设备130和远程服务器计算机160之间进行通信的信道的任何网络。应当明白,图1所示的网络拓扑结构已经被简化,并且可利用多个网络和联网设备来互连此处所公开的各种计算系统。web服务162可作为数字个人助理120的内核的一部分或主要部分来被调用。例如,web服务162可作为自然语言处理模块122的子例程来被调用。作为补充或替换,web服务162可以是命令数据结构140中定义的应用且可能够从数字个人助理120无头地启动。The digital personal assistant 120 can call a web service, such as web service 162 executing on a remote server computer 160. A web service is a software function provided at a network address on a network, such as network 170. Network 170 can include a local area network (LAN), a wide area network (WAN), the Internet, an intranet, a wired network, a wireless network, a cellular network, a combination thereof, or any network suitable for providing a channel for communication between the computing device 130 and the remote server computer 160. It should be understood that the network topology shown in FIG. 1 is simplified, and a plurality of networks and networking devices can be utilized to interconnect the various computing systems disclosed herein. Web service 162 can be called as part of or an integral part of the kernel of the digital personal assistant 120. For example, web service 162 can be called as a subroutine of the natural language processing module 122. Additionally or alternatively, web service 162 can be an application defined in the command data structure 140 and can be launched headlessly from the digital personal assistant 120.

包括数字个人助理的示例软件体系结构Example software architecture including a digital personal assistant

图2是描绘用于在数字个人助理120的后台无头地完成应用的任务的示例软件体系结构200的示图。在无头地执行应用的任务时,该任务可以在后台执行并且该应用的用户接口不作为该任务被执行的结果而出现。相反,数字个人助理120的用户接口可被用来提供输出给用户和/或来自用户的输入,以使得用户在数字个人助理120的上下文内而非该应用的上下文内进行交互。因而,无头地执行的应用的任务可以在后台执行达该任务的执行历时,并且该应用的用户接口从不出现。计算设备(诸如计算设备130)可以执行根据体系结构200来组织的数字个人助理120、操作系统(OS)内核210以及应用230的软件。2 is a diagram depicting an example software architecture 200 for headlessly completing tasks of an application in the background of a digital personal assistant 120. When a task of an application is headlessly executed, the task can be executed in the background and the user interface of the application does not appear as a result of the task being executed. Instead, the user interface of the digital personal assistant 120 can be used to provide output to and/or input from the user so that the user interacts within the context of the digital personal assistant 120 rather than the context of the application. Thus, the task of an application that is headlessly executed can be executed in the background for the execution duration of the task, and the user interface of the application never appears. A computing device (such as computing device 130) can execute software of the digital personal assistant 120, an operating system (OS) kernel 210, and applications 230 organized according to architecture 200.

OS内核210一般提供计算设备130的软件组件和硬件组件之间的接口。OS内核210可包括用于渲染的组件(例如,渲染给显示器的视觉输出、生成用于扬声器的音频输出和其他声音、以及生成用于电机的振动输出)、用于联网的组件、用于进程管理的组件、用于存储器管理的组件、用于位置跟踪的组件、以及用于语音识别和其他输入处理的组件。OS内核210可以管理计算设备130的用户输入功能、输出功能、存储访问功能、网络通信功能、存储器管理功能、进程管理功能、以及其他功能。OS内核210可以向数字个人助理120和应用230提供对这样的功能的访问,诸如通过各种系统调用。The OS kernel 210 generally provides an interface between the software components and the hardware components of the computing device 130. The OS kernel 210 may include components for rendering (e.g., rendering visual output to a display, generating audio output and other sounds for a speaker, and generating vibration output for a motor), components for networking, components for process management, components for memory management, components for location tracking, and components for speech recognition and other input processing. The OS kernel 210 may manage user input functions, output functions, storage access functions, network communication functions, memory management functions, process management functions, and other functions of the computing device 130. The OS kernel 210 may provide access to such functions to the digital personal assistant 120 and the applications 230, such as through various system calls.

用户可以生成用户输入(诸如语音、触觉以及运动)来与数字个人助理120交互。可经由OS内核210(它可包括响应于用户输入来创建消息的功能性)使得数字个人助理120知晓用户输入。该消息可由数字个人助理120或其他软件使用。用户输入可包括触觉输入(诸如触摸屏输入)、按钮按压或按键按压。OS内核210可包括用于识别来自触觉输入的对触摸屏的轻击、手指姿势等、按钮输入、或按键按压输入的功能性。OS内核210可以接收来自话筒150的输入并可包括用于识别来自语音输入的所讲命令和/或词语的功能性。OS内核210可以接收来自加速度计的输入并可包括用于识别取向或运动(诸如摇动)的功能性。The user can generate user input (such as voice, tactile, and motion) to interact with the digital personal assistant 120. The digital personal assistant 120 can be made aware of the user input via the OS kernel 210 (which can include functionality to create messages in response to the user input). The message can be used by the digital personal assistant 120 or other software. The user input can include tactile input (such as touch screen input), button presses, or key presses. The OS kernel 210 can include functionality for recognizing tactile input such as taps on the touch screen, finger gestures, etc., button inputs, or key presses. The OS kernel 210 can receive input from the microphone 150 and can include functionality for recognizing spoken commands and/or words from voice input. The OS kernel 210 can receive input from an accelerometer and can include functionality for recognizing orientation or motion (such as shaking).

数字个人助理120的用户接口(UI)输入处理引擎222可等待来自OS内核210的用户输入事件消息。UI事件消息可指示来自语音输入的所识别的词语、平移姿势、轻拂姿势、拖动姿势、或设备的触摸屏上的其他姿势、触摸屏上的轻击、键击输入、摇动姿势、或其他UI事件(例如,方向按钮或跟踪球输入)。UI输入处理引擎222可以将来自OS内核210的UI事件消息转换成发送给数字个人助理120的控制逻辑224的信息。例如,UI输入处理引擎222可包括自然语言处理能力并可指示特定应用名称已被讲出或键入或者用户已给出语音命令。或者,自然语言处理能力可被包括在控制逻辑224中。The user interface (UI) input processing engine 222 of the digital personal assistant 120 can wait for user input event messages from the OS kernel 210. The UI event messages can indicate recognized words from voice input, pan gestures, flick gestures, drag gestures, or other gestures on the device's touch screen, taps on the touch screen, keystroke input, shake gestures, or other UI events (e.g., directional button or trackball input). The UI input processing engine 222 can convert the UI event messages from the OS kernel 210 into information that is sent to the control logic 224 of the digital personal assistant 120. For example, the UI input processing engine 222 can include natural language processing capabilities and can indicate that a specific application name has been spoken or typed or that a user has given a voice command. Alternatively, the natural language processing capabilities can be included in the control logic 224.

控制逻辑224可以接收来自数字个人助理120的各模块的信息,诸如UI输入处理引擎222、个性化信息存储226、以及命令数据结构140,并且控制逻辑224可以基于接收到的信息来作出决策并执行操作。例如,控制逻辑224可以确定数字个人助理120是否应当代表用户执行任务,诸如通过解析所讲文本流来确定是否已给出语音命令。The control logic 224 can receive information from various modules of the digital personal assistant 120, such as the UI input processing engine 222, the personalized information store 226, and the command data structure 140, and the control logic 224 can make decisions and perform operations based on the received information. For example, the control logic 224 can determine whether the digital personal assistant 120 should perform a task on behalf of the user, such as by parsing a spoken text stream to determine whether a voice command has been given.

控制逻辑224可在按命令来行动之前等待整个用户命令被讲出,或者控制逻辑224可以在命令仍然正被讲出且在它被讲完之前就开始按该命令来行动。例如,控制逻辑224可以分析所讲命令的中间串并尝试将这些串与命令数据结构140中定义的一个或多个应用相匹配。在应用将被调用的概率超出阈值时,该应用可被热身以使得该应用可更及时地对用户作出响应。可在多个应用和/或功能预期到被调用时投机性地被热身,并且如果确定应用将不被调用则可中止该应用。例如,在用户以特定应用的名称来开始所讲命令时,有很高概率该特定应用将被调用,并且所以该应用可被热身。作为另一示例,一些部分命令串可被限于命令数据结构140中定义的一小组应用,并且当存在与该部分命令串的匹配时,该组应用可被并行地热身。具体而言,命令数据结构140可只具有带词语“take”的命令的两个应用,诸如具有命令“take a picture(拍摄图片)”的相机应用和具有命令“take a memo(记录备忘)”的备忘应用。控制逻辑224可在词语“take”被识别时开始热身相机应用和备忘应用两者,并且随后在完整命令“take a picture”被识别时可中止备忘应用。将应用热身可包括分配存储器、预取指令、建立通信会话、从数据库检索信息、启动新执行线程、唤起中断、或其他合适的因应用而异的操作。OS内核210的服务可在热身期间被调用,诸如例如进程管理服务、存储器管理服务、以及网络服务。The control logic 224 may wait for the entire user command to be spoken before acting on it, or the control logic 224 may begin acting on it while the command is still being spoken and before it is finished. For example, the control logic 224 may analyze the intermediate strings of the spoken command and attempt to match these strings with one or more applications defined in the command data structure 140. When the probability that an application will be called exceeds a threshold, the application may be warmed up so that the application can respond to the user more promptly. It may be speculatively warmed up when multiple applications and/or functions are expected to be called, and if it is determined that the application will not be called, the application may be suspended. For example, when a user begins a spoken command with the name of a specific application, there is a high probability that the specific application will be called, and so the application may be warmed up. As another example, some partial command strings may be limited to a small group of applications defined in the command data structure 140, and when there is a match with the partial command string, the group of applications may be warmed up in parallel. Specifically, the command data structure 140 may have only two applications with a command with the word "take," such as a camera application with the command "take a picture" and a memo application with the command "take a memo." The control logic 224 may begin warming up both the camera application and the memo application when the word "take" is recognized, and may then terminate the memo application when the full command "take a picture" is recognized. Warming up the applications may include allocating memory, prefetching instructions, establishing a communication session, retrieving information from a database, starting a new execution thread, invoking an interrupt, or other suitable application-specific operations. Services of the OS kernel 210 may be called during the warm-up, such as, for example, process management services, memory management services, and network services.

所讲文本可包括上下文信息且控制逻辑224可以解析该上下文信息以使得用户语音命令是上下文无关的。上下文信息可包括计算设备130的当前位置、当前时间、取向以及个性化信息存储226中存储的个人信息。个人信息可包括:用户关系,诸如用户的、配偶的或孩子的名字;因用户而异的位置,诸如住宅、工作、学校、日托、或医生地址;来自用户联系人列表或日历的信息,;用户最喜欢的颜色、餐馆、或交通方法;重要生日、周年纪念、或其他日期;以及其他因用户而异的信息。用户可以给出具有上下文信息的命令且控制逻辑224可以将该命令转换成上下文无关的命令。例如,用户可以给出命令“Bus-app,tell me thebusses home within the next hour(公交应用,告诉我下一小时内回家的公交)”。在这一示例中,该命令中的上下文信息是当前日期和时间、当前位置以及用户住宅的位置。The spoken text may include contextual information, and control logic 224 may parse this contextual information to render the user's voice command context-independent. Contextual information may include the current location, current time, orientation, and personal information stored in personalized information store 226 of computing device 130. Personal information may include: user relationships, such as the user's, spouse's, or child's names; user-specific locations, such as home, work, school, daycare, or doctor's addresses; information from the user's contact list or calendar; the user's favorite colors, restaurants, or transportation methods; important birthdays, anniversaries, or other dates; and other user-specific information. A user may issue a command with contextual information, and control logic 224 may convert the command into a context-independent command. For example, a user may issue the command "Bus-app, tell me the buses home within the next hour." In this example, the contextual information in the command is the current date and time, current location, and the location of the user's home.

控制逻辑224可以从维护实时时钟或具有对实时时钟的访问权的OS内核210获得当前时间。控制逻辑224可从OS内核210得到计算设备130的当前位置数据,而OS内核210可从计算设备130的本地组件获得当前位置数据。例如,位置数据可基于来自全球定位系统(GPS)的数据、通过蜂窝网络的蜂窝塔之间的三角测量、通过参考附近Wi-Fi路由器的物理位置、或者通过另一机制来确定。控制逻辑224可以从个性化信息存储226得到用户住宅的位置。个性化信息存储226可被存储在计算设备130的辅助或其他非易失性存储中。因而,控制逻辑224可以经由可访问存储资源(例如,个性化信息存储226)的OS内核210接收个性化信息。在上下文信息可被解析时,该命令可被转换成上下文无关的命令。例如,如果是星期五下午6点,用户在Main Street 444,并且用户的住宅是Pleasant Drive 123,则上下文无关命令可以是“Bus-app,tell me the busses arriving near 444Main Street andpassing near 123Pleasant Drive between 6:00and 7:00p.m.on Fridays(公交应用,告诉我星期五下午6点和7点之间到达Main Street 444附近且经过Pleasant Drive 123附近的公交)”。The control logic 224 may obtain the current time from the OS kernel 210, which maintains or has access to a real-time clock. The control logic 224 may obtain the current location data of the computing device 130 from the OS kernel 210, which may obtain the current location data from a local component of the computing device 130. For example, the location data may be based on data from a global positioning system (GPS), triangulation between cellular towers of a cellular network, by reference to the physical location of nearby Wi-Fi routers, or determined by another mechanism. The control logic 224 may obtain the location of the user's residence from the personalized information store 226. The personalized information store 226 may be stored in auxiliary or other non-volatile storage of the computing device 130. Thus, the control logic 224 may receive personalized information via the OS kernel 210, which may access a storage resource (e.g., the personalized information store 226). When the context information can be parsed, the command may be converted into a context-independent command. For example, if it is 6 p.m. on Friday, the user is at 444 Main Street, and the user's residence is 123 Pleasant Drive, the context-free command could be "Bus-app, tell me the buses arriving near 444 Main Street and passing near 123 Pleasant Drive between 6:00 and 7:00 p.m. on Fridays."

该用户命令可由控制逻辑224(诸如在该命令是数字个人助理120的本机命令时)、安装在计算设备130上的应用230(诸如在该命令与应用230相关联时)、或者web服务162(诸如在该命令与web服务162相关联时)执行。命令数据结构140可以指定哪些命令与哪些应用相关联以及该命令可以在前台还是后台执行。例如,命令数据结构140可以将用户语音命令映射到由可用第三方启用语音的应用所支持的功能。The user command can be executed by the control logic 224 (such as when the command is a native command of the digital personal assistant 120), the application 230 installed on the computing device 130 (such as when the command is associated with the application 230), or the web service 162 (such as when the command is associated with the web service 162). The command data structure 140 can specify which commands are associated with which applications and whether the command can be executed in the foreground or background. For example, the command data structure 140 can map the user voice command to the functions supported by available third-party voice-enabled applications.

在控制逻辑224确定用户命令与应用230的预定义的功能232相关联时,控制逻辑224可以使得应用230的预定义的功能232被执行。如果控制逻辑224确定应用230的预定义的功能232要作为后台进程来执行,则预定义的功能232可以在后台执行。例如,控制逻辑224可以通过唤起中断、写到共享存储器、写到消息队列、传递消息、或启动新执行线程(诸如经由OS内核210的进程管理组件)来向预定义的功能232发送请求240。应用230可以执行预定义的功能232并通过唤起中断、写到共享存储器、写到消息队列或传递消息来向控制逻辑224返回响应242。该响应可包括应用230的状态和/或对用户命令作出响应的其他信息。When the control logic 224 determines that the user command is associated with a predefined function 232 of the application 230, the control logic 224 may cause the predefined function 232 of the application 230 to be executed. If the control logic 224 determines that the predefined function 232 of the application 230 is to be executed as a background process, the predefined function 232 may be executed in the background. For example, the control logic 224 may send a request 240 to the predefined function 232 by invoking an interrupt, writing to a shared memory, writing to a message queue, passing a message, or starting a new execution thread (such as via a process management component of the OS kernel 210). The application 230 may execute the predefined function 232 and return a response 242 to the control logic 224 by invoking an interrupt, writing to a shared memory, writing to a message queue, or passing a message. The response may include the status of the application 230 and/or other information responsive to the user command.

在控制逻辑224确定该命令与web服务162相关联时,控制逻辑224可以使web服务162被调用。例如,请求260可通过OS内核210的联网组件被发送给web服务162。联网组件可以格式化该请求并通过网络170将该请求转发(诸如通过根据网络170的协议将该请求封装在网络分组中)到web服务162以执行该用户命令。请求260可包括多个步骤,诸如打开控制逻辑224与web服务162之间的通信信道(例如,套接字)以及发送与用户命令相关的信息。web服务162可以用能通过网络170传送并由联网组件作为回复262来转发到控制逻辑224的响应来对请求260作出响应。来自web服务162的响应可包括web服务162的状态以及对用户命令作出响应的其他信息。When the control logic 224 determines that the command is associated with the web service 162, the control logic 224 can cause the web service 162 to be called. For example, the request 260 can be sent to the web service 162 via the networking component of the OS kernel 210. The networking component can format the request and forward the request (such as by encapsulating the request in a network packet according to the protocol of the network 170) to the web service 162 via the network 170 to execute the user command. The request 260 can include multiple steps, such as opening a communication channel (e.g., a socket) between the control logic 224 and the web service 162 and sending information related to the user command. The web service 162 can respond to the request 260 with a response that can be transmitted via the network 170 and forwarded by the networking component as a reply 262 to the control logic 224. The response from the web service 162 can include the status of the web service 162 and other information responsive to the user command.

控制逻辑224可以(在UI输出渲染引擎228和OS内核210的渲染组件的辅助下)基于来自各应用的响应生成要呈现给用户的输出。例如,命令数据结构140可以将从各功能接收到的状态映射到来自语音控制的数字个人助理120的提供给用户的响应。一般而言,控制逻辑224可以将高级输出命令提供给UI输出渲染引擎228,UI输出渲染引擎228可产生送给OS内核210的渲染组件的较低级输出原语以用于显示器上的视觉输出、通过扬声器或耳机的音频和/或语音输出、以及来自电机的振动输出。例如,控制逻辑224可以将具有文本串的文本-到-语音命令发送给可生成模拟所讲语音的数字音频数据的UI输出渲染引擎228。The control logic 224 can generate output to be presented to the user based on responses from each application (with the assistance of the UI output rendering engine 228 and the rendering components of the OS kernel 210). For example, the command data structure 140 can map the status received from each function to the response provided to the user from the voice-controlled digital personal assistant 120. In general, the control logic 224 can provide high-level output commands to the UI output rendering engine 228, which can generate lower-level output primitives that are sent to the rendering components of the OS kernel 210 for visual output on the display, audio and/or voice output through speakers or headphones, and vibration output from motors. For example, the control logic 224 can send a text-to-speech command with a text string to the UI output rendering engine 228, which can generate digital audio data that simulates spoken speech.

控制逻辑224可以基于应用的状态来确定什么信息要提供给用户。各状态可对应于用户命令的开始、处理、确认、消歧、或完成。命令数据结构140可将应用的状态映射到要提供给用户的不同响应。可被提供的信息的类型包括例如显示文本、仿真语音、回到应用的深链接、到网页或网站的链接、以及基于超文本标记语言(HTML)的web内容。The control logic 224 can determine what information to provide to the user based on the state of the application. Each state can correspond to the start, processing, confirmation, disambiguation, or completion of a user command. The command data structure 140 can map the state of the application to different responses to be provided to the user. The types of information that can be provided include, for example, displayed text, simulated voice, deep links back to the application, links to web pages or websites, and web content based on Hypertext Markup Language (HTML).

示例应用状态Sample application status

图3是用于以无头方式与数字个人助理120对接的应用的示例状态机300的示图。该应用可以在热身状态310或初始状态320中开始。在数字个人助理120使得该应用热身时可进入热身状态310,诸如在知晓应用名称但所讲命令尚未讲完时。该应用将保持在热身状态310,直至热身操作完成。在热身操作完成时,该应用可转移到初始状态320。FIG3 is a diagram of an example state machine 300 for an application interfacing with a digital personal assistant 120 in a headless manner. The application may begin in a warm-up state 310 or an initial state 320. The warm-up state 310 may be entered when the digital personal assistant 120 warms up the application, such as when the application name is known but the spoken command is not yet complete. The application will remain in the warm-up state 310 until the warm-up operation is complete. When the warm-up operation is complete, the application may transition to the initial state 320.

在热身状态310完成之后或在数字个人助理120提供了对该应用的用户命令之后,可进入初始状态320。在初始状态320期间,用户命令被该应用处理。如果命令是没有歧义的但将花费比预定时间量更多的时间来完成(诸如5秒),则在命令正被执行时,状态可被转移到进行中状态330。如果命令是没有歧义的且可导致重要的或破坏性的操作被执行,则状态可转移到确认状态340。如果命令有点歧义,但歧义可通过在几个选项之间作出选择来澄清,则状态可转移到消歧状态350。如果命令是有歧义的且不能用几个选项来消歧,则状态可转移到最终状态360,诸如失败状态或重定向状态。如果命令不能被执行,则状态可转移到最终状态360,诸如失败状态。如果命令能在少于预定时间量的时间内完成且不需要请求来自用户的确认,则状态可转移到最终状态360,诸如成功状态。应当注意,最终状态360可以是具有多个条件的单个状态(诸如其中这些条件是成功、失败、重定向以及超时)或一群最终状态(诸如其中各状态是成功、失败、重定向以及超时)。After the warm-up state 310 is completed or after the digital personal assistant 120 provides a user command to the application, the initial state 320 may be entered. During the initial state 320, the user command is processed by the application. If the command is unambiguous but will take more than a predetermined amount of time to complete (such as 5 seconds), the state may transition to the in-progress state 330 while the command is being executed. If the command is unambiguous and may result in a significant or destructive operation being performed, the state may transition to the confirmation state 340. If the command is somewhat ambiguous, but the ambiguity can be clarified by selecting between several options, the state may transition to the disambiguation state 350. If the command is ambiguous and cannot be disambiguated using several options, the state may transition to a final state 360, such as a failure state or a redirection state. If the command cannot be executed, the state may transition to a final state 360, such as a failure state. If the command can be completed in less than a predetermined amount of time and does not require confirmation from the user, the state may transition to a final state 360, such as a success state. It should be noted that final state 360 can be a single state with multiple conditions (such as where the conditions are success, failure, redirection, and timeout) or a group of final states (such as where the states are success, failure, redirection, and timeout).

进行中状态330可指示用户命令的操作正被执行或正被尝试。应用可在进行中状态330期间通过向数字个人助理120发送文本-到-语音(TTS)串或图形用户界面(GUI)串以使得信息可使用数字个人助理120的用户接口而被呈现给用户,来向用户提供信息。作为补充或替换,默认信息(诸如转轮、沙漏、和/或取消按钮)可在进行中状态330期间使用数字个人助理120的用户接口被呈现给用户。The in-progress state 330 may indicate that the operation commanded by the user is being performed or attempted. The application may provide information to the user during the in-progress state 330 by sending a text-to-speech (TTS) string or a graphical user interface (GUI) string to the digital personal assistant 120 so that the information can be presented to the user using the user interface of the digital personal assistant 120. Additionally or alternatively, default information (such as a wheel, an hourglass, and/or a cancel button) may be presented to the user using the user interface of the digital personal assistant 120 during the in-progress state 330.

在进行中状态330期间,应用可以监视各操作的进度并确定该应用是可停留在进行中状态330中还是转移到最终状态360。在一个实施例中,该应用可以启动定时器(诸如达5秒),并且如果该应用在定时器期满之前没有做出足够进展,则状态可转移到最终状态360,诸如超时状态。如果该应用作出了足够进展,则定时器可被重启并且可在下一定时器期满时再次检查进度。应用可具有停留在进行中状态330中的最大时限,并且如果超过该最大时限,状态可转移到最终状态360,诸如超时状态。与用户命令相关联的操作可以完成(成功或不成功地)并且状态可转移到适当的最终状态360。当应用处于进行中状态330中时,用户可以通过向数字个人助理120的用户接口给出命令来终止该应用。例如,用户可以按下或点击显示器上的“取消”或“后退”按钮或者说出“取消”。取消该命令可以使得数字个人助理120停止该应用,并且显示数字个人助理120的主屏幕或者退出。During the in-progress state 330, the application can monitor the progress of various operations and determine whether the application can remain in the in-progress state 330 or transition to a final state 360. In one embodiment, the application can start a timer (such as for 5 seconds), and if the application does not make sufficient progress before the timer expires, the state can transition to a final state 360, such as a timed-out state. If the application makes sufficient progress, the timer can be restarted and progress can be checked again when the next timer expires. The application can have a maximum time limit for remaining in the in-progress state 330, and if the maximum time limit is exceeded, the state can transition to a final state 360, such as a timed-out state. The operation associated with the user command can complete (successfully or unsuccessfully) and the state can transition to the appropriate final state 360. While the application is in the in-progress state 330, the user can terminate the application by giving a command to the user interface of the digital personal assistant 120. For example, the user can press or click a "Cancel" or "Back" button on the display or speak "Cancel." Cancelling the command may cause the digital personal assistant 120 to stop the application and display the home screen of the digital personal assistant 120 or exit.

确认状态340可以指示该应用在完成任务之前正等待来自用户的确认。在数字个人助理120检测到应用处于确认状态340时,可使用数字个人助理120的用户接口向用户呈现用于是/否响应的提示。该应用可以向数字个人助理120提供作为具有是或否回答的问题的TTS串。数字个人助理120可以说出该应用所提供的TTS串并且可监听“是/否”回答。如果用户响应没有决定是或否回答,则数字个人助理120可继续询问用户该问题直至预定义次数(诸如三次)。如果所有尝试已被耗尽,则数字个人助理120可以说出默认短语,诸如“I’msorry,I don’t understand.Tap below to choose an answer(对不起,我不明白。轻击下方来选择回答)”并且数字个人助理120可以停止监听。如果用户轻击是或否,则数字个人助理120可以将该用户的选择发送给应用。如果用户轻击话筒图标,则数字个人助理120可再次尝试识别所讲回答(诸如通过复位对口头回答的尝试的次数进行计数的计数器)。数字个人助理120可以循环,直至存在匹配或者用户取消或点击显示屏上的后退按钮。如果应用接收到来自数字个人助理120的肯定响应,则该应用可以尝试完成任务。如果任务成功完成,则状态可转移到具有成功条件的最终状态360。如果任务未能成功完成或者该应用被取消,则状态可转移到具有失败条件的最终状态360。如果任务将花费高于预定时间量的时间才能完成,则状态可转移到进行中状态330,同时任务正被执行。The confirmation state 340 may indicate that the application is awaiting confirmation from the user before completing the task. When the digital personal assistant 120 detects that the application is in the confirmation state 340, a prompt for a yes/no response may be presented to the user using the user interface of the digital personal assistant 120. The application may provide the digital personal assistant 120 with a TTS string that is a question with a yes or no answer. The digital personal assistant 120 may speak the TTS string provided by the application and may listen for a "yes/no" answer. If the user response does not determine a yes or no answer, the digital personal assistant 120 may continue to ask the user the question up to a predefined number of times (such as three times). If all attempts have been exhausted, the digital personal assistant 120 may speak a default phrase, such as "I'm sorry, I don't understand. Tap below to choose an answer," and the digital personal assistant 120 may stop listening. If the user taps yes or no, the digital personal assistant 120 may send the user's selection to the application. If the user taps the microphone icon, the digital personal assistant 120 may again attempt to recognize the spoken answer (such as by resetting a counter that counts the number of attempts to speak an answer). The digital personal assistant 120 may loop until there is a match or the user cancels or clicks a back button on the display screen. If the application receives a positive response from the digital personal assistant 120, the application may attempt to complete the task. If the task is successfully completed, the state may transition to a final state 360 with a success condition. If the task is not successfully completed or the application is canceled, the state may transition to a final state 360 with a failure condition. If the task will take longer than a predetermined amount of time to complete, the state may transition to an in-progress state 330 while the task is being performed.

消歧状态350可以指示应用在完成任务之前正等待用户在有限数目(诸如10个或更少)的选项之间进行澄清。应用可以向数字个人助理120提供TTS串、GUI串和/或用户从中选择的项列表。项列表可作为具有针对每一项提供给用户的一个或多个信息段(诸如标题、描述、和/或图标)的模板来提供。数字个人助理120可以使用该应用所提供的信息来向用户呈现项列表。数字个人助理120可以提示并监听来自用户的选择。用户可以使用灵活或非灵活选择来从列表中进行选择。非灵活选择意指用户只能以一种方式从列表选择,而灵活选择意指用户能以多种不同的方式从列表中进行选择。例如,用户可以基于各项被列出的数字次序来从列表中进行选择,诸如通过说出“第一”或“第二”以分别选择第一项或第二项。作为另一示例,用户可以基于各项之间的空间关系从列表进行选择,诸如“底部那个”、“顶部那个”、“右边那个”或者“从底部开始第二个”。作为另一示例,用户可以通过说出项的标题来从列表进行选择。The disambiguation state 350 may indicate that the application is waiting for the user to clarify between a limited number of options (such as 10 or fewer) before completing the task. The application may provide the digital personal assistant 120 with a TTS string, a GUI string, and/or a list of items from which the user selects. The list of items may be provided as a template with one or more information segments (such as a title, description, and/or icon) provided to the user for each item. The digital personal assistant 120 may use the information provided by the application to present the list of items to the user. The digital personal assistant 120 may prompt and listen for selections from the user. The user may select from the list using flexible or inflexible selection. Inflexible selection means that the user can only select from the list in one way, while flexible selection means that the user can select from the list in a variety of different ways. For example, the user may select from the list based on the numerical order in which the items are listed, such as by saying "first" or "second" to select the first or second item, respectively. As another example, the user may select from the list based on the spatial relationship between the items, such as "the one at the bottom," "the one at the top," "the one to the right," or "the second from the bottom." As another example, a user may select an item from a list by speaking its title.

作为消歧的具体示例,用户可以向数字个人助理120说出“影片应用,将影片X添加到我的队列(Movie-Application,add Movie-X to my queue)”。然而,可存在影片X(Movie-X)的三个版本,诸如原始版本以及两个续集:影片-X I、影片-X II、以及影片-XIII。响应于该所讲命令,数字个人助理120可以在后台使用将影片X添加到队列的命令来启动影片应用。影片应用可搜索影片X并确定存在三个版本。因而,影片应用可转移到消歧状态350并且将这三个备选选择发送给数字个人助理120。数字个人助理120可通过其用户接口向用户呈现这三个选择并且可以可从列表选择一个。在用户作出正确选择时,数字个人助理120可以将该响应发送给影片应用并且正确的影片可被添加到队列。As a specific example of disambiguation, a user may say to the digital personal assistant 120, "Movie-Application, add Movie-X to my queue." However, there may be three versions of Movie-X, such as the original version and two sequels: Movie-XI, Movie-XII, and Movie-XIII. In response to the spoken command, the digital personal assistant 120 may launch the Movie-Application in the background with a command to add Movie-X to the queue. The Movie-Application may search for Movie-X and determine that there are three versions. Thus, the Movie-Application may transition to a disambiguation state 350 and send the three alternative selections to the digital personal assistant 120. The digital personal assistant 120 may present the three selections to the user through its user interface and may select one from the list. When the user makes the correct selection, the digital personal assistant 120 may send the response to the Movie-Application and the correct movie may be added to the queue.

如果用户响应不能被决定到列表上的项,则数字个人助理120可继续询问用户该问题直至预定义次数。如果所有尝试已被耗尽,则数字个人助理120可以说出默认短语,诸如“I’m sorry,I don’t understand.Tap below to choose an answer(对不起,我不明白。轻击下方来选择回答)”并且数字个人助理120可以停止监听。如果用户轻击所显示的列表上的各项之一,则数字个人助理120可以将该用户的选择发送给应用。如果用户轻击话筒图标,则数字个人助理120可再次尝试识别所讲回答(诸如通过复位对口头回答的尝试的次数进行计数的计数器)。数字个人助理120可以循环,直至存在匹配或者用户取消或点击显示屏上的后退按钮。如果应用接收到来自数字个人助理120的有效响应,则该应用可以尝试完成任务。如果任务在采取动作之前需要用户确认,则状态可转移到确认状态340。如果任务成功完成,则状态可转移到具有成功条件的最终状态360。如果任务未能成功完成或者该应用被取消,则状态可转移到具有失败条件的最终状态360。如果任务将花费高于预定时间量的时间才能完成,则状态可转移到进行中状态330,同时任务正被执行。If the user response cannot be determined to an item on the list, the digital personal assistant 120 may continue to ask the user the question up to a predefined number of times. If all attempts have been exhausted, the digital personal assistant 120 may utter a default phrase, such as "I'm sorry, I don't understand. Tap below to choose an answer," and the digital personal assistant 120 may stop listening. If the user taps one of the items on the displayed list, the digital personal assistant 120 may send the user's selection to the application. If the user taps the microphone icon, the digital personal assistant 120 may again attempt to identify the spoken answer (such as by resetting a counter that counts the number of attempts to speak a spoken answer). The digital personal assistant 120 may loop until a match is found or the user cancels or clicks the back button on the display. If the application receives a valid response from the digital personal assistant 120, the application may attempt to complete the task. If the task requires user confirmation before taking action, the state may transition to the confirmation state 340. If the task is successfully completed, the state may transition to a final state 360 with a success condition. If the task fails to complete successfully or the application is canceled, the state may transition to a terminal state with a failure condition 360. If the task will take longer than a predetermined amount of time to complete, the state may transition to an in-progress state 330 while the task is being executed.

应当理解,示例状态机300可以用附加或另选状态来扩展以启用用户与应用之间的各种多轮对话。消歧(经由消歧状态350)和确认(经由确认状态340)是多轮对话的具体示例。一般而言,在多轮对话中,无头应用可向用户请求附加信息而不使其用户接口出现。相反,可以通过代表该应用的数字个人助理120从用户获得信息。因而,数字个人助理120可充当用户与应用之间的管道。It should be understood that the example state machine 300 can be extended with additional or alternative states to enable various multi-turn conversations between a user and an application. Disambiguation (via the disambiguation state 350) and confirmation (via the confirmation state 340) are specific examples of multi-turn conversations. Generally speaking, in a multi-turn conversation, a headless application can request additional information from the user without presenting its user interface. Instead, the information can be obtained from the user via the digital personal assistant 120 on behalf of the application. Thus, the digital personal assistant 120 can act as a conduit between the user and the application.

最终状态360可以指示该应用已经成功完成任务、未能完成任务、已超时、或正指出该应用应当在前台启动(重定向)。如上所述,最终状态360可以是具有多个条件的单个状态(例如,成功、失败、重定向以及超时)或一群最终状态(例如,成功、失败、重定向以及超时)应用可以向数字个人助理120提供TTS串、GUI串、项列表(经由模板提供)和/或启动参数。数字个人助理120可以使用数字个人助理120的用户接口向用户呈现由该应用提供的信息。作为补充或替换,数字个人助理120可以呈现与不同条件相关联地预定义或录制响应。例如,如果发生超时或任务失败,则数字个人助理120可以说出“Sorry!I couldn’t getthat done for you.Can you please try again later?(对不起!我不能为你完成。能请您稍候重试吗?)”。作为另一示例,如果应用正请求重定向,则数字个人助理120可以说出“Sorry.<appName>is not responding.Launching<appName>(对不起。<应用名>没有响应。正启动<应用名>)”并且数字个人助理120可尝试用初始语音命令和启动参数(如果该应用提供了启动参数的话)在前台启动该应用。作为另一示例,如果应用成功完成任务,则数字个人助理120可以说出“I’ve done that for you(我已为你完成)”。The final state 360 may indicate that the application has successfully completed the task, failed to complete the task, has timed out, or is indicating that the application should be launched in the foreground (redirected). As described above, the final state 360 may be a single state with multiple conditions (e.g., success, failure, redirection, and timeout) or a group of final states (e.g., success, failure, redirection, and timeout). The application may provide a TTS string, a GUI string, a list of items (provided via a template), and/or launch parameters to the digital personal assistant 120. The digital personal assistant 120 may present the information provided by the application to the user using the user interface of the digital personal assistant 120. Additionally or alternatively, the digital personal assistant 120 may present predefined or recorded responses associated with different conditions. For example, if a timeout occurs or a task fails, the digital personal assistant 120 may say, "Sorry! I couldn't get that done for you. Can you please try again later?" As another example, if the application is requesting a redirect, the digital personal assistant 120 can say, "Sorry. <appName> is not responding. Launching <appName>" and the digital personal assistant 120 can attempt to launch the application in the foreground using the initial voice command and launch parameters (if provided by the application). As another example, if the application successfully completes the task, the digital personal assistant 120 can say, "I've done that for you."

示例命令定义Example command definition

图4是符合一模式的命令定义400的示例,它可被用来创建用于允许第三方应用与数字个人助理120之间的对接的数据结构(诸如命令数据结构140)。命令定义400可以用各种语言来编写,诸如由模式定义的可扩展标记语言(XML)或XML子集。例如,该模式可以定义命令定义的结构,诸如法定元素、元素的分层结构、每一元素的法定和可任选属性、以及其他合适的准则。命令定义400可由数字个人助理120用来辅助将用户话语解析成不同分量,诸如应用、命令或任务、以及数据项或时隙,其中数据项是可任选的。例如,命令“MovieAppService,add MovieX to my queue(影片应用服务,将影片X添加到我的队列)”可被解析成应用(“MovieAppService(影片应用服务)”、命令(“Add(添加)”以及数据项(“MovieX(影片X)”)。命令定义400可包括用于定义应用名、该应用的任务或命令、用于自然语言处理的备选短语、以及与不同应用状态相关联的响应的各元素。FIG4 is an example of a command definition 400 that conforms to a schema that can be used to create a data structure (such as command data structure 140) for enabling interfacing between a third-party application and a digital personal assistant 120. The command definition 400 can be written in various languages, such as Extensible Markup Language (XML) or a subset of XML defined by a schema. For example, the schema can define the structure of the command definition, such as mandatory elements, a hierarchical structure of elements, mandatory and optional attributes for each element, and other suitable criteria. The command definition 400 can be used by the digital personal assistant 120 to assist in parsing a user utterance into different components, such as an application, a command or task, and a data item or time slot, where the data item is optional. For example, the command "MovieAppService, add MovieX to my queue" may be parsed into an application ("MovieAppService", a command ("Add"), and a data item ("MovieX"). Command definition 400 may include elements defining an application name, a task or command for the application, alternative phrases for natural language processing, and responses associated with different application states.

一个或多个应用可被定义在命令定义400中。应用可以是安装在该计算设备上或是web服务的第三方或其他应用。与该应用相关的信息可以用定义该应用的元素来界定。例如,应用名可以由<AppName>元素来定义,并且<AppName>元素之间的元素可以与开头<AppName>元素相关联。在命令定义400中,应用名是“MovieAppService(影片应用服务)”,且<AppName>元素之后的元素与“MovieAppService”应用相关联。One or more applications can be defined in command definition 400. An application can be a third-party or other application installed on the computing device or a web service. Information related to the application can be defined using the elements that define the application. For example, the application name can be defined by the <AppName> element, and the elements between the <AppName> element can be associated with the leading <AppName> element. In command definition 400, the application name is "MovieAppService", and the elements after the <AppName> element are associated with the "MovieAppService" application.

应用名之后的命令是该应用的命令。命令可以用<Command>(命令)元素来标识。命令元素的属性可包括该命令的名称(例如,“Name”)和该命令的激活类型(例如,“ActivationType”)。例如,对于要在前台启动的命令,激活类型可以是“前台”,且对于要在后台启动的命令,激活类型可以是“后台”。“ActivationType”属性可以是可任选的,其中默认激活类型是前台。The commands following the application name are the commands for that application. A command can be identified using the <Command> element. Attributes of the command element can include the name of the command (e.g., "Name") and the activation type of the command (e.g., "ActivationType"). For example, for a command to be launched in the foreground, the activation type can be "foreground," and for a command to be launched in the background, the activation type can be "background." The "ActivationType" attribute can be optional, with the default activation type being foreground.

<ListenFor>(监听)元素可被嵌套在<Command>元素内并且可被用来定义可讲出该命令的一种或多种方式。在执行自然语言处理时,可任选或载体词语可作为提示被提供给数字个人助理120。载体词语可被标识在方括号:[]内。数据项可被标识在花括号:{}内。在命令定义400中,一般存在调用“Add(添加)”命令的两种备选方式,如两个<ListenFor>元素所定义的。例如,说出“add MovieX to my queue(将影片X添加到我的队列)”或“addMovieX to my MovieAppService queue(将影片X添加到我的影片应用服务队列)”可被用来使数字个人助理120在后台启动MovieAppService(影片应用服务)的“Add”命令。应当注意,预定义短语可以用一组括号内的关键字“builtIn:(内置)”来标识:{builtIn:<phraseidentifier>}(内置:<短语标识符>)。The <ListenFor> element can be nested within a <Command> element and can be used to define one or more ways that the command can be spoken. When performing natural language processing, optional or carrier terms can be provided to the digital personal assistant 120 as prompts. Carrier terms can be identified within square brackets: []. Data items can be identified within curly braces: {}. In command definition 400, there are generally two alternative ways to invoke the "Add" command, as defined by the two <ListenFor> elements. For example, saying "add MovieX to my queue" or "addMovieX to my MovieAppService queue" can be used to cause the digital personal assistant 120 to launch the "Add" command for the MovieAppService in the background. It should be noted that predefined phrases can be identified using the keyword "builtIn:" within a set of brackets: {builtIn:<phraseidentifier>}.

<Feedback>(反馈)元素可被嵌套在<Command>元素内并且可被用来定义在数字个人助理120已成功识别出来自用户的所讲命令时要向用户说出的短语。作为补充或替换,<Feedback>元素可以定义在所讲命令正被数字个人助理120解析时要显示给用户的文本串。The <Feedback> element can be nested within a <Command> element and can be used to define a phrase to be spoken to the user when the digital personal assistant 120 has successfully recognized a spoken command from the user. Additionally or alternatively, the <Feedback> element can define a text string to be displayed to the user while the spoken command is being parsed by the digital personal assistant 120.

<Response>(响应)元素可被嵌套在<Command>元素内且可被用来定义由数字个人助理120提供给用户的一个或多个响应。每一响应与由“State(状态)”属性所定义的应用状态相关联。状态可以用于最终状态(诸如成功和失败)或用于中间状态(诸如进行中)。可定义多种类型的响应,诸如例如用于将文本显示在屏幕上的<DisplayString>(显示串)、用于将被说给用户的文本的<TTSString>、用于到网站的深链接的<AppDeepLink>、以及用于到网站的较不深的链接的<WebLink>。由<Response>元素定义的响应可以用由该应用提供的附加响应信息来扩充。The <Response> element can be nested within a <Command> element and can be used to define one or more responses provided to the user by the digital personal assistant 120. Each response is associated with an application state defined by the "State" attribute. States can be for final states (such as success and failure) or for intermediate states (such as in progress). Multiple types of responses can be defined, such as, for example, <DisplayString> for text displayed on the screen, <TTSString> for text to be spoken to the user, <AppDeepLink> for a deep link to a website, and <WebLink> for a less deep link to a website. The response defined by the <Response> element can be augmented with additional response information provided by the application.

示例序列图Example sequence diagram

图5是解说用于从数字个人助理120内无头地执行第三方应用的功能的多个执行线程(510、520以及530)的通信的示例序列图500。UI线程510和控制线程520可以是数字个人助理120的多线程实施例的并行线程。UI线程510可主要负责捕捉来自数字个人助理120的用户接口的输入并将输出显示到该用户接口。例如,语音输入、触觉输入、和/或文本输入可由UI线程510捕捉。在一个实施例中,UI线程510可以对该输入执行自然语言处理并可以将用户所讲的命令与命令数据结构140中的命令相匹配。在所讲命令被确定与命令数据结构140中的命令相匹配时,该命令可被传递到控制线程520以供进一步处理。在一替换实施例中,UI线程510可以捕捉语音到文本输入,并且各单独词语可被传递给控制线程520,控制线程520可对该输入执行自然语言处理并可将用户所讲的命令与命令数据结构140中的命令相匹配。FIG5 is an example sequence diagram 500 illustrating the communication of multiple execution threads (510, 520, and 530) for headlessly executing the functionality of a third-party application from within the digital personal assistant 120. The UI thread 510 and the control thread 520 may be parallel threads of a multi-threaded embodiment of the digital personal assistant 120. The UI thread 510 may be primarily responsible for capturing input from the user interface of the digital personal assistant 120 and displaying output to the user interface. For example, voice input, tactile input, and/or text input may be captured by the UI thread 510. In one embodiment, the UI thread 510 may perform natural language processing on the input and may match the user's spoken command with a command in the command data structure 140. When the spoken command is determined to match a command in the command data structure 140, the command may be passed to the control thread 520 for further processing. In an alternative embodiment, the UI thread 510 may capture speech-to-text input, and the individual words may be passed to the control thread 520 , which may perform natural language processing on the input and may match the user-spoken commands with commands in the command data structure 140 .

控制线程520可主要负责与应用进行通信并跟踪应用的进度并且与UI线程510对接。例如,UI线程510可以向控制线程520通知用户已向数字个人助理120的用户接口讲话。词语或命令可由控制线程520接收并且控制线程520可以向UI线程510通知用户命令何时已被控制线程520识别。UI线程510可以经由数字个人助理120的用户接口向用户指示对该命令作出的进度。UI线程510或控制线程520可以通过检索来自命令数据结构140的命令的属性来确定该命令要被无头地启动。在命令要被无头地启动时,控制线程520可以开始新线程或与现有线程(诸如AppService(应用服务)线程530)通信。为降低对用户的响应时间,AppService线程530是现有线程可以是合乎需要的,而非使控制线程520开始新线程。例如,AppService线程530可以在使应用热身时或者在计算设备130的引导期间被启动。The control thread 520 may be primarily responsible for communicating with applications and tracking their progress, interfacing with the UI thread 510. For example, the UI thread 510 may notify the control thread 520 that a user has spoken into the user interface of the digital personal assistant 120. Words or commands may be received by the control thread 520, and the control thread 520 may notify the UI thread 510 when the user command has been recognized by the control thread 520. The UI thread 510 may indicate the progress of the command to the user via the user interface of the digital personal assistant 120. The UI thread 510 or the control thread 520 may determine that the command is to be launched headlessly by retrieving the command's attributes from the command data structure 140. When a command is to be launched headlessly, the control thread 520 may start a new thread or communicate with an existing thread, such as the AppService thread 530. To improve response time for the user, it may be desirable for the AppService thread 530 to be an existing thread, rather than for the control thread 520 to start a new thread. For example, the AppService thread 530 may be started when warming up the application or during booting of the computing device 130 .

AppService线程530可以在计算设备130上执行或可以在远程服务器(诸如远程服务器计算机160)上执行。AppService线程530可主要负责完成由用户命令指定的功能。AppService线程530可维持状态机(诸如状态机300)来跟踪该功能的执行进度,并且可以向控制线程520提供与状态有关的更新。通过向控制线程520提供状态更新,AppService线程530可以是无头的,其中给用户的输出由数字个人助理120提供而非AppService线程530的用户接口。AppService thread 530 may execute on computing device 130 or may execute on a remote server (such as remote server computer 160). AppService thread 530 may be primarily responsible for completing the function specified by the user command. AppService thread 530 may maintain a state machine (such as state machine 300) to track the progress of the execution of the function and may provide updates related to the state to control thread 520. By providing state updates to control thread 520, AppService thread 530 may be headless, where the output to the user is provided by digital personal assistant 120 rather than the user interface of AppService thread 530.

控制线程520可以通过接收来自应用的状态更新并检查该应用是否有进展来跟踪该应用(例如,AppService线程530)的进度。例如,控制线程520可以每次它与AppService线程530通信(发送信息给AppService线程530或从AppService线程530接收信息)时就启动预定义历时(诸如5秒)的定时器。如果定时器在AppService线程530作出响应之前期满,则控制线程520可以向UI线程510指示该应用未能作出响应并且UI线程510可以经由数字个人助理120的用户接口向用户呈现失败消息。在定时器期满之后,AppService线程530可被控制线程520终止或忽略。或者,如果AppService线程530在定时器期满之前作出响应,则在预期来自该应用的另一响应的情况下(诸如在应用以进行中状态来作出响应时),定时器可被复位,或者定时器可被取消(诸如在该应用已完成功能(最终状态)时或在请求用户响应时(确认或消歧状态))。The control thread 520 can track the progress of an application (e.g., AppService thread 530) by receiving status updates from the application and checking whether the application has made progress. For example, the control thread 520 can start a timer of a predefined duration (e.g., 5 seconds) each time it communicates with the AppService thread 530 (sending information to or receiving information from the AppService thread 530). If the timer expires before the AppService thread 530 responds, the control thread 520 can indicate to the UI thread 510 that the application failed to respond, and the UI thread 510 can present a failure message to the user via the user interface of the digital personal assistant 120. After the timer expires, the AppService thread 530 can be terminated or ignored by the control thread 520. Alternatively, if the AppService thread 530 responds before the timer expires, the timer can be reset if another response from the application is expected (e.g., if the application responds in the in-progress state), or the timer can be canceled (e.g., when the application has completed a function (a final state) or when a user response is requested (a confirmation or disambiguation state)).

在控制线程520从AppService线程530接收到确认或消歧状态时,控制线程520可以向UI线程510指示向用户请求确认或消歧。UI线程510可以经由数字个人助理120的用户接口将该确认或消歧选择呈现给用户。在用户作出响应或未能作出响应时,UI线程510可以向控制线程520提供该用户响应或者确定没有响应。控制线程520可将用户响应传递给AppService线程530以使得AppService线程530可以执行功能。如果用户未能作出响应,则控制线程520可以终止AppService线程530。When the control thread 520 receives the confirmation or disambiguation status from the AppService thread 530, the control thread 520 may indicate to the UI thread 510 that confirmation or disambiguation is being requested from the user. The UI thread 510 may present the confirmation or disambiguation selection to the user via the user interface of the digital personal assistant 120. When the user responds or fails to respond, the UI thread 510 may provide the user response to the control thread 520 or determine that there is no response. The control thread 520 may pass the user response to the AppService thread 530 so that the AppService thread 530 can perform the function. If the user fails to respond, the control thread 520 may terminate the AppService thread 530.

UI线程510可以经由数字个人助理120的用户接口显示各种类型的输出。例如,UI线程510可以生成音频输出,诸如来自文本的数字仿真语音输出。数字仿真语音可被发送给可将该数字仿真语音转换成模拟信号(诸如使用数模转换器)的音频处理芯片,该模拟信号可经由扬声器或耳机被输出为声音。作为另一示例,UI线程510可以提供视觉输出,诸如用于由用户在计算设备130的显示屏上查看的图像、动画、文本输出、以及超链接。如果超链接被轻击或点击,UI线程510可启动浏览器应用以查看与所选超链接相对应的网站。作为另一示例,UI线程510可以生成触觉输出,诸如通过向可使得计算设备130振动的电机发送振动信号。The UI thread 510 can display various types of outputs via the user interface of the digital personal assistant 120. For example, the UI thread 510 can generate audio outputs, such as digital simulated voice outputs from text. The digital simulated voice can be sent to an audio processing chip that can convert the digital simulated voice into an analog signal (such as using a digital-to-analog converter), and the analog signal can be output as sound via a speaker or headphones. As another example, the UI thread 510 can provide visual outputs, such as images, animations, text outputs, and hyperlinks for viewing on a display screen of the computing device 130 by a user. If a hyperlink is tapped or clicked, the UI thread 510 can start a browser application to view the website corresponding to the selected hyperlink. As another example, the UI thread 510 can generate tactile outputs, such as by sending a vibration signal to a motor that can cause the computing device 130 to vibrate.

用于无头任务完成的示例方法Example method for headless task completion

图6是用于在数字个人助理120的后台无头地完成应用的任务的示例方法600的流程图。在610,可由数字个人助理120接收由用户生成的语音输入。语音输入可在计算设备130本地捕捉或从计算设备130远程地捕捉。作为一个示例,由用户生成的语音输入可由计算设备130的话筒150在本地捕捉并由模数转换器来数字化。作为另一示例,由用户生成的语音输入可由无线地连接到计算设备130的话筒(诸如由蓝牙伴随设备)远程地捕捉。数字个人助理120可通过在数字个人助理120的用户接口处录入的语音和/或文本来控制。6 is a flow diagram of an example method 600 for headlessly completing tasks for an application in the background of a digital personal assistant 120. At 610, voice input generated by a user may be received by the digital personal assistant 120. The voice input may be captured locally on the computing device 130 or remotely from the computing device 130. As one example, the voice input generated by the user may be captured locally by the microphone 150 of the computing device 130 and digitized by an analog-to-digital converter. As another example, the voice input generated by the user may be captured remotely by a microphone wirelessly connected to the computing device 130 (such as by a Bluetooth companion device). The digital personal assistant 120 may be controlled by voice and/or text entered at the user interface of the digital personal assistant 120.

在620,可以执行对语音输入的自然语言处理以确定用户语音命令。用户语音命令可包括执行应用(诸如第三方启用语音的应用)的预定义功能的请求。预定义功能可以使用数据结构来标识,该数据结构定义由数字个人助理120支持的应用和应用的功能。例如,兼容应用可被标识在命令定义文件中,诸如命令定义400。通过使用可扩展命令定义文件来定义可由数字个人助理120无头地执行的第三方应用的功能,数字个人助理120可以使用户能够使用数字个人助理120的用户接口执行更多任务。At 620, natural language processing of the speech input can be performed to determine a user voice command. The user voice command can include a request to execute a predefined function of an application (such as a third-party voice-enabled application). The predefined functions can be identified using a data structure that defines the applications and functions of the applications supported by the digital personal assistant 120. For example, compatible applications can be identified in a command definition file, such as command definition 400. By using extensible command definition files to define functions of third-party applications that can be executed headlessly by the digital personal assistant 120, the digital personal assistant 120 can enable the user to perform more tasks using the user interface of the digital personal assistant 120.

在630,数字个人助理120可以使该应用无头地执行预定义功能,而不使得该应用的用户接口出现在计算设备130的显示器上。数字个人助理120可以确定无头地执行该应用,因为该应用在命令数据结构140中被定义为无头的或者因为用户正以免手模式使用计算设备且在前台执行应用可能潜在地使用户分心。例如,数字个人助理120可以调用web服务来执行该应用的预定义功能。作为另一示例,数字个人助理120可以在确定了用户命令之后在计算设备130上启动新线程来执行该应用的预定义功能。作为又一示例,数字个人助理120可以与现有线程(诸如在该应用的热身期间启动的线程)通信,以执行该应用的预定义功能。预定义功能可作为后台进程来执行。应用可以监视预定义功能的进度,诸如通过跟踪预定义功能的状态。At 630, the digital personal assistant 120 may cause the application to execute the predefined function headlessly without causing the user interface of the application to appear on the display of the computing device 130. The digital personal assistant 120 may determine to execute the application headlessly because the application is defined as headless in the command data structure 140 or because the user is using the computing device in hands-free mode and executing the application in the foreground may potentially distract the user. For example, the digital personal assistant 120 may call a web service to execute the predefined function of the application. As another example, the digital personal assistant 120 may start a new thread on the computing device 130 to execute the predefined function of the application after determining the user command. As yet another example, the digital personal assistant 120 may communicate with an existing thread (such as a thread started during the warm-up period of the application) to execute the predefined function of the application. The predefined function may be executed as a background process. The application may monitor the progress of the predefined function, such as by tracking the status of the predefined function.

在640,可以从应用接收指示与预定义功能相关联的状态的响应。例如,状态可包括热身、初始、进行中、确认、消歧以及最终状态。响应可包括附加信息,诸如模板化列表、文本串、文本-到-语音串、图像、超链接、或可经由数字个人助理120的用户接口显示给用户的其他合适信息。At 640, a response may be received from the application indicating a status associated with the predefined function. For example, the status may include warm-up, initial, in-progress, confirmation, disambiguation, and final status. The response may include additional information, such as a templated list, a text string, a text-to-speech string, an image, a hyperlink, or other suitable information that can be displayed to the user via the user interface of the digital personal assistant 120.

在650,数字个人助理120的用户接口可以基于接收到的与预定义功能相关联的状态来向用户提供响应。以此方式,响应可来自数字个人助理120的用户接口的上下文内,而不出现该应用的用户接口。此外,数字个人助理120的确认和消歧能力可被用来确认和/或澄清针对该应用的用户命令。At 650, the user interface of the digital personal assistant 120 can provide a response to the user based on the received status associated with the predefined function. In this way, the response can come from within the context of the user interface of the digital personal assistant 120 without the user interface of the application appearing. In addition, the confirmation and disambiguation capabilities of the digital personal assistant 120 can be used to confirm and/or clarify the user command to the application.

用于确定是否使应用热身的示例方法Example method for determining whether to warm up an application

图7是用于确定在用户正向数字个人助理120讲话时是否使应用热身的示例方法700的流程图。在710,用户可以向数字个人助理120键入、发声或讲话。可使用自然语言处理技术来分析用户的文本或语音并且可以从该语音识别出各单独词语。各单独词语可被分开地且在它们正被讲出的中间阶段中分析。例如,用户可以说出“hey Assistant,MyApp,do...(嗨助理,我的应用,做……)”。词语“hey”可以是载体词并且被丢弃。词语“Assistant”可被用来使数字个人助理120知晓用户正请求它执行动作。词语“MyApp”可被解释为应用。7 is a flow chart of an example method 700 for determining whether to warm up an application when a user is speaking to a digital personal assistant 120. At 710, the user may type, speak, or talk to the digital personal assistant 120. Natural language processing techniques may be used to analyze the user's text or speech and to identify individual words from the speech. Individual words may be analyzed separately and in the middle of being spoken. For example, a user may say "hey Assistant, MyApp, do..." The word "hey" may be a carrier word and is discarded. The word "Assistant" may be used to let the digital personal assistant 120 know that the user is requesting it to perform an action. The word "MyApp" may be interpreted as an application.

在720,所键入或讲出的词语可以与数字个人助理120的本机功能以及可扩展命令定义中提供的功能相比较。本机功能和命令定义文件中定义的功能可被统称为“已知AppService(应用服务)”。所讲词语可在各词语正被说出时被分析并与已知AppService相比较。换言之,对语音的分析可以发生在整个短语被用户讲出或键入之前。如果没有已知AppService是匹配的,则在730,数字个人助理120可打开web浏览器以使用与未被识别的所讲短语相对应的搜索串来检索搜索引擎网页。程序控制可被转移到web浏览器以使得用户可以细化web搜索和/或查看结果。然而,如果已知AppService是匹配的,则方法700可在740继续。At 720, the typed or spoken words can be compared with the native functions of the digital personal assistant 120 and the functions provided in the extensible command definition. The native functions and the functions defined in the command definition file can be collectively referred to as "known AppServices". The spoken words can be analyzed and compared with known AppServices as each word is being spoken. In other words, the analysis of the speech can occur before the entire phrase is spoken or typed by the user. If no known AppService is a match, at 730, the digital personal assistant 120 can open a web browser to retrieve a search engine web page using a search string corresponding to the unrecognized spoken phrase. Program control can be transferred to the web browser so that the user can refine the web search and/or view the results. However, if a known AppService is a match, method 700 can continue at 740.

在740,可以确定AppService应用是前台还是后台任务。例如,命令定义可包括将AppService应用定义为前台或后台应用的属性。如果AppService应用是前台任务,则在750,AppService应用可以在前台启动并且控制可被转移到AppService应用以完成该命令。如果AppService应用是后台任务,则方法700可以用并行步骤760和770继续。At 740, it can be determined whether the AppService application is a foreground or background task. For example, the command definition can include a property defining the AppService application as a foreground or background application. If the AppService application is a foreground task, then at 750, the AppService application can be launched in the foreground and control can be transferred to the AppService application to complete the command. If the AppService application is a background task, method 700 can continue with parallel steps 760 and 770.

在760,数字个人助理120可以向用户提供与语音分析有关的信息。具体而言,数字个人助理120可以生成用于数字个人助理120的用户接口的进行中屏幕的输出。输出可被定义在例如命令定义的嵌套在<Command>元素内的<Feedback>元素中。输出可以是文本串且可以随着用户继续讲话而持续更新。At 760, the digital personal assistant 120 may provide the user with information related to the speech analysis. Specifically, the digital personal assistant 120 may generate output for the in-progress screen of the user interface of the digital personal assistant 120. The output may be defined, for example, in a <Feedback> element nested within a <Command> element of a command definition. The output may be a text string and may be continuously updated as the user continues to speak.

在770,数字个人助理120可以使AppService应用热身而不等待用户话语结束。使AppService应用热身可包括分配存储器、预取指令、建立通信会话、从数据库检索信息、启动新执行线程、唤起中断、或其他合适的因应用而异的操作。该应用可基于投机性功能来热身。例如,与投机性功能相对应的指令可被获取,即使不确信该功能是所知道的。通过在用户完成所讲命令之前使应用热身,对用户作出响应的时间可潜在地降低。At 770, the digital personal assistant 120 may warm up the AppService application without waiting for the user to finish speaking. Warming up the AppService application may include allocating memory, prefetching instructions, establishing a communication session, retrieving information from a database, starting a new execution thread, invoking an interrupt, or other suitable application-specific operations. The application may be warmed up based on speculative functionality. For example, instructions corresponding to a speculative functionality may be retrieved even if there is no certainty that the functionality is known. By warming up the application before the user completes the spoken command, the time to respond to the user can potentially be reduced.

在780,数字个人助理120可继续解析部分语音识别结果,直至话语完成。可基于被解析的命令和/或基于来自用户的暂停达预定时间量以上,来检测话语的结束。例如,可在识别出词语“queue(队列)”时检测到命令“MovieAppService,add MovieX to my queue”的结束。作为另一示例,命令“TextApp,text my wife that I will be home late fordinner(文本应用,用文本通知我妻子我将晚回家吃饭)”的结束可能更难以检测,因为该命令以未知长度的数据项结束。因而,暂停可被使用以向数字个人助理120指示该命令完成。At 780, the digital personal assistant 120 may continue parsing the partial speech recognition results until the utterance is complete. The end of the utterance may be detected based on the command being parsed and/or based on a pause from the user for more than a predetermined amount of time. For example, the end of the command "MovieAppService, add MovieX to my queue" may be detected when the word "queue" is recognized. As another example, the end of the command "TextApp, text my wife that I will be home late for dinner" may be more difficult to detect because the command ends with a data item of unknown length. Thus, a pause may be used to indicate to the digital personal assistant 120 that the command is complete.

在790,所讲命令的结束可被检测到且最终语音识别结果可被传递给该应用。该应用和数字个人助理120可彼此通信以完成所讲命令,如参考在前附图所描述的。The end of the spoken command can be detected and the final speech recognition result can be passed to the application at 790. The application and the digital personal assistant 120 can communicate with each other to complete the spoken command, as described with reference to the previous figures.

计算系统Computing System

图8描绘了其中可实现所描述的创新的合适的计算系统800的一般化示例。计算系统800并不旨对使用范围或功能提出任何限制,因为这些创新可以在不同的通用或专用计算系统中实现。8 depicts a generalized example of a suitable computing system 800 in which the described innovations may be implemented. Computing system 800 is not intended to suggest any limitation as to scope of use or functionality, as these innovations may be implemented in various general-purpose or special-purpose computing systems.

参考图8,计算系统800包括一个或多个处理单元810、815和存储器820、825。在图8中,该基本配置830被包括在虚线内。处理单元810、815执行计算机可执行的指令。处理单元可以是通用中央处理单元(CPU)、专用集成电路(ASIC)中的处理器或任意其它类型的处理器。在多处理系统中,多个处理单元执行计算机可执行指令以提高处理能力。例如,图8示出中央处理单元810以及图形处理单元或协处理单元815。有形存储器820、825可以是可由(诸)处理单元存取的易失性存储器(例如,寄存器、高速缓存、RAM)、非易失性存储器(例如,ROM、EEPROM、闪存等)或者两者的某一组合。存储器820、825以适合被(诸)处理单元执行的计算机可执行指令的形式,存储实现此处描述的一个或多个发明的软件880。Referring to FIG8 , a computing system 800 includes one or more processing units 810, 815 and memories 820, 825. In FIG8 , the basic configuration 830 is included within the dashed line. The processing units 810, 815 execute computer-executable instructions. The processing units may be general-purpose central processing units (CPUs), processors in application-specific integrated circuits (ASICs), or any other type of processor. In a multi-processing system, multiple processing units execute computer-executable instructions to increase processing power. For example, FIG8 shows a central processing unit 810 and a graphics processing unit or co-processing unit 815. The tangible memories 820, 825 may be volatile memories (e.g., registers, caches, RAM), non-volatile memories (e.g., ROM, EEPROM, flash memory, etc.) accessible by the processing unit(s), or some combination of the two. The memories 820, 825 store software 880 that implements one or more of the inventions described herein in the form of computer-executable instructions suitable for execution by the processing unit(s).

计算系统可具有附加的特征。例如,计算系统800包括存储840、一个或多个输入设备850、一个或多个输出设备860以及一个或多个通信连接870。诸如总线、控制器或网络之类的互连机制(未示出)将计算系统800的各组件互连。通常,操作系统软件(未示出)为在计算系统800中执行的其它软件提供操作环境,并协调计算系统800的各组件的活动。The computing system may have additional features. For example, the computing system 800 includes storage 840, one or more input devices 850, one or more output devices 860, and one or more communication connections 870. An interconnection mechanism (not shown), such as a bus, controller, or network, interconnects the components of the computing system 800. Typically, operating system software (not shown) provides an operating environment for other software executed in the computing system 800 and coordinates the activities of the components of the computing system 800.

有形存储840可以是可移动或不可移动的,并包括磁盘、磁带或磁带盒、CD-ROM、DVD或可用于储存信息并可在计算系统800内访问的任何其他介质。存储器840存储用于软件880的指令,所述软件880实现此处描述的一个或多个发明。Tangible storage 840 may be removable or non-removable and include magnetic disks, magnetic tapes or cassettes, CD-ROMs, DVDs, or any other medium that can be used to store information and accessed within computing system 800. Memory 840 stores instructions for software 880 that implements one or more of the inventions described herein.

(诸)输入设备850可以是触摸输入设备(诸如键盘、鼠标、笔或跟踪球)、语音输入设备、扫描设备或向计算系统800提供输入的另一设备。对于视频编码,(诸)输入设备850可以是相机、视频卡、TV调谐卡或接受模拟或数字形式的视频输入的类似设备,或将视频样本读入计算系统800的CD-ROM或CD-RW。(诸)输出设备860可以是显示器、打印机、扬声器、CD刻录机或提供来自计算系统800的输出的另一设备。Input device(s) 850 may be a touch input device (such as a keyboard, mouse, pen, or trackball), a voice input device, a scanning device, or another device that provides input to the computing system 800. For video encoding, input device(s) 850 may be a camera, video card, TV tuner card, or similar device that accepts video input in analog or digital form, or reads video samples into a CD-ROM or CD-RW of the computing system 800. Output device(s) 860 may be a display, printer, speakers, a CD burner, or another device that provides output from the computing system 800.

(诸)通信连接870允许在通信介质上到另一计算实体的通信。通信介质传达诸如计算机可执行指令、音频或视频输入或输出、或已调制数据信号中的其他数据之类的信息。已调制数据信号是使其一个或多个特征以在信号中编码信息的方式设置或改变的信号。作为示例而非限制,通信介质可以使用电的、光学的、RF或其它载体。Communication connection(s) 870 allow communication to another computing entity over a communication medium. The communication medium conveys information such as computer-executable instructions, audio or video input or output, or other data in a modulated data signal. A modulated data signal is a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, the communication medium may employ electrical, optical, RF, or other carriers.

各创新可在计算机可执行指令(诸如包括在程序模块中的在目标现实或虚拟处理器上在计算系统中执行的那些计算机可执行指令)的一般上下文中描述。一般而言,程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、库、对象、类、组件、数据结构等。如各实施例中描述的,这些程序模块的功能可以被组合,或者在这些程序模块之间拆分。针对各程序模块的计算机可执行指令可以在本地或分布式计算系统中执行。Each innovation can be described in the general context of computer-executable instructions (such as those included in program modules that are executed in a computing system on a target real or virtual processor). Generally speaking, program modules include routines, programs, libraries, objects, classes, components, data structures, etc. that perform specific tasks or implement specific abstract data types. As described in the various embodiments, the functionality of these program modules can be combined or split between these program modules. The computer-executable instructions for each program module can be executed in a local or distributed computing system.

术语“系统”和“设备”在此被互换地使用。除非上下文明确指示,否则,术语并不暗示对计算系统或计算设备的类型的任何限制。一般说来,计算系统或计算设备可以是本地的或分布式的,并且可以包括具有实现本文中描述的功能的软件的专用硬件和/或通用硬件的任意组合。The terms "system" and "device" are used interchangeably herein. Unless the context clearly indicates otherwise, the terms do not imply any limitation on the type of computing system or computing device. Generally speaking, a computing system or computing device can be local or distributed and can include any combination of special-purpose hardware and/or general-purpose hardware with software that implements the functionality described herein.

为了呈现起见,本详细描述使用了如“确定”和“使用”等术语来描述计算系统中的计算机操作。这些术语是对由计算机执行的操作的高级抽象,且不应与人类所执行的动作混淆。对应于这些术语的实际的计算机操作取决于实现而不同。For purposes of presentation, this detailed description uses terms such as "determine" and "use" to describe computer operations in a computing system. These terms are high-level abstractions for operations performed by a computer and should not be confused with actions performed by a human. The actual computer operations corresponding to these terms vary depending on the implementation.

移动设备mobile device

图9是描述一示例移动设备900的系统示意图,该示例移动设备900包括在902概括示出的各种任选的硬件和软件组件。移动设备中的任何组件902可以与任何其他组件通信,然而为容易说明未示出所有连接。该移动设备可以是各种计算设备(例如,蜂窝电话、智能电话、手持式计算机、个人数字助理(PDA)等)中的任一个,并且可允许与诸如蜂窝、卫星或其他网络的一个或多个移动通信网络904进行无线双向通信。FIG9 is a system diagram illustrating an example mobile device 900 including various optional hardware and software components shown generally at 902. Any component 902 in the mobile device can communicate with any other component, although not all connections are shown for ease of illustration. The mobile device can be any of a variety of computing devices (e.g., a cellular phone, a smartphone, a handheld computer, a personal digital assistant (PDA), etc.) and can enable wireless two-way communication with one or more mobile communication networks 904, such as cellular, satellite, or other networks.

图示的移动设备900可以包括用于执行任务的控制器或处理器910(例如,信号处理器、微处理器、ASIC或者其他控制和处理逻辑电路),所述任务诸如信号编码、数据处理、输入/输出处理、功率控制、和/或其他功能。操作系统912可以控制各组件902的分配和使用,并且支持数字个人助理120和一个或多个应用程序914。应用程序可以包括常见的移动计算应用(例如,电子邮件应用、日历、联系人管理器、web浏览器、消息收发应用、影片应用、银行应用)、或者任何其他计算应用。应用程序914可包括具有可以由数字个人助理120无头地执行的任务的应用。例如,任务可被定义在命令数据结构140中。用于访问应用存储的功能913还可以用于获取和更新应用程序914。The illustrated mobile device 900 may include a controller or processor 910 (e.g., a signal processor, microprocessor, ASIC, or other control and processing logic) for performing tasks such as signal encoding, data processing, input/output processing, power control, and/or other functions. An operating system 912 may control the allocation and use of the various components 902 and support the digital personal assistant 120 and one or more application programs 914. The application programs may include common mobile computing applications (e.g., email applications, calendars, contact managers, web browsers, messaging applications, movie applications, banking applications), or any other computing applications. The application programs 914 may include applications with tasks that can be performed headlessly by the digital personal assistant 120. For example, the tasks may be defined in a command data structure 140. Functionality for accessing application storage 913 may also be used to retrieve and update the application programs 914.

图示的移动设备900可以包括存储器920。存储器920可以包括不可移动存储器922和/或可移动存储器924。不可移动存储器922可以包括RAM、ROM、闪存、硬盘或者其他公知的存储器存储技术。可移动存储器924可以包括闪存或订户身份模块(SIM)卡,这在GSM通信系统或者其他公知的存储器存储技术中是公知的,诸如“智能卡”。存储器920可用于存储运行操作系统912和应用914的数据和/或代码。示例数据可以包括要经由一个或多个有线或无线网络被发送至和/或接收自一个或多个网络服务器或其他设备的网页、文本、图像、声音文件、视频数据或者其他数据集。存储器920可用于存储订户标识符和设备标识符,所述订户标识符诸如国际移动订户身份(IMSI),所述设备标识符诸如国际移动设备标识符(IMEI)。这种标识符可以被发射至网络服务器以标识用户和设备。The illustrated mobile device 900 may include a memory 920. The memory 920 may include a non-removable memory 922 and/or a removable memory 924. The non-removable memory 922 may include RAM, ROM, flash memory, a hard disk, or other well-known memory storage technologies. The removable memory 924 may include flash memory or a subscriber identity module (SIM) card, which is well known in GSM communication systems or other well-known memory storage technologies, such as "smart cards." The memory 920 may be used to store data and/or code for running an operating system 912 and applications 914. Example data may include web pages, text, images, sound files, video data, or other data sets to be sent to and/or received from one or more network servers or other devices via one or more wired or wireless networks. The memory 920 may be used to store subscriber identifiers, such as an International Mobile Subscriber Identity (IMSI), and device identifiers, such as an International Mobile Equipment Identifier (IMEI). Such identifiers may be transmitted to a network server to identify the user and device.

移动设备900可以支持一个或多个输入设备930以及一个或多个输出设备950,所述输入设备诸如触摸屏932、话筒934、相机936、物理键盘938和/或轨迹球940,所述输出设备诸如扬声器952和显示器954。其他可能的输出设备(未示出)可以包括压电或其他触觉输出设备。一些设备可以用于多于一个输入/输出功能。例如,触摸屏932和显示器954可以在单个输入/输出设备内被组合。The mobile device 900 may support one or more input devices 930, such as a touch screen 932, a microphone 934, a camera 936, a physical keyboard 938, and/or a trackball 940, and one or more output devices 950, such as a speaker 952 and a display 954. Other possible output devices (not shown) may include piezoelectric or other tactile output devices. Some devices may be used for more than one input/output function. For example, a touch screen 932 and a display 954 may be combined in a single input/output device.

输入设备930可以包括自然用户界面(NUI)。NUI是使用户能以“自然”方式与设备交互、免受诸如鼠标、键盘、远程控件等输入设备所施加的人工约束的任一界面技术。NUI方法的示例包括依赖于语音识别、触摸和触笔识别、屏上及邻近屏的姿势识别、空中姿势、头部和眼部跟踪、说话和语音、视觉、触摸、姿势以及机器智能的那些方法。NUI的其他示例包括使用加速度计/陀螺仪、面部识别、3D显示器、头部、眼部和凝视跟踪的运动手势检测、沉浸式增强的现实和虚拟现实系统,这些中的全部都提供更自然的界面,还包括使用电场传感电极(EEG及相关方法)来感测脑部活动的技术。因此,在一个具体示例中,操作系统912或应用914可以包括语音识别软件作为语音用户界面的一部分,该语音用户界面允许用户经由语音命令来操作设备900。而且,设备900可以包括允许经由用户的空间手势进行用户交互的输入设备和软件,诸如检测和解释手势以便向游戏应用提供输入。The input device 930 may include a natural user interface (NUI). A NUI is any interface technology that enables a user to interact with a device in a "natural" way, free from the artificial constraints imposed by input devices such as a mouse, keyboard, or remote control. Examples of NUI methods include those that rely on voice recognition, touch and stylus recognition, gesture recognition on and near the screen, mid-air gestures, head and eye tracking, speech and voice, vision, touch, gestures, and machine intelligence. Other examples of NUIs include motion gesture detection using accelerometers/gyroscopes, facial recognition, 3D displays, head, eye, and gaze tracking, immersive augmented reality, and virtual reality systems, all of which provide more natural interfaces, as well as technologies that use electric field sensing electrodes (EEG and related methods) to sense brain activity. Thus, in one specific example, the operating system 912 or application 914 may include voice recognition software as part of a voice user interface that allows the user to operate the device 900 via voice commands. Furthermore, the device 900 may include input devices and software that allow user interaction via the user's spatial gestures, such as detecting and interpreting gestures to provide input to a gaming application.

无线调制解调器960可被耦合到天线(未示出),并可支持处理器910和外部设备之间的双向通信,如本领域中清楚理解的。调制解调器960被一般性地示出,并且可以包括用于与移动通信网络904的蜂窝调制解调器和/或其它基于无线电的调制解调器(例如蓝牙964或Wi-Fi 962)。无线调制解调器960一般被配置成与一个或多个蜂窝网络(诸如GSM网络)通信,用于单个蜂窝网络内、多个蜂窝网络之间、或者在移动设备和公共交换电话网(PSTN)之间的数据和语音通信。The wireless modem 960 may be coupled to an antenna (not shown) and may support two-way communication between the processor 910 and external devices, as is well understood in the art. The modem 960 is shown generally and may include a cellular modem and/or other radio-based modem (e.g., Bluetooth 964 or Wi-Fi 962) for communicating with the mobile communication network 904. The wireless modem 960 is generally configured to communicate with one or more cellular networks (such as a GSM network) for data and voice communications within a single cellular network, between multiple cellular networks, or between a mobile device and a public switched telephone network (PSTN).

移动设备还可以包括至少一个输入/输出端口980、电源982、卫星导航系统接收机984(诸如全球定位系统(GPS)接收机)、加速度计986、和/或物理连接器990,物理连接器990可以是USB端口、IEEE 1394(火线)端口和/或RS-232端口。图示的组件902不是必须的或全包含的,因为任何组件可以被删除且其他组件可以被添加。The mobile device may also include at least one input/output port 980, a power supply 982, a satellite navigation system receiver 984 (such as a global positioning system (GPS) receiver), an accelerometer 986, and/or a physical connector 990, which may be a USB port, an IEEE 1394 (FireWire) port, and/or an RS-232 port. The illustrated components 902 are not required or all-inclusive, as any component may be deleted and other components may be added.

云支持环境Cloud support environment

图10示出了其中可实现所描述的实施例、技巧和技术的合适的云支持环境1000的一般化示例。在示例环境1000中,由云1010提供各种类型的服务(例如,计算服务)。例如,云1010可以包括多个计算设备的集合,多个计算设备可以是中央式或分布式定位的,所述多个计算设备向经由诸如互联网这样的网络连接的各类用户和设备提供基于云的服务。实现方式环境1000可以不同方式被用来完成计算任务。例如,一些任务(例如,处理用户输入和呈现用户界面)可以在本地计算设备(例如,所连接的设备1030、1040、1050)上执行,而其他任务(例如,要在后续处理中使用的数据的存储)可以在云1010中执行。Figure 10 shows a generalized example of a suitable cloud support environment 1000 in which the described embodiments, techniques and technologies can be implemented. In the example environment 1000, various types of services (e.g., computing services) are provided by a cloud 1010. For example, the cloud 1010 may include a collection of multiple computing devices, which may be centrally located or distributed, that provide cloud-based services to various users and devices connected via a network such as the Internet. Implementation environment 1000 can be used in different ways to complete computing tasks. For example, some tasks (e.g., processing user input and presenting a user interface) can be performed on a local computing device (e.g., connected devices 1030, 1040, 1050), while other tasks (e.g., storage of data to be used in subsequent processing) can be performed in the cloud 1010.

在示例环境1000中,云1010为具有各种屏幕能力的所连接的设备1030、1040、1050提供服务。所连接的设备1030代表具有计算机屏幕1035(例如,中等尺寸屏幕)的设备。例如,所连接的设备1030可以是诸如台式计算机、膝上型电脑、笔记本、上网本等个人计算机。所连接的设备1040代表具有移动设备屏幕1045(例如,小尺寸屏幕)的设备。例如,连接的设备1040可以是移动电话、智能电话、个人数字助理、平板计算机等。所连接的设备1050代表具有大屏幕1055的设备。例如,所连接的设备1050可以是电视机屏幕(例如,智能电视机)或连至电视机的另一设备(例如,机顶盒或游戏控制台)等等。所连接的设备1030、1040、1050中的一者或多者可以包括触摸屏能力。触摸屏可以以不同方式接受输入。例如,电容式触摸屏在对象(例如,指尖或触笔)跨表面上流动的电流扭曲或中断时检测到触摸输入。举另一个示例,触摸屏可以使用光学传感器在来自光学传感器的波束被中断时检测到触摸输入。对于一些触摸屏所检测的输入而言,与屏幕表面的物理接触不是必要的。在示例环境1000中也可以使用没有屏幕能力的设备。例如,云1010可以为没有显示器的一个或多个计算机(例如,服务器计算机)提供服务。In example environment 1000, cloud 1010 provides services for connected devices 1030, 1040, and 1050 with various screen capabilities. Connected device 1030 represents a device with a computer screen 1035 (e.g., a medium-sized screen). For example, connected device 1030 may be a personal computer such as a desktop computer, laptop, notebook, or netbook. Connected device 1040 represents a device with a mobile device screen 1045 (e.g., a small-sized screen). For example, connected device 1040 may be a mobile phone, smartphone, personal digital assistant, tablet computer, etc. Connected device 1050 represents a device with a large screen 1055. For example, connected device 1050 may be a television screen (e.g., a smart TV) or another device connected to a television (e.g., a set-top box or game console), etc. One or more of connected devices 1030, 1040, and 1050 may include touchscreen capabilities. A touchscreen can accept input in various ways. For example, a capacitive touch screen detects touch input when an object (e.g., a fingertip or stylus) distorts or interrupts the current flowing across the surface. As another example, a touch screen may use an optical sensor to detect touch input when a beam from the optical sensor is interrupted. For some touch screens to detect input, physical contact with the screen surface is not necessary. Devices without screen capabilities may also be used in example environment 1000. For example, cloud 1010 may provide services for one or more computers (e.g., server computers) that do not have a display.

服务可由云1010通过服务提供者1020提供、或通过其他在线服务提供者(未图示)而提供。例如,云服务可以被定制为特定的所连接设备(例如,所连接的设备1030、1040、1050)的屏幕大小、显示器能力和/或触摸屏能力。Services may be provided by the cloud 1010 through the service provider 1020, or through other online service providers (not shown). For example, cloud services may be customized to the screen size, display capabilities, and/or touch screen capabilities of a specific connected device (e.g., connected devices 1030, 1040, 1050).

在示例环境1000中,云1010至少部分使用服务提供者1020,将此处所述的技术和解决方案提供给各种所连接的设备1030、1040、1050。例如,服务提供者1020可以为各种基于云的服务提供集中式解决方案。服务提供者1020可以管理对于各用户和/或设备(例如,对于所连接的设备1030、1040、1050和/或它们相应的用户)的服务预订。In the example environment 1000, the cloud 1010 provides the technologies and solutions described herein to various connected devices 1030, 1040, 1050, at least in part using a service provider 1020. For example, the service provider 1020 can provide a centralized solution for various cloud-based services. The service provider 1020 can manage service subscriptions for various users and/or devices (e.g., for the connected devices 1030, 1040, 1050 and/or their respective users).

示例实现Example Implementation

尽管以特定的顺序次序描述所公开方法的一些的操作以便于方便呈现,但应当理解,这一描述方式包含重排,除非以下提出的具体语言要求特定的次序。例如,顺序描述的操作可以在一些情况下并行地重排或同时执行。此外,为简洁起见,附图可能不示出所公开的方法可以结合其他方法使用的各种方式。Although some operations of the disclosed methods are described in a particular sequential order for ease of presentation, it should be understood that this description encompasses reordering unless specific language set forth below requires a particular order. For example, operations described sequentially may, in some cases, be reordered or performed simultaneously. Furthermore, for the sake of brevity, the accompanying drawings may not illustrate the various ways in which the disclosed methods can be used in conjunction with other methods.

所公开的方法中的任何方法可被实现为被存储在一个或多个计算机可读存储介质上并在计算设备(例如任何可用计算设备,包括智能电话或其他包括计算硬件的移动设备)上执行的计算机可执行指令或计算机程序产品。计算机可读存储介质是在计算环境内可访问的任何可用的有形介质(例如,诸如DVD或CD之类的一个或多个光学介质盘、易失性存储器组件(诸如DRAM或SRAM)或非易失性存储器组件(诸如闪存或硬件驱动器))。作为示例并参考图8,计算机可读存储介质包括存储器820和825以及存储840。作为示例并参考图9,计算机可读存储介质包括存储器以及存储920、922和924。术语计算机可读存储介质不包括信号和载波。此外,术语计算机可读存储介质不包括通信连接(例如,870、960、962和964)。Any of the disclosed methods may be implemented as computer-executable instructions or computer program products stored on one or more computer-readable storage media and executed on a computing device (e.g., any available computing device, including a smartphone or other mobile device including computing hardware). A computer-readable storage medium is any available tangible medium accessible within a computing environment (e.g., one or more optical media disks such as DVDs or CDs, volatile memory components such as DRAM or SRAM, or non-volatile memory components such as flash memory or a hard drive). As an example and with reference to FIG8 , a computer-readable storage medium includes memories 820 and 825 and storage 840. As an example and with reference to FIG9 , a computer-readable storage medium includes memories and storage 920, 922, and 924. The term computer-readable storage medium does not include signals and carrier waves. In addition, the term computer-readable storage medium does not include communication connections (e.g., 870, 960, 962, and 964).

用于实现所公开技术的计算机可执行指令中的任一个以及在实现所公开的实施例期间创建和使用的任何数据可以被存储在一个或多个计算机可读存储介质上。计算机可执行指令可以是例如专用软件应用或者经由web浏览器或其他软件应用(诸如远程计算应用)访问和下载的软件应用的一部分。这种软件可以例如在单个本地计算机(例如,任何适当的商业可购买计算机)上或者使用一个或多个网络计算机在网络环境(例如,经由互联网、广域网、局域网、客户端-服务器网络(诸如云计算网络)、或者其他这样的网络)中执行。Any of the computer-executable instructions for implementing the disclosed technology and any data created and used during the implementation of the disclosed embodiments can be stored on one or more computer-readable storage media. The computer-executable instructions can be, for example, a dedicated software application or part of a software application that is accessed and downloaded via a web browser or other software application (such as a remote computing application). Such software can be executed, for example, on a single local computer (e.g., any suitable commercially available computer) or using one or more network computers in a network environment (e.g., via the Internet, a wide area network, a local area network, a client-server network (such as a cloud computing network), or other such network).

为清楚起见,仅描述了基于软件的实现方式的特定所选方面。省略了本领域公知的其他细节。例如,应当理解,所公开的技术不限于任何具体的计算机语言或程序。例如,所公开的技术可以通过以C++、Java、Perl、JavaScript、Adobe Flash或者任何其他适当的编程语言编写的软件来实现。同样,所公开的技术不限于任何特定的计算机或硬件类型。适当计算机和硬件的特定细节是公知的,并且不需要在本公开中详细提出。For the sake of clarity, only certain selected aspects of a software-based implementation are described. Other details known in the art have been omitted. For example, it should be understood that the disclosed technology is not limited to any specific computer language or program. For example, the disclosed technology can be implemented by software written in C++, Java, Perl, JavaScript, Adobe Flash, or any other suitable programming language. Similarly, the disclosed technology is not limited to any particular computer or hardware type. The specific details of suitable computers and hardware are well known and need not be set forth in detail in this disclosure.

而且,基于软件的实施例(包括例如用于使计算机执行所公开方法的任一种的计算机可执行指令)中的任一者可以通过适当的通信手段被上载、下载或远程地访问。这种适当的通信手段包括例如互联网、万维网、内联网、软件应用、电缆(包括光纤电缆)、磁通信、电磁通信(包括RF、微波和红外通信)、电子通信或者其他这样的通信手段。Furthermore, any of the software-based embodiments (including, for example, computer-executable instructions for causing a computer to perform any of the disclosed methods) can be uploaded, downloaded, or remotely accessed via any suitable communication means, including, for example, the Internet, the World Wide Web, an intranet, a software application, cables (including fiber optic cables), magnetic communications, electromagnetic communications (including RF, microwave, and infrared communications), electronic communications, or other such communications means.

所公开的方法、装置和系统不应以任何方式被限制。相反,本公开针对各种公开的实施例(单独和彼此的各种组合和子组合)的所有新颖和非显而易见的特征和方面。所公开的方法、装置和系统不限于任何具体方面或特征或它们的组合,所公开的实施例也不要求存在任一个或多个具体优点或者解决问题。The disclosed methods, apparatus, and systems should not be limited in any way. Instead, the present disclosure is directed to all novel and non-obvious features and aspects of the various disclosed embodiments (alone and in various combinations and subcombinations with each other). The disclosed methods, apparatus, and systems are not limited to any specific aspect or feature or combination thereof, nor do the disclosed embodiments require the presence of any one or more specific advantages or problems to be solved.

来自任何示例的技术可以与在任何一个或多个其他示例中所描述的技术相组合。鉴于可应用所公开的本技术的原理的许多可能的实施例,应当认识到,所示实施例仅是所公开的技术的示例,并且不应被用作是对所公开的技术的范围的限制。The technology from any example can be combined with the technology described in any one or more other examples. In view of the many possible embodiments to which the principles of the disclosed technology can be applied, it should be recognized that the illustrated embodiments are merely examples of the disclosed technology and should not be used as limitations on the scope of the disclosed technology.

Claims (17)

Translated fromChinese
1.一种计算设备,包括:1. A computing device comprising:处理单元;processing unit;存储器;以及Memory; and一个或多个话筒;one or more microphones;所述计算设备配置有语音控制的数字个人助理,各操作包括:The computing device is configured with a voice-controlled digital personal assistant, and the operations include:UI线程经由所述一个或多个话筒接收用户所生成的语音输入;The UI thread receives user-generated voice input via the one or more microphones;使用所述语音输入执行语音识别以确定所讲命令,其中所讲命令包括执行第三方应用的任务的请求,并且其中所述任务是使用定义能由所讲命令调用的第三方应用的任务的数据结构来标识的;performing speech recognition using the speech input to determine a spoken command, wherein the spoken command comprises a request to perform a task of a third-party application, and wherein the task is identified using a data structure defining tasks of the third-party application that can be invoked by the spoken command;在执行语音识别的同时以及在确定所讲命令的完成之前发起所述第三方应用的热身序列,所述热身序列包括与控制线程通信的应用服务线程,所述控制线程与所述UI线程对接;initiating a warm-up sequence for the third-party application while performing speech recognition and before determining completion of the spoken command, the warm-up sequence including an application service thread communicating with a control thread, the control thread interfacing with the UI thread;确定所述第三方应用的任务是否能够被无头地执行;Determining whether the task of the third-party application can be executed headlessly;在确定所述第三方应用的任务能够被无头地执行时,使得所述控制线程开始新的应用服务线程作为后台进程以无头地执行所述任务;When it is determined that the task of the third-party application can be executed headlessly, causing the control thread to start a new application service thread as a background process to execute the task headlessly;接收来自所述第三方应用的指示与所述任务相关联的状态的响应;以及receiving a response from the third-party application indicating a status associated with the task; and通过所述语音控制的数字个人助理的用户接口基于接收到的与所述任务相关联的状态来向所述用户提供响应,以使得所述响应来自所述语音控制的数字个人助理的用户接口的上下文内,而不出现所述第三方应用的用户接口。A response is provided to the user through the user interface of the voice-controlled digital personal assistant based on the received status associated with the task, such that the response comes from within the context of the user interface of the voice-controlled digital personal assistant without the user interface of the third-party application appearing.2.如权利要求1所述的计算设备,其特征在于,确定所述第三方应用的任务能够被无头地执行包括确定所述数据结构将所述任务定义为后台任务。2 . The computing device of claim 1 , wherein determining that the task of the third-party application can be executed headlessly comprises determining that the data structure defines the task as a background task.3.如权利要求1所述的计算设备,其特征在于,确定所述第三方应用的任务能够被无头地执行包括确定所述用户正以免手模式使用所述计算设备。3 . The computing device of claim 1 , wherein determining that the task of the third-party application can be executed headlessly comprises determining that the user is using the computing device in a hands-free mode.4.如权利要求1所述的计算设备,其特征在于,所述热身序列包括分配所述存储器的一部分、预取指令、建立通信会话、从数据库检索信息、启动新执行线程、或唤起中断。4. The computing device of claim 1, wherein the warm-up sequence comprises allocating a portion of the memory, prefetching instructions, establishing a communication session, retrieving information from a database, starting a new execution thread, or invoking an interrupt.5.如权利要求1所述的计算设备,其特征在于,定义能由所讲命令调用的第三方应用的任务的所述数据结构包括与所述任务相关联的状态到对所述用户的响应之间的映射。5. The computing device of claim 1, wherein the data structure defining a task of a third-party application that can be invoked by a spoken command includes a mapping between a state associated with the task to a response to the user.6.如权利要求1所述的计算设备,其特征在于,所述操作还包括:6. The computing device of claim 1 , wherein the operations further comprise:在使得所述第三方应用作为后台进程来执行时,启动定时器;以及When causing the third-party application to execute as a background process, starting a timer; and如果所述定时器期满则终止所述后台进程。If the timer expires, the background process is terminated.7.如权利要求1所述的计算设备,其特征在于,来自所述第三方应用的响应指示确认状态,并且基于所述确认状态的给所述用户的响应提示所述用户以是或否回答来作出响应。7. The computing device of claim 1, wherein the response from the third-party application indicates a confirmation status, and wherein the response to the user based on the confirmation status prompts the user to respond with a yes or no answer.8.一种由包括话筒的计算设备实现的方法,所述方法包括:8. A method implemented by a computing device including a microphone, the method comprising:由语音控制的数字个人助理的UI线程接收由用户生成的数字语音输入,其中所述数字语音输入是经由所述话筒来接收的;The UI thread of the voice-controlled digital personal assistant receives digital voice input generated by a user, wherein the digital voice input is received via the microphone;使用所述数字语音输入执行自然语言处理以确定用户语音命令,其中所述用户语音命令包括执行第三方启用语音的应用的预定义功能的请求,并且其中所述预定义功能是使用定义可用的第三方启用语音的应用使用语音输入所支持的功能的数据结构来标识的;performing natural language processing using the digital voice input to determine a user voice command, wherein the user voice command comprises a request to execute a predefined function of a third-party voice-enabled application, and wherein the predefined function is identified using a data structure defining functions supported by available third-party voice-enabled applications using voice input;在执行自然语言处理的同时以及在确定所述用户语音命令的完成之前发起所述第三方启用语音的应用的热身序列,所述热身序列包括与控制线程通信的应用服务线程,所述控制线程与所述UI线程对接;initiating a warm-up sequence for the third-party voice-enabled application while performing natural language processing and before determining completion of the user voice command, the warm-up sequence comprising an application service thread communicating with a control thread, the control thread interfacing with the UI thread;使得所述控制线程开始新的应用服务线程作为后台进程执行所述预定义功能,而不在所述计算设备的显示器上出现所述第三方启用语音的应用的用户接口;causing the control thread to start a new application service thread as a background process to perform the predefined function without appearing a user interface of the third-party voice-enabled application on a display of the computing device;接收来自所述第三方启用语音的应用的指示与所述预定义功能相关联的状态的响应;以及receiving a response from the third-party voice-enabled application indicating a status associated with the predefined functionality; and通过所述语音控制的数字个人助理的用户接口基于接收到的与所述预定义功能相关联的状态来向所述用户提供响应,以使得所述响应来自所述语音控制的数字个人助理的用户接口的上下文内,而不出现所述第三方启用语音的应用的用户接口。A response is provided to the user through the user interface of the voice-controlled digital personal assistant based on the received status associated with the predefined function, such that the response comes from within the context of the user interface of the voice-controlled digital personal assistant without the user interface of the third-party voice-enabled application appearing.9.如权利要求8所述的方法,其特征在于,发起所述热身序列包括将投机性功能发送给所述第三方启用语音的应用。9. The method of claim 8, wherein initiating the warm-up sequence comprises sending speculative functionality to the third-party voice-enabled application.10.如权利要求8所述的方法,其特征在于,与所述预定义功能相关联的状态是从热身、初始、进行中、确认、消歧或最终状态中选择的。10. The method of claim 8, wherein the state associated with the predefined function is selected from a warm-up, initial, in-progress, confirmation, disambiguation, or final state.11.如权利要求8所述的方法,其特征在于,所述数据结构是能经由命令模式来扩展的,所述命令模式能用于将来自所述用户的请求关联到一个或多个第三方启用语音的应用的预定义功能。11. The method of claim 8, wherein the data structure is extensible via a command pattern that can be used to associate a request from the user to predefined functionality of one or more third-party voice-enabled applications.12.如权利要求8所述的方法,其特征在于,所述数据结构包括:12. The method of claim 8, wherein the data structure comprises:语音命令到可用第三方启用语音的应用所支持的功能的第一映射;以及a first mapping of voice commands to functions supported by available third-party voice-enabled applications; and从功能接收到的状态到从所述语音控制的数字个人助理提供给所述用户的响应的第二映射。A second mapping of states received from functions to responses provided to the user from the voice-controlled digital personal assistant.13.如权利要求8所述的方法,其特征在于,从所述语音控制的数字个人助理提供给所述用户的响应是从以下类型中选择的:显示文本、文本-到-语音、超文本标记语言HTML、列表模板、可被大声读出的文本、回到该应用的深链接、到网页或网站的链接、基于超文本标记语言的内容、图像、超链接、或者能够经由数字个人助理的用户接口显示给用户的其他合适信息。13. The method of claim 8, wherein the response provided to the user from the voice-controlled digital personal assistant is selected from the following types: displayed text, text-to-speech, HTML, a list template, text that can be read aloud, a deep link back to the application, a link to a web page or website, HTML-based content, an image, a hyperlink, or other suitable information that can be displayed to the user via the user interface of the digital personal assistant.14.如权利要求8所述的方法,其特征在于,所述第三方启用语音的应用是远程web服务。14. The method of claim 8, wherein the third-party voice-enabled application is a remote web service.15.如权利要求8所述的方法,其特征在于,使用所述数字语音输入来执行自然语言处理以确定用户语音命令包括解析上下文信息以使得所述用户语音命令是上下文无关的。15. The method of claim 8, wherein performing natural language processing using the digital voice input to determine a user voice command comprises parsing contextual information so that the user voice command is context-free.16.一种存储计算机可执行指令的计算机可读存储介质,所述计算机可执行指令使得计算设备执行用于在语音控制的数字个人助理的上下文内完成启用语音的应用的任务的操作,所述操作包括:16. A computer-readable storage medium storing computer-executable instructions that cause a computing device to perform operations for completing a task of a voice-enabled application within the context of a voice-controlled digital personal assistant, the operations comprising:由所述语音控制的数字个人助理的UI线程接收由用户生成的数字语音输入,其中所述数字语音输入是经由话筒来接收的;The UI thread of the voice-controlled digital personal assistant receives a digital voice input generated by a user, wherein the digital voice input is received via a microphone;使用所述数字语音输入执行自然语言处理以确定用户语音命令,其中所述用户语音命令包括执行启用语音的应用的任务的请求,并且其中所述任务是使用将用户语音命令映射到启用语音的应用的任务的可扩展数据结构来标识的;performing natural language processing using the digital voice input to determine a user voice command, wherein the user voice command comprises a request to perform a task of a voice-enabled application, and wherein the task is identified using an extensible data structure that maps user voice commands to tasks of a voice-enabled application;在执行自然语言处理的同时以及在确定所述用户语音命令的完成之前发起所述启用语音的应用的热身序列,所述热身序列包括与控制线程通信的应用服务线程,所述控制线程与所述UI线程对接;initiating a warm-up sequence for the voice-enabled application while performing natural language processing and before determining completion of the user voice command, the warm-up sequence comprising an application service thread communicating with a control thread, the control thread interfacing with the UI thread;确定所述启用语音的应用的任务是前台任务还是后台任务;determining whether a task of the voice-enabled application is a foreground task or a background task;在确定所述任务是后台任务时,使得所述控制线程开始新的应用服务线程作为后台任务并在所述语音控制的数字个人助理的上下文内执行,而不出现所述启用语音的应用的用户接口;upon determining that the task is a background task, causing the control thread to start a new application service thread as a background task and execute within the context of the voice-controlled digital personal assistant without presenting a user interface of the voice-enabled application;接收来自所述启用语音的应用的指示与所述任务相关联的状态的响应;以及receiving a response from the voice-enabled application indicating a status associated with the task; and基于接收到的与所述任务相关联的状态向所述用户提供响应,在确定所述任务是后台任务时,所述响应是在所述语音控制的数字个人助理的上下文内提供的,而没有出现所述启用语音的应用的用户接口。A response is provided to the user based on the received status associated with the task, and when the task is determined to be a background task, the response is provided within the context of the voice-controlled digital personal assistant without presenting a user interface of the voice-enabled application.17.如权利要求16所述的计算机可读存储介质,其特征在于,确定所述启用语音的应用的任务是前台任务还是后台任务包括参考所述可扩展数据结构。17. The computer-readable storage medium of claim 16, wherein determining whether the task of the voice-enabled application is a foreground task or a background task comprises referencing the extensible data structure.
HK17113502.5A2015-01-092015-12-29Headless task completion within digital personal assistantsHK1240350B (en)

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
US14/593,5842015-01-09

Publications (2)

Publication NumberPublication Date
HK1240350A1 HK1240350A1 (en)2018-05-18
HK1240350Btrue HK1240350B (en)2022-01-07

Family

ID=

Similar Documents

PublicationPublication DateTitle
CN107111516B (en)Headless task completion in a digital personal assistant
US11500672B2 (en)Distributed personal assistant
EP3436970B1 (en)Application integration with a digital assistant
EP3437092B1 (en)Intelligent device arbitration and control
DK179343B1 (en)Intelligent task discovery
CN107210033B (en)Updating language understanding classifier models for digital personal assistants based on crowd sourcing
CN107112015B (en)Discovering capabilities of third party voice-enabled resources
US20230245657A1 (en)Keyword detection using motion sensing
JP2020144932A (en)Providing suggested voice-based action queries
KR20190116960A (en)Application integration with a digital assistant
HK1240350B (en)Headless task completion within digital personal assistants
HK1240350A1 (en)Headless task completion within digital personal assistants
HK1241551B (en)Updating language understanding classifier models for a digital personal assistant based on crowd-sourcing

[8]ページ先頭

©2009-2025 Movatter.jp