Movatterモバイル変換


[0]ホーム

URL:


CN112470216B - Voice Application Platform - Google Patents

Voice Application Platform
Download PDF

Info

Publication number
CN112470216B
CN112470216BCN201980049296.7ACN201980049296ACN112470216BCN 112470216 BCN112470216 BCN 112470216BCN 201980049296 ACN201980049296 ACN 201980049296ACN 112470216 BCN112470216 BCN 112470216B
Authority
CN
China
Prior art keywords
voice
request
response
platform
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201980049296.7A
Other languages
Chinese (zh)
Other versions
CN112470216A (en
Inventor
R.T.诺顿
N.G.莱德劳
A.M.邓恩
J.K.麦克马洪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SOUND LLC
Original Assignee
SOUND LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US16/000,799external-prioritypatent/US10636425B2/en
Priority claimed from US16/000,789external-prioritypatent/US10803865B2/en
Priority claimed from US16/000,805external-prioritypatent/US11437029B2/en
Priority claimed from US16/000,798external-prioritypatent/US10235999B1/en
Application filed by SOUND LLCfiledCriticalSOUND LLC
Publication of CN112470216ApublicationCriticalpatent/CN112470216A/en
Application grantedgrantedCritical
Publication of CN112470216BpublicationCriticalpatent/CN112470216B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

Wherein a request is received from a voice assistant device according to a different respective protocol representation of one or more voice assistant frameworks. Each request represents a voice input by the user to the corresponding voice assistant device. The received request is re-presented according to a common request protocol. Based on the received request, a response to the request is expressed according to a common response protocol. Each response is re-represented according to the protocol relative to the frame with which the corresponding request is represented. The response is sent to the voice assistant device for presentation to the user.

Description

Translated fromChinese
语音应用平台Voice Application Platform

技术领域Technical Field

本申请涉及语音应用平台。This application relates to a voice application platform.

背景技术Background technique

语音应用平台向语音助手和语音助手装置提供服务,以使它们能够收听和响应于终端用户的讲话。响应可以作为文本、图像、音频和视频(内容项)说出或者呈现。在有些情况下,响应涉及比如关闭设备的动作。The voice application platform provides services to voice assistants and voice assistant devices so that they can listen to and respond to the end user's speech. The response can be spoken or presented as text, images, audio and video (content items). In some cases, the response involves actions such as turning off the device.

语音助手,比如苹果的Siri、亚马逊的Alexa、微软的Cortana和谷歌的Assistant通过专用语音助手装置,比如亚马逊Echo和苹果HomePod从服务器被访问,或者有时在通用工作站和移动装置上被访问。Voice assistants such as Apple's Siri, Amazon's Alexa, Microsoft's Cortana and Google's Assistant are accessed from servers through dedicated voice assistant devices such as Amazon Echo and Apple HomePod, or sometimes on general-purpose workstations and mobile devices.

语音助手装置典型地具有麦克风、扬声器、处理器、存储器、通讯设施及其他硬件和软件。语音助手装置可以检测和处理人的讲话,以推导出表示终端用户的请求的信息,根据预定义的协议将该信息表示为请求消息(有时被称为意图或者包含意图),并通过通信网络将请求消息传递到服务器。The voice assistant device typically has a microphone, a speaker, a processor, a memory, a communication facility, and other hardware and software. The voice assistant device can detect and process human speech to derive information representing the end user's request, represent the information as a request message (sometimes referred to as an intent or including an intent) according to a predefined protocol, and transmit the request message to a server through a communication network.

在服务器,语音应用接收和处理该请求消息并确定适当的响应。响应包括在根据预定义的协议表示的响应消息中。响应消息通过通信网络被发送到语音助手装置。语音助手解释响应消息并说出或者呈现响应(或者采取由响应指定的动作)。由操作系统的基础设施及在服务器上运行的其他处理支持语音应用的工作。On the server, the voice application receives and processes the request message and determines an appropriate response. The response is included in a response message represented according to a predefined protocol. The response message is sent to the voice assistant device via the communication network. The voice assistant interprets the response message and speaks or presents the response (or takes an action specified by the response). The operation of the voice application is supported by the infrastructure of the operating system and other processing running on the server.

由服务器向客户端语音助手装置提供以使能他们与终端用户的交互的服务有时被称为语音助手服务(有时也被称为或者包括技巧、动作或者语音应用)。The services provided by the server to client voice assistant devices to enable their interaction with end users are sometimes referred to as voice assistant services (sometimes also referred to as or including skills, actions, or voice applications).

终端用户和语音助手之间的交互可以包括一系列请求和响应。在有些情况下,请求是由终端用户提出的问题且响应是对问题的回答。The interaction between the end user and the voice assistant may include a series of requests and responses. In some cases, the request is a question asked by the end user and the response is an answer to the question.

典型地,服务器、语音助手装置、语音助手、语音助手服务、预定义的协议和基本语音应用一起被设计为专用语音助手框架的一部分。为使能第三方,比如想要通过语音助手与最终用户互动的品牌,创建他们自己语音应用,框架提供专用API。Typically, servers, voice assistant devices, voice assistants, voice assistant services, predefined protocols, and basic voice applications are designed together as part of a dedicated voice assistant framework. To enable third parties, such as brands that want to interact with end users through voice assistants, to create their own voice applications, the framework provides a dedicated API.

发明内容Summary of the invention

在某些实现中,我们这里描述的通用语音应用平台向品牌和组织提供创建和维护服务亚马逊Alexa、谷歌助手、苹果HomePod、微软Cortana及其他装置的在一个位置参与语音应用的能力。平台设计用于向品牌和组织提供快速部署语音应用同时经由定制性能提供灵活性的能力。In certain implementations, the universal voice application platform we describe here provides brands and organizations with the ability to create and maintain voice applications that serve Amazon Alexa, Google Assistant, Apple HomePod, Microsoft Cortana, and other devices in one place. The platform is designed to provide brands and organizations with the ability to quickly deploy voice applications while providing flexibility through customization capabilities.

平台提供处理语音请求且在模块内打包的特征。特征包括处理用于事件、FAQ、日常更新、提醒、清单、调查和最新消息的语音请求的处理器,及其他预定义的特征。模块基于与产业特定需要有关的公共使用情况打包参考特征,且包括使能用于品牌和组织的快速上市的样本内容。The platform provides features that process voice requests and are packaged in modules. Features include processors to handle voice requests for events, FAQs, daily updates, reminders, lists, surveys, and latest news, among other predefined features. Modules package reference features based on common use cases related to industry-specific needs and include sample content that enables rapid time-to-market for brands and organizations.

品牌作者可以在平台的语音内容管理系统内管理语音内容。语音内容管理系统提供不需要创建、修改和去除塑造语音体验的内容的技术知识的直观界面。平台的内容管理系统还经由随时间捕获的终端用户使用分析向品牌管理者提供指导和洞察。指导包括比如用于通过媒体类型支持的装置的视觉指示符的线索(例如,由亚马逊Echo Show支持的视频和图像媒体)。洞察包括关于跨装置类型的用于给定问题的响应的成功率的分析(例如,洞察Google比亚马逊的Alexa对相同问题更多响应)。Brand authors can manage voice content within the platform's voice content management system. The voice content management system provides an intuitive interface that does not require technical knowledge to create, modify, and remove content that shapes the voice experience. The platform's content management system also provides guidance and insights to brand managers via end-user usage analytics captured over time. Guidance includes clues such as visual indicators for devices supported by media types (e.g., video and image media supported by Amazon Echo Show). Insights include analysis of the success rate of responses to a given question across device types (e.g., insight that Google responds more to the same question than Amazon's Alexa).

在后台,平台是基于云的,消除品牌和组织投资附加基础设施的需要。基于云的提供还导致对作为平台的用户的品牌和组织可自动得到的定期更新和增强。Behind the scenes, the platform is cloud-based, eliminating the need for brands and organizations to invest in additional infrastructure. The cloud-based offering also results in regular updates and enhancements that are automatically available to brands and organizations that are users of the platform.

平台使用不依赖于与系统中的其他层的从属性的分层架构。层包括语音API层,商业逻辑层,特征和模块层,CMS层和数据层。The platform uses a layered architecture that does not rely on dependencies with other layers in the system. The layers include voice API layer, business logic layer, feature and module layer, CMS layer and data layer.

平台的独特方面如下:The unique aspects of the platform are as follows:

1.平台将来自多个语音助手框架(比如Alexa、谷歌Home、苹果HomePod、聊天机器人)数据处理到单一API/商业逻辑层中。平台提炼数据并处理它以增强终端用户的的意图的理解。与基于规则的引擎相对比,平台使用基于图的模式匹配。基于图的模式匹配允许管理映射跨助手意图与使用的平台的特征的一致和确信的方法。这使语音应用更可管理和可更新,同时仍然给予使机器学习能够更新图中的节点的位置的灵活性。基于图的方法仅需要一个步骤来支持新添加的语音助手框架。新节点(数据点)添加到图数据库以创建来自终端用户的语音意图之间的连接。1. The platform processes data from multiple voice assistant frameworks (such as Alexa, Google Home, Apple HomePod, chatbots) into a single API/business logic layer. The platform refines the data and processes it to enhance the understanding of the end user's intent. In contrast to rule-based engines, the platform uses graph-based pattern matching. Graph-based pattern matching allows for a consistent and confident way to manage mapping cross-assistant intents with the features of the platform used. This makes voice applications more manageable and updateable, while still giving the flexibility to enable machine learning to update the location of nodes in the graph. The graph-based approach requires only one step to support newly added voice assistant frameworks. New nodes (data points) are added to the graph database to create connections between voice intents from end users.

2.因为平台访问来自多个语音助手框架的数据,平台可以比较某些框架相对于其它的怎样执行。例如,平台可以看到跨各种语音助手框架的不同语音应用和特征的失败率,且结果,可以使用机器学习和算法来比它们正在使用的特定语音助手框架更好地理解终端用户的的意图。通过对于相同类型的内容检测每个框架的成功和失败的模式,这是可能的,且确定什么改变将使得它更成功又能允许找到适配所有支持的框架的内容变化的最好的超集。2. Because the platform accesses data from multiple voice assistant frameworks, the platform can compare how certain frameworks perform relative to others. For example, the platform can see the failure rates of different voice applications and features across various voice assistant frameworks, and as a result, can use machine learning and algorithms to better understand the end user's intent than the specific voice assistant framework they are using. This is possible by detecting the success and failure patterns of each framework for the same type of content, and determining what changes will make it more successful and allow the best superset of content changes to be found that fits all supported frameworks.

3.因为平台通过单个API跨多个装置收集性能数据,它可以收集和分析性能并有效地提供内容推荐。平台使用机器学习和它自己的算法来保证一个语音应用相对于另一个做得怎样,以在平台的用户界面内直接对语音应用开发者做出实时动态内容建议。这可以优化语音应用的性能并增强总的终端用户体验。3. Because the platform collects performance data across multiple devices through a single API, it can collect and analyze performance and effectively provide content recommendations. The platform uses machine learning and its own algorithms to ensure how one voice application is doing relative to another to make real-time dynamic content recommendations directly to the voice application developer within the platform's user interface. This can optimize the performance of voice applications and enhance the overall end-user experience.

4.平台支持动态内容的集合,提供回答问题或者给出响应的多于一个方式。因为提示和响应可以逐会话地改变,这创建更参与性的语音体验。它还允许取决于终端用户的偏好和人口分布的角色的创建和改变语音体验。相反地,例如,如果十个终端用户问Alexa相同问题,语音助手将十次都以同样的方式交互。这里描述的平台允许语音应用开发者对于十个用户中的每一个设置无限的不同响应,且响应甚至可以对每个特定个体个性化。例如,如果平台确定终端用户是在佐治亚州生活的35岁女性,则开发者可以决定终端用户可能与具有南方口音且使用本地口语和本地引用说话的另一女性讲话更舒服。平台允许开发者改变当与终端用户讲话时特定语音平台使用的词。开发者还可以使用平台来记录具有有关的性别、口音、方言等的业余或者专业的语音人才。结果是终端用户和他们的语音助手装置之间的更可信的/人的交互。4. The platform supports a collection of dynamic content, providing more than one way to answer questions or give responses. Because prompts and responses can change from session to session, this creates a more engaging voice experience. It also allows the creation and change of voice experiences depending on the preferences and demographics of the end user. On the contrary, for example, if ten end users ask Alexa the same question, the voice assistant will interact in the same way ten times. The platform described here allows voice application developers to set unlimited different responses for each of the ten users, and the response can even be personalized for each specific individual. For example, if the platform determines that the end user is a 35-year-old woman living in Georgia, the developer can decide that the end user may be more comfortable speaking to another woman with a southern accent and using local colloquialisms and local references. The platform allows developers to change the words used by a specific voice platform when speaking to the end user. Developers can also use the platform to record amateur or professional voice talents with relevant gender, accent, dialect, etc. The result is a more credible/human interaction between end users and their voice assistant devices.

5.平台自然地支持用于提示和响应的多语言内容。为了触及美国和世界范围内的更多听众,这是有用的。它还创建终端用户和他们的语音助手装置之间的更包容性和人的体验。多语言支持与添加、修改和去除多语言内容的能力一起内置在用于非英语讲话管理者的界面内。5. The platform naturally supports multi-lingual content for prompts and responses. This is useful in order to reach more listeners in the United States and around the world. It also creates a more inclusive and human experience between end users and their voice assistant devices. Multi-lingual support is built into the interface for non-English speech managers along with the ability to add, modify and remove multi-lingual content.

6.平台经由预定义的模块以样本内容提供快速上市并经由定制提供灵活性。平台让开发者使用预定义的模块和平台的内容管理系统,或者使用经由API与平台接口连接的他们自己模块和内容的组合创建定制的语音体验。这是重要的,因为它将使语音应用创建者/管理者能够创建和管理更定制和可信的语音体验,这最终将有益于终端用户。6. The platform provides fast time to market with sample content via predefined modules and flexibility via customization. The platform lets developers create customized voice experiences using predefined modules and the platform's content management system, or using a combination of their own modules and content connected to the platform interface via APIs. This is important because it will enable voice application creators/managers to create and manage more customized and trusted voice experiences, which will ultimately benefit end users.

7.与AI计算机语音相对比使用人的语音用于提示和响应导致更可信的和参与性的体验。平台允许管理者直接在平台内创建和编辑音频与视频内容。不需要离开平台去创建新内容。管理者可以创建语音应用中的语音交互,包括都在一个地方创建富媒体(音频与视频)内容。在典型认知中,期望管理者在语音应用平台外创建音频与视频资产。平台使管理者能够在平台和其用户界面内直接添加媒体,因此增加效率和快速上市。另外,这最终导致用于终端用户的更深、更丰富的语音体验。7. Using human voice for prompts and responses as opposed to AI computer voice results in a more credible and engaging experience. The platform allows managers to create and edit audio and video content directly within the platform. There is no need to leave the platform to create new content. Managers can create voice interactions in voice applications, including creating rich media (audio and video) content all in one place. In typical cognition, managers are expected to create audio and video assets outside the voice application platform. The platform enables managers to add media directly within the platform and its user interface, thereby increasing efficiency and quick time to market. In addition, this ultimately leads to a deeper, richer voice experience for end users.

8.语音助手装置在它们怎样基于它的内部硬件处理多媒体上变化。一个装置可以支持视频、音频、图像和文本,而另一个可以仅支持文本和音频。平台在平台的用户界面中直接实时地提供关于平台内的特定内容是否由特定语音助手装置和框架支持的媒体指导。这向用户提供关于他或者她应该聚焦于什么内容的重要信息,同时学习如何优化特定语音助手装置上的体验。8. Voice assistant devices vary in how they handle multimedia based on their internal hardware. One device may support video, audio, images, and text, while another may only support text and audio. The platform provides media guidance in real time directly in the platform's user interface about whether specific content within the platform is supported by a specific voice assistant device and framework. This provides the user with important information about what content he or she should focus on, while learning how to optimize the experience on a specific voice assistant device.

因此,总的来说,在一方面,从语音助手装置接收到根据一个或多个语音助手框架的相应协议表示的请求。每一个请求表示由用户到相应的语音助手装置的语音输入。接收到的请求根据公共请求协议重新表示。基于接收到的,根据公共响应协议表示对请求的响应。每一个响应根据相对于其表示相应的请求的框架的协议重新表示。该响应发送到语音助手装置以呈现给用户。Therefore, in general, on the one hand, requests are received from a voice assistant device according to a corresponding protocol representation of one or more voice assistant frameworks. Each request represents a voice input by a user to a corresponding voice assistant device. The received request is re-represented according to a common request protocol. Based on the received, a response to the request is represented according to a common response protocol. Each response is re-represented according to a protocol relative to the framework in which it represents the corresponding request. The response is sent to the voice assistant device for presentation to the user.

实现可以包括如下特征之一或者两个或更多个的组合。请求根据两个或更多语音助手框架的相应协议表示。语音助手框架包括亚马逊、苹果、谷歌、微软或者聊天机器人开发者中的至少一个的框架。响应的生成包括使用来自请求的信息遍历图。遍历图包括标识要用于实现响应的特征。特征以模块组织。预定义至少一个模块。定制地定义至少一个模块。至少一个模块包括预定义的特征与适应特定产业或者组织的预定义的内容项的集合。特征包括关于要在响应中包括的内容项的信息。特征包括关于要在响应中包括的动态内容项的信息。预定义至少一个内容项。定制地定义至少一个内容项。对请求的响应的生成包括执行语音应用。语音应用包括生成对人说出的请求的响应的功能的集合。生成的响应包括词句输出。生成的响应在提供词句输出的同时触发其他功能。指令可由处理器执行以:接收两个或更多框架的关于请求和相应响应的数据,和分析接收到的数据以确定用于框架的响应的比较性能。性能包括一个或多个语音助手框架的性能。性能包括用于实现响应的一个或多个特征的性能。性能包括响应中包括的一个或多个内容项的性能。性能包括一个或多个语音应用的性能。The implementation may include one or a combination of two or more of the following features. The request is expressed according to the corresponding protocol of two or more voice assistant frameworks. The voice assistant framework includes a framework of at least one of Amazon, Apple, Google, Microsoft, or a chatbot developer. The generation of the response includes traversing a graph using information from the request. The traversal graph includes identifying features to be used to implement the response. Features are organized in modules. At least one module is predefined. At least one module is customized. At least one module includes a collection of predefined features and predefined content items adapted to a specific industry or organization. Features include information about content items to be included in the response. Features include information about dynamic content items to be included in the response. At least one content item is predefined. At least one content item is customized. The generation of a response to the request includes executing a voice application. The voice application includes a collection of functions for generating a response to a request spoken by a person. The generated response includes a word output. The generated response triggers other functions while providing the word output. The instructions can be executed by a processor to: receive data about requests and corresponding responses of two or more frameworks, and analyze the received data to determine the comparative performance of the response for the framework. The performance includes the performance of one or more voice assistant frameworks. The capabilities include the capabilities of one or more features used to implement the response. The capabilities include the capabilities of one or more content items included in the response. The capabilities include the capabilities of one or more voice applications.

指令可由处理器执行以在语音应用平台的用户界面展示用于要在响应中包括的内容项的选择和管理的特征。与正在选择或者管理内容项实时地,通过用户界面展示关于与内容项的特性相关联的各个内容项的相对性能的信息。关于选择或者管理的内容项通过用户界面接收信息。执行语音应用以生成包括选择和管理的内容项的呈现的响应。用户界面配置为使非技术训练的人能够选择或者管理内容项并提供和接收关于内容项的信息。指令可由处理器执行以使能从可选的可能内容项中要在给定的一个中包括的内容项的选择。要在给定响应中包括的内容项的选择基于终端用户的语音输入的上下文。终端用户的语音输入的上下文包括响应要发送到的语音助手装置的地理位置。终端用户的语音输入的上下文包括终端用户的人口统计特性。The instructions can be executed by the processor to display the features for selecting and managing content items to be included in the response in the user interface of the voice application platform. In real time with the content items being selected or managed, information about the relative performance of each content item associated with the characteristics of the content item is displayed through the user interface. Information about the selected or managed content items is received through the user interface. The voice application is executed to generate a response including the presentation of the selected and managed content items. The user interface is configured to enable non-technically trained people to select or manage content items and provide and receive information about the content items. The instructions can be executed by the processor to enable the selection of content items to be included in a given one from the optional possible content items. The selection of content items to be included in a given response is based on the context of the voice input of the terminal user. The context of the voice input of the terminal user includes the geographic location of the voice assistant device to which the response is to be sent. The context of the voice input of the terminal user includes the demographic characteristics of the terminal user.

指令可由处理器执行以呈现用户界面,配置用户界面以(a)使能创建用于处理请求和用于生成相应的响应的语音应用,(b)维护请求可以与其匹配以生成响应的特征的模块,包括标准模块和定制模块,(c)在每一个模块中包括与其中将响应呈现给终端用户的上下文对应的特征的集合,和(d)通过用户界面展示模块。The instructions are executable by a processor to present a user interface, configuring the user interface to (a) enable creation of a voice application for processing requests and for generating corresponding responses, (b) maintain modules of features to which requests can be matched to generate responses, including standard modules and customized modules, (c) include in each module a set of features corresponding to the context in which the response is presented to an end user, and (d) present the modules through the user interface.

指令可由处理器执行以在语音应用平台的用户界面展示用于使能选择和管理要在响应中包括的内容项的特征。每一个内容项需要语音助手装置具有相应的内容呈现性能。在内容项的选择和管理期间,通过用户界面同时展示关于符合各个不同语音助手框架以呈现选择和管理的内容项的语音助手装置的性能的信息。语音应用平台关于语音助手框架的性能和它们将怎样表示图像、音频、视频及媒体的其他形式来引导非技术训练的用户。The instructions are executable by the processor to display features for enabling selection and management of content items to be included in a response in a user interface of the voice application platform. Each content item requires the voice assistant device to have corresponding content presentation capabilities. During the selection and management of content items, information about the performance of the voice assistant device that conforms to each different voice assistant framework to present the selected and managed content items is simultaneously displayed through the user interface. The voice application platform guides non-technically trained users about the performance of the voice assistant framework and how they will represent images, audio, video, and other forms of media.

总的来说,在一方面,通过通信网络从符合一个或多个不同语音助手框架的语音助手装置接收请求。请求用于基于终端用户的讲话的服务。终端用户的讲话表示意图。从用于服务的请求推导出的数据用于遍历节点和边缘的图以达到匹配各个用于服务的请求的特征。执行特征以生成响应。响应通过通信网络发送语音助手装置以使得它们响应于各个终端用户。In general, in one aspect, requests are received from voice assistant devices conforming to one or more different voice assistant frameworks over a communication network. The request is for a service based on an end user's speech. The end user's speech indicates intent. Data derived from the request for service is used to traverse a graph of nodes and edges to arrive at features that match each request for service. The features are executed to generate responses. The responses are sent to the voice assistant devices over the communication network so that they respond to each end user.

实现可以包括如下特征之一或者两个或更多个的组合。从其接收到请求的语音助手装置符合两个或更多不同语音助手框架。通过将请求中的信息抽象为跨两个或更多不同语音助手框架公共的数据格式来从用于服务的请求推导出数据。图的节点使用机器学习算法的输出更新。关于请求的信息用于标识在其开始遍历的图的初始节点。节点自动地添加到图,以用作在其相对于符合附加语音助手框架的请求开始遍历的图的初始节点。An implementation may include one or a combination of two or more of the following features. The voice assistant device from which the request is received complies with two or more different voice assistant frameworks. Data is derived from the request for the service by abstracting the information in the request into a data format that is common across two or more different voice assistant frameworks. The nodes of the graph are updated using the output of the machine learning algorithm. Information about the request is used to identify the initial node of the graph at which traversal begins. The node is automatically added to the graph to serve as the initial node of the graph at which traversal begins relative to the request that complies with the additional voice assistant framework.

总的来说,在一方面,通过通信网络从符合一个或多个不同语音助手框架的语音助手装置接收请求。请求用于基于终端用户的讲话的服务。终端用户的讲话表示意图。确定对接收到的请求的响应。响应配置为通过通信网络发送到语音助手装置以使得它们响应于各个终端用户。评估响应的确定的成功的度量。基于响应的成功的相对度量,用户可以通过用户界面管理对用于服务的请求的后续响应。In general, on the one hand, a request is received from a voice assistant device that conforms to one or more different voice assistant frameworks via a communication network. The request is for a service based on an end user's speech. The end user's speech indicates intent. A response to the received request is determined. The response is configured to be sent to the voice assistant device via the communication network so that they respond to the respective end users. A measure of success of the determination of the response is evaluated. Based on the relative measure of success of the response, the user can manage subsequent responses to the request for the service through a user interface.

实现可以包括如下特征之一或者两个或更多个的组合。从其接收到请求的语音助手装置符合两个或更多不同语音助手框架。提出的响应基于评估的成功的度量通过用户界面呈现给用户,且用户可以基于提出的响应选择要发送到语音助手装置的响应。成功的度量的评估包括跨两个或更多不同的语音助手框架评估由响应携带的内容项的成功。成功的度量的评估包括评估相对于响应要发送到的语音助手装置的各个语音助手框架的响应的成功。成功的度量的评估包括评估相对于配置为接收请求和确定响应的两个或更多不同语音应用的响应的成功。基于成功的度量管理要在后续响应中携带的内容项。Implementations may include one or a combination of two or more of the following features. The voice assistant device from which the request is received complies with two or more different voice assistant frameworks. The proposed response is presented to the user through a user interface based on the evaluated success metric, and the user can select the response to be sent to the voice assistant device based on the proposed response. The evaluation of the success metric includes evaluating the success of the content item carried by the response across two or more different voice assistant frameworks. The evaluation of the success metric includes evaluating the success of the response of each voice assistant framework relative to the voice assistant device to which the response is to be sent. The evaluation of the success metric includes evaluating the success of the response relative to two or more different voice applications configured to receive requests and determine responses. Manage the content items to be carried in subsequent responses based on the success metric.

总的来说,在一方面,在语音应用平台的用户界面展示特征使能选择和管理要在正在选择和管理内容项的同时,要由语音应用提供给符合一个或多个不同语音助手框架的语音助手装置的响应中包括的内容项,通过用户界面展示关于与内容项的特性相关联的各个内容项的相对性能的信息。关于选择和管理的内容项通过用户界面接收信息。执行语音应用以生成响应以包括选择和管理的内容项。In general, in one aspect, a user interface presentation feature of a voice application platform enables selection and management of content items to be included in a response provided by a voice application to a voice assistant device that conforms to one or more different voice assistant frameworks while content items are being selected and managed, presenting information about relative performance of various content items associated with characteristics of the content items through a user interface. Information is received through the user interface about the selected and managed content items. The voice application is executed to generate a response to include the selected and managed content items.

实现可以包括如下特征之一或者两个或更多个的组合。从符合两个或更多不同语音助手框架的语音助手装置聚集使用数据。关于来自聚集的使用数据的各个内容项的相对性能生成信息。通过通用API聚集使用数据。通过机器学习算法生成关于相对性能的信息。Implementations may include one or a combination of two or more of the following features: Aggregating usage data from voice assistant devices that conform to two or more different voice assistant frameworks. Generating information about the relative performance of various content items from the aggregated usage data. Aggregating usage data through a common API. Generating information about relative performance through a machine learning algorithm.

总的来说,在一方面,通过通信网络从符合一个或多个不同语音助手框架的语音助手装置接收用于服务的请求。用于服务的请求基于终端用户的讲话。终端用户的讲话表示意图。确定对接收到的请求的响应。响应配置为通过通信网络发送到语音助手装置以使得它们响应于各个终端用户。响应包括内容项。从可选的可能内容项中选出给定的一个响应中包括的内容项。要在给定响应中包括的内容项的选择基于终端用户的表示的意图的上下文。In general, on the one hand, a request for service is received from a voice assistant device that conforms to one or more different voice assistant frameworks via a communication network. The request for service is based on an end user's speech. The end user's speech indicates intent. A response to the received request is determined. The response is configured to be sent to the voice assistant device via the communication network so that they respond to the respective end users. The response includes content items. Content items included in a given one of the responses are selected from the optional possible content items. The selection of content items to be included in a given response is based on the context of the end user's expressed intent.

实现可以包括如下特征之一或者两个或更多个的组合。从其接收到请求的语音助手装置符合两个或更多不同语音助手框架。语音助手框架之一包括聊天机器人框架。终端用户的表示的意图的上下文可以包括响应要发送到的语音助手装置的地理位置。终端用户的表示的意图的上下文可以包括终端用户的人口统计特性。人口统计特性包括从响应要发送到的语音助手装置的地理位置推断或者从接收到的请求中包括的词的特性推断的语言特性。人口统计特性可以包括年龄。语言特性包括本地口语或者本地引用。人口统计特性可以包括性别。可以选择基于给定响应中要包括哪个内容项的终端用户偏好。Implementations may include one or a combination of two or more of the following features. The voice assistant device from which the request is received complies with two or more different voice assistant frameworks. One of the voice assistant frameworks includes a chatbot framework. The context of the terminal user's expressed intent may include the geographic location of the voice assistant device to which the response is to be sent. The context of the terminal user's expressed intent may include the demographic characteristics of the terminal user. Demographic characteristics include language characteristics inferred from the geographic location of the voice assistant device to which the response is to be sent or inferred from the characteristics of the words included in the received request. Demographic characteristics may include age. Language characteristics include local colloquialisms or local references. Demographic characteristics may include gender. A terminal user preference based on which content item to include in a given response may be selected.

总的来说,在一方面,呈现用户界面以用于语音应用的开发。用户界面配置为使能用于处理从语音助手装置接收到的请求和用于生成用于语音助手装置呈现给终端用户的相应响应的语音应用的创建。维护请求可以与其匹配以生成响应的特征的模块。每一个模块包括与其中将响应呈现给终端用户的上下文对应的特征的集合。模块的维护包括(a)维护用于相应上下文的标准模块,和(b)使能请求可以与其匹配以生成用于语音助手装置的定制响应的特征的定制模块的生成和维护。通过用户界面展示模块。In general, on the one hand, a user interface is presented for the development of voice applications. The user interface is configured to enable the creation of a voice application for processing requests received from a voice assistant device and for generating corresponding responses for the voice assistant device to present to the end user. Modules with features to which requests can be matched to generate responses are maintained. Each module includes a set of features corresponding to the context in which the response is presented to the end user. Maintenance of modules includes (a) maintaining standard modules for corresponding contexts, and (b) enabling requests to be matched to generate customized modules with features for customized responses for voice assistant devices and maintenance. Modules are displayed through a user interface.

实现可以包括如下特征之一或者两个或更多个的组合。维护内容项以在生成响应时与特征一起使用。内容项的维护包括(a)维护标准内容项,和(b)使能要与特征一起使用以生成用于语音助手装置的定制响应的定制内容项的生成和维护。上下文涉及定义的市场部分中的产品或者服务。上下文涉及终端用户的目标组的人口分布。上下文涉及语音助手装置的性能。上下文涉及要在生成响应时与特征一起使用的内容项的类型。An implementation may include one or a combination of two or more of the following features. Maintaining content items for use with features when generating responses. Maintenance of content items includes (a) maintaining standard content items, and (b) enabling the generation and maintenance of customized content items to be used with features to generate customized responses for voice assistant devices. The context relates to products or services in a defined market segment. The context relates to the demographic distribution of a target group of end users. The context relates to the performance of the voice assistant device. The context relates to the type of content item to be used with the feature when generating a response.

总的来说,在一方面,呈现用户界面以用于语音应用的开发。用户界面配置为使能用于处理从语音助手装置接收到的请求和用于生成用于语音助手装置呈现给终端用户的相应响应的语音应用的创建。确定对接收到的请求的响应。响应配置为通过通信网络发送到语音助手装置以使得它们响应于各个终端用户。响应包括内容项。用户界面使能以富媒体格式的内容项的创建和编辑以包括在响应中。In general, in one aspect, a user interface is presented for development of voice applications. The user interface is configured to enable creation of voice applications for processing requests received from a voice assistant device and for generating corresponding responses for the voice assistant device to present to an end user. Responses to the received requests are determined. The responses are configured to be sent to the voice assistant device via a communication network so that they are responsive to respective end users. The responses include content items. The user interface enables creation and editing of content items in a rich media format for inclusion in the responses.

实现可以包括如下特征之一或者两个或更多个的组合。富媒体格式包括图像、音频和视频格式。通过使能语音应用的创建的平台呈现用户界面。平台使能通过用户界面区域在平台内内容项的直接记录和编辑。Implementations may include one or a combination of two or more of the following features. Rich media formats include image, audio, and video formats. User interfaces are presented through the platform enabling the creation of voice applications. The platform enables direct recording and editing of content items within the platform through the user interface area.

总的来说,在一方面,在语音应用平台的用户界面展示特征。特征使能要在响应中包括的内容项的选择和管理,该响应要由语音应用提供给符合一个或多个不同语音助手框架的语音助手装置。每一个内容项需要语音助手装置具有相应的内容呈现性能。在正在选择和管理内容项的同时,关于符合各个不同语音助手框架的语音助手装置的性能通过用户界面同时展示信息以呈现正在选择和管理的内容项。In general, on the one hand, features are displayed in the user interface of the voice application platform. Features enable selection and management of content items to be included in a response to be provided by the voice application to a voice assistant device that conforms to one or more different voice assistant frameworks. Each content item requires the voice assistant device to have corresponding content presentation capabilities. While content items are being selected and managed, information about the capabilities of the voice assistant devices that conform to the various different voice assistant frameworks is simultaneously displayed through the user interface to present the content items being selected and managed.

实现可以包括如下特征之一或者两个或更多个的组合。响应要提供到的语音助手装置符合两个或更多不同语音助手框架。内容呈现性能包括语音助手装置的硬件和软件的性能。内容呈现性能涉及内容项的类型。内容项的类型包括文本、图像、音频和视频。Implementations may include one or a combination of two or more of the following features. The voice assistant device to which the response is to be provided complies with two or more different voice assistant frameworks. Content rendering performance includes the performance of the hardware and software of the voice assistant device. Content rendering performance relates to the type of content item. The types of content items include text, images, audio, and video.

总的来说,在一方面,呈现用户界面以用于语音应用的开发。用户界面配置为使能用于处理从语音助手装置接收到的请求和用于生成用于语音助手装置呈现给终端用户的相应响应的语音应用的创建。确定对接收到的请求的响应。响应配置为通过通信网络发送到语音助手装置,以使得它们响应于各个终端用户,响应包括以自然语言表示的内容项。用户界面使用户能够以两个或更多自然语言中的任何一个选择和管理一个或多个内容项的表达。In general, in one aspect, a user interface is presented for development of voice applications. The user interface is configured to enable creation of a voice application for processing requests received from a voice assistant device and for generating corresponding responses for the voice assistant device to present to an end user. Responses to the received requests are determined. The responses are configured to be sent to the voice assistant device via a communication network so that they are responsive to respective end users, the responses including content items represented in a natural language. The user interface enables a user to select and manage the expression of one or more content items in any one of two or more natural languages.

实现可以包括如下特征之一或者两个或更多个的组合。以两个或更多不同自然语言中的任何一个呈现用户界面。根据数据模型表示每个内容项。每一个内容项的表示继承包括内容项的自然语言的属性的对象。Implementations may include one or a combination of two or more of the following features: Presenting a user interface in any one of two or more different natural languages. Representing each content item according to a data model. The representation of each content item inherits an object that includes the properties of the natural language of the content item.

这些及其他方面、特征和实现可以表示为方法、设备、系统、组件、程序产品、经营商业的方法、用于执行功能的装置或者步骤,和以其他方式表示。These and other aspects, features, and implementations may be expressed as methods, apparatus, systems, components, program products, methods of doing business, means or steps for performing the functions, and in other ways.

这些及其他方面、特征和实现将从包括权利要求的以下描述变得明显。These and other aspects, features and implementations will be apparent from the following description including the claims.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1、图2到图10、图14到图21和图29到图32是框图。1 , 2 to 10 , 14 to 21 , and 29 to 32 are block diagrams.

图11A、图11B、图12和图13是代码的示例。11A , 11B , 12 , and 13 are examples of codes.

图22到图28和图30、图33是用户界面屏幕。22 to 28 and 30 and 33 are user interface screens.

具体实施方式Detailed ways

如图1所示,这里我们描述提供通用语音应用平台12(有时我们简单地称为“平台”或者“通用平台”或者“跨装置平台”)的技术10。平台配置用于(在其他动作中)创建、存储、管理、控制和执行语音应用14,并将语音助手服务11提供给语音助手13和语音助手装置18。平台服务两种类型的用户。As shown in Figure 1, we describe here technology 10 that provides a universal voice application platform 12 (sometimes we simply refer to it as a "platform" or a "universal platform" or a "cross-device platform"). The platform is configured to (among other actions) create, store, manage, control, and execute voice applications 14, and to provide voice assistant services 11 to voice assistants 13 and voice assistant devices 18. The platform serves two types of users.

一个类型包括语音助手装置和语音助手的终端用户28。终端用户由通用语音应用服务,所述通用语音应用可以处理来自符合任何框架的语音助手装置的请求并制定可以翻译为可在任何框架中使用的响应的相应的通用响应。One type includes voice assistant devices and voice assistant end users 28. The end users are served by a universal voice application that can process requests from voice assistant devices that conform to any framework and formulate corresponding universal responses that can be translated into responses that can be used in any framework.

另一类型的用户包括平台参与用户45,其通过用户界面39使用以软件的平台作为服务模式,以创建、存储和管理通用语音应用和有关的内容项等。平台配置为使平台参与用户能够基于预定义的标准内容项及语音应用需要的其他组件,快速地创建、存储和管理标准化的通用语音应用。在其他使用模式中,平台配置为使平台参与用户能够创建、存储、管理和控制定制的通用语音应用和有关的内容项,等等。Another type of user includes a platform participating user 45, who uses the platform as a software as a service mode through the user interface 39 to create, store and manage universal voice applications and related content items, etc. The platform is configured to enable the platform participating user to quickly create, store and manage standardized universal voice applications based on predefined standard content items and other components required by voice applications. In other usage modes, the platform is configured to enable the platform participating user to create, store, manage and control customized universal voice applications and related content items, etc.

标准化的通用语音应用、内容项及其他组件可以存储在平台服务器222上。定制的通用语音应用、内容项及其他组件可以存储在定制服务器上。Standardized universal voice applications, content items, and other components may be stored on the platform server 222. Customized universal voice applications, content items, and other components may be stored on a customized server.

在运行时,从终端用户说出的请求(例如,意图)26由处理它们并制定请求消息34的语音助手装置18接收。请求消息34通过通信网络29传递到语音助手服务器31,语音助手服务器31例如由控制特定框架的各方操作(比如相对于Alexa框架的亚马逊)。语音助手服务器处理输入的消息,解析它们以推导出请求消息元素,并将处理的请求信息传递到平台服务器。平台服务器使用所接收的消息元素,以根据正在执行的给定标准化或者定制语音应用来确定最佳响应。为了该目的,平台服务器可以参考在平台服务器上存储和管理的标准语音应用、内容项及其他组件,或者可以对于定制的语音应用、定制的内容项及其他定制的组件参考定制服务器。平台服务器制定相应的适当的响应消息元素35,并将它们返回到语音助手服务器,语音助手服务器使用它们生成正式的语音响应消息32。响应34可以被说出,或者以文本、图像、音频或者视频呈现。平台以各种媒体格式存储内容项52以用于响应。在有些情况下,响应可以涉及比如关闭设备的响应动作。At runtime, the request (e.g., intent) 26 spoken from the end user is received by the voice assistant device 18 that processes them and formulates the request message 34. The request message 34 is passed to the voice assistant server 31 via the communication network 29, which is operated by the parties controlling the specific framework (such as Amazon relative to the Alexa framework). The voice assistant server processes the input message, parses them to derive the request message element, and passes the processed request information to the platform server. The platform server uses the received message elements to determine the best response according to the given standardized or customized voice application being executed. For this purpose, the platform server can refer to the standard voice application, content items and other components stored and managed on the platform server, or can refer to the customized server for customized voice applications, customized content items and other customized components. The platform server formulates the corresponding appropriate response message elements 35 and returns them to the voice assistant server, which uses them to generate a formal voice response message 32. The response 34 can be spoken or presented in text, image, audio or video. The platform stores content items 52 in various media formats for response. In some cases, the response may involve a response action such as shutting down the device.

可以分别由不同的三方创建、管理、操作、拥有或者控制(或者那些动作的组合)三个集合的服务器(平台服务器、定制服务器和语音助手服务器),三方为:(a)将平台操作为商业的平台主人,(b)控制他们自己的定制服务器的平台参与者,和(c)操作他们自己的语音助手服务器以控制处理他们的框架的请求和响应消息的方式的框架开发者(比如微软、亚马逊、谷歌、苹果和chatbots的开发者)。在某些实现中,三个集合的服务器中的两个或更多可以为了它自己的利益或者为了它本身和另一方的利益而由单方控制。The three sets of servers (platform servers, custom servers, and voice assistant servers) may be created, managed, operated, owned, or controlled (or a combination of those actions) by three different parties: (a) the platform owner who operates the platform as a business, (b) platform participants who control their own custom servers, and (c) framework developers (such as Microsoft, Amazon, Google, Apple, and chatbot developers) who operate their own voice assistant servers to control how request and response messages for their frameworks are processed. In some implementations, two or more of the three sets of servers may be controlled by a single party for its own benefit or for the benefit of itself and another party.

因为平台是基于云的(例如,使用通过通信网络与客户端语音助手装置通信的一个或多个服务器实现),所以平台参与者不需要投资附加的基础设施以能够创建、编辑、管理和拥有鲁棒的语音应用。基于云的方法还使能要由控制通用语音应用平台的一方添加的定期更新和增强。更新和增强对平台参与者变得自动地可用和立即可用。Because the platform is cloud-based (e.g., implemented using one or more servers that communicate with client voice assistant devices over a communications network), platform participants do not need to invest in additional infrastructure to be able to create, edit, manage, and own robust voice applications. The cloud-based approach also enables regular updates and enhancements to be added by the party controlling the universal voice application platform. Updates and enhancements become automatically and immediately available to platform participants.

如上所述的平台参与者的示例包括品牌、广告客户、开发者及使用平台的其他实体。Examples of platform participants as described above include brands, advertisers, developers, and other entities using the platform.

在某些示例中,使用平台作为平台参与者的代表或者以平台参与者的名义的人有时被称为“平台参与者用户”、“平台用户”或者“参与者用户”。参与者用户通过一个或多个“参与者用户界面”39或者简单的“用户界面”与平台交互。In some examples, a person using the platform as a representative of or on behalf of a platform participant is sometimes referred to as a "platform participant user," "platform user," or "participant user." Participant users interact with the platform through one or more "participant user interfaces" 39 or simply "user interfaces."

如早先提出的,我们有时称为“标准语音应用”的某些语音应用由控制平台的一方设计、开发和存储,并使得由平台参与者公开地可使用。我们称为“定制语音应用”的某些语音应用包括定制内容项、定制特征或者其他定制组件,且为了特定目的或者由特定平台参与者设计、开发、存储和控制。在有些情况下,这些定制语音应用可以与其他平台参与者共享。在有些情况下,定制语音应用专用于单个平台参与者且不共享。As mentioned earlier, certain voice applications, which we sometimes refer to as "standard voice applications," are designed, developed, and stored by a party that controls the platform and are made publicly available for use by platform participants. Certain voice applications, which we refer to as "custom voice applications," include customized content items, customized features, or other customized components, and are designed, developed, stored, and controlled for specific purposes or by specific platform participants. In some cases, these customized voice applications may be shared with other platform participants. In some cases, a customized voice application is dedicated to a single platform participant and is not shared.

我们广泛地使用术语“语音应用”,以例如包括可以接受关于语音助手装置的用户的请求并制定对请求的响应的元素以返回到要实现响应的语音助手装置的任何应用。语音应用可以通过任何方法创建,该方法涉及指定如何接受和使用关于输入请求的信息和如何使得基于关于输入请求的信息而生成适当的响应的元素。响应可以包括内容项,且可以通过基于关于输入请求的信息执行相关定义的功能来生成响应的元素。在典型的已知系统中,语音应用被作为代码“硬连线”,所述代码接受作为输入的请求并基于请求执行预先指定的方法或功能以生成响应。在我们这里描述的平台和用户界面的优点当中,它们向参与者用户提供易于使用的、鲁棒的、有效的、节省时间的、高度灵活的、跨框架的方法来开发、更新、控制、维护、度量他们使用的语音应用和内容项的有效,并部署他们使用的语音应用和内容项。细粒度的跨框架、跨内容和跨特征分析可供用户可用,且还以后台工作以改进语音应用的有效性。产生的应用是鲁棒的、自适应的、动态的和有效的,还有其他优点。We use the term "voice application" broadly to include, for example, any application that can accept a request from a user of a voice assistant device and formulate an element of a response to the request to return to the voice assistant device to implement the response. The voice application can be created by any method that involves specifying how to accept and use information about the input request and how to generate an appropriate response element based on the information about the input request. The response may include a content item, and the element of the response can be generated by executing a related defined function based on the information about the input request. In a typical known system, the voice application is "hardwired" as code that accepts a request as input and executes a pre-specified method or function based on the request to generate a response. Among the advantages of the platform and user interface we describe here, they provide participant users with an easy-to-use, robust, effective, time-saving, highly flexible, cross-framework method to develop, update, control, maintain, measure the effectiveness of the voice applications and content items they use, and deploy the voice applications and content items they use. Fine-grained cross-frame, cross-content, and cross-feature analysis is available to users, and also works in the background to improve the effectiveness of voice applications. The resulting applications are robust, adaptive, dynamic, and efficient, among other advantages.

平台12配置为能够接受符合任何类型的语音助手框架的请求消息元素,使用那些消息元素执行通用语音应用,和返回可以用于制定用于任何类型的语音助手框架的响应消息的通用表示的响应消息元素。The platform 12 is configured to be able to accept request message elements that conform to any type of voice assistant framework, execute a generic voice application using those message elements, and return a response message element that can be used to formulate a generic representation of a response message for any type of voice assistant framework.

换句话说,通用语音应用平台可以使用用于符合其框架的本机协议的每一个语音助手装置的请求消息和响应消息,同时与属于(例如,符合)多个不同的当前和将来的语音助手框架的语音助手装置通信。同时,通用应用平台使平台参与者能够开发、维护和部署鲁棒的通用语音应用,其可以解释属于各种不同框架的语音助手装置的请求并制定响应,而不必须开发、维护和部署多个并行的功能上类似的语音应用,一个语音应用用于要服务的每个框架。In other words, the universal voice application platform can communicate with voice assistant devices belonging to (e.g., conforming to) multiple different current and future voice assistant frameworks using request messages and response messages for each voice assistant device that conforms to its framework's native protocol. At the same time, the universal application platform enables platform participants to develop, maintain, and deploy robust universal voice applications that can interpret requests and formulate responses for voice assistant devices belonging to a variety of different frameworks without having to develop, maintain, and deploy multiple parallel functionally similar voice applications, one for each framework to be served.

因此,在平台的某些实现的益处当中,平台参与者可以通过单个易于使用的一致参与者用户界面来制定、维护和部署参与的有效鲁棒的语音应用。产生的语音应用可以通用地服务亚马逊Alexa、谷歌助手、苹果HomePod、微软Cortana和任何其它种类的当前或者将来的语音助手和语音助手装置。平台设计用于使平台参与者能够快速和容易地部署语音应用,同时通过定制性能提供灵活性。Thus, among the benefits of certain implementations of the platform, platform participants can formulate, maintain and deploy participating, effective and robust voice applications through a single, easy-to-use, consistent participant user interface. The resulting voice applications can universally serve Amazon Alexa, Google Assistant, Apple HomePod, Microsoft Cortana, and any other kind of current or future voice assistants and voice assistant devices. The platform is designed to enable platform participants to quickly and easily deploy voice applications while providing flexibility through customization capabilities.

本技术和平台的特征和优点还如下:The features and advantages of this technology and platform are as follows:

基于图的。平台可以通过单个通用API和通用商业逻辑层,与任何语音助手框架交互,提供用于其的服务和处理与其相关联的数据,语音助手框架包括由亚马逊、谷歌、苹果、微软等开发的现有的专用框架和非专用框架。平台抽象接收到的请求消息并使用基于图的模式匹配而不是基于规则的引擎(虽然组合基于图的模式匹配与基于规则的方法是可能的)来处理它们以理解终端用户的请求(例如,意图)。基于图的模式匹配使能跨多个语音助手框架地将请求消息映射到要在制定的响应中使用的特征的一致和确信的方法。基于图的方法是可管理的、可更新的和足够灵活的,以使机器学习能够更新图中的节点的位置。可以通过基于从符合新语音助手框架的语音助手装置接收到的请求消息,添加新节点(数据点)到图数据库以创建可达到的连接,来通过基于图的方法简单地适应新语音助手框架。Graph-based. The platform can interact with any voice assistant framework through a single universal API and a universal business logic layer, provide services for it and process data associated with it. The voice assistant framework includes existing dedicated frameworks and non-dedicated frameworks developed by Amazon, Google, Apple, Microsoft, etc. The platform abstracts the received request messages and processes them using graph-based pattern matching rather than a rule-based engine (although it is possible to combine graph-based pattern matching with rule-based methods) to understand the end user's request (e.g., intent). Graph-based pattern matching enables a consistent and confident method of mapping request messages to features to be used in formulated responses across multiple voice assistant frameworks. The graph-based approach is manageable, updateable, and flexible enough to enable machine learning to update the position of nodes in the graph. The new voice assistant framework can be simply adapted through a graph-based approach by adding new nodes (data points) to the graph database to create a reachable connection based on request messages received from voice assistant devices that conform to the new voice assistant framework.

跨框架分析。因为通用语音应用平台可以访问来自多个不同语音助手框架的使用数据,所以平台可以比较框架之间的相对性能。例如,平台可以分析不同语音应用的在处理和响应于接收到的请求消息时的失败率,和跨多个语音助手框架的特定特征或者内容项的失败率。结果,平台可以使用机器学习和平台算法来比由正在使用的特定语音助手框架(其仅可以访问该框架的使用数据)可能理解请求更好地理解终端用户的请求(意图)。例如通过检测用于给定类型的特征或者内容项的每个框架的成功和失败的模式和确定将使得内容项或者特征更成功的改变来实现该优点。该分析使平台能够标识跨支持的框架的内容项和特征变化的最佳超集。Cross-framework analysis. Because the universal voice application platform can access usage data from multiple different voice assistant frameworks, the platform can compare the relative performance between frameworks. For example, the platform can analyze the failure rates of different voice applications in processing and responding to received request messages, and the failure rates of specific features or content items across multiple voice assistant frameworks. As a result, the platform can use machine learning and platform algorithms to better understand the end user's request (intent) than the request could be understood by the specific voice assistant framework being used (which only has access to the usage data of the framework). This advantage is achieved, for example, by detecting the patterns of success and failure of each framework for a given type of feature or content item and determining changes that will make the content item or feature more successful. This analysis enables the platform to identify the optimal superset of content item and feature changes across supported frameworks.

鲁棒的内容建议。因为平台通过单个API收集跨多个语音助手装置和多个框架的使用数据且可以分析它们的相对性能,所以平台可以向平台参与者提供有效的特征和内容推荐。平台使用机器学习和算法以向平台参与者报告不同语音应用的相对性能(包括给定平台参与者的不同语音应用或者不同平台参与者的不同语音应用),以在平台用户界面内对平台用户直接做出实时动态内容建议。这些建议可以帮助平台用户优化他们的语音应用的性能和增强总体的终端用户体验。Robust content recommendations. Because the platform collects usage data across multiple voice assistant devices and multiple frameworks through a single API and can analyze their relative performance, the platform can provide effective features and content recommendations to platform participants. The platform uses machine learning and algorithms to report the relative performance of different voice applications (including different voice applications of a given platform participant or different voice applications of different platform participants) to platform participants to make real-time dynamic content recommendations directly to platform users within the platform user interface. These recommendations can help platform users optimize the performance of their voice applications and enhance the overall end-user experience.

动态内容。平台支持动态内容的项的集合,以例如提供多于一个可能的对请求的响应,比如对问题的可选回答。动态内容可以使能更吸引人的终端用户体验,例如因为响应可以在会话之间改变。动态内容还使能取决于终端用户的偏好和人口分布来创建语音助手的一个或多个角色和改变终端用户体验。在典型的现有平台中,如果十个终端用户问给定的语音助手相同问题,则语音助手将十次都以同样的方式交互。通用语音应用平台能够为十个终端用户中的每一个制定可能的无限多种响应,并对特定终端用户个性化每个响应。例如,如果平台确定终端用户是生活在佐治亚州的35岁女性,则可以基于开发者的决定来选择特定响应,该决定是这个终端用户可能与具有南方口音和使用本地口语和本地引用讲话的另一女性(语音助手)谈话更舒服。平台使开发者能够改变给定语音助手框架在与终端用户说话时使用的词,和记录具有相关性别、口音、方言或者其他语音特性的业余或者专业语音人才。结果是给定终端用户和语音助手之间的更可信的和可接受的交互。Dynamic content. The platform supports a collection of items of dynamic content, for example, to provide more than one possible response to a request, such as an optional answer to a question. Dynamic content can enable a more attractive end-user experience, for example because the response can change between sessions. Dynamic content also enables the creation of one or more roles of voice assistants and changes in the end-user experience depending on the preferences and demographics of the end-user. In a typical existing platform, if ten end-users ask a given voice assistant the same question, the voice assistant will interact in the same way ten times. The general voice application platform is able to formulate a possible infinite number of responses for each of the ten end-users and personalize each response to a specific end-user. For example, if the platform determines that the end-user is a 35-year-old woman living in Georgia, a specific response can be selected based on the developer's decision that this end-user may be more comfortable talking to another woman (voice assistant) with a southern accent and using local colloquialisms and local references. The platform enables developers to change the words used by a given voice assistant framework when speaking to an end-user, and record amateur or professional voice talents with relevant gender, accent, dialect or other voice characteristics. The result is a more trustworthy and acceptable interaction between a given end user and the voice assistant.

典型地,平台不能“听到”终端用户的口音,因为请求消息不携带来自任何语音助手框架的音频文件。平台仅接收文本且可以寻找给出终端用户可能具有口音的线索的关键词。示例将是文本中的“y’all”可以归因于美国南方口音。如果可用,平台还可以耦合关键字的标识与地理信息。从在亚特兰大,GA的语音助手装置接收到的关键词“y’all”可以暗示南方口音。Typically, the platform cannot "hear" the end user's accent because the request message does not carry an audio file from any voice assistant framework. The platform only receives text and can look for keywords that give clues that the end user may have an accent. An example would be that "y'all" in text can be attributed to a Southern American accent. The platform can also couple the identification of keywords with geographic information if available. The keyword "y'all" received from a voice assistant device in Atlanta, GA can suggest a Southern accent.

多语言内容。平台对于响应天然地支持多语言内容,使平台参与者能够触及美国和世界范围内的更多听众。平台还使能终端用户和语音助手之间的更包含性和人性的体验。多语言支持与添加、修改和去除多语言内容的能力一起内置在用于非英语讲话参与者用户的界面内。Multilingual Content. The platform natively supports multilingual content for responses, enabling platform participants to reach a wider audience in the U.S. and around the world. The platform also enables a more inclusive and human experience between the end user and the voice assistant. Multilingual support is built into the interface for non-English speaking participant users along with the ability to add, modify, and remove multilingual content.

预先存储的和定制模块和内容。平台提供以下两者:(a)使用预定义的(例如,标准)特征、特征模块和样本内容项的对于品牌拥有者或者其他平台参与者的加速上市,和(b)使用定制或者特征、模块和内容项等的定制的创建的灵活性。平台参与者可以通过易于使用的内容管理系统使用标准特征、模块和内容项23以加速开发,或者可以通过创建使用API与平台一起操作的他们自己的定制特征、模块和内容项等来创建定制的终端用户体验。该布置使平台参与者能够创建和管理定制和可信的终端用户体验以更好地服务终端用户。Pre-stored and custom modules and content. The platform provides both: (a) accelerated time to market for brand owners or other platform participants using predefined (e.g., standard) features, feature modules, and sample content items, and (b) flexibility in the use of custom or customized creation of features, modules, content items, etc. Platform participants can use standard features, modules, and content items 23 through an easy-to-use content management system to accelerate development, or can create a customized end-user experience by creating their own custom features, modules, content items, etc. that operate with the platform using APIs. This arrangement enables platform participants to create and manage customized and trusted end-user experiences to better serve end users.

人的语音。使用人的语音用于响应而不仅合成的计算机语音产生更可信的和参与感的终端用户体验。平台使得参与者用户能够通过用户界面在平台内直接创建和编辑音频和视频内容项,而不需要诉诸其他跨平台内容创建应用(尽管也可以使用跨平台内容创建应用)。平台参与者可以经由单个的参与者用户界面来创建利用和包括富媒体(音频与视频)内容项的语音应用。该布置的优点包括更高的效率和快速上市和更深、更丰富的终端用户体验。Human voice. Using human voice for responses rather than just synthesized computer voice creates a more credible and engaging end-user experience. The platform enables participant users to create and edit audio and video content items directly within the platform through a user interface without resorting to other cross-platform content creation applications (although cross-platform content creation applications may also be used). Platform participants can create voice applications that utilize and include rich media (audio and video) content items via a single participant user interface. The advantages of this arrangement include greater efficiency and speed to market and a deeper, richer end-user experience.

关于装置的性能的媒体指导。语音助手框架(和符合它们的语音助手装置)在它们怎样基于它们的内部硬件和软件来处理各种类型的内容项上变化。例如,一个框架可以支持视频、音频、图像和文本,而另一个可能仅支持文本和音频。通用语音应用平台提供关于是否由特定语音助手装置或者语音助手框架支持特定类型的内容项的媒体指导,并在平台的参与者用户界面中实时地直接提供该指导。该指导使品牌或者其他平台参与者能够在学习如何优化特定语音助手装置或者语音助手框架上的终端用户体验的同时确定强调哪个内容。Media guidance about the performance of the device. Voice assistant frameworks (and voice assistant devices that conform to them) vary in how they handle various types of content items based on their internal hardware and software. For example, one framework may support video, audio, images, and text, while another may only support text and audio. The general voice application platform provides media guidance on whether specific types of content items are supported by specific voice assistant devices or voice assistant frameworks, and provides this guidance directly in the platform's participant user interface in real time. This guidance enables brands or other platform participants to determine which content to emphasize while learning how to optimize the end-user experience on a specific voice assistant device or voice assistant framework.

如早先解释的,在我们这里描述的本技术的某些实现中,语音助手装置18处理终端用户28的语音26,将语音解释为相应的请求48,在根据语音助手装置属于的语音助手框架的协议而表示的请求消息中包括请求(例如,意图),并通过一个或多个通信网络将请求消息转发到服务器,服务器处理接收到的请求消息。也如图1所示的,服务器使用语音应用14的相关特征43制定响应,并(在大多数情况下)将相应的响应消息发送回到语音助手装置。通用语音应用平台包括组织和提供特征43以使语音应用能够处理请求的模块46。在平台的某些实现中,这种模块的特征实现为处理用于语音应用的可能的多种不同类型的请求(例如,意图)的请求处理器41,例如,与比如事件、FAQ、日常更新、提醒、清单、调查和最新消息的特征相关联的请求。As explained earlier, in certain implementations of the present technology described herein, the voice assistant device 18 processes the speech 26 of the end user 28, interprets the speech into a corresponding request 48, includes the request (e.g., intent) in a request message represented according to the protocol of the voice assistant framework to which the voice assistant device belongs, and forwards the request message to a server via one or more communication networks, which processes the received request message. As also shown in FIG. 1 , the server formulates a response using the relevant features 43 of the voice application 14 and (in most cases) sends a corresponding response message back to the voice assistant device. The general voice application platform includes a module 46 that organizes and provides features 43 to enable the voice application to process requests. In certain implementations of the platform, features of such a module are implemented as a request processor 41 that handles a variety of different types of requests (e.g., intents) for a voice application, such as requests associated with features such as events, FAQs, daily updates, reminders, lists, surveys, and latest news.

在给定模块中实现为请求处理器的特征可以表示一堆特征,相对于例如关于共享公共特性的平台参与者的集合,比如属于产业或者市场的实体的公共使用情况,所述一堆特征全部是有用的。每个模块也可以包括或者与样本内容23的预先存储的项相关联,其可以在制定对请求的响应时由请求处理器调用和使用。样本内容的预先存储的项的可用性可以改进用于平台参与者的快速上市。Features implemented as request handlers in a given module may represent a collection of features that are all useful with respect to, for example, a collection of platform participants that share common characteristics, such as common use cases for entities belonging to an industry or market. Each module may also include or be associated with pre-stored items of sample content 23 that may be called and used by the request handler when formulating responses to requests. The availability of pre-stored items of sample content may improve rapid time to market for platform participants.

参与者用户(例如,代表特定公司、品牌、组织或者其他平台参与者的利益工作的人)可以使用平台的内容管理系统54,通过平台的用户界面创建、编辑和管理定制的内容项22。内容管理系统提供不需要技术知识来创建、修改和去除塑造终端用户体验的内容项的直观的用户界面。Participant users (e.g., people working on behalf of a particular company, brand, organization, or other platform participant's interests) can use the platform's content management system 54 to create, edit, and manage customized content items 22 through the platform's user interface. The content management system provides an intuitive user interface that does not require technical knowledge to create, modify, and remove content items that shape the end-user experience.

平台的内容管理系统还通过收集使用数据和向收集的使用数据55应用分析56,来向参与者提供指导和洞察。在用户界面中,可以通过由特定框架支持的内容项653的媒体格式(例如,由亚马逊Echo Show支持的视频和图像媒体)的比如用于语音助手装置的视觉指示符的线索提供指导。洞察例如包括关于跨不同框架的语音助手装置的对于给定请求的由语音应用制定的响应的成功率的分析(例如,谷歌助手比亚马逊Alexa对给定请求更多成功地响应)。The platform's content management system also provides guidance and insights to participants by collecting usage data and applying analytics 56 to the collected usage data 55. In the user interface, guidance can be provided through cues such as visual indicators for voice assistant devices of the media formats of content items 653 supported by a particular framework (e.g., video and image media supported by Amazon Echo Show). Insights include, for example, analysis of the success rate of responses formulated by voice applications for a given request across voice assistant devices of different frameworks (e.g., Google Assistant responds more successfully to a given request than Amazon Alexa).

如图2所示,通用语音应用平台12使用独立功能层的架构70。层包括:API层72、商业逻辑层74、特征和模块层76、CMS(内容管理系统)层78和数据层80。2 , the universal voice application platform 12 uses an architecture 70 of independent functional layers. The layers include: API layer 72 , business logic layer 74 , feature and module layer 76 , CMS (content management system) layer 78 and data layer 80 .

API层API Layer

API层处理从语音助手装置接收到的请求消息73和从定制模块和特征接收到的请求75。API层接受根据与任何可能的专用或者非专用语音助手框架相关联的协议82表示的请求消息及其他请求。当API层接收符合任何定义的协议的请求消息或者其他请求时,API层将接收到的请求消息或者请求抽象(例如,翻译、变换或者映射)为根据公共通用协议84表示的请求以用于进一步处理。该抽象使能使用通用商业逻辑及其他逻辑层(比如特征和模块层和CMS层)支持多种专用和非专用语音助手框架、语音助手装置和语音助手,而不需要用于每个语音助手框架的逻辑层的单独的堆栈。The API layer processes request messages 73 received from voice assistant devices and requests 75 received from custom modules and features. The API layer accepts request messages and other requests expressed according to a protocol 82 associated with any possible dedicated or non-dedicated voice assistant framework. When the API layer receives a request message or other request that conforms to any defined protocol, the API layer abstracts (e.g., translates, transforms, or maps) the received request message or request into a request expressed according to a common general protocol 84 for further processing. This abstraction enables the use of common business logic and other logic layers (such as feature and module layers and CMS layers) to support a variety of dedicated and non-dedicated voice assistant frameworks, voice assistant devices, and voice assistants without the need for a separate stack of logic layers for each voice assistant framework.

作为示例,亚马逊Alexa和谷歌助手每个向平台的API层提供以JSON表示的请求消息以用于处理。用于表示请求消息的协议通常相同,而与语音助手装置符合的框架无关,但是请求消息中包括的对象和值对在分别由谷歌或者亚马逊支持的两个不同框架之间不同。例如,两个平台都在JSON协议内表示用户和会话是否是新的;用于谷歌助手的特定键值对是“userid|Unique Number”和“type|New”,而用于Alexa的特定键是“userid|GUID”和“new|True”。平台检测哪个框架与发送请求消息的特定语音助手装置相关联,以确定应该怎样进一步处理请求消息。平台协调差异并将信息归一化为公共格式以用于附加处理。As an example, Amazon Alexa and Google Assistant each provide request messages represented in JSON to the platform's API layer for processing. The protocol used to represent the request message is generally the same, regardless of the framework that the voice assistant device conforms to, but the objects and value pairs included in the request message differ between the two different frameworks supported by Google or Amazon, respectively. For example, both platforms represent whether the user and session are new within the JSON protocol; the specific key-value pairs for Google Assistant are "userid|Unique Number" and "type|New", while the specific keys for Alexa are "userid|GUID" and "new|True". The platform detects which framework is associated with the specific voice assistant device sending the request message to determine how the request message should be further processed. The platform reconciles the differences and normalizes the information into a common format for additional processing.

商业逻辑层Business logic layer

商业逻辑层应用商业逻辑以处理平台的关键操作,该关键操作与将每一个输入请求的消息元素映射到可以和将处理请求的特定的适当模块和特征有关。在某些实现中,商业逻辑层使用存储为服务器中的数据库之一的图数据库86来通过图遍历执行映射。在有些情况下,图遍历确定哪个模块和特征最可能匹配(例如,最可能处理和制定对给定请求适当的响应)给定请求。图数据库包括表示由边缘连接的节点的图的数据。图遍历是基于项关系在图数据库内寻找图案的搜索技术。图案表示连接一个或多个节点的图内的边缘。例如,具有文字短语“stop(停止)”作为消息元素之一的来自亚马逊alexa装置的请求消息将基于Alexa的边缘值和停止指令映射到图的“停止”特征节点。基于图遍历的结果,商业逻辑层处理已经以抽象通用协议表示的请求,以识别通用语音应用平台的特征和模块层76的最可能匹配的模块和特征。The business logic layer applies business logic to handle the key operations of the platform, which are related to mapping the message elements of each input request to specific appropriate modules and features that can and will process the request. In some implementations, the business logic layer uses a graph database 86 stored as one of the databases in the server to perform mapping through a graph traversal. In some cases, the graph traversal determines which module and feature is most likely to match (e.g., most likely to process and formulate a response appropriate to a given request) a given request. The graph database includes data representing a graph of nodes connected by edges. Graph traversal is a search technique for finding patterns in a graph database based on item relationships. The pattern represents an edge in a graph connecting one or more nodes. For example, a request message from an Amazon Alexa device with the literal phrase "stop" as one of the message elements will be mapped to the "stop" feature node of the graph based on Alexa's edge value and stop instruction. Based on the results of the graph traversal, the business logic layer processes the request that has been represented in an abstract general protocol to identify the most likely matching modules and features of the feature and module layer 76 of the general voice application platform.

特征和模块层Feature and module layers

特征和模块层内的特征81表示作为在语音API层和商业逻辑层中处理请求的结果而调用的功能或者处理83。例如,返回事件列表的功能期望从请求消息解析和从商业逻辑层接收到的消息元素,以表示事件的日期或者比如篮球比赛的事件的类型或者这两者。平台内的特征根据要处理的请求的类型被分段。例如,关于事件的信息的所有请求可以由事件特征85的功能处理,而用于最新通用更新的所有请求由日常更新特征87的功能处理。通过请求类型的特征分段提供用于处理请求和安放响应的结构化格式。每个特征的功能和由它们使用的内容项可以由控制平台的一方或者参与者用户或者这两者存储和管理。因为特征和模块紧密地关于和使用内容项,所以特征和模块层是参与者用户可以通过平台的用户界面中的名称观看并与其直接工作的两层之一(另一个是CMS层)。Features 81 within the feature and module layer represent functions or processes 83 that are called as a result of processing requests in the voice API layer and the business logic layer. For example, a function that returns a list of events expects message elements that are parsed from the request message and received from the business logic layer to represent the date of the event or the type of event, such as a basketball game, or both. Features within the platform are segmented according to the type of request to be processed. For example, all requests for information about events can be processed by the function of the event feature 85, while all requests for the latest general updates are processed by the function of the daily update feature 87. Feature segmentation by request type provides a structured format for processing requests and placing responses. The functions of each feature and the content items used by them can be stored and managed by the party controlling the platform or the participant user or both. Because features and modules are closely related to and use content items, the feature and module layer is one of the two layers (the other is the CMS layer) that participant users can view and work directly with by name in the user interface of the platform.

模块89提供用于参考或者打包由一组平台参与者共同使用或者与一组平台参与者相关的特征81的集合91,或者与给定使用情况相关的特征的集合的结构,该组平台参与者例如是属于给定产业的公司。多于一个模块可以参考给定特征或者在它的打包中包括给定特征。因为特征参考和使用内容项,所以对模块和模块的特征的参考相对于对特定内容项(例如,由平台管理以用于由平台参与者使用的预先存储的样本或者标准内容项23)的参考。例如,用于高等教育领域的模块和用于健康产业的模块两者都可以包括对相同事件特征的参考(例如,打包),但是特征的使用将基于当分别由两个不同模块中的两个不同参考调用特征时加载的内容项(例如,样本或者标准内容的项或者定制内容项)而不同。高等教育事件模块可以制定与特定运动队或者学校部门有关的响应;健康事件模块可以制定用于城市或者办公室的动作的响应。Module 89 provides a structure for referencing or packaging a set 91 of features 81 commonly used by or associated with a set of platform participants, or a set of features associated with a given use case, the set of platform participants being, for example, companies belonging to a given industry. More than one module may reference a given feature or include a given feature in its packaging. Because features reference and use content items, references to modules and features of modules are relative to references to specific content items (e.g., pre-stored sample or standard content items 23 managed by the platform for use by platform participants). For example, a module for the field of higher education and a module for the health industry may both include references to the same event feature (e.g., packaging), but the use of the feature will be based on the content items (e.g., items of sample or standard content or customized content items) loaded when the feature is called by two different references in two different modules, respectively. A higher education event module may formulate responses related to a specific sports team or school department; a health event module may formulate responses for actions of a city or office.

如之后讨论的,通用语音应用平台包括通过相对于搜索索引执行内容搜索来当调用特征时检索特定内容项的搜索引擎。例如,陈述“在校园里下星期二发生什么”的进入的请求消息由相对于索引的事件特征搜索处理,以返回具有该星期二的日期值的数据库中的事件的列表。As discussed later, the general voice application platform includes a search engine that retrieves specific content items when a feature is invoked by performing a content search against a search index. For example, an incoming request message stating "what is happening on campus next Tuesday" is processed by searching the event feature against the index to return a list of events in the database that have a date value of that Tuesday.

CMS层CMS Layer

标准和定制内容项23通过展示CMS层78的特征和功能的平台用户界面的主要部分由参与者用户创建、存储和管理。CMS层还使参与者用户能够控制管理和访问权限。CMS层被设计用于对于非技术管理者足够容易使用。CMS层支持各种格式的内容项,包括:如.mp3的音频,如.mp4的视频,如.png的图像,原始文本和如SSML(语言合成标注语言)的文本,等等。对于互操作性,除支持来自特征和模块层76的请求之外,CMS层提供它自己的API90以支持来自外部应用的请求。例如,平台参与者可以重新打算CMS层内存储的内容项,以用于外部语音应用和用于其他分配信道,例如通过移动应用呈现。在后一情况下,移动应用可以通过API的使用检索CMS层内存储的内容项。Standard and custom content items 23 are created, stored, and managed by participant users through the main part of the platform user interface that displays the features and functions of the CMS layer 78. The CMS layer also enables participant users to control management and access rights. The CMS layer is designed to be easy enough to use for non-technical managers. The CMS layer supports content items in various formats, including: audio such as .mp3, video such as .mp4, images such as .png, raw text and text such as SSML (Speech Synthesis Markup Language), etc. For interoperability, in addition to supporting requests from the feature and module layer 76, the CMS layer provides its own API 90 to support requests from external applications. For example, platform participants can re-purpose content items stored in the CMS layer for external voice applications and for other distribution channels, such as presentation through mobile applications. In the latter case, the mobile application can retrieve content items stored in the CMS layer through the use of the API.

数据层Data Layer

数据层是由所有层、用户界面及平台的其他功能使用的数据的储存库。数据层采用各种存储机制92,比如图数据库101、文件存储103、搜索索引105以及关系和非关系数据库存储。数据层容纳用于至少以下用户、机制和使用的数据:参与者用户、系统允许、用于模块和特征的映射、与特征和由特征制定的响应有关的内容项和用于分析的使用数据,等等。The data layer is a repository for data used by all layers, user interfaces, and other features of the platform. The data layer employs various storage mechanisms 92, such as graph databases 101, file storage 103, search indexes 105, and relational and non-relational database storage. The data layer holds data for at least the following users, mechanisms, and uses: participant users, system permissions, mappings for modules and features, content items related to features and responses formulated by features, and usage data for analysis, among others.

本技术和平台的重要方面Important aspects of this technology and platform

在本技术和平台的重要方面当中,包括它的层和用户界面是以下,其中一些早先已经提到。Among the important aspects of the present technology and platform, including its layers and user interfaces are the following, some of which have been mentioned earlier.

使用API层的用于各种语音助手装置的支持Support for various voice assistant devices using API layer

API层可以处理来自任何类型的语音助手装置的请求消息,包括属于或者符合一个或多个语音助手框架的任何语音助手装置,比如那些例如由亚马逊、谷歌、微软和苹果等提供的。将来开发的新的或者定制的语音助手装置、语音助手和语音助手框架可以以一致的方式容纳。因此,通过使用单个API层,可以容纳各种类型(框架)的语音助手装置而不需要用于各个框架的整体不同的代码库集合的开发。The API layer can handle request messages from any type of voice assistant device, including any voice assistant device that belongs to or conforms to one or more voice assistant frameworks, such as those provided by Amazon, Google, Microsoft, and Apple, etc. New or customized voice assistant devices, voice assistants, and voice assistant frameworks developed in the future can be accommodated in a consistent manner. Therefore, by using a single API layer, voice assistant devices of various types (frameworks) can be accommodated without the need for the development of a completely different set of code libraries for each framework.

映射句子结构到特征的图数据库技术Graph database technology that maps sentence structure to features

在平台(例如,在API层)接收到的请求消息携带典型地表示为松散结构化的句子模式的一部分的关于语音助手装置的用户的讲话的信息。平台的(且在某些实现中,平台的商业逻辑层的)重要功能是基于在松散结构化的句子模式中携带的信息,确定对于给定请求消息中包括的消息元素应该调用的正确或者最适当的或者相关或者有效的特征(我们有时将它们称为“适当特征”)。虽然图数据库技术典型地用于识别高度相关数据的大数据集合上的实体关系的模式匹配,但是这里的平台使用图数据库技术以识别用于相对于定义的功能的松散结构化的句子模式的模式匹配。例如,图数据库通常用于确定社交网络的大数据集合内的关系模式。由节点表示的个体可以具有与其他个体的几个关系以及图内表示的共享的兴趣。这里的平台平衡图数据库以将关于用户请求类型的模式与平台内的特征匹配。图使能与可管理的数据集合一起工作。The request message received on the platform (e.g., at the API layer) carries information about the speech of the user of the voice assistant device, typically expressed as part of a loosely structured sentence pattern. An important function of the platform (and in some implementations, the business logic layer of the platform) is to determine the correct or most appropriate or relevant or effective features (we sometimes refer to them as "appropriate features") that should be called for the message elements included in a given request message based on the information carried in the loosely structured sentence pattern. Although graph database technology is typically used to identify pattern matching of entity relationships on large data sets of highly related data, the platform here uses graph database technology to identify pattern matching for loosely structured sentence patterns relative to defined functions. For example, graph databases are often used to determine relationship patterns within large data sets of social networks. An individual represented by a node may have several relationships with other individuals and shared interests represented within the graph. The platform here leverages the graph database to match patterns about user request types with features within the platform. Graphs enable working with manageable data sets.

跨语音助手框架的分析Analysis of cross-voice assistant frameworks

平台可以捕获用于跨各种语音助手装置、语音助手和框架使用的语音应用的单个储存库(例如,数据层内的数据库)内的使用数据。使用存储的使用数据,平台可以执行分析并提供结果给参与者用户和平台参与者,例如,关于跨多个类型的装置或者多个框架的语音应用的总体性能的结果和关于特定语音应用的单独的请求和响应交互的性能的结果。在语音应用级,平台可以执行和累积、存储和提供覆盖量度的分析的结果,量度包括:语音应用下载的数目,语音应用会话的数目,独特应用会话的数目,平均应用会话的长度,接收到的最频繁的请求,成功地映射请求到特征的平均比率,和不能成功地映射请求到特征。The platform can capture usage data within a single repository (e.g., a database within a data layer) for voice applications used across various voice assistant devices, voice assistants, and frameworks. Using the stored usage data, the platform can perform analysis and provide results to participant users and platform participants, for example, results about the overall performance of voice applications across multiple types of devices or multiple frameworks and results about the performance of individual request and response interactions for specific voice applications. At the voice application level, the platform can perform and accumulate, store, and provide results of analysis covering metrics including: the number of voice application downloads, the number of voice application sessions, the number of unique application sessions, the length of the average application session, the most frequent requests received, the average rate of successfully mapping requests to features, and the failure to successfully map requests to features.

用于每个分析量度的使用数据可以通过语音助手、语音助手装置或者语音助手框架的类型、日期范围或者各种其他参数划分。Usage data for each analytical metric can be segmented by type of voice assistant, voice assistant device, or voice assistant framework, date range, or various other parameters.

API层和SDKAPI Layer and SDK

如早先解释的和如图3所示,语音助手装置98根据语音助手装置的本地协议将终端用户说出的请求99表示为结构化数据(请求消息)。本地协议可以由装置关联的框架确定。在有些情况下,根据应用于平台不支持的语音助手装置或者框架的类型的通用协议表示请求消息。As explained earlier and shown in FIG3 , the voice assistant device 98 represents the request 99 spoken by the end user as structured data (request message) according to the local protocol of the voice assistant device. The local protocol can be determined by the framework associated with the device. In some cases, the request message is represented according to a general protocol applicable to the type of voice assistant device or framework that is not supported by the platform.

为了API层(在图3中标识为语音体验API 110)能够处理根据特定协议表示的请求消息73,平台支持用于不同编程语言、语音助手装置和语音助手框架的SDK 112的集合。In order for the API layer (identified as Voice Experience API 110 in FIG. 3 ) to be able to process request messages 73 expressed according to a specific protocol, the platform supports a collection of SDKs 112 for different programming languages, voice assistant devices, and voice assistant frameworks.

SDK使所有类型的语音助手装置(符合任何框架)能够容易地访问API层。SDK向开发者或者其他平台参与者提供用于表示与平台的通信的期望格式(协议)。SDK包括使开发者能够定义用于以下的协议的特性的工具:授权和验证语音助手装置以允许它们以期望格式应用请求消息的方式访问API层,授权以平台登记的语音应用,将原始请求消息制定为符合可应用协议的数据结构以呈现给API层,根据目标语音助手装置期望的可应用协议将从API接收到的响应制定为适当的数据结构(响应消息),保证请求消息在更新展开之后应用于API的正确版本,和支持多个编程语言。The SDK enables all types of voice assistant devices (compliant with any framework) to easily access the API layer. The SDK provides developers or other platform participants with the expected format (protocol) for expressing communications with the platform. The SDK includes tools that enable developers to define the characteristics of the following protocols: authorizing and authenticating voice assistant devices to access the API layer in a manner that allows them to apply request messages in the expected format, authorizing voice applications registered with the platform, formulating the original request message into a data structure that conforms to the applicable protocol for presentation to the API layer, formulating the response received from the API into an appropriate data structure (response message) according to the applicable protocol expected by the target voice assistant device, ensuring that the request message is applied to the correct version of the API after the update is deployed, and supporting multiple programming languages.

平台SDK可以支持用于创建用于各种类型的语音助手装置和框架的技巧、动作、扩展和语音应用的公共编程语言,比如JavaScript和TypeScript、C#、Java和Kotlin、Swift和Go等等。The platform SDK can support common programming languages for creating skills, actions, extensions, and voice apps for various types of voice assistant devices and frameworks, such as JavaScript and TypeScript, C#, Java and Kotlin, Swift and Go, and more.

对于语音助手装置(框架)的类型,可以直接访问API层以使开发者能够开发其他SDK或者将请求消息直接呈现给API层,对于语音助手装置(框架),处理通常不以SDK支持的编程语言之一写成。SDK可以是开源的,以通过表明满足各种框架的本地协议的需求和API层的需求的设计模式和代码架构,来使用支持的SDK以外的编程语言帮助支持开发社区的成员。For the type of voice assistant device (framework), the API layer can be directly accessed to enable developers to develop other SDKs or present request messages directly to the API layer. For voice assistant devices (frameworks), the processing is usually not written in one of the programming languages supported by the SDK. The SDK can be open source to help support members of the development community using programming languages other than the supported SDK by indicating design patterns and code architectures that meet the requirements of the local protocols of various frameworks and the requirements of the API layer.

一旦SDK将来自语音助手装置的请求消息转发到API层,API层将消息映射到平台的内部通用协议。API层还将由特征服务器115制定的响应113表示为符合由发送请求的语音助手装置接受的协议的响应消息117。SDK然后可以接受来自API层的制定的响应消息,验证响应消息,并通过网络将其转发到语音助手装置。语音助手装置然后向终端用户渲染或者呈现响应119(例如,响应中携带的内容项)。如果语音助手装置支持那些更丰富的格式,那么响应的呈现可以通过由语音助手装置的本地AI语音阅读响应中包括的文本,通过直接播放音频文件,通过呈现视频文件,等等,或者它们的组合。Once the SDK forwards the request message from the voice assistant device to the API layer, the API layer maps the message to the platform's internal general protocol. The API layer also represents the response 113 formulated by the feature server 115 as a response message 117 that conforms to the protocol accepted by the voice assistant device that sent the request. The SDK can then accept the formulated response message from the API layer, verify the response message, and forward it to the voice assistant device over the network. The voice assistant device then renders or presents the response 119 (e.g., the content items carried in the response) to the end user. If the voice assistant device supports those richer formats, the presentation of the response can be by reading the text included in the response by the local AI voice of the voice assistant device, by directly playing an audio file, by presenting a video file, etc., or a combination thereof.

例如,由亚马逊Alexa的SDK处理的请求消息被发送到API层以用于进一步处理。API层然后将处理的请求映射到归一化格式(例如,公共格式)。然后使用到特定特征的映射进一步处理归一化的制定的请求,如之后进一步解释的。从特征返回的响应然后被制定为以适当的框架格式的响应消息,并发送回亚马逊Alexa的SDK以用于呈现为说出的文本、音频、图像或者视频。For example, the request message processed by the SDK of Amazon Alexa is sent to the API layer for further processing. The API layer then maps the processed request to a normalized format (e.g., a common format). The normalized formulated request is then further processed using the mapping to a specific feature, as further explained later. The response returned from the feature is then formulated as a response message in an appropriate frame format and sent back to the SDK of Amazon Alexa for presentation as spoken text, audio, image, or video.

但是,SDK的可用性不限制开发者或者其他平台参与者仅使用由平台提供的特征来开发语音应用。例如,如果开发者想要提供不能由任何可用特征实现的响应行为,则开发者可以跳过使用SDK以发送输入请求到API层并简单地使用SDK以实现对请求的明确响应。该性能使开发者能够通过使用现有的技巧和语音应用体验迁移到平台而不必须从头开始。However, the availability of the SDK does not restrict developers or other platform participants to develop voice applications using only features provided by the platform. For example, if a developer wants to provide a response behavior that cannot be implemented by any available features, the developer can skip using the SDK to send input requests to the API layer and simply use the SDK to implement explicit responses to the requests. This capability enables developers to migrate to the platform by using existing skills and voice application experiences without having to start from scratch.

对于平台不支持的语音助手装置或者框架的类型,比如第三方聊天机器人、非主流语音助手等,开发者可以在平台的CMS层中登记不支持的类型的语音助手装置或者框架。这样做将生成语音助手装置或者框架的唯一标识符,以使能跟踪来自比其它的更好地工作的特定类型的语音助手装置或者框架的请求的类型的更好的分析,或者得到给定类型的语音助手装置或者框架相比于其它的使用数据。For types of voice assistant devices or frameworks that are not supported by the platform, such as third-party chatbots, non-mainstream voice assistants, etc., developers can register the unsupported type of voice assistant device or framework in the CMS layer of the platform. Doing so will generate a unique identifier for the voice assistant device or framework to enable better analysis of the types of requests from a specific type of voice assistant device or framework that works better than others, or to obtain usage data for a given type of voice assistant device or framework compared to others.

商业逻辑层图遍历Business logic layer graph traversal

为了支持不同语音助手装置,商业逻辑层处理由每一种语音助手装置或者框架提供的请求消息中包括的请求消息元素的模式。如图3所示,为能够处理来自各种类型的语音助手装置(语音助手框架)98的请求消息108的请求元素107,和将请求元素的模式映射回到适当的特征115,商业逻辑层使用请求元素的模式和平台支持的特征之间的关系的图数据库116的遍历117。图包括用于与每一语音助手装置或者框架对应的请求消息和关于平台支持的每一特征的信息的节点。图数据库可以在任何节点开始搜索以找到请求元素与使用的适当特征的匹配。In order to support different voice assistant devices, the business logic layer processes the pattern of the request message elements included in the request message provided by each voice assistant device or framework. As shown in Figure 3, in order to be able to process the request element 107 of the request message 108 from various types of voice assistant devices (voice assistant frameworks) 98, and map the pattern of the request element back to the appropriate feature 115, the business logic layer uses a traversal 117 of a graph database 116 of the relationship between the pattern of the request element and the features supported by the platform. The graph includes nodes for request messages corresponding to each voice assistant device or framework and information about each feature supported by the platform. The graph database can start searching at any node to find a match between the request element and the appropriate feature used.

为将请求消息和其请求元素与适当特征匹配的图数据库的遍历117至少包括以下步骤:API消耗、节点端点搜索、图遍历117和输出处理。Traversal 117 of the graph database to match the request message and its request elements with appropriate features includes at least the following steps: API consumption, node endpoint search, graph traversal 117 and output processing.

API消耗API consumption

找到适当特征以应用于制定对给定请求消息的响应的预备步骤是创建具有消耗来自与特定框架相关联的语音助手装置的本地请求消息的请求消息元素的唯一端点的用于商业逻辑层的RESTful API 110。RESTful API中的每一个唯一的端点知道从符合特定框架的语音助手装置接收到的消息请求中包括的请求元素的协议。例如,端点可以存在以消耗从亚马逊Alexa SDK112接收到的请求消息中包括的请求元素。API的端点的单独的集合消耗谷歌助手SDK 112以其请求消息发送的请求元素的类型。RESTful(代表性状态传输)是基于超文本传输协议(HTTP)平衡用于系统之间的通信的API的技术架构风格。A preliminary step to finding the appropriate features to apply to formulate a response to a given request message is to create a RESTful API 110 for the business logic layer with a unique endpoint that consumes request message elements from local request messages of a voice assistant device associated with a specific framework. Each unique endpoint in the RESTful API knows the protocol of the request elements included in the message request received from the voice assistant device that conforms to the specific framework. For example, an endpoint may exist to consume the request elements included in the request message received from the Amazon Alexa SDK 112. A separate set of endpoints of the API consumes the type of request elements sent by the Google Assistant SDK 112 with its request message. RESTful (Representational State Transfer) is a technical architectural style for APIs based on the Hypertext Transfer Protocol (HTTP) for communication between systems.

RESTful API的这些端点使能跟踪符合语音助手装置的每一框架的协议的请求元素,并提供用于请求元素的通用集的端点的通用集,以使得未登记的类型(不支持的框架)的语音助手装置或者其他应用也可以与平台支持的特征交互。These endpoints of the RESTful API enable tracking of request elements that conform to the protocol of each framework of the voice assistant device and provide a common set of endpoints for a common set of request elements so that voice assistant devices or other applications of unregistered types (unsupported frameworks) can also interact with features supported by the platform.

通过具有与各个不同语音助手框架和相应的语音助手装置相关联的理解的协议的集合,以及协议的通用集,系统可以搜索用于匹配的图数据库中的适当的节点集以找到制定对接收到的请求的响应的适当的特征。By having a set of understood protocols associated with various different voice assistant frameworks and corresponding voice assistant devices, as well as a common set of protocols, the system can search the appropriate set of nodes in the graph database for matching to find appropriate features to formulate a response to a received request.

节点端点搜索Node endpoint search

典型地,来自给定框架的语音助手装置的请求消息的请求元素可以分解为请求的通用类型与已知为空位的内部请求元素的关系。(空位是用于由终端用户以请求的形式传递的值的可选的占位符。空位和空位值的示例是US_City和Seattle。US_City是空位且Seattle是值。)基于这种结构,可以建造请求元素到特征的关系的图数据库。由这种图数据库捕获的关系可以包括关系的公共类型。Typically, the request element of a request message from a voice assistant device of a given framework can be decomposed into a relationship of a common type of request to an internal request element known as a slot. (A slot is an optional placeholder for a value passed by an end user in the form of a request. Examples of slots and slot values are US_City and Seattle. US_City is a slot and Seattle is a value.) Based on this structure, a graph database of the relationship of request elements to features can be built. The relationships captured by such a graph database can include common types of relationships.

如图4所示,消息元素(在某些上下文中我们称为意图)和特征之间的关系可以是简单如从关于特定特征140的语音助手(助手1)的类型接收到的消息元素142(意图1)的类型或者(图5)可以更复杂,例如,来自关于相同特征140的不同类型(即,框架)的语音助手装置的两个不同助手(助手1和助手2)的消息元素142。消息元素的示例类型可以是Alexa事件搜索,其将与图中的事件特征节点140共享边缘143,和Alexa事件位置搜索,其也将与事件特征节点140共享边缘145。用于到给定特征的给定消息元素的边缘的边缘描述符是“导向”;消息元素是导向子特征节点的父节点。As shown in FIG4, the relationship between message elements (which we refer to as intents in some contexts) and features can be as simple as the type of message element 142 (intent 1) received from the type of voice assistant (assistant 1) regarding a particular feature 140 or (FIG5) can be more complex, for example, message elements 142 from two different assistants (assistant 1 and assistant 2) of different types (i.e., frameworks) of voice assistant devices regarding the same feature 140. Example types of message elements can be Alexa event search, which would share edge 143 with event feature node 140 in the graph, and Alexa event location search, which would also share edge 145 with event feature node 140. The edge descriptor for the edge of a given message element to a given feature is "direction"; the message element is the parent node of the directed child feature node.

如图6所示,如果可以由源于特定类型153的语音助手装置的两个不同消息元素152、154共享空位150的类型和如果两个消息元素中的每一个也具有不与其他语音助手装置共享的它自己的空位类型156、158,则关系可能更复杂。继续关于事件特征的Alexa事件搜索和Alexa事件位置搜索的消息元素的示例,这两个不同消息元素152、154将具有内部(即,共享的)空位。某些空位150可以在两个不同消息元素之间共享而某些空位156、158可以不在两个不同消息元素之间共享。例如,以空位的日期类型和空位的位置名类型。消息元素类型Alexa事件搜索将包括日期和位置名空位类型,而Alexa事件位置搜索将仅包括位置名空位类型。用于到空位的消息元素的边缘描述符160是“包括”,因为消息元素包括一个空位或者多个空位。As shown in Figure 6, if the type of the slot 150 can be shared by two different message elements 152, 154 of a voice assistant device originating from a specific type 153 and if each of the two message elements also has its own slot type 156, 158 that is not shared with other voice assistant devices, the relationship may be more complex. Continuing with the example of the message elements of the Alexa event search and the Alexa event location search for event features, the two different message elements 152, 154 will have internal (i.e., shared) slots. Some slots 150 can be shared between two different message elements and some slots 156, 158 may not be shared between two different message elements. For example, with a date type for a slot and a location name type for a slot. The message element type Alexa event search will include date and location name slot types, while the Alexa event location search will only include location name slot types. The edge descriptor 160 for the message element to the slot is "included" because the message element includes one or more slots.

如图7所示,在更复杂的示例中,特征702也可以涉及来自不同类型的语音助手装置的多个类型的消息元素和它们包括的空位。在关于事件特征702的Alexa事件搜索类型的消息元素(意图1)的示例中,Alexa(助手1)以外的语音助手装置,比如谷歌助手(助手2)可以具有支持被称为谷歌事件701(意图1)的它自己的类似的消息元素的框架。则图中的谷歌事件节点701将引导边缘711共享到与Alexa事件搜索703和Alexa事件位置704搜索也共享边缘的相同事件特征702。As shown in Figure 7, in a more complex example, feature 702 may also involve multiple types of message elements from different types of voice assistant devices and the slots they include. In the example of a message element (intent 1) of the Alexa event search type regarding event feature 702, a voice assistant device other than Alexa (assistant 1), such as Google Assistant (assistant 2), may have a framework that supports its own similar message element called Google Event 701 (intent 1). The Google Event node 701 in the figure will then share a guide edge 711 to the same event feature 702 that also shares an edge with the Alexa event search 703 and Alexa event location 704 searches.

用于给定消息元素的节点可以具有导向多个不同特征的边缘。但是,为了这能够工作,必须有确定给定的实际消息元素导向哪个不同特征的方式。例如,如果存在在每一仅关于两个特征之一的用于两个不同特征的两个不同空位类型,则可以做出确定。A node for a given message element may have edges leading to multiple different features. However, in order for this to work, there must be a way to determine which different feature a given actual message element leads to. For example, if there are two different slot types for two different features, each with only one of the two features, then a determination can be made.

如图7所示,如果第一消息元素703关于特征702且具有与也涉及相同特征702的第二消息元素704共享的空位类型706,且如果第一消息元素具有不与第二消息元素共享的另一空位类型708,则第一消息元素703和共享具有相同空位706的另一消息元素704的特征702之间的关系709比第二消息元素704和特征702之间的关系711更强。以下相对于图遍历更详细地讨论怎样做出该决定。7 , if a first message element 703 is about a feature 702 and has a slot type 706 that is shared with a second message element 704 that also relates to the same feature 702, and if the first message element has another slot type 708 that is not shared with the second message element, then a relationship 709 between the first message element 703 and the feature 702 that shares the other message element 704 having the same slot 706 is stronger than a relationship 711 between the second message element 704 and the feature 702. How this determination is made is discussed in more detail below with respect to graph traversal.

例如,考虑两个平台支持的特征:事件特征和日常消息特征。这两个特征制定包括不同类型的内容项的响应消息。一个类型的内容项(用于事件)可以是包括日期、时间、位置、事件类型和描述的事件信息。其他类型的内容项(用于日常消息)可以是要根据调度广播到一群人的音频或者视频消息。存在可以涉及,即,与表示图的这两个特征的节点共享导向边缘的许多不同类型的请求消息元素。也有可以导向任一特征而不是两者的消息元素。两个特征可以在给定时间在语音应用中是有效的,所以从请求消息元素知道导向哪个特征的唯一方式是查看消息元素与两个特征中的每一个共享的空位。例如,Alexa的what's new(什么是新的)消息元素可以导向事件特征或者日常消息特征。但是,Alexa的what's new消息元素可以包括比如日期和人名空位的多个空位类型。日期空位也与两个特征共享边缘,但是人名空位仅涉及日常消息特征。因此,如果接收到的请求消息中的消息元素是Alexa的what's new消息元素且请求消息包括人名空位,则请求消息和日常消息特征之间的关系比它与事件特征的关系更强。另一方面,如果在特征节点和一个意图节点之间比与另一意图节点存在更多空位关系且请求达到图而没有与填充的一个意图节点有关的空位,则特征节点到其他意图节点的其他关系更强。在相同示例内,如果接收到的请求包括Alexa的what'snew意图且仅具有填充的日期空位,则该意图可以导向事件特征。For example, consider two platform-supported features: an event feature and a daily message feature. These two features formulate response messages that include different types of content items. One type of content item (for events) can be event information including date, time, location, event type, and description. Other types of content items (for daily messages) can be audio or video messages to be broadcast to a group of people according to a schedule. There are many different types of request message elements that can be involved, i.e., share guiding edges with the nodes representing these two features of the graph. There are also message elements that can be directed to either feature but not both. Two features can be valid in a voice application at a given time, so the only way to know which feature to direct from a request message element is to look at the slots that the message element shares with each of the two features. For example, Alexa's what's new message element can be directed to an event feature or a daily message feature. However, Alexa's what's new message element can include multiple slot types such as date and name slots. The date slot also shares edges with two features, but the name slot only involves the daily message feature. Thus, if the message element in the received request message is an Alexa's what's new message element and the request message includes a name slot, the relationship between the request message and the daily message feature is stronger than its relationship with the event feature. On the other hand, if there are more slot relationships between a feature node and an intent node than with another intent node and the request reaches the graph without a slot related to one intent node filled, the other relationships of the feature node to the other intent nodes are stronger. Within the same example, if the received request includes Alexa's what's new intent and only has a filled date slot, the intent can lead to the event feature.

使用这些类型的关系,图数据库可以包括任何简单或者复杂的节点、边缘、特征和空位的组合。一旦通过API层接收到请求消息,则处理将开始于匹配消息元素的类型的图中的节点,且将使用消息元素中包括的空位类型以确定到最可应用的特征的最佳路径。Using these types of relationships, the graph database can include any simple or complex combination of nodes, edges, features, and slots. Once a request message is received through the API layer, processing will begin at the node in the graph that matches the type of the message element, and the slot type included in the message element will be used to determine the best path to the most applicable feature.

图遍历Graph traversal

为找到匹配消息元素的最适当的特征,遍历始于端点搜索步骤中找到的节点和包括的空位节点。商业逻辑层的逻辑使用图来找到由到该节点的边缘直接连接的所有特征。如图8所示,在消息元素(意图1)和特征之间的简单关系的情况下,遍历的路径是沿着单个边缘到单个特征192的一跳190,然后选择其以制定响应消息元素。To find the most appropriate feature that matches the message element, the traversal begins at the node found in the endpoint search step and includes the empty node. The logic of the business logic layer uses the graph to find all features that are directly connected by edges to this node. As shown in Figure 8, in the case of a simple relationship between a message element (Intent 1) and a feature, the traversal path is a hop 190 along a single edge to a single feature 192, which is then selected to formulate a response message element.

对于其中消息元素具有多个相关特征的更复杂的图关系,搜索处理必须考虑与消息元素有关的空位。如果消息元素仅包括与给定特征类型有关的空位,则遍历路径将继续到包括最多空位关系的最强关系。在以上共享Alexa的what's new消息元素的事件和日常消息特征的示例中,如果请求消息包括该消息元素以及日期空位和人名空位,则遍历路径将导向日常消息特征,其是与人名和日期空位共享边缘的唯一特征节点而事件特征仅与日期空位共享边缘。For more complex graph relationships where a message element has multiple related features, the search process must consider the slots associated with the message element. If a message element only includes slots associated with a given feature type, the traversal path will continue to the strongest relationship that includes the most slot relationships. In the above example of the event and daily message features sharing Alexa's what's new message element, if the request message includes that message element as well as a date slot and a name slot, the traversal path will lead to the daily message feature, which is the only feature node that shares edges with the name and date slots while the event feature only shares edges with the date slot.

消息元素可以涉及其他消息元素,即使相关的消息元素包括用于不同类型的语音助手装置的消息元素的类型的数据。将这些关系关联在一起可以产生到所选特征的更强路径。遍历逻辑的目标是确定到该特征的最短路径。如果两个特征是距离消息元素节点的相同数目的边缘(即,具有到消息元素节点的相同路线长度),则遍历的路径必须导向具有最强关系的特征,即,已知为具有最多连接的短边缘的特征。例如,代替导向事件特征,Alexa事件搜索消息元素可以与谷歌事件消息元素共享边缘。然后,谷歌事件消息元素可以具有到事件特征的导向边缘。用于Alexa事件搜索消息元素和谷歌事件消息元素之间的关系的边缘描述符将称为“关于”。则从Alexa事件搜索到事件特征的遍历路径是:Alexa事件搜索关于导向事件的谷歌事件。Message elements can relate to other message elements, even if the related message elements include data of the type of message elements for different types of voice assistant devices. Associating these relationships together can produce a stronger path to the selected feature. The goal of the traversal logic is to determine the shortest path to the feature. If two features are the same number of edges from the message element node (i.e., have the same route length to the message element node), the traversed path must lead to the feature with the strongest relationship, i.e., the feature known to have the most connected short edges. For example, instead of guiding the event feature, the Alexa event search message element can share an edge with the Google event message element. Then, the Google event message element can have a guiding edge to the event feature. The edge descriptor for the relationship between the Alexa event search message element and the Google event message element will be called "about". Then the traversal path from the Alexa event search to the event feature is: Alexa event search about the Google event of the guiding event.

复杂的图遍历Complex graph traversal

如图9所示,更复杂的示例图300包括在来自多个类型的语音助手装置(与各种框架对应)和多个特征的请求消息中携带的多个消息元素。几个消息元素每一个可以映射到多个特征和相关回多个特征。取决于基于请求消息的消息元素填充哪一空位值(即,具有值),从Alexa的讲话者搜索意图节点302的遍历可以终止于FAQ特征节点304或者在事件特征节点306。As shown in Figure 9, a more complex example diagram 300 includes multiple message elements carried in request messages from multiple types of voice assistant devices (corresponding to various frameworks) and multiple features. Several message elements can each be mapped to multiple features and related back to multiple features. Depending on which vacancy value is filled (i.e., has a value) based on the message element of the request message, the traversal from Alexa's speaker search intent node 302 can terminate at the FAQ feature node 304 or at the event feature node 306.

例如,如果消息元素表示为Alexa的讲话者搜索意图302并填充人名空位308值,则遍历将遵循到Alexa的人信息意图310且然后到FAQ特征304的路径314。For example, if the message element represents the speaker search intent 302 for Alexa and fills in the person name slot 308 value, the traversal will follow the path 314 to the person information intent 310 for Alexa and then to the FAQ feature 304.

另一方面,如果消息元素表示为Alexa的讲话者搜索意图302,但是代替填充人名空位值,填充事件类型空位,则遍历将遵循通过与其共享边缘的Alexa的事件位置搜索意图316和Alexa的事件搜索意图318的道路到事件特征306的路径312。On the other hand, if the message element is represented as Alexa's speaker search intent 302, but instead of filling the person name slot value, the event type slot is filled, the traversal will follow the path 312 to the event feature 306 through the Alexa's event location search intent 316 and the Alexa's event search intent 318 with which it shares an edge.

类似的遍历路径分析应用于从谷歌事件320、谷歌位置信息322、谷歌通用搜索324和Alexa通用搜索326消息元素到事件特征306和FAQ特征304的遍历路径。Similar traversal path analysis is applied to the traversal paths from the Google event 320 , Google location information 322 , Google universal search 324 , and Alexa universal search 326 message elements to the event feature 306 and the FAQ feature 304 .

注意到两个特征304和306中的每一个可以达到和制定响应消息元素以响应于从符合两个不同框架(亚马逊和谷歌的)的语音助手装置接收到的请求消息元素。Note that each of the two features 304 and 306 can reach and formulate a response message element in response to a request message element received from a voice assistant device that conforms to two different frameworks (Amazon's and Google's).

输出处理Output Processing

在通过图遍历找到适当的匹配特征之后,商业逻辑层接下来制定用于消息元素的数据结构以适配特征。一旦以用于特征的可用方式制定数据结构,则平台将使用结构化数据调用特征,制定符合适当的协议的正式响应消息,并将从特征导出的响应消息发送到发起的语音助手装置。该处理可以包括由特征返回的数据结构到正式响应消息的逆向映射。After finding the appropriate matching feature through the graph traversal, the business logic layer next formulates the data structure for the message element to adapt the feature. Once the data structure is formulated in a usable manner for the feature, the platform will call the feature with the structured data, formulate a formal response message that conforms to the appropriate protocol, and send the response message derived from the feature to the initiating voice assistant device. This processing can include the reverse mapping of the data structure returned by the feature to the formal response message.

管理未找到的节点和置信度计分Managing unfound nodes and confidence scoring

如果遍历路径应该在其开始的适当的节点的搜索证明没有节点匹配接收到的请求消息的消息元素,则平台将通过API层返回请求不是有效的或不支持的响应消息到发起的语音助手装置。If the search for the appropriate node at which the traversal path should begin proves that no node matches the message element of the received request message, the platform will return a response message that the request is not valid or not supported to the initiating voice assistant device through the API layer.

除简单的未找到的情况之外,从初始消息元素到适当的特征的边缘的数目可能对于遍历的路径过多,而不能逻辑地考虑为已经达到特征的适当的选择。要遍历以达到特征的边缘的数目可以被作为遍历路径的所谓的“置信度分数”。可配置置信度分数的阈值,超出其产生的特征将不考虑为适当的选择且请求将考虑为坏的或者不支持的。例如,如果置信度分数阈值设置为10个边缘,则需要仅一个边缘的遍历的消息元素具有100%的置信度分数,五个边缘的遍历可以具有50%的置信度分数,且十个边缘的遍历可以具有0%的置信度分数。超过或者等于置信度阈值的任何请求将考虑为无效的。Beyond the simple not-found case, the number of edges from the initial message element to the appropriate feature may be too many for the traversed path to be logically considered an appropriate choice for having reached the feature. The number of edges to be traversed to reach a feature can be taken as a so-called "confidence score" for the traversal path. A threshold for the confidence score can be configured, beyond which the resulting feature will not be considered an appropriate choice and the request will be considered bad or unsupported. For example, if the confidence score threshold is set to 10 edges, a message element requiring a traversal of only one edge may have a confidence score of 100%, a traversal of five edges may have a confidence score of 50%, and a traversal of ten edges may have a confidence score of 0%. Any request exceeding or equal to the confidence threshold will be considered invalid.

特征和模块层Feature and module layers

平台支持可以制定对请求消息的响应的特征,且这样帮助与语音助手装置交互的终端用户。事实上,终端用户可以通过讲话触发特征以制定响应,讲话由语音助手装置的自然语言处理器解释为表示终端用户的意图的消息元素。例如,意图可以是具有回答的问题或者具有执行的动作,比如打开灯。消息元素以请求消息发送到API层用于由商业逻辑层映射到特定特征。特征处理该意图并生成响应,如早先解释的。The platform supports features that can formulate responses to request messages, and in this way helps end users interacting with voice assistant devices. In fact, the end user can trigger the feature to formulate a response by speaking, and the speech is interpreted by the natural language processor of the voice assistant device as a message element that represents the end user's intention. For example, the intention can be a question with an answer or an action with execution, such as turning on a light. The message element is sent to the API layer in a request message for mapping to a specific feature by the business logic layer. The feature processes the intent and generates a response, as explained earlier.

特征是可以执行各种动作中的一个或多个的一个或多个功能方法的集合,各种动作比如检索数据、发送数据、调用其他功能方法和制定对请求消息的响应,以返回到发起的语音助手装置。A feature is a collection of one or more functional methods that can perform one or more of a variety of actions, such as retrieving data, sending data, calling other functional methods, and formulating responses to request messages to be returned to the initiating voice assistant device.

这种特征的示例是早先提到的事件特征。用户可以向语音助手装置讲话以问问题,比如“明天在西雅图办公室有任何健康事件吗?”。该问题作为请求消息中的消息元素(意图)从语音助手装置发送到平台。在平台,事件特征解析消息元素的词及其他参数,并在有些情况下基于词及其他参数到数据库查询的直接映射或者基于商业逻辑,使用解析的词及其他参数来从平台数据库(或者从到定制服务器的网络服务通话)检索实际事件的列表。An example of such a feature is the events feature mentioned earlier. A user can speak to a voice assistant device to ask a question, such as "Are there any health events in the Seattle office tomorrow?" The question is sent from the voice assistant device to the platform as a message element (intent) in a request message. At the platform, the event feature parses the words and other parameters of the message element and uses the parsed words and other parameters to retrieve a list of actual events from the platform database (or from a web service call to a custom server) based on a direct mapping of the words and other parameters to a database query or based on business logic in some cases.

每一特征利用大量的数据输入和定制商业逻辑以生成响应。就先前讨论的事件特征示例而言,事件特征可以配置为期望具有用于任意数目的占位符参数(例如,空位)的值的消息元素(例如,问题)。事件特征解析该问题以提取占位符参数值以用于该问题的进一步处理。该处理可以相对于搜索索引、数据库、定制商业逻辑或者定制服务器应用解析的参数值以获得特性化对该问题的一个或多个回答的参数的一个或多个值。由事件特征制定的响应可以使用包括文本、图像、视频或者音频中的一个或多个的内容项的组合表示对该问题的回答。内容项包括为要返回到发起的语音助手装置的制定的响应消息中的消息元素。基于制定的响应消息中包括的消息元素,在语音助手装置的语音助手可以说出文本响应或者与图像一起播放音频或者视频剪辑(如果装置支持图像和视频)。Each feature utilizes a large amount of data input and custom business logic to generate a response. With respect to the event feature example discussed previously, the event feature can be configured as a message element (e.g., a question) that expects a value for any number of placeholder parameters (e.g., a vacancy). The event feature parses the question to extract the placeholder parameter value for further processing of the question. The processing can be relative to the parameter value parsed by the search index, database, custom business logic, or custom server application to obtain one or more values of the parameter that characterizes one or more answers to the question. The response formulated by the event feature can use a combination of one or more content items including text, images, videos, or audio to represent the answer to the question. The content item includes a message element in a formulated response message to be returned to the initiated voice assistant device. Based on the message element included in the formulated response message, the voice assistant in the voice assistant device can speak a text response or play an audio or video clip with an image (if the device supports images and videos).

由特征支持的执行模式例如使事件特征能够使用(由执行模式表示的)相同方法和处理,来处理请求消息的各种不同的消息元素。例如,终端用户可以问“接下来足球队什么时间比赛”或者“在TD花园发生了什么?”,且请求消息的相应的消息元素可以通过事件特征的相同执行模式处理。事件特征寻找事件类型或者时间帧的模式以搜索相应的项。在以上示例中,事件类型将值“足球队”和“TD花园”等同于事件类型和地点。在终端用户的问题中的词“接下来”暗示搜索未来事件。陈述“在TD花园发生什么”不包括时间帧且特征通过默认为关于未来事件的问题来处理陈述。The execution modes supported by the feature, for example, enable the event feature to use the same methods and processing (represented by the execution mode) to handle various different message elements of the request message. For example, an end user can ask "What time does the football team play next" or "What's happening at TD Garden?" and the corresponding message elements of the request message can be processed by the same execution mode of the event feature. The event feature looks for patterns of event types or time frames to search for corresponding items. In the above example, the event type equates the values "football team" and "TD Garden" to event type and location. The word "next" in the end user's question implies a search for future events. The statement "What's happening at TD Garden" does not include a time frame and the feature handles the statement by defaulting to a question about a future event.

另外,给定特征可以支持产业特定使用。为此原因,平台支持模块,每个模块打包包括用于参与者用户的执行模式和内容项(比如样本内容项)的一个或多个特征。给定模块中打包的特征通常将基于产业(或者某些其它逻辑基础)彼此具有关系。在某些实现中,在平台的代码栈内,模块被表示为参考特定特征和内容项的容器。如通过平台的用户界面展示给参与者用户的,模块包括创建、管理、更新和实现终端用户的语音体验需要的特征和内容项。In addition, given features can support industry-specific uses. For this reason, the platform supports modules, each of which is packaged to include one or more features of the execution mode and content items (such as sample content items) for participant users. The features packaged in a given module will generally have relationships with each other based on industry (or some other logical basis). In some implementations, within the code stack of the platform, modules are represented as containers with reference to specific features and content items. As shown to participant users through the user interface of the platform, modules include features and content items that create, manage, update and implement the voice experience needs of end users.

特征处理Feature processing

由特征执行的方法的示例是事件处理器和FAQ处理器。用户可以问语音助手装置问题,比如“明天在西雅图办公室有任何健康事件吗?”。FAQ特征解析相应的请求消息中的消息元素,并基于它们使用数据库、定制商业逻辑或者来自定制网络服务通话的响应检索事件的列表。Examples of methods performed by features are event handlers and FAQ handlers. A user can ask a voice assistant device questions such as "Are there any health events tomorrow in the Seattle office?" The FAQ feature parses the message elements in the corresponding request message and retrieves a list of events based on them using a database, custom business logic, or responses from a custom web service call.

由商业逻辑层使用以处理请求消息的消息元素的商业逻辑的分解落入三个主要步骤:特征位置搜索和发现、特征服务器请求和响应处理。The decomposition of the business logic of the message elements used by the business logic layer to process request messages falls into three main steps: feature location search and discovery, feature server request, and response processing.

在该处理的结尾,响应消息发送到发起的语音助手装置。At the end of the process, a response message is sent to the initiating voice assistant device.

特征位置发现Feature location discovery

如图10所示,当语音体验服务器110从语音助手装置521接收请求消息并解析请求消息中的消息元素时,服务器发送用于图遍历的请求523。一旦对于支持类型的语音助手装置已经遍历501图,则特征和模块层知道由请求消息的消息元素表示的特征527的类型。特征类型可以由唯一标识符,比如GUID、UUID或者关键词表示。通过该唯一ID,特征和模块层可以搜索502特征数据库504以找到定义特征的全部信息(包括执行模式及其他信息)。一旦特征和模块层具有关于特征的信息,则它可以找到给定语音应用在哪里登记该特征。登记或者关于该特征的元数据可能在服务器505上存在,可能在内部、平台的管理服务器或者由平台参与者控制的定制服务器上。这些服务器中的每一个可以从平台独立地度量以适当地处理它需要与任何其他特征分开地处理的查找请求的波动。As shown in Figure 10, when the voice experience server 110 receives a request message from a voice assistant device 521 and parses the message elements in the request message, the server sends a request 523 for graph traversal. Once the 501 graph has been traversed for the voice assistant device of the supported type, the feature and module layer knows the type of feature 527 represented by the message element of the request message. The feature type can be represented by a unique identifier, such as a GUID, UUID, or keyword. Through this unique ID, the feature and module layer can search 502 the feature database 504 to find all the information (including execution mode and other information) that defines the feature. Once the feature and module layer has information about the feature, it can find where the given voice application registers the feature. The registration or metadata about the feature may exist on the server 505, which may be internal, the management server of the platform, or a custom server controlled by the platform participants. Each of these servers can be independently measured from the platform to properly handle the fluctuations of the search request that it needs to handle separately from any other feature.

例如,如果图501的遍历导致事件特征的选择,则该特征类型(在该情况下,特征类型“事件”)将具有唯一标识符,比如a592a403-16ff-469a-8e91-dec68f5513b5。使用该标识符,特征和层模块的处理将相对于特征管理数据库504,比如PostgreSQL数据库进行搜索。该数据库包括具有事件特征类型、有关的语音应用和该语音应用已经选择以用于事件特征的特征服务器位置的记录的表。该特征服务器位置记录包括用于服务器505的位置的URL,比如https://events-feature.voicify.com/api/eventSearch,以及特征服务器接受的期望的HTTP方法,比如HTTP GET。该特征服务器位置记录不是必须包括由平台管理的URL。该服务器位置可以通过实现定制特征是外部的,比如https://thirdpartywebsite.com/api/eventSearch。For example, if the traversal of graph 501 results in the selection of an event feature, the feature type (in this case, feature type "event") will have a unique identifier, such as a592a403-16ff-469a-8e91-dec68f5513b5. Using this identifier, the processing of the feature and layer modules will search against a feature management database 504, such as a PostgreSQL database. The database includes a table with records of the event feature type, the relevant voice application, and the feature server location that the voice application has selected for the event feature. The feature server location record includes a URL for the location of the server 505, such as https://events-feature.voicify.com/api/eventSearch, and the expected HTTP method accepted by the feature server, such as HTTP GET. The feature server location record does not necessarily include a URL managed by the platform. The server location can be external by implementing a custom feature, such as https://thirdpartywebsite.com/api/eventSearch.

一旦平台已经找到适当的特征服务器505,它发送用于特征服务器505的服务请求529以使用从请求消息的消息元素导出的参数执行特征类型并等待响应499。Once the platform has found an appropriate feature server 505 , it sends a service request 529 for the feature server 505 to execute the feature type using parameters derived from the message elements of the request message and awaits a response 499 .

特征服务器请求Feature Server Request

一旦找到特征服务器505,则通过创建包括HTTP报头和HTTP主体的HTTP请求,将服务请求发送到它,HTTP报头标识该请求来自于平台的特征和模块层,HTTP主体包括从来自语音助手装置的请求消息的消息元素解析的词和参数且根据相应的特征请求协议来表示。该服务请求然后例如通过使用来自消息元素的词和参数在特征服务器上处理以搜索匹配内容项。根据定义用于该特征的服务响应协议表示的搜索结果返回到特征和模块层。Once a feature server 505 is found, a service request is sent to it by creating an HTTP request including an HTTP header identifying that the request is from the feature and module layer of the platform and an HTTP body including words and parameters parsed from the message elements of the request message from the voice assistant device and expressed according to the corresponding feature request protocol. The service request is then processed on the feature server, for example, by using the words and parameters from the message elements to search for matching content items. The search results expressed according to the service response protocol defined for the feature are returned to the feature and module layer.

每一特征定义特征请求协议和特征响应协议。这些协议定义用于发送响应和请求到特征服务器和从特征服务器接收响应和请求的服务请求和服务响应的格式和结构。特征请求和特征响应协议定义刚性制定需求。图11A和图11B是特征请求协议的JSON版本的示例,且图12是特征响应协议的JSON版本的示例。通过定义严格的特征请求和特征响应协议,平台可以确信特征服务器将能够适当地处理每一特征请求,并提供平台的特征和模块层可以适当地处理的适当的特征响应。该结构也使内置于平台的定制特征服务器能够使开发者能够创建他们自己定制特征服务器以处理用于给定类型的特征的请求和响应。Each feature defines a feature request protocol and a feature response protocol. These protocols define the format and structure of service requests and service responses for sending responses and requests to and receiving responses and requests from feature servers. The feature request and feature response protocol definitions rigidly formulate requirements. Figures 11A and 11B are examples of the JSON version of the feature request protocol, and Figure 12 is an example of the JSON version of the feature response protocol. By defining strict feature request and feature response protocols, the platform can be sure that the feature server will be able to properly handle each feature request and provide appropriate feature responses that the platform's feature and module layers can properly handle. This structure also enables custom feature servers built into the platform to enable developers to create their own custom feature servers to handle requests and responses for features of a given type.

特征请求协议的一般结构包括关于特征的信息,特征是服务请求的主题、服务请求的内容和关于用于遍历图以找到特征的来自语音助手装置的消息请求中包括的消息元素的信息。该结构也使由平台的主人管理或者以平台参与者名义创建为定制特征服务器的特征服务器能够将请求和响应处理为它们由或者从语音助手装置自然地处理的。这使定制和平台服务器能够实现每一类型的语音助手装置的框架API的完全能力。The general structure of the feature request protocol includes information about features, which are the subject of the service request, the content of the service request, and information about the message elements included in the message request from the voice assistant device used to traverse the graph to find the feature. This structure also enables feature servers managed by the owner of the platform or created as custom feature servers on behalf of platform participants to process requests and responses as they are naturally processed by or from voice assistant devices. This enables custom and platform servers to implement the full capabilities of the framework API for each type of voice assistant device.

例如,当发送服务请求到事件特征服务器(无论平台中内部管理的或者在第三方服务器管理的)时,特征和模块层将发送具有以下列出的报头以及图11A和11B中的示例特征请求协议的HTTP主体的HTTP请求:-授权:1d91e3e1-f3de-4028-ba19-47bd4526ca94;-应用:2e1541dd-716f-4369-b22f-b9f6f1fa2c6d。For example, when sending a service request to an event feature server (whether managed internally in the platform or on a third-party server), the feature and module layer will send an HTTP request with the headers listed below and an HTTP body of the example feature request protocol in Figures 11A and 11B: - Authorization: 1d91e3e1-f3de-4028-ba19-47bd4526ca94; - Application: 2e1541dd-716f-4369-b22f-b9f6f1fa2c6d.

该-授权报头值是由语音应用和特征类型自动地生成且对语音应用和特征类型唯一的唯一标识符。该值可以由平台参与者重新生成以使特征服务器能够保证该请求不来自于恶意的第三方。-应用报头值是用于语音应用使特征服务器能够验证该请求来自于授权的语音应用的唯一标识符。The -Authorization header value is a unique identifier automatically generated by and unique to the voice application and feature type. This value can be regenerated by platform participants to enable the feature server to ensure that the request is not from a malicious third party. The -Application header value is a unique identifier for the voice application to enable the feature server to verify that the request is from an authorized voice application.

响应处理Response Processing

一旦特征服务器505已经结束处理特征服务请求,则它需要返回根据特征响应协议表示的数据。特征服务响应499包括关于由特征服务器找到的内容项的信息,且可能包括关于用于能够呈现更丰富的内容项的语音助手装置的富媒体内容项的信息。特征服务响应可以包括到例如图像、视频或者音频文件的文件位置的URL指针。特征服务响应中包括的数据由特征和模块层验证以保证与服务响应协议的一致性和保证数据包括有效信息。Once the feature server 505 has finished processing the feature service request, it needs to return data represented according to the feature response protocol. The feature service response 499 includes information about the content items found by the feature server, and may include information about rich media content items for voice assistant devices that can present richer content items. The feature service response may include a URL pointer to a file location such as an image, video, or audio file. The data included in the feature service response is verified by the feature and module layers to ensure consistency with the service response protocol and to ensure that the data includes valid information.

如果在特征服务响应的验证中存在错误或者如果初始特征服务请求超时或者无效,则错误响应消息发送到语音助手装置以用于由API层接收到的初始请求消息。If there is an error in the validation of the feature service response or if the initial feature service request times out or is invalid, an error response message is sent to the voice assistant device for the initial request message received by the API layer.

如果特征服务器返回通过验证的成功特征服务响应,则特征服务响应519由语音体验层110的特征和模块层处理以制定要发送到语音助手装置的响应消息。该处理涉及将特征服务响应映射到语音助手装置521的框架的协议,包括将媒体文件及其他内容项映射到适当的形式。如果语音助手装置支持比如视频的富媒体项格式,则该处理将优先富媒体项。否则,处理将回退到要由语音助手向终端用户说出或者读取的简单文本内容,例如,如果响应中没有包括富媒体。使用响应消息中包括的消息元素,发起的语音助手装置将能够向终端用户渲染或者呈现响应。如果初始请求消息来自于通用或者不支持的AI装置或者语音助手装置,则将返回包括来自特征服务响应的内容项的原始版本的通用响应消息,以使不支持的AI装置或者语音助手装置本身能够确定是否和如何使用或者渲染每一内容项。If the feature server returns a successful feature service response that passes verification, the feature service response 519 is processed by the feature and module layers of the voice experience layer 110 to formulate a response message to be sent to the voice assistant device. This processing involves mapping the feature service response to the protocol of the framework of the voice assistant device 521, including mapping media files and other content items to appropriate forms. If the voice assistant device supports rich media item formats such as videos, the processing will give priority to rich media items. Otherwise, the processing will fall back to simple text content to be spoken or read by the voice assistant to the end user, for example, if rich media is not included in the response. Using the message elements included in the response message, the initiating voice assistant device will be able to render or present the response to the end user. If the initial request message comes from a general or unsupported AI device or voice assistant device, a general response message including the original version of the content item from the feature service response will be returned so that the unsupported AI device or voice assistant device itself can determine whether and how to use or render each content item.

例如,如果发起请求的语音助手装置支持渲染不仅是语音的更丰富的内容,比如图像或者视频(如亚马逊Echo Show做的),则特征和模块层的响应制定处理将映射用于富媒体项的特征服务响应中包括的URL,并映射到符合语音助手装置的框架协议的消息响应的消息元素中的富媒体特性。某些特征可以使语音助手装置能够在向终端用户读取回答的同时呈现多个类型的媒体项,比如图像和文本。平台的商业逻辑层将知道支持的语音助手装置的配置以便于根据最优配置来制定响应消息。对于不支持富媒体项的语音助手装置,特征和模块层的默认行为将是将响应消息的消息元素制定为语音响应,使得语音助手装置说出响应消息中的发送到它的文本。For example, if the voice assistant device initiating the request supports rendering richer content than just voice, such as images or video (as the Amazon Echo Show does), the response formulation process at the feature and module layers will map the URL included in the feature service response for the rich media item to the rich media properties in the message element of the message response that conforms to the voice assistant device's framework protocol. Certain features may enable the voice assistant device to present multiple types of media items, such as images and text, while reading answers to the end user. The platform's business logic layer will know the configuration of the supported voice assistant devices so that the response message can be formulated according to the optimal configuration. For voice assistant devices that do not support rich media items, the default behavior at the feature and module layers will be to formulate the message element of the response message as a voice response, causing the voice assistant device to speak the text sent to it in the response message.

例如,如果请求消息是来自于支持图像和文本的比如Echo Show的语音助手装置,则提供给事件特征的特征服务响应可以如图13所示。图13的示例中示出的特征服务响应使文本响应中的结果能够被说出和在语音助手装置的卡区域示出,且还根据Alexa响应消息协议将图像URL映射到适当的卡图像URL。For example, if the request message is from a voice assistant device such as Echo Show that supports images and text, the feature service response provided to the event feature can be as shown in Figure 13. The feature service response shown in the example of Figure 13 enables the result in the text response to be spoken and shown in the card area of the voice assistant device, and also maps the image URL to the appropriate card image URL according to the Alexa response message protocol.

现在使用相同示例特征响应,但是假定做出请求的语音助手装置是Alexa EchoDot,其不支持视觉内容项的呈现。更简单地,Alexa响应协议可以是:Now using the same example feature response, but assuming that the voice assistant device making the request is an Alexa Echo Dot, which does not support the presentation of visual content items. More simply, the Alexa response protocol can be:

该示例仅将来自特征响应的文本映射到Alexa协议的outputSpeech特性的文本,其接着由Alexa Echo Dot说给用户。This example simply maps the text from the feature response to text for the outputSpeech property of the Alexa protocol, which is then spoken to the user by the Alexa Echo Dot.

特征内容搜索Feature content search

当特征处理作为图遍历的结果路由至其的请求消息的消息元素时,特征的处理必须搜索内容项以包括在响应中,如图14所示。特征服务器505负责找到和包括基于特征服务请求相关的内容项。在某些实现中,特征服务器搜索由其他平台参与者写作或者以其他方式控制的管理的内容项的搜索索引511内的内容项510。内容搜索索引511基于用于特征服务器查询的结构化的内容项提供有效率的储存库。在搜索结果中返回的内容项的标识是精确地匹配查询的任何一个内容项,或者基于搜索置信度很可能匹配,或者基于零返回的内容项或者返回的项的低置信度分数不包括内容项。When the feature processes the message element of the request message routed to it as a result of the graph traversal, the processing of the feature must search for content items to be included in the response, as shown in Figure 14. The feature server 505 is responsible for finding and including content items related to the feature service request. In some implementations, the feature server searches for content items 510 within a search index 511 of content items managed by other platform participants writing or otherwise controlled. The content search index 511 provides an efficient repository based on the structured content items for feature server queries. The identification of the content items returned in the search results is any one of the content items that exactly matches the query, or is likely to match based on the search confidence, or does not include the content item based on the content item returned by zero or the low confidence score of the returned item.

存在使特征服务器能够返回适当的内容项的两个关键方面:内容索引512和内容搜索531。内容索引和内容检索一起工作以创建可由特征服务器505搜索的内容数据库504中的内容项,以提供内容项给特征和模块层用于制定对语音助手装置的响应。There are two key aspects that enable the feature server to return appropriate content items: content indexing 512 and content searching 531. Content indexing and content retrieval work together to create content items in the content database 504 that can be searched by the feature server 505 to provide content items to the feature and module layers for formulating responses to the voice assistant device.

内容索引Content Index

如平台的数据库中存储的,每一内容项具有包括当放入弹性搜索索引511时可容易地搜索到的简单信息的某些字段和特性,比如文本内容、标识符、URL等。为了改进特征服务器的性能,可由特征处理达到的所有内容项应该添加到弹性搜索索引511。由特征使用的某些内容项可以具有在索引中作为更有价值的一个或多个特定特性,且权重可以在索引的字段中添加到那些特性。加权使弹性搜索索引能够以字段的权重的递减次序相对于字段优先化搜索。当相对于索引的搜索在给定内容项的不同字段上具有多个命中时,权重产生分数。As stored in the platform's database, each content item has certain fields and features that include simple information that can be easily searched when placed into the elastic search index 511, such as text content, identifiers, URLs, etc. In order to improve the performance of the feature server, all content items that can be reached by feature processing should be added to the elastic search index 511. Certain content items used by features may have one or more specific features that are more valuable in the index, and weights can be added to those features in the fields of the index. Weighting enables the elastic search index to prioritize searches relative to fields in descending order of the weight of the field. When a search relative to the index has multiple hits on different fields of a given content item, the weight produces a score.

例如,如果事件内容项具有以下字段,指示的权重值(按1-5的比例)可以与它们相关联:事件名-4,事件位置-2,事件开始日期/时间-5,事件结束日期/时间-1,事件细节-2和事件概要-2。For example, if an event content item has the following fields, indicated weight values (on a scale of 1-5) may be associated with them: event name - 4, event location - 2, event start date/time - 5, event end date/time - 1, event details - 2, and event summary - 2.

该权重将相对于事件的开始日期/时间和事件的名称而优先化搜索。因此,如果存在具有类似的描述但是不同的开始时间的两个事件,且请求包括搜索的特定日期,比如明天或者三月三日,则顶部结果将是具有匹配日期请求的开始日期和时间的事件内容项。如果存在同时发生的两个事件,则优先化搜索的下一字段是名称。例如,如果存在具有相同开始日期:5/2/2018 3:00PM的两个事件,但是一个具有名称“篮球比赛”且另一个是“曲棍球比赛”,则比如“五月二日的曲棍球比赛是什么时间?”的消息元素的搜索将找到具有名称曲棍球比赛的第二事件作为顶部结果,并代替篮球比赛事件将其返回。This weight will prioritize the search relative to the start date/time of the event and the name of the event. Therefore, if there are two events with similar descriptions but different start times, and the request includes a specific date for the search, such as tomorrow or March 3rd, the top result will be the event content item with the start date and time that matches the date request. If there are two events that occur at the same time, the next field to prioritize the search is the name. For example, if there are two events with the same start date: 5/2/2018 3:00PM, but one has the name "Basketball Game" and the other is "Hockey Game", then a search for a message element such as "What time is the hockey game on May 2nd?" will find the second event with the name Hockey Game as the top result and return it instead of the Basketball Game event.

当参与者用户使用内容管理系统513更新内容项时自动地从该弹性搜索索引添加、更新和去除内容项。如果参与者用户通过将其标记为从数据库中去除(或完全删除它)而删除内容项,则内容索引器处理512将从包括该内容项的每一弹性搜索索引去除该内容项。同样地,如果参与者用户更新内容项的特性或者添加新内容项,更新533那些项535或者添加到弹性搜索索引。索引也可以手动地填充或者复位。这样做将迫使内容索引器处理通过对于应该索引的内容项查询数据库504而重建索引,然后使用数据再合成索引和高速缓存。When a participant user updates a content item using the content management system 513, the content item is automatically added, updated, and removed from the elastic search index. If a participant user deletes a content item by marking it as removed from the database (or deleting it completely), the content indexer process 512 will remove the content item from each elastic search index that includes the content item. Similarly, if a participant user updates the characteristics of a content item or adds a new content item, those items 535 are updated 533 or added to the elastic search index. The index can also be manually filled or reset. Doing so will force the content indexer process to rebuild the index by querying the database 504 for the content items that should be indexed, and then use the data to resynthesize the index and cache.

例如,假定平台参与者添加用于具有以下特性和值的事件特征的新内容项:事件名:篮球比赛,事件位置:体育馆,事件开始日期/时间:五月二日,3:00PM,事件结束日期/时间:五月二日,5:30PM,事件细节:今年第三次Rams与Lions比赛,事件概要:票价始于$15且在1PM开门!购买一些商品来支持你的队伍!For example, assume that a platform participant adds a new content item for an event feature with the following properties and values: Event Name: Basketball Game, Event Location: Stadium, Event Start Date/Time: May 2, 3:00 PM, Event End Date/Time: May 2, 5:30 PM, Event Details: The third Rams-Lions game of the year, Event Summary: Tickets start at $15 and doors open at 1 PM! Buy some merchandise to support your team!

一旦参与者用户标记内容项为有效的或者公布事件,则内容项直接添加到弹性搜索索引且事件可用于以事件特征的名义由特征服务器505在搜索中找到。假定参与者用户回到CMS中的内容项并更新特性,比如:事件位置:在100大街的体育馆。更新处理将更新数据库中的内容项的记录且还更新弹性搜索索引中的内容项。假定从语音体验服务器110或者内容管理系统513到弹性搜索索引511发生可能导致去同步的断开,比如弹性搜索索引维护失败。然后当恢复连接时,弹性搜索索引将溢出,即,将去除索引中的所有内容项。一旦这样做,索引处理器512将在数据库504和弹性搜索索引511之间通信以重新添加所有适当的内容项。最后,如果参与者用户要从CMS去除篮球比赛事件,这事件将标记为在数据库中清除和从索引完全删除,以保证它将不由任何特征服务器找到。Once the participant user marks the content item as valid or publishes the event, the content item is directly added to the elastic search index and the event can be found in the search by the feature server 505 in the name of the event feature. Assume that the participant user returns to the content item in the CMS and updates the characteristics, such as: event location: gymnasium on 100 Street. The update process will update the record of the content item in the database and also update the content item in the elastic search index. Assume that a disconnection that may cause desynchronization occurs from the voice experience server 110 or the content management system 513 to the elastic search index 511, such as an elastic search index maintenance failure. Then when the connection is restored, the elastic search index will overflow, that is, all content items in the index will be removed. Once this is done, the index processor 512 will communicate between the database 504 and the elastic search index 511 to re-add all appropriate content items. Finally, if the participant user wants to remove the basketball game event from the CMS, this event will be marked as cleared in the database and completely deleted from the index to ensure that it will not be found by any feature server.

内容搜索Content Search

一旦内容项由内容索引器添加到数据库和弹性搜索索引,则该项准备好由特征服务器在搜索中找到。如果由于高速缓存和索引的有效溢出或者对于任何其他原因,索引不合成(不具有数据),则特征服务器505将后退到直接使用传统的模糊搜索技术514来查询内容数据库504。模糊搜索产生内容项的较低置信度的结果,但是保证当正在对系统做出更新时或者如果索引511变得崩溃可达到内容项。在某些实现中,内容数据库是包括在内容管理系统513中管理的信息并包括内容项的关系数据库504,内容管理系统513包括关于给定语音应用已经使能的特征的信息。Once a content item is added to the database and elasticsearch index by the content indexer, the item is ready to be found in a search by the feature server. If the index is not synthetic (has no data) due to effective overflow of the cache and index or for any other reason, the feature server 505 will fall back to querying the content database 504 directly using traditional fuzzy search techniques 514. Fuzzy search produces lower confidence results for content items, but guarantees that the content items can be reached when updates are being made to the system or if the index 511 becomes corrupted. In some implementations, the content database is a relational database 504 that includes information managed in a content management system 513 and includes content items, and the content management system 513 includes information about features that have been enabled for a given voice application.

当填充且可达到索引时,特征服务器将执行相对于索引的搜索。初级过滤器可以使能快速搜索,比如仅用于匹配特征服务器表示的特征类型的内容项的搜索。这强制给定特征服务器将不返回与另一特征相关联的内容项的规则。相对于索引的搜索将返回匹配搜索请求的结果的集合。如果没有匹配,则不能成功地处理请求消息的消息元素且适当的响应消息将从特征服务器返回到语音体验服务器,以解释特征服务器不确定以消息元素做什么。当在搜索中找到单个内容项时,也被称为确切匹配,则一个内容项将作为响应的消息元素返回到语音体验服务器。如果找到许多内容项匹配消息元素,则基于搜索的字段的权重具有最高分数的内容项将作为消息元素返回以包括在响应消息中。When the index is populated and reachable, the feature server will perform a search relative to the index. Primary filters can enable fast searches, such as searches only for content items that match the feature type represented by the feature server. This enforces the rule that a given feature server will not return content items associated with another feature. The search relative to the index will return a collection of results that match the search request. If there is no match, the message element of the request message cannot be successfully processed and an appropriate response message will be returned from the feature server to the voice experience server to explain that the feature server is not sure what to do with the message element. When a single content item is found in the search, also known as an exact match, one content item will be returned to the voice experience server as a message element of the response. If many content items are found to match the message element, the content item with the highest score based on the weight of the searched field will be returned as a message element to be included in the response message.

在以上涉及篮球比赛和曲棍球比赛事件的示例中,用于完美匹配的总的可能分数将是所有可索引字段的权重之和:16。如果由特征服务器处理的特征服务请求包括关于开始日期/时间和名称的信息且不包括别的,则最大可实现分数是9。如果搜索查询包括两个事件的相同开始时间和曲棍球比赛的名称,则篮球比赛的分数将是5且曲棍球比赛的分数将是9,且曲棍球比赛事件信息将作为消息元素返回以包括在要发送到语音助手装置的响应消息中。In the example above involving basketball game and hockey game events, the total possible score for a perfect match would be the sum of the weights of all indexable fields: 16. If the feature service request processed by the feature server includes information about the start date/time and the name and nothing else, the maximum achievable score is 9. If the search query includes the same start time of both events and the name of the hockey game, the score of the basketball game would be 5 and the score of the hockey game would be 9, and the hockey game event information would be returned as a message element to be included in the response message to be sent to the voice assistant device.

特征和模块定制Features and module customization

除平台的标准支持和管理的模块之外,平台使平台参与者能够创建定制模块。当建造定制模块时,参与者用户可以选择登记的特征类型以添加到模块。平台也使开发者能够创建在执行语音应用期间替换支持的特征服务器的定制特征服务器。In addition to the standard supported and managed modules of the platform, the platform enables platform participants to create custom modules. When building a custom module, participant users can select registered feature types to add to the module. The platform also enables developers to create custom feature servers that replace supported feature servers during the execution of voice applications.

存在定制在定制的上下文中检索和管理的内容项的方式的两个方面:定制模块和定制特征。There are two aspects of customizing the way content items are retrieved and managed in a customized context: custom modules and custom features.

定制模块是非技术性要素且不需要平台参与者的单独开发或者维护,而定制特征需要开发者创建和维护平台可以通信的网络服务器以使用定制模块和使得定制特征的执行。Custom modules are non-technical elements and do not require separate development or maintenance by platform participants, whereas custom features require developers to create and maintain a network server that the platform can communicate with to use the custom modules and enable execution of the custom features.

创建定制模块Creating a custom module

在高级别,模块508是特征509和那些特征内的上下文化的内容项510的集合,如图15所示。作为示例,平台可以预先配置为包括产业模块的集合,比如高等教育模块或者员工健康模块,如图16所示。当将任何这些模块添加到语音应用507时,平台可以以平台参与者506可以使用、更新或者去除以替代他们自己的内容项的样本(例如,标准)内容项517预先填充模块的特征516、541。作为示例,预先填充的(例如,标准或者样本)特征516、541可以包括常问的问题、快速轮询和调查。平台维护和管理预先填充的模块515;但是,平台参与者不限于这些预先填充的模块和它们的特征。如果平台参与者希望混合和匹配不同模块的特征,或者想要创建具有与现有模块使能的不同上下文的特征的集合,则平台参与者可以创建一个或多个定制模块,如图17所示。At a high level, a module 508 is a collection of features 509 and contextualized content items 510 within those features, as shown in FIG15 . As an example, the platform can be preconfigured to include a collection of industry modules, such as a higher education module or an employee health module, as shown in FIG16 . When any of these modules are added to a voice application 507 , the platform can pre-populate the module's features 516 , 541 with sample (e.g., standard) content items 517 that platform participants 506 can use, update, or remove in place of their own content items. As an example, the pre-populated (e.g., standard or sample) features 516 , 541 can include FAQs, quick polls, and surveys. The platform maintains and manages the pre-populated modules 515 ; however, platform participants are not limited to these pre-populated modules and their features. If a platform participant wishes to mix and match features of different modules, or wants to create a collection of features with different contexts enabled with existing modules, the platform participant can create one or more custom modules, as shown in FIG17 .

定制模块518必须给出它属于的语音应用的上下文内的唯一名称。平台用户也可以给予它们的模块描述以帮助巩固对于其特征和内容项创建的上下文。当开发者以唯一名称创建模块时,它在平台内登记。一旦平台参与者已经通过唯一名称创建模块,则拥有者可以开始向它添加特征。特征可以是预先存在的(例如,标准或者样本)平台支持的特征516或者定制特征520。如果添加的特征是预先存在的特征516,则拥有者然后可以开始将内容项添加到定制模块518内的特征。A custom module 518 must be given a unique name within the context of the voice application to which it belongs. Platform users may also give their module descriptions to help consolidate the context for their feature and content item creation. When a developer creates a module with a unique name, it is registered within the platform. Once a platform participant has created a module with a unique name, the owner can begin adding features to it. Features can be pre-existing (e.g., standard or sample) platform-supported features 516 or custom features 520. If the added feature is a pre-existing feature 516, the owner can then begin adding content items to the feature within the custom module 518.

除从头创建新定制模块之外,平台参与者也可以添加现有的(例如,标准或者样本)产业模块到语音应用,并通过代替或者除预先存在的特征之外,添加特征、去除特征或者使用定制特征来调整模块内的特征以形成定制模块519,如图18所示。如以预先存在的特征的情况那样,添加特征到产业模块将不填充特征内的内容项。例如,如果语音应用已经使用员工健康模块且参与者用户想要添加未包括或者先前去除的另一特征到模块,则参与者用户可以通过平台的用户界面观看还未添加的其余支持的特征类型且可以添加想要的特征到模块。参与者用户然后可以选择是否使用预先存在的特征实现或者登记来自第三方的定制特征或者参与者用户自己已经开发的定制特征。In addition to creating new custom modules from scratch, platform participants can also add existing (e.g., standard or sample) industry modules to voice applications and adjust features within the module by replacing or in addition to pre-existing features, adding features, removing features, or using custom features to form custom modules 519, as shown in Figure 18. As with pre-existing features, adding features to an industry module will not populate the content items within the feature. For example, if the voice application already uses an employee health module and a participant user wants to add another feature that is not included or previously removed to the module, the participant user can view the remaining supported feature types that have not been added through the platform's user interface and can add the desired features to the module. The participant user can then choose whether to use pre-existing features to implement or register custom features from a third party or custom features that the participant user has developed himself.

创建定制特征Creating Custom Features

平台特征由特征服务器和它表示的特征类型的组合实现。特征类型定义期望的特征请求协议、特征响应协议,且发送HTTP到的特征服务器的位置当特征类型被标识为在图遍历期间找到的适当的特征时调用。该结构应用两者到支持的、管理的特征和创建以扩展平台的定制特征。如果平台参与者具有在平台数据库以外存储的预先存在的内容项或者由另一系统管理的内容项,如果他们的安全标准使内容项不能够由比如平台的外部系统管理,或者如果他们想要增强或者改变平台的功能性或者行为,平台参与者可能想要这样做。Platform features are implemented by a combination of a feature server and the feature type it represents. The feature type defines the expected feature request protocol, feature response protocol, and the location of the feature server to which HTTP is sent when the feature type is identified as an appropriate feature found during a graph traversal. This structure applies both to supported, managed features and to custom features created to extend the platform. Platform participants may want to do this if they have pre-existing content items stored outside of the platform database or content items managed by another system, if their security standards prevent content items from being managed by systems external to the platform, or if they want to enhance or change the functionality or behavior of the platform.

如果平台参与者想要创建定制特征,则参与者可以创建可公开访问的网络服务器(作为定制特征服务器)。定制特征服务器具有HTTP端点,其接受HTTP主体中的根据协议表示的期望特征服务请求,并返回根据协议表示的期望特征服务响应。在某些实现中,该端点必须在限制的一段时间内返回特征服务响应以保证终端用户的体验不会被平台控制之外的慢性能而恶化。If a platform participant wants to create a custom feature, the participant can create a publicly accessible web server (called a custom feature server). The custom feature server has an HTTP endpoint that accepts a desired feature service request expressed in the HTTP body according to the protocol and returns the desired feature service response expressed in the protocol. In some implementations, the endpoint must return the feature service response within a limited period of time to ensure that the end user's experience is not degraded by slow performance outside the control of the platform.

定制特征服务器可以以任何方式使用来自特征服务请求的数据,只要返回期望特征服务响应。定制特征服务器可以使用来自语音助手装置的初始请求消息的消息元素,跟踪任何内部分析,分解请求消息的消息元素,并提供对发送请求消息的语音助手装置或者语音应用唯一的功能性。如图19所示,例如,如果平台参与者已经使用第三方服务管理它的事件信息且不想要迁移数据到平台,则参与者可以代替地开发定制事件特征服务器521以代替默认的(支持的)事件特征服务器555。但是,该定制事件特征服务器521必须接受根据与平台的事件特征服务器555相同的协议表示的特征服务请求,并返回根据与平台的服务器相同的输出协议表示的特征服务响应。一旦开发者已经创建该可公开访问的定制事件特征服务器,则开发者可以更新CMS中的语音应用以将特征服务器位置改变为定制事件特征服务器的URL。The custom feature server can use the data from the feature service request in any way, as long as the expected feature service response is returned. The custom feature server can use the message elements of the initial request message from the voice assistant device, track any internal analysis, decompose the message elements of the request message, and provide functionality unique to the voice assistant device or voice application that sends the request message. As shown in Figure 19, for example, if the platform participant has used a third-party service to manage its event information and does not want to migrate data to the platform, the participant can instead develop a custom event feature server 521 to replace the default (supported) event feature server 555. However, the custom event feature server 521 must accept feature service requests expressed according to the same protocol as the platform's event feature server 555, and return a feature service response expressed according to the same output protocol as the platform's server. Once the developer has created the publicly accessible custom event feature server, the developer can update the voice application in the CMS to change the feature server location to the URL of the custom event feature server.

每个定制特征服务器必须是现有的特征类型。平台需要知道发送特征服务请求到适当的特征服务器。但是,如图20所示,特征服务器也可以登记为定制回退特征服务器523,以使得对于给定语音应用,如果来自语音助手装置的请求不与对语音应用登记的特征类型匹配,则特征服务请求524可以发送到回退定制特征服务器523。该布置使能怎样处理响应的完全定制,比如创建包括定制模块而没有除了回退定制特征以外的其他特征的语音应用。如图21所示,所有特征服务请求525将被转发到定制特征服务器523,该定制特征服务器523可以设计用于处理响应消息本身的所有消息元素而不使用任何平台支持的特征526。这些类型的定制特征仍然需要返回到语音体验服务器的特征服务响应匹配用于回退类型的期望特征服务响应的协议。用于该情况的特征服务请求可以包括来自语音助手装置的初始消息请求的消息元素和消息元素正在尝试拉取的信息,比如它最紧密匹配的特征类型。如图21所示,即使特征类型不在语音应用中登记,处理也这样执行。Each custom feature server must be an existing feature type. The platform needs to know to send feature service requests to the appropriate feature server. However, as shown in Figure 20, the feature server can also be registered as a custom fallback feature server 523 so that for a given voice application, if the request from the voice assistant device does not match the feature type registered for the voice application, the feature service request 524 can be sent to the fallback custom feature server 523. This arrangement enables full customization of how to handle responses, such as creating a voice application that includes a custom module without other features except the fallback custom feature. As shown in Figure 21, all feature service requests 525 will be forwarded to the custom feature server 523, which can be designed to handle all message elements of the response message itself without using any platform-supported features 526. These types of custom features still require the feature service response returned to the voice experience server to match the protocol for the expected feature service response of the fallback type. The feature service request for this situation can include the message element of the initial message request from the voice assistant device and the information that the message element is trying to pull, such as the feature type it most closely matches. As shown in Figure 21, the processing is performed even if the feature type is not registered in the voice application.

例如,如果给定语音应用不具有在其任何模块中使能的事件特征,但是请求消息到达包括用于Alexa事件搜索的消息元素的语音体验服务器,则因为适当的匹配是事件特征,所以图遍历将不能够找到匹配特征。如果语音应用已经登记了定制回退特征,则处理将跳过图遍历步骤并代替地从内容数据库找到回退特征服务器信息,并将初始本地Alexa事件搜索消息元素发送到定制回退特征服务器。定制特征服务器然后可以将任何期望的处理应用于原始的Alexa事件搜索消息元素,并返回对回退特征类型特定的结构化的特征服务响应。如果除了该定制回退特征服务器以外没有登记的特征,则将总是跳过图遍历,有利于直接进行到定制特征服务器。For example, if a given voice application does not have an event feature enabled in any of its modules, but a request message arrives at a voice experience server that includes a message element for an Alexa event search, then the graph traversal will not be able to find a matching feature because the appropriate match is the event feature. If the voice application has registered a custom fallback feature, then the process will skip the graph traversal step and instead find the fallback feature server information from the content database and send the initial local Alexa event search message element to the custom fallback feature server. The custom feature server can then apply any desired processing to the original Alexa event search message element and return a structured feature service response specific to the fallback feature type. If there are no registered features other than this custom fallback feature server, then the graph traversal will always be skipped in favor of proceeding directly to the custom feature server.

内容管理层Content management layer

语音助手和终端用户之间的交互由语音应用提供并基于由平台管理的内容项而具体化。平台使参与者用户能够按照需要创建、修改和删除特征使用的内容项。这些参与者用户可以使用网络浏览器或者移动装置,通过平台的用户界面,与基于特征的内容项一起工作。如早先讨论的,特征可以实现为用于特定类型的消息元素,比如关于事件的信息的请求的处理器。特征也提供基于由平台定义的协议用于添加内容项的一致结构。The interaction between the voice assistant and the end user is provided by the voice application and is embodied based on the content items managed by the platform. The platform enables participant users to create, modify and delete content items used by features as needed. These participant users can work with feature-based content items through the platform's user interface using a web browser or mobile device. As discussed earlier, features can be implemented as handlers for specific types of message elements, such as requests for information about events. Features also provide a consistent structure for adding content items based on protocols defined by the platform.

例如,事件特征可以包括以下特性:事件名、事件位置、事件开始日期/时间、事件结束日期/时间、事件细节和事件概要,等等。相对于这种特征,参与者用户简单地使用平台的用户界面内呈现的字段来添加、修改或者去除关于事件的信息622(图22)。用于特征的内容项被添加到当语音助手装置的终端用户发出事件特定问题时查询的搜索索引。For example, an event feature may include the following characteristics: event name, event location, event start date/time, event end date/time, event details, and event summary, etc. With respect to such features, a participant user simply uses the fields presented within the platform's user interface to add, modify, or remove information about the event 622 ( FIG. 22 ). Content items for the feature are added to the search index that is queried when an end user of the voice assistant device issues an event-specific question.

如图23所示,参与者用户可以使用内容管理系统用户界面611管理用于给定模块(无论平台管理的模块还是定制模块)内的所有选择的特征类型的语音应用的内容项。另外,参与者用户可以基于跨语音助手装置的多个框架的给定语音应用的使用数据来观看跨装置(例如,跨框架)分析612,因为通用语音应用平台可以处理来自所有这种语音助手装置的请求消息。As shown in Figure 23, a participant user can manage content items for voice applications of all selected feature types within a given module (whether a platform-managed module or a custom module) using a content management system user interface 611. In addition, a participant user can view cross-device (e.g., cross-framework) analysis 612 based on usage data of a given voice application across multiple frameworks of a voice assistant device, because the universal voice application platform can process request messages from all such voice assistant devices.

为了添加内容项的目的,用户界面使用HTTP发送内容管理请求到CMS的API。CMSAPI然后管理在哪里存储内容项。内容项可以包括文本或者比如以mp3格式的音频或者以mp4格式的视频的媒体资产。以媒体资产的形式的内容项被上载到二进制大对象存储设备或者文件管理系统,且元数据和有关的内容项被存储在可扩展的关系数据库中。For the purpose of adding content items, the user interface sends a content management request to the CMS's API using HTTP. The CMS API then manages where the content items are stored. Content items may include text or media assets such as audio in mp3 format or video in mp4 format. Content items in the form of media assets are uploaded to a binary large object storage device or file management system, and metadata and related content items are stored in an extensible relational database.

CMS API对于与特征类型有关的内容项不是排他的,但是也使参与者用户能够管理他们的帐户、语音应用、模块、特征及平台的其他方面,包括登记定制模块和定制特征。每一内容项特别地对于对应的特征类型结构化,在于内容项的特性和字段一致地符合用于表示任何给定特征类型的内容项的公共协议。每一内容项也与特定语音应用有关,以防止能访问语音应用的适当的平台参与者以外的平台参与者在用户界面中观看或者使用所有者的内容项。虽然给定特征类型可以跨多个模块使用,特征内容项直接与管理它们的模块相关联。例如,表示对常问问题的答案且对于两个模块相同的特征内容项值被存储在数据库中两次。The CMS API is not exclusive to content items associated with feature types, but also enables participant users to manage their accounts, voice applications, modules, features, and other aspects of the platform, including registering custom modules and custom features. Each content item is structured specifically for the corresponding feature type, in that the properties and fields of the content item consistently conform to the public protocol used to represent content items of any given feature type. Each content item is also associated with a specific voice application to prevent platform participants other than appropriate platform participants with access to the voice application from viewing or using the owner's content item in the user interface. Although a given feature type can be used across multiple modules, feature content items are directly associated with the modules that manage them. For example, a feature content item value that represents an answer to a FAQ and is the same for two modules is stored twice in the database.

来自CMS的支持和指导Support and guidance from CMS

语音助手装置在它们怎样基于它们的内部硬件和软件处理内容项方面变化。一个语音助手装置可以支持视频、音频、图像和文本,而另一个可以仅支持文本和音频。CMS可以提供关于由参与者用户添加的内容项的指导和实时反馈。例如,如图24所示,除也与事件652有关的音频文件和图像之外,参与者用户可以输入与事件有关的文本内容项。CMS界面将指示支持内容项651、661的提交类型的语音助手装置的类型(图26)。Voice assistant devices vary in how they process content items based on their internal hardware and software. One voice assistant device may support video, audio, images, and text, while another may only support text and audio. The CMS may provide guidance and real-time feedback on content items added by participant users. For example, as shown in Figure 24, in addition to audio files and images also related to event 652, participant users may enter text content items related to the event. The CMS interface will indicate the type of voice assistant device that supports the submission type of content items 651, 661 (Figure 26).

选择包括音频或者视频作为响应消息的消息元素的一部分的参与者用户可以通过平台641的用户界面直接在CMS内产生内容项。因此,如图24和图25所示,平台使平台用户能够在一个位置642产生和管理多个类型的内容项。Participant users who choose to include audio or video as part of a message element of a response message can generate content items directly within the CMS through the user interface of the platform 641. Thus, as shown in Figures 24 and 25, the platform enables platform users to generate and manage multiple types of content items in one location 642.

问题和回答Questions and Answers

平台被设计用于存储和提供语音助手装置可以说出的不同短语和句子,例如,以回答终端用户的问题。可选的是存储为问题与回答集合。如图22所示,CMS界面使平台用户能够创建问题621和回答623的集合。The platform is designed to store and provide different phrases and sentences that the voice assistant device can say, for example, to answer questions of the end user. Optionally, it can be stored as a collection of questions and answers. As shown in Figure 22, the CMS interface enables platform users to create a collection of questions 621 and answers 623.

全面的多语言支持Comprehensive multi-language support

平台全面地支持语音内容管理系统界面内的多语言内容和语音交互。因为语音内容管理系统界面支持多个语言,界面可对非英语平台用户以他们的本地语言访问。在某些实现中,平台可以支持发布非英语内容的能力。为使得这种方法有用,界面内的指令和提示也需要以平台用户的本地语言提供。The platform fully supports multilingual content and voice interaction within the voice content management system interface. Because the voice content management system interface supports multiple languages, the interface is accessible to non-English platform users in their native language. In some implementations, the platform may support the ability to publish non-English content. For this approach to be useful, instructions and prompts within the interface also need to be provided in the platform user's native language.

平台基于表示给定内容项的数据模型,支持多语言内容,用于从数据层向上通过最终响应消息到语音助手装置的语音交互。平台内的所有内容项继承包括用于语言和版本的特性的对象。因此,系统中的任何内容项可以具有以其他语言的对应项。例如,以EN-US的语言值的陈述“学生群体多大”的语音内容管理系统中的问题可以具有以具有ES-ES和FR-FR的语言值的西班牙语和法语的等效条目。The platform supports multilingual content for voice interaction from the data layer up through the final response message to the voice assistant device, based on a data model representing a given content item. All content items within the platform inherit objects that include properties for language and version. Therefore, any content item in the system can have a corresponding item in another language. For example, a question in a voice content management system stating "How big is the student body" with a language value of EN-US can have equivalent entries in Spanish and French with language values of ES-ES and FR-FR.

分析analyze

平台的分析处理可以分析表示平台的操作的许多不同方面的使用数据,并处理该大量信息以向参与者用户提供对他们的内容项624、特征、模块和语音应用的性能的洞察。如图27和28所示,数据分析可以包括跨语音助手装置671、681的不同类型(框架)以及作为初始请求消息的源的不同特定语音助手装置做出的度量,请求的消息元素调用672的特征的类型的度量,和给定特征使用的各个内容项的性能的比较。这些类型的分析与平台本身使用以确定组件、方面和整个平台的性能的分析分开。The platform's analytical processing can analyze usage data representing many different aspects of the platform's operation, and process this large amount of information to provide participant users with insights into the performance of their content items 624, features, modules, and voice applications. As shown in Figures 27 and 28, data analysis can include metrics made across different types (frameworks) of voice assistant devices 671, 681 and different specific voice assistant devices that are the source of the initial request message, metrics of the type of feature that the requested message element calls 672, and comparisons of the performance of individual content items used by a given feature. These types of analysis are separate from the analysis used by the platform itself to determine the performance of components, aspects, and the entire platform.

由平台提供的关键种类的分析包括数据积累、数据分析和处理、关键性能指标和智能渲染。Key categories of analytics provided by the platform include data accumulation, data analysis and processing, key performance indicators, and intelligent rendering.

数据积累Data accumulation

分析内容项的性能对使平台参与者能够创建用于语音助手装置的终端用户的好的语音体验是关键的。在数据流中存在可以为了该目的特别有效地分析原始数据的点。平台将机器学习方法应用于原始数据以将数据分类到存储桶中,并比较随时间积累的大量数据。Analyzing the performance of content items is critical to enabling platform participants to create a good voice experience for end users of voice assistant devices. There are points in the data stream where raw data can be analyzed particularly effectively for this purpose. The platform applies machine learning methods to raw data to sort the data into buckets and compare large amounts of data accumulated over time.

平台分析的数据类型包括以下的每一个(和它们的两个或更多个的组合):从其发起请求消息的语音助手的类型(例如,框架)(例如,Alexa,谷歌助手,苹果Siri,微软Cortana或者定制语音助手),请求消息来自于其的语音助手装置的类型(例如,框架)(例如,Echo Show,谷歌Home,移动装置,Echo Dot或者其它的),由请求消息的消息元素调用的特征的类型,用于每一处理的内容项的元数据,通常一起找到的内容项,调用适当的特征中请求消息的消息元素的类型的成功率,调用内容项中的丢失,关于其讲话启动请求的终端用户的信息,关于有关应用的信息,原始使用信息,时刻,重复相对新访客,从其发起请求消息的地理位置和区域,和验证的终端用户信息,等等。The data types analyzed by the platform include each of the following (and combinations of two or more thereof): the type of voice assistant (e.g., framework) from which the request message is initiated (e.g., Alexa, Google Assistant, Apple Siri, Microsoft Cortana, or a custom voice assistant), the type of voice assistant device (e.g., framework) from which the request message comes (e.g., Echo Show, Google Home, mobile device, Echo Dot, or other), the type of features invoked by the message element of the request message, metadata for each content item processed, content items that are commonly found together, success rate of invoking the appropriate feature in the type of message element of the request message, losses in invoking content items, information about the end user whose speech initiated the request, information about related applications, original usage information, time of day, repeat versus new visitors, geographic location and region from which the request message is initiated, and authenticated end user information, etc.

这些数据项也可以彼此相关。如前所述,数据项的关系提供对内容项的性能的洞察。These data items may also be related to each other. As previously mentioned, the relationships of the data items provide insight into the performance of the content items.

在平台的操作流中存在某些特别有效的位置,在其可以收集原始分析数据且存在用于怎样收集它的子流。一旦收集,原始数据可以被处理为更容易理解的结构化数据。用于数据收集的有效位置包括:在API层的请求消息的初始接收、特征服务器的内容搜索的性能和由语音体验服务器的响应消息的处理,等等。There are certain particularly efficient locations in the platform's operational flow where raw analytics data can be collected and there are sub-flows for how to collect it. Once collected, the raw data can be processed into more easily understood structured data. Efficient locations for data collection include: initial receipt of request messages at the API layer, performance of content searches by feature servers, and processing of response messages by the Voice Experience Server, among others.

由语音体验API的请求的接收Receipt of a request by the Voice Experience API

由语音助手装置发送到语音体验服务器API的请求消息包括有用的原始数据。发送的原始数据将取决于语音助手装置的类型,虽然由许多类型的语音助手装置发送的数据通常包括:用户标识符、关于作为请求消息的源的语音助手装置的信息、关于作为请求消息的源的语音助手的信息和请求中包括的某些数据(例如,消息元素)。The request message sent by the voice assistant device to the voice experience server API includes useful raw data. The raw data sent will depend on the type of voice assistant device, although the data sent by many types of voice assistant devices typically includes: a user identifier, information about the voice assistant device that is the source of the request message, information about the voice assistant that is the source of the request message, and certain data included in the request (e.g., message elements).

平台的API层将原始数据翻译为根据跨不同框架共享的一组协议而表示的抽象形式。如图29所示,一旦原始数据被根据抽象协议结构化和表示,它被发送到实现为数据湖528的累积数据存储器,例如,在那里其被存储以便之后由一个或多个数据分析处理529来处理530。The API layer of the platform translates the raw data into an abstract form represented according to a set of protocols shared across different frameworks. As shown in Figure 29, once the raw data is structured and represented according to the abstract protocol, it is sent to an accumulation data store implemented as a data lake 528, for example, where it is stored for later processing 530 by one or more data analysis processes 529.

特征服务器内容搜索Feature Server Content Search

通过使用关于字段的权重创建搜索索引和允许请求消息的消息元素达到多个特征内容结果,平台可以跟踪在对应的响应消息中返回的结果和公共地在跨多个请求消息的结果中找到的内容项。这使平台能够通过平台的用户界面向平台参与者示出他们的哪个内容项正在最频繁地使用和哪个丢失。平台参与者然后可以决定改变内容项或者响应消息的消息元素的措词或者结构或者其他特性以在与终端用户交互时产生更好的结果。By creating a search index using weights on fields and allowing message elements of request messages to reach multiple feature content results, the platform can track the results returned in the corresponding response messages and the content items found commonly in the results across multiple request messages. This enables the platform to show platform participants through the platform's user interface which of their content items are being used most frequently and which are missing. Platform participants can then decide to change the wording or structure or other characteristics of the content items or message elements of the response messages to produce better results when interacting with end users.

如图30所示,当特征服务器505询问内容索引511并接收可能的结果(内容项)时,原始的可能结果527可以存储在数据湖528中。存储的数据识别来自搜索结果的内容项和关于那些结果返回的查询,比如特征服务请求的有关信息。与搜索结果数据一起存储的特征服务请求中的数据涉及最初从API发送的请求数据,因为特征服务请求包括从语音助手装置接收到的请求消息的初始消息元素。As shown in Figure 30, when the feature server 505 queries the content index 511 and receives possible results (content items), the original possible results 527 can be stored in the data lake 528. The stored data identifies the content items from the search results and the query returned about those results, such as relevant information of the feature service request. The data in the feature service request stored with the search result data relates to the request data originally sent from the API, because the feature service request includes the initial message element of the request message received from the voice assistant device.

响应处理Response Processing

一旦来自初始请求消息的消息元素和内容搜索结果的数据已经存储在分析数据湖中,则可以通过将来自特征服务器的特征服务响应转换为符合相应的语音助手装置期望的协议的形式,来制定要包括在响应消息中的消息元素。生成响应消息的消息元素的处理是用于积累用于分析的原始数据的有用的点。Once the message elements from the initial request message and the data from the content search results have been stored in the analytics data lake, the message elements to be included in the response message can be formulated by converting the feature service response from the feature server into a form that conforms to the protocol expected by the corresponding voice assistant device. The process of generating the message elements of the response message is a useful point for accumulating raw data for analysis.

例如,如果数据湖528包括来自请求消息的消息元素、关于发起的语音助手装置的信息和请求消息以及响应消息,则分析处理529可以将那些数据集组合为更整齐和更瘦的模型530,以使得更容易示出例如多少终端用户使用各种类型的语音助手装置或者对于某个类型的语音助手装置多少请求消息已经生成成功的响应消息。例如,如果语音应用具有使用SDK以将请求消息的消息元素发送到语音应用的Alexa技巧和谷歌动作,则平台参与者可以得知总体对于语音应用多少终端用户使用Alexa技巧对比谷歌动作和多少终端用户对于比如事件特征的特定特征使用Alexa对比谷歌,或者两个不同语音助手装置的多少终端用户要求特定内容项。For example, if the data lake 528 includes message elements from a request message, information about the voice assistant device that initiated the request message, and the response message, then the analysis process 529 can combine those data sets into a neater and leaner model 530 to make it easier to show, for example, how many end users use various types of voice assistant devices or how many request messages have generated successful response messages for a certain type of voice assistant device. For example, if a voice application has an Alexa skill and a Google action that use an SDK to send the message element of the request message to the voice application, then the platform participants can know how many end users use the Alexa skill versus the Google action for the voice application overall and how many end users use Alexa versus Google for a specific feature such as an event feature, or how many end users of two different voice assistant devices request a specific content item.

分析处理也可以跟踪与给定特征匹配的给定类型的语音助手装置的消息元素类型,使平台参与者能够考虑将内容项移动到定制回退特征服务器。因为初始请求消息包括初始消息元素类型,所以分析处理529可以跳过图遍历并直接找到特征。例如,如果平台参与者注意到谷歌动作趋向于使用平台参与者不想要映射到它正在映射到的特征的特定消息元素类型,则拥有者可以禁止该特征并通过使用定制回退特征服务器或者定制特征服务器定制请求消息的消息元素将去哪里。The analysis process may also track the message element types for a given type of voice assistant device that matches a given feature, enabling the platform participant to consider moving the content item to a custom fallback feature server. Because the initial request message includes the initial message element type, the analysis process 529 may skip the graph traversal and go directly to the feature. For example, if the platform participant notices that Google actions tend to use a particular message element type that the platform participant does not want mapped to the feature it is mapping to, the owner may disable that feature and customize where the message elements of the request message will go by using a custom fallback feature server or a custom feature server.

上面讨论的分析的类型可以考虑为静态分析,且数据到抽象结构的处理可以被称为静态数据分析。如之后讨论的,静态数据分析不同于被称作动态数据分析或者智能数据分析,其使用机器学习来理解分析中的模式而不是直接显示数据。The types of analysis discussed above can be considered static analysis, and the processing of data into abstract structures can be referred to as static data analysis. As discussed later, static data analysis is different from what is called dynamic data analysis or intelligent data analysis, which uses machine learning to understand patterns in the analysis rather than directly displaying the data.

一旦请求消息的消息元素已经从其存储在数据湖528中的原始状态映射到数据库531中存储的更结构化的形式,通过将数据压缩为文件和将其保存到blob存储器或者文件存储器,可以删除数据湖中的原始数据或者将其移动到长期存档中。存档某些类型的数据使能新的或者修订的机器学习算法的训练,而不必须重新收集用于训练的数据,且还用作相对于分析数据库531中的数据损坏或者数据损失的备份。Once the message elements of the request message have been mapped from their original state stored in the data lake 528 to a more structured form stored in the database 531, the original data in the data lake can be deleted or moved to a long-term archive by compressing the data into files and saving it to a blob storage or file storage. Archiving certain types of data enables the training of new or revised machine learning algorithms without having to re-collect the data used for training, and also serves as a backup with respect to data corruption or data loss in the analytical database 531.

机器学习和智能建议Machine learning and smart recommendations

分析引擎使用机器学习和大量分析数据向平台参与者提供分析和建议。该动态或者智能数据分析可以用于向平台参与者提供关于如何结构化内容项、在哪里放置某些类型的内容项、哪个内容项工作良好和哪个内容项工作不好的智能建议。The analysis engine uses machine learning and large amounts of analytical data to provide analysis and recommendations to platform participants. This dynamic or intelligent data analysis can be used to provide platform participants with intelligent recommendations on how to structure content items, where to place certain types of content items, which content items work well and which content items do not work well.

如图31所示,处理分析数据的一般流程包括:在数据湖中存储原始数据,从数据湖取得原始数据,发送原始数据到静态分析,将来自静态分析的输出发送到机器学习534,在单独的数据库535中存储平台参与者的建议以之后使用,基于机器学习算法的输出请求建议536,和通过平台的用户界面向平台参与者呈现建议。As shown in Figure 31, the general process of processing analytical data includes: storing raw data in a data lake, obtaining raw data from the data lake, sending the raw data to static analysis, sending the output from static analysis to machine learning 534, storing suggestions from platform participants in a separate database 535 for later use, requesting suggestions based on the output of the machine learning algorithm 536, and presenting suggestions to platform participants through the platform's user interface.

数据分析和处理Data analysis and processing

如图31所示,分析引擎内的处理使用通过后处理静态地分析的数据531以及来自预处理的数据湖的原始数据而生成的信息,来推断关系和查看这些关系中的模式。与静态分析步骤一样,用于动态分析533的算法也针对特定目标。用于动态分析的目标不仅使用静态数据,比如跨装置的使用、成功率或者失败率。动态分析使用这些关于使用和比率的统计,以比较某些内容项和特征。As shown in Figure 31, the processing within the analysis engine uses the information generated by post-processing the statically analyzed data 531 and the raw data from the pre-processed data lake to infer relationships and view patterns in these relationships. As with the static analysis step, the algorithm for dynamic analysis 533 is also targeted at specific goals. The goals for dynamic analysis use more than just static data, such as usage across devices, success rates, or failure rates. Dynamic analysis uses these statistics on usage and rates to compare certain content items and features.

例如,如图32所示,动态分析可以检测内容项的相对性能。当使用随时间增长的积累的数据量执行动态分析时,动态分析可以实现为什么特定内容项比其它的更好地工作的越来越深的理解。该动态分析的结果可以是关于句子结构的信息、内容项内的数据的类型、由语音助手对词的使用的质量及其他因素。For example, as shown in FIG32 , dynamic analysis can detect the relative performance of content items. When dynamic analysis is performed using an accumulated amount of data that grows over time, dynamic analysis can achieve an increasingly deeper understanding of why specific content items work better than others. The results of this dynamic analysis can be information about sentence structure, the type of data within a content item, the quality of the use of words by the voice assistant, and other factors.

其中,分析数据的动态分析包括:在语音应用级和在内容项级收集数据。数据例如可以包括:总的内容项的成功537和失败538比率,当在特定类型的语音助手装置上呈现时内容项的成功和失败比率,比较在特征内容搜索中哪些内容项通常一起返回,和识别返回公共数据集合结果的特征服务器内容搜索中的询问,等等。The dynamic analysis of the analysis data includes: collecting data at the voice application level and at the content item level. The data may include, for example: the success 537 and failure 538 ratio of the total content items, the success and failure ratio of the content items when presented on a specific type of voice assistant device, comparing which content items are usually returned together in feature content searches, and identifying queries in feature server content searches that return common data set results, etc.

静态分析和动态分析之间分析数据的收集的主要差异在于静态分析仅使用特定语音应用和特征的上下文内的数据。因为静态分析的结果是仅应用于特定应用的数据和其自己的特征和内容项,所以出现该限制。相反地,动态分析可以一次性地使用从所有平台参与者的所有语音应用的执行推导出的原始数据。因此,给定平台参与者可以使用所有平台参与者的所有语音应用的所有内容项的动态分析,且可以接收使平台参与者能够向终端用户提供有效内容项的智能建议。The main difference in the collection of analytical data between static analysis and dynamic analysis is that static analysis uses data only within the context of a specific voice application and feature. This limitation occurs because the results of static analysis are data that only apply to a specific application and its own features and content items. In contrast, dynamic analysis can use raw data derived from the execution of all voice applications of all platform participants at one time. Therefore, a given platform participant can use dynamic analysis of all content items of all voice applications of all platform participants, and can receive intelligent suggestions that enable the platform participant to provide effective content items to end users.

例如,由平台的分析引擎执行的动态分析和机器学习可以分类539四个不同平台参与者的四个语音应用的分析数据。假定语音应用全部使用调查特征,而与哪个模块是该特征的源无关。在每一个调查特征内容领域中,每一语音应用问类似的问题,比如“在汉密尔顿学院有多少本科生?”假定该问题具有一组可接受的回答,比如1878、1800、大约1800和一千八百。For example, dynamic analysis and machine learning performed by the platform's analytics engine can classify 539 analytical data for four voice applications of four different platform participants. Assume that the voice applications all use the survey feature, regardless of which module is the source of the feature. In each survey feature content area, each voice application asks a similar question, such as "How many undergraduates are there at Hamilton College?" Assume that this question has a set of acceptable responses, such as 1,878, 1,800, approximately 1,800, and one thousand eight hundred.

基于该示例,静态分析将收集关于多少响应成功和产生成功和失败的语音助手装置或者语音助手540的类型的信息。例如,比如Siri的特定类型的语音助手具有比其他语音助手高得多的失败率。分析引擎可以收集关于提供了不正确的回答的信息。在这些统计的动态分析期间,分析引擎可以检测Siri的大量失败的响应,其大部分是“十八一百”。这可以建议特定类型的语音助手装置或者语音助手的语言处理可能比其他类型更差地执行。终端用户实际上可能已经说“一千八百”,但是Siri将该讲话解释为“十八一百”。动态分析可以跟踪某些语音助手比其他类型的语音助手更不精确地解释的词的类型,并且就像静态分析做的那样在结构化数据库中存储该信息。在该示例中,机器学习算法534将记录“一千八百”是对于Siri正确地处理的困难的短语。通过该知识,分析引擎可以向平台参与者提供智能建议。因为分析引擎可以使用来自不同平台参与者的全部四个应用的使用数据,所以它可以存储已处理的信息,并将其提供给全部四个平台参与者,而不需要每一个平台参与者访问用于训练机器和处理用于智能建议的私密信息。Based on this example, static analysis will collect information about how many responses are successful and the types of voice assistant devices or voice assistants 540 that produce success and failure. For example, a specific type of voice assistant such as Siri has a much higher failure rate than other voice assistants. The analysis engine can collect information about providing incorrect answers. During the dynamic analysis of these statistics, the analysis engine can detect a large number of failed responses of Siri, most of which are "eighteen one hundred". This can suggest that the language processing of a specific type of voice assistant device or voice assistant may be performed worse than other types. The end user may actually have said "one thousand eight hundred", but Siri interpreted the speech as "eighteen one hundred". Dynamic analysis can track the types of words that certain voice assistants interpret more imprecisely than other types of voice assistants, and store this information in a structured database just like static analysis does. In this example, the machine learning algorithm 534 will record that "one thousand eight hundred" is a difficult phrase for Siri to process correctly. With this knowledge, the analysis engine can provide intelligent suggestions to platform participants. Because the analytics engine can use usage data from all four applications from different platform participants, it can store the processed information and provide it to all four platform participants without requiring each platform participant to access private information used to train the machine and process it for intelligent recommendations.

智能建议Smart suggestions

智能建议是从由分析处理的机器学习和动态分析阶段生成的数据推导出并提供给平台参与者的建议,以当使用一个或多个类型的语音助手装置上的平台参与者的语音应用和消息元素时,结构化或者表示或者改变内容项以实现终端用户的有效的语音体验的方式。这些建议可以包括:重新表达句子、去除词、添加措词变化、去除变化、或者更新空位值,等等。Intelligent suggestions are suggestions derived from data generated by the machine learning and dynamic analysis phases of the analysis process and provided to platform participants in a manner that structures or represents or changes content items to achieve an effective voice experience for end users when using platform participants' voice applications and message elements on one or more types of voice assistant devices. These suggestions may include: rephrasing sentences, removing words, adding wording variations, removing variations, or updating slot values, etc.

通过当正在更新内容项时发送HTTP请求到CMS API以请求建议来生成建议。CMSAPI对于例如关于某些语音助手或者语音助手装置的某些词的成功和失败比率的最新信息来检查数据库,并返回建议的集合,如果存在的话。CMS客户端(例如,分析处理)然后通过平台的用户界面向平台用户呈现这些建议以使平台用户能够基于建议对措词做出改变或者忽略建议。Suggestions are generated by sending an HTTP request to the CMS API to request suggestions when a content item is being updated. The CMS API checks a database for the latest information, such as success and failure rates for certain words for certain voice assistants or voice assistant devices, and returns a set of suggestions, if any. The CMS client (e.g., an analytics process) then presents these suggestions to the platform user through the platform's user interface to enable the platform user to make changes to the wording based on the suggestions or ignore the suggestions.

使用以上机器学习和动态分析检测和跟踪Siri对于某些类型的数字,比如“一千八百”有困难的示例,假定平台参与者创建新的调查问题“独立宣言何时签署?”,其中接受的回答是1776、十七七六。在参与者用户输入表示这些回答的内容项之后,CMS将请求用于这些内容项的建议。因为分析引擎知道Siri将可能将“一七七六”说成“十七一百和七十六”,它将建议平台参与者添加"十七一百和七十六"的另一回答变型,说明Siri可能不正确地解释某些数字,且添加该变型将帮助保证Apple HomePod的终端用户将具有更好的语音交互体验。例如,如图33所示,这种短语可以在用于这些智能建议631的用户界面中存在。Using the above machine learning and dynamic analysis to detect and track Siri for certain types of numbers, such as "one thousand eight hundred" has difficulty in the example, assuming that the platform participant creates a new survey question "When was the Declaration of Independence signed?", where the accepted answers are 1776, seventeen seventy six. After the participant user enters the content items representing these answers, the CMS will request suggestions for these content items. Because the analysis engine knows that Siri will probably say "one seven hundred and seventy-six" as "seventeen hundred and seventy-six", it will suggest that the platform participant add another answer variant of "seventeen hundred and seventy-six", indicating that Siri may not interpret certain numbers correctly, and adding this variant will help ensure that the end user of Apple HomePod will have a better voice interaction experience. For example, as shown in Figure 33, this phrase can exist in the user interface for these intelligent suggestions 631.

智能建议可以用于任何类型的特征或者内容项,因为动态分析可以跨特征地以及在特定特征的上下文内跟踪数据,以提供最好的智能建议。Smart suggestions can be used for any type of feature or content item because dynamic analysis can track data across features and within the context of a specific feature to provide the best smart suggestions.

除关于内容项的建议和关于特征的建议之外的另一类型的智能建议是将特定特征添加到语音应用的推荐。这种智能建议可以通过跟踪添加到类似的语音应用的哪个特征由它们的语音应用与更成功或更多使用相关联来推导出。例如,通过知道对于相同产业中的语音应用哪个特征使用最多和最成功,动态分析可以跟踪关于这些特征和模块的数据,并向平台参与者建议添加这些特征和模块。Another type of intelligent suggestion in addition to suggestions about content items and suggestions about features is a recommendation to add specific features to a voice application. Such intelligent suggestions can be derived by tracking which features added to similar voice applications are associated with more success or more usage by their voice applications. For example, by knowing which features are most used and successful for voice applications in the same industry, dynamic analytics can track data about these features and modules and recommend adding these features and modules to platform participants.

例如,如果在高等教育产业中存在两个语音应用,并且一个语音应用,由于添加调查特征而体验更多使用和较高成功率,则动态分析可以检测该特征是第一语音应用的更大成功的原因,并建议添加类似的特征到第二应用,伴随有他们的产业中的其他平台参与者当包括该特征时体验更大成功的理由。For example, if there are two voice applications in the higher education industry, and one voice application experiences more usage and a higher success rate due to the addition of a survey feature, dynamic analytics can detect that the feature is the reason for the greater success of the first voice application and recommend adding a similar feature to the second application, along with reasons why other platform participants in their industry experience greater success when including the feature.

数据层Data Layer

数据层定义由分析引擎使用的存储的类型和那些类型的存储怎样与商业逻辑或API和应用的其他部分交互。主存储包括:内容数据库、分析数据湖、分析结构化数据库、文件和blob存储、内容索引和图数据库,等等。The data layer defines the types of storage used by the analytical engine and how those types of storage interact with the business logic or API and other parts of the application. Primary storage includes: content databases, analytical data lakes, analytical structured databases, file and blob storage, content indexes and graph databases, etc.

每一主要存储被设计为通过使用云技术是可扩展的,以使得它们可以跨世界的区域复制、同步保存数据和增加大小和吞吐量。Each primary storage is designed to be scalable by using cloud technologies so that they can replicate across regions of the world, keep data in sync, and increase in size and throughput.

内容数据库Content Database

内容数据库负责存储与管理由平台拥有的内容项有关的数据。在某些实现中,该数据库是关系SQL风格的数据库,其关联关于平台参与者、语音应用、模块、特征、内容项的数据及其他数据。The content database is responsible for storing data related to managing content items owned by the platform. In some implementations, the database is a relational SQL-style database that associates data about platform participants, voice applications, modules, features, content items, and other data.

使用来自CMS服务器和数据库的连接,通过CMS API更新内容数据库。由平台参与者通过平台的用户界面对CMS做出的请求使平台参与者能够更新内容项。The content database is updated through the CMS API using connections from the CMS server and database. Requests made by platform participants to the CMS through the platform's user interface enable platform participants to update content items.

数据库可以实现为PostgreSQL数据库或者任何其他SQL风格的数据库。The database can be implemented as a PostgreSQL database or any other SQL-style database.

文件和blob存储File and blob storage

文件和blob存储可以实现为云中的传统文件存储,以使能具有安全性的可扩展存储。文件和blob存储包括由平台参与者上载的文件,例如,音频记录、视频文件或者图像,或者它们的组合。这些文件中的每一个与可公开访问的URL相关联,以使语音助手装置能够访问文件,例如,在支持那些格式的语音助手装置上流化音频记录和视频文件或者渲染图像。File and blob storage can be implemented as traditional file storage in the cloud to enable scalable storage with security. File and blob storage includes files uploaded by platform participants, such as audio recordings, video files, or images, or a combination thereof. Each of these files is associated with a publicly accessible URL to enable voice assistant devices to access the files, such as streaming audio recordings and video files or rendering images on voice assistant devices that support those formats.

当平台参与者上载文件时,文件数据经过CMS API到文件和blob存储。一旦上载完成,文件的URL作为对请求客户的答复而发送,且对该文件的URL的参考存储在内容数据库中。平台参与者也可以通过平台的用户界面使用CMS去除和更新存储中的文件。When a platform participant uploads a file, the file data goes through the CMS API to the file and blob storage. Once the upload is complete, the URL of the file is sent as a reply to the requesting client, and a reference to the URL of the file is stored in the content database. Platform participants can also use the CMS to remove and update files in storage through the platform's user interface.

在某些实现中,文件和blob存储可以实现为亚马逊网络服务S3存储桶。In some implementations, file and blob storage may be implemented as Amazon Web Services S3 buckets.

内容索引Content Index

内容索引是包含来自内容数据库中的内容项的数据的弹性搜索索引的集合。内容索引提供用于特征服务器的更好执行内容搜索。当对于来自特征服务器的索引做出查询时,返回最佳匹配结果的集合。如早先描述的,弹性搜索索引使能添加权重到给定类型的数据的某些特性,所述给定类型的数据正被添加到所述索引。The content index is a collection of elasticsearch indexes containing data for content items from a content database. The content index provides for better performing content searches for feature servers. When a query is made against the index from a feature server, a collection of best matching results is returned. As described earlier, elasticsearch indexes enable adding weights to certain characteristics of a given type of data that is being added to the index.

当内容项由平台参与者添加、更新或者删除时,内容索引中的内容项由CMS API更新。When content items are added, updated, or deleted by platform participants, the content items in the content index are updated by the CMS API.

图数据库Graph Databases

图数据库存储特征、请求消息的消息元素和消息元素空位之间的关系的图。当从语音助手装置接收到请求消息时,图数据库在商业逻辑层的图遍历阶段期间使用。可以使用意图、空位和特征之间的边缘来遍历图,以找到用于来自语音助手装置的请求消息的消息元素的最适当的特征。The graph database stores a graph of the relationships between features, message elements of a request message, and message element slots. The graph database is used during the graph traversal phase of the business logic layer when a request message is received from a voice assistant device. The graph can be traversed using the edges between intents, slots, and features to find the most appropriate features for the message elements of the request message from the voice assistant device.

图数据库由管理例如亚马逊、谷歌、苹果和微软的新的或者更新的消息元素类型的关系的参与者用户更新。The graph database is updated by participant users who manage relationships for new or updated message element types from, for example, Amazon, Google, Apple, and Microsoft.

分析数据湖Analyzing the Data Lake

分析数据湖是用于非结构化的分析数据的大数据存储。它用于基于来自语音助手的请求消息和来自特征服务器的内容搜索来添加基础信息。静态分析和动态分析阶段和任务使用大量数据,并将其结构化为对分析引擎有价值的更小的和更可理解的信息单元,比如使用、成功/失败率,等等。The analytics data lake is a big data store for unstructured analytics data. It is used to add foundational information based on request messages from voice assistants and content searches from feature servers. Static and dynamic analysis phases and tasks use large amounts of data and structure it into smaller and more understandable units of information that are valuable to the analytics engine, such as usage, success/failure rates, and so on.

分析结构化数据库Analyzing structured databases

分析结构化数据库是CMS使用以示出和提供结构化分析数据和存储智能建议数据的SQL风格的关系数据库。在从数据湖检索信息和将其映射到存在于结构化数据库中的结构表关系之后,该数据库由数据分析阶段更新。The analytical structured database is a SQL-style relational database used by CMS to show and provide structured analytical data and store intelligent suggestion data. This database is updated by the data analysis phase after retrieving information from the data lake and mapping it to the structural table relationships present in the structured database.

其他实现也在以下权利要求的范围内。Other implementations are within the scope of the following claims.

Claims (35)

CN201980049296.7A2018-06-052019-06-03 Voice Application PlatformActiveCN112470216B (en)

Applications Claiming Priority (11)

Application NumberPriority DateFiling DateTitle
US16/000,799US10636425B2 (en)2018-06-052018-06-05Voice application platform
US16/000,7982018-06-05
US16/000,8052018-06-05
US16/000,7892018-06-05
US16/000,789US10803865B2 (en)2018-06-052018-06-05Voice application platform
US16/000,7992018-06-05
US16/000,805US11437029B2 (en)2018-06-052018-06-05Voice application platform
US16/000,798US10235999B1 (en)2018-06-052018-06-05Voice application platform
US16/353,9772019-03-14
US16/353,977US10943589B2 (en)2018-06-052019-03-14Voice application platform
PCT/US2019/035125WO2019236444A1 (en)2018-06-052019-06-03Voice application platform

Publications (2)

Publication NumberPublication Date
CN112470216A CN112470216A (en)2021-03-09
CN112470216Btrue CN112470216B (en)2024-08-02

Family

ID=68769419

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201980049296.7AActiveCN112470216B (en)2018-06-052019-06-03 Voice Application Platform

Country Status (4)

CountryLink
EP (1)EP3803856A4 (en)
CN (1)CN112470216B (en)
CA (1)CA3102093A1 (en)
WO (1)WO2019236444A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
IT202100012548A1 (en)*2021-05-142022-11-14Hitbytes Srl Method for building cross-platform voice applications
CN114220425B (en)*2021-11-042025-03-04福建亿榕信息技术有限公司 Chatbot system and conversation method based on speech recognition and Rasa framework
CN115757460A (en)*2022-11-022023-03-07浙江大华技术股份有限公司Business processing method based on data lake system and computer readable storage medium
CN116893864B (en)*2023-07-172024-02-13无锡车联天下信息技术有限公司Method and device for realizing voice assistant of intelligent cabin and electronic equipment

Citations (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN107277153A (en)*2017-06-302017-10-20百度在线网络技术(北京)有限公司Method, device and server for providing voice service

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20050261907A1 (en)*1999-04-122005-11-24Ben Franklin Patent Holding LlcVoice integration platform
US7340714B2 (en)*2001-10-182008-03-04Bea Systems, Inc.System and method for using web services with an enterprise system
US7640160B2 (en)*2005-08-052009-12-29Voicebox Technologies, Inc.Systems and methods for responding to natural language speech utterance
US8073681B2 (en)*2006-10-162011-12-06Voicebox Technologies, Inc.System and method for a cooperative conversational voice user interface
KR100911312B1 (en)*2007-06-222009-08-11주식회사 엘지씨엔에스 Voice portal service system and voice portal service method
US8437339B2 (en)*2010-04-282013-05-07Hewlett-Packard Development Company, L.P.Techniques to provide integrated voice service management
US9159322B2 (en)*2011-10-182015-10-13GM Global Technology Operations LLCServices identification and initiation for a speech-based interface to a mobile device
US20150066817A1 (en)*2013-08-272015-03-05Persais, LlcSystem and method for virtual assistants with shared capabilities
US9516355B2 (en)*2013-09-042016-12-06Qualcomm IncorporatedDiscovering and controlling multiple media rendering devices utilizing different networking protocols
US9548049B2 (en)*2014-02-192017-01-17Honeywell International Inc.Methods and systems for integration of speech into systems
US10713005B2 (en)*2015-01-052020-07-14Google LlcMultimodal state circulation
US9911412B2 (en)*2015-03-062018-03-06Nuance Communications, Inc.Evidence-based natural language input recognition
US9886955B1 (en)*2016-06-292018-02-06EMC IP Holding Company LLCArtificial intelligence for infrastructure management
US10115400B2 (en)*2016-08-052018-10-30Sonos, Inc.Multiple voice services
US10783883B2 (en)*2016-11-032020-09-22Google LlcFocus session at a voice interface device
US10235999B1 (en)*2018-06-052019-03-19Voicify, LLCVoice application platform

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN107277153A (en)*2017-06-302017-10-20百度在线网络技术(北京)有限公司Method, device and server for providing voice service

Also Published As

Publication numberPublication date
EP3803856A4 (en)2021-07-21
EP3803856A1 (en)2021-04-14
CN112470216A (en)2021-03-09
CA3102093A1 (en)2019-12-12
WO2019236444A1 (en)2019-12-12

Similar Documents

PublicationPublication DateTitle
US11790904B2 (en)Voice application platform
US11887597B2 (en)Voice application platform
US11615791B2 (en)Voice application platform
US11437029B2 (en)Voice application platform
US11775254B2 (en)Analyzing graphical user interfaces to facilitate automatic interaction
CN112470216B (en) Voice Application Platform
US11769064B2 (en)Onboarding of entity data
CN109271556B (en)Method and apparatus for outputting information
US11107470B2 (en)Platform selection for performing requested actions in audio-based computing environments
US11694688B2 (en)Platform selection for performing requested actions in audio-based computing environments
US20150058417A1 (en)Systems and methods of presenting personalized personas in online social networks
CN106407361A (en)Method and device for pushing information based on artificial intelligence
CN105279168A (en)Data query method supporting natural language, open platform, and user terminal
WO2022206307A1 (en)Method for electronic messaging using image based noisy content

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp