US11568148B1

Movatterモバイル変換

Info

Publication number: US11568148B1
Application number: US16/183,270
Authority: US
Inventors: Nathan D. Nichols; Andrew R. Paley; Maia Lewis Meza; Santiago Santana
Original assignee: Narrative Science LLC
Current assignee: Salesforce Inc
Priority date: 2017-02-17
Filing date: 2018-11-07
Publication date: 2023-01-31
Also published as: US12423525B2; US20240211697A1

Abstract

Artificial intelligence (AI) technology can be used in combination with composable communication goal statements to facilitate a user's ability to quickly structure story outlines using “explanation” communication goals in a manner usable by an NLG narrative generation system without any need for the user to directly author computer code. This AI technology permits NLG systems to determine the appropriate content for inclusion in a narrative story about a data set in a manner that will satisfy a desired explanation communication goal such that the narratives will express various ideas that are deemed relevant to a given explanation communication goal.

Description

CROSS-REFERENCE AND PRIORITY CLAIM TO RELATED PATENT APPLICATIONS

This patent application claims priority to U.S. provisional patent application Ser. No. 62/585,809, filed Nov. 14, 2017, and entitled “Applied Artificial Intelligence Technology for Narrative Generation Based on Smart Attributes and Explanation Communication Goals”, the entire disclosure of which is incorporated herein by reference.

This patent application is related to U.S. patent application Ser. No. 16/183,230, filed this same day, and entitled “Applied Artificial Intelligence Technology for Narrative Generation Based on Smart Attributes”, the entire disclosure of which is incorporated herein by reference.

INTRODUCTION

There is an ever-growing need in the art for improved natural language generation (NLG) technology that harnesses computers to process data sets and automatically generate narrative stories about those data sets. NLG is a subfield of artificial intelligence (AI) concerned with technology that produces language as output on the basis of some input information or structure, in the cases of most interest here, where that input constitutes data about some situation to be analyzed and expressed in natural language. Many NLG systems are known in the art that use template approaches to translate data into text. However, such conventional designs typically suffer from a variety of shortcomings such as constraints on how many data-driven ideas can be communicated per sentence, constraints on variability in word choice, and limited capabilities of analyzing data sets to determine the content that should be presented to a reader.

As technical solutions to these technical problems in the NLG arts, the inventors note that the assignee of the subject patent application has previously developed and commercialized pioneering technology that robustly generates narrative stories from data, of which a commercial embodiment is the QUILL™ narrative generation platform from Narrative Science Inc. of Chicago, Ill. Aspects of this technology are described in the following patents and patent applications: U.S. Pat. Nos. 8,374,848, 8,355,903, 8,630,844, 8,688,434, 8,775,161, 8,843,363, 8,886,520, 8,892,417, 9,208,147, 9,251,134, 9,396,168, 9,576,009, 9,697,197, 9,697,492, 9,720,890, and 9,977,773, and U.S. patent application Ser. No. 14/211,444 (entitled “Method and System for Configuring Automatic Generation of Narratives from Data”, filed Mar. 14, 2014), Ser. No. 15/253,385 (entitled “Applied Artificial Intelligence Technology for Using Narrative Analytics to Automatically Generate Narratives from Visualization Data, filed Aug. 31, 2016), Ser. No. 15/666,151 (entitled “Applied Artificial Intelligence Technology for Interactively Using Narrative Analytics to Focus and Control Visualizations of Data”, filed Aug. 1, 2017), Ser. No. 15/666,168 (entitled “Applied Artificial Intelligence Technology for Evaluating Drivers of Data Presented in Visualizations”, filed Aug. 1, 2017), and Ser. No. 15/666,192 (entitled “Applied Artificial Intelligence Technology for Selective Control over Narrative Generation from Visualizations of Data”, filed Aug. 1, 2017); the entire disclosures of each of which are incorporated herein by reference.

The inventors have further extended on this pioneering work with improvements in AI technology as described herein.

For example, the inventors disclose how AI technology can be used in combination with composable communication goal statements and an ontology to facilitate a user's ability to quickly structure story outlines in a manner usable by a narrative generation system without any need to directly author computer code.

Moreover, the inventors also disclose that the ontology used by the narrative generation system can be built concurrently with the user composing communication goal statements. Further still, expressions can be attached to objects within the ontology for use by the narrative generation process when expressing concepts from the ontology as text in a narrative story. As such, the ontology becomes a re-usable and shareable knowledge-base for a domain that can be used to generate a wide array of stories in the domain by a wide array of users/authors.

The inventors further disclose techniques for editing narrative stories whereby a user's editing of text in the narrative story that has been automatically generated can in turn automatically result in modifications to the ontology and/or a story outline from which the narrative story was generated. Through this feature, the ontology and/or story outline is able to learn from the user's edits and the user is alleviated from the burden of making further corresponding edits of the ontology and/or story outline.

The inventors further disclose how the narrative analytics that are linked to communication goal statements can employ a conditional outcome framework that allows the content and structure of resulting narratives to intelligently adapt as a function of the nature of the data under consideration.

Further still, the inventors also disclose how “analyze” communication goals can be supported by the system, including various examples of communication goal statements that drive the generation of narratives that express various ideas that are deemed relevant to a given analysis communication goal.

The inventors also disclose how the attribute structures within the ontology can include an explicit model for the subject attribute, regardless of whether that model is used to compute the value of the subject attribute itself. This explicit model can then be leveraged to support an investigation of drivers of the value for the subject attribute. Narrative analytics that perform such driver analysis can then be used to support narrative generation for communication goals relating to explanations, predictions, recommendations, and the like.

Furthermore, the inventors also disclose how “explain” communication goals can be supported by the system in combination with driver analysis supported by the explicit attribute models, including various examples of communication goal statements that drive the generation of narratives that express various ideas that are deemed relevant to a given explanation communication goal.

Through these and other features, example embodiments of the invention provide significant technical advances in the NLG arts by harnessing AI computing to improve how narrative stories are generated from data sets while alleviating users from a need to directly code and re-code the narrative generation system, thereby opening up use of the AI-based narrative generation system to a much wider base of users (e.g., including users who do not have specialized programming knowledge).

BRIEF DESCRIPTION OF THE DRAWINGS

FIGS.1A-B and2 depict various process flows for example embodiments.

FIG.3A depicts an example process flow for composing a communication goal statement.

FIG.3B depicts an example ontology.

FIG.3C depicts an example process flow for composing a communication goal statement while also building an ontology.

FIG.3D depict an example of how communication goal statements can relate to an ontology and program code for execution by a process as part of a narrative generation process.

FIG.4A depicts examples of base communication goal statements.

FIG.4B depicts examples of parameterized communication goal statements corresponding to the base communication goal statements ofFIG.4A.

FIG.5 depicts a narrative generation platform in accordance with an example embodiment.

FIGS.6A-D depict a high level view of an example embodiment of a platform in accordance with the design ofFIG.5.

FIG.7 depicts an example embodiment of an analysis component ofFIG.6C.

FIGS.8A-H depict example embodiments for use in an NLG component ofFIG.6D.

FIG.9 depicts an example process flow for parameterizing an attribute.

FIG.10 depicts an example process flow for parameterizing a characterization.

FIG.11 depicts an example process flow for parameterizing an entity type.

FIG.12 depicts an example process flow for parameterizing a timeframe.

FIG.13 depicts an example process flow for parameterizing a timeframe interval.

FIGS.14A-D illustrate an example of how a communication goal statement can include subgoals that drive the narrative generation process.

FIG.15A depicts an example conditional outcome data structure linked with one or more idea data structures.

FIG.15B depicts an example of narrative analytics that employ a conditional outcome framework to determine ideas to be expressed in a narrative.

FIG.16 depicts an example embodiment for a conditional outcome framework that can be used by the narrative analytics associated with a communication goal statement for “Analyze Entity Group by Attribute”.

FIGS.17A and17B depict examples of how ideas can be linked to and delinked from outcomes within a conditional outcome framework in response to user input.

FIGS.18A and18B depict examples of narratives that can be generated using the conditional outcome framework ofFIG.16.

FIGS.19A and19B depict an example embodiment for a conditional outcome framework that can be used by the narrative analytics associated with a communication goal statement for “Analyze Entity Group byAttribute 1 andAttribute 2” and examples of narrative stories that can be generated thereby.

FIG.20A depicts an example embodiment for a conditional outcome framework that can be used by the narrative analytics associated with a communication goal statement for “Analyze Entity Group by a Change in Attribute (Over Time)” and an example of a narrative story that can be generated thereby.

FIGS.20B-D depict another example embodiment for a conditional outcome framework that can be used by the narrative analytics associated with a communication goal statement for “Analyze Entity Group by a Change in Attribute (Over Time)” and examples of a narrative stories that can be generated thereby.

FIGS.21A and21B depict an example embodiment for a conditional outcome framework that can be used by the narrative analytics associated with a communication goal statement for “Analyze Entity Group by Characterization” and examples of narrative stories that can be generated thereby.

FIG.22A depicts an example structure for a smart attribute.

FIGS.22B and22C depict examples that show how smart attributes can have attribute models that are linked to other attributes and field within source data.

FIG.23 depicts an example process flow that shows how the smart attributes can be leveraged to support driver analysis.

FIGS.24A-E depict an example embodiment for a conditional outcome framework that can be used by the narrative analytics associated with a communication goal statement for “Explain a Value of an Attribute” as used to generate various narratives.

FIG.25A shows an example list of facts that can be learned about a data set by a narrative generation system using smart attributes in connection with a communication goal statement for “Explain a Change in Value of an Attribute”

FIGS.25B-D depict an example embodiment for a conditional outcome framework that can be used by the narrative analytics associated with a communication goal statement for “Explain a Change in Value of an Attribute” as used to generate various narratives.

FIGS.26A and26B depict an example embodiment for a recursive conditional outcome framework that can be recursively invoked by the narrative analytics associated with a communication goal statement for “Explain a Change in Value of an Attribute”.

FIGS.27-298 illustrate example user interfaces for using an example embodiment to support narrative generation through composable communication goal statements and ontologies.

DETAILED DESCRIPTION OF EXAMPLE EMBODIMENTS

The example embodiments described herein further extend and innovate on the pioneering work described in the above-referenced and incorporated patent application serial numbers U.S. Pat. Nos. 9,576,009, 9,697,197, 9,697,492, 9,720,890, and 9,977,773, where explicit representations of communication goals are used by AI technology to improve how NLG technology generates narratives from data. With example embodiments described herein, AI technology is able to process a communication goal statement in relation to a data set in order to automatically generate narrative text about that data set such that the narrative text satisfies a communication goal corresponding to the communication goal statement. Furthermore, innovative techniques are disclosed that allow users to compose such communication goal statements in a manner where the composed communication goal statements exhibit a structure that promotes re-usability and robust story generation.

FIG.1A depicts a process flow for an example embodiment. Atstep100, a processor selects and parameterizes a communication goal statement. The processor can perform this step in response to user input as discussed below with respect to example embodiments. The communication goal statement can be expressed as natural language text, preferably as an operator in combination with one or more parameters, as elaborated upon below.

Atstep102, a processor maps data within the data set to the parameters of the communication goal statement. The processor can also perform this step in response to user input as discussed below with respect to example embodiments.

Atstep104, a processor performs NLG on the parameterized communication goal statement and the mapped data. The end result ofstep104 is the generation of narrative text based on the data set, where the content and structure of the narrative text satisfies a communication goal corresponding to the parameterized communication goal statement.

WhileFIG.1A describes a process flow that operates on a communication goal statement, it should be understood that multiple communication goal statements can be composed and arranged to create sections of an outline for a story that is meant to satisfy multiple communication goals.FIG.1B depicts an example process flow for narrative generation based on multiple communication goal statements. Atstep110, multiple communication goal statements are selected and parameterized to create sections of a story outline. Atstep112, a processor maps data within a data set to these communication goal statements as with step102 (but for multiple communication goal statements). Step114 is likewise performed in a manner similar to that ofstep104 but on the multiple communication goal statements and the mapped data associated therewith. The end result ofstep114 is a narrative story about the data set that conveys information about the data set in a manner that satisfies the story outline and associated communication goals.

It should be understood that

steps

102 and104, as well as

steps

112 and114, need not be performed in lockstep order with each other where step102 (or112) maps all of the data before the system progresses to step104 (or step114). These steps can be performed in a more iterative manner if desired, where a portion of the data is mapped at step102 (or step112), followed by execution of step104 (or step114) on that mapped data, whereupon the system returns to step102/112 to map more data for subsequent execution ofstep104/114, and so on.

Furthermore, it should be understood that a system that executes the process flows ofFIGS.1A and/or1B may involve multiple levels of parameterization. For example, not only is there parameterization in the communication goals to build story outlines, but there can also be parameterization of the resulting story outline with the actual data used to generate a story, as explained hereinafter with respect to example embodiments.

FIG.2 depicts an example process flow that shows how a story outline can be composed as part ofstep110. The process flow ofFIG.2 can be performed by a processor in response to user input through a user interface. To begin the process, a name is provided for a section (step120). Within this section,step100 is performed to define a communication goal statement for the subject section. Atstep122, the section is updated to include this communication goal statement. The process flow then determines whether another communication goal statement is to be added to the subject section (step124). If so, the process flow returns to

steps

100 and122. If not, the process flow proceeds to step126. Atstep126, the process flow determines whether another section is to be added to the story outline. If so, the process flow returns to step120. Otherwise, the process flow concludes and the story outline is completed. Thus, through execution of the process flow ofFIG.2, a processor can generate a story outline comprising a plurality of different sections, where each section comprises one or more communication goal statements. This story outline in turn defines the organization and structure of a narrative story generated from a data set and determines the processes required to generate such a story.

The previous example shows how an outline can be built by adding sections and parameterizing goals completely from scratch. The user is generally not expected to start from scratch, however. A narrative generation system instance will generally include a library of prebuilt components that users can utilize to more easily and quickly build out their outline. The narrative generation system's library provides access to previously parameterized and composed goals, subsections, sections, and even fully defined outlines. These re-usable components come fully parameterized, but can be updated or adjusted for the specific project. These changes are initially isolated from the shared library of components.

Components from the system's shared library can be used in two ways. First, a new project can be created from an entire project blueprint providing all aspects of a project already defined. This includes sample data, data views, the ontology, outline, sections, parameterized goals, and data mappings. Second, a user can pull in predefined components from the system's library ad hoc while building a new project. For example, when adding a section to an outline, the user can either start from scratch with an empty section or use a predefined section that includes a set of fully parameterized goals.

The system's library of components can be expanded by users of the platform through a mechanism that enables users to share components they have built. Once a component (outline, ontology, section, etc.) is shared, other users can then use them from the system's library in their own projects.

Composable Communication Goal Statements:

As shown byFIG.4A, basecommunication goal statement402 is “Present the Value” where the word “Present” serves as theoperator410 and “Value” serves as theparameter placeholder412. Theoperator410 can be associated with a set of narrative analytics (discussed below) that define how the AI will analyze a data set to determine the content that is to be addressed by a narrative story that satisfies the “Present the Value” communication goal. Theparameter placeholder412 is a field through which a user specifies an attribute of an entity type to thereby define a parameter to be used as part of the communication goal statement and subsequent story generation process. As explained below, the process of parameterizing the parameter placeholders in the base communication goal statements can build and/or leverage an ontology that represents a knowledge base for the domain of the story generation process.

As shown byFIG.4B, another example of a base communication goal statement is basecommunication goal statement404, which is expressed as “Present the Characterization”, but could also be expressed as “Characterize the Entity”. In these examples, “Present” (or “Characterize”) can serve asoperator414 can “Characterization” (or Entity”) can serve as aparameter placeholder416. This base communication goal statement can be used to formulate a communication goal statement geared toward analyzing a data set in order to express an editorial judgment about data within the data set.

As shown byFIG.4B, another example of a base communication goal statement is basecommunication goal statement406, which is expressed as “Compare the Value to the Other Value”, where “Compare” serves asoperator418, “Value” serves as aparameter placeholder420, and “Other Value” serves asparameter placeholder422. The “Compare”operator418 can be associated with a set of narrative analytics that are configured to compute various metrics indicative of a comparison between the values corresponding to specified attributes of specified entities to support the generation of a narrative that expresses how the two values compare with each other.

Another example of a base communication goal statement is “Callout the Entity”408 as shown byFIG.4A. In this example, “Callout” isoperator424 and “Entity” is theparameter placeholder426. The “Callout”operator424 can be associated with a set of narrative analytics that are configured to compute various metrics by which to identify one or more entities that meet a set of conditions to support the generation of a narrative that identifies such an entity or entities in the context of these conditions.

The system can store data representative of a set of available base communication goal statements in a memory for use as a library. A user can then select from among this set of base communication goal statements in any of a number of ways. For example, the set of available base communication goal statements can be presented as a menu (e.g., a drop down menu) from which the user makes a selection. As another example, a user can be permitted to enter text in a text entry box. Software can detect the words being entered by the user and attempt to match those words with one of the base communication goal statements as would be done with auto-suggestion text editing programs. Thus, as a user begins typing the character string “Compa . . . ”, the software can match this text entry with the base communication goal statement of “Compare the Value to the Other Value” and select this base communication goal statement atstep300.

Returning toFIG.3A, the process flow at steps302306 operates to parameterize the base communication goal statement by specifying parameters to be used in place of the parameter placeholders in the base communication goal statement. One of the technical innovations disclosed by the inventors is the use of anontology320 to aid this part of composing the communication goal statement. Theontology320 is a data structure that identifies the types of entities that exist within the knowledge domain used by the narrative generation system to generate narrative stories in coordination with communication goal statements. The ontology also identifies additional characteristics relating to the entity types such as various attributes of the different entity types, relationships between entity types, and the like.

Step302 allows a user to use the existing ontology to support parameterization of a base communication goal statement. For example, if theontology320 includes an entity type of “Salesperson” that has an attribute of “Sales”, a user who is parameterizing basecommunication goal statement402 can cause the processor to access the existingontology320 at step304 to select “Sales of the Salesperson” from theontology320 atstep306 to thereby specify the parameter to be used in place ofparameter placeholder412 and thereby create a communication goal statement of “Present the Sales of the Salesperson”.

Also, if theexisting ontology320 does not include the parameters desired by a user,step306 can operate by a user providing user input that defines the parameter(s) to be used for parameterizing the communication goal statement. In this situation, the processor in turn builds/updates theontology320 to add the parameter(s) provided by the user. For example, if theontology320 did not already include “Sales” as an attribute of the entity type “Salesperson”, steps306-308 can operate to add a Sales attribute to the Salesperson entity type, thereby adapting theontology320 at the same time that the user is composing the communication goal statement. This is a powerful innovation in the art that provides significant improvement with respect to how artificial intelligence can learn and adapt to the knowledge base desired by the user for use by the narrative generation system.

Atstep310, the processor checks whether the communication goal statement has been completed. If so, the process flow ends, and the user has composed a complete communication goal statement. However, if other parameters still need to be specified, the process flow can return to step302. For example, to compose a communication goal statement from the basecommunication goal statement406 of “Compare the Value to the Other Value”, two passes through steps302-308 may be needed for the user to specify the parameters for use as the Value and the Other Value.

FIG.4B shows examples of parameterized communication goal statements that can be created as a result of theFIG.3A process flow. For example, the basecommunication goal statement402 ofFIG.4A can be parameterized as communication goal statement402 (“Present the Price of the Car”, where theparameter placeholder412 has been parameterized asparameter412b, namely “Price of the Car” in this instance, with “Price” being the specified attribute of a “Car” entity type). Similarly, the basecommunication goal statement402 ofFIG.4A could also be parameterized as “Present the Average Value of the Deals of the Salesperson”, where theparameter placeholder412 has been parameterized asparameter412b, namely “Average Value of the Deals of the Salesperson” in this instance).

FIG.4B also shows examples of how basecommunication goal statement404 can be parameterized (see relatively lengthy “Present the Characterization of the Highest Ranking Department in the City by Expenses in terms of the Difference Between its Budget and Expenses”statement404b1 where the specifiedparameter404b1 is the “Characterization of the Highest Ranking Department in the City by Expenses in terms of the Difference Between its Budget and Expenses”; see also its substantially equivalent in the form ofstatement404b2). Also shown byFIG.4B are examples of parameterization of basecommunication goal statement406. A first example is thecommunication goal statement406bof “Compare the Sales of the Salesperson to the Benchmark of the Salesperson” where the specified parameter for “Value”420 is “Sales of the Salesperson”420band the specified parameter for “Other Value”422 is “Benchmark of the Salesperson”422b. A second example is thecommunication goal statement406bof “Compare the Revenue of the Business to the Expenses of the Business” where the specified parameter for “Value”420 is “Revenue of the Business”420band the specified parameter for “Other Value”422 is “Expenses of the Business”422b.

Also shown byFIG.4B are examples of parameterization of basecommunication goal statement408. A first example is thecommunication goal statement408bof “Callout the Highest Ranked Salesperson by Sales” where the specified parameter for “Entity”426 is the “Highest Ranked Salesperson by Sales”426b. A second example is thecommunication goal statement408bof “Callout the Players on the Winning Team” where the specified parameter for “Entity”426 is “Players on the Winning Team”426b. A third example is thecommunication goal statement408bof “Callout the Franchises with More than $1000 in Daily Sales” where the specified parameter for “Entity”426 is “Franchises with More than $1000 in Daily Sales”426b.

As with the base communication goal statements, it should be understood that a practitioner may choose to employ more, fewer, or different parameterized communication goal statements in a narrative generation system. For example, a parameterized Review communication goal statement could be “Review the weekly cash balance of the company over the year”, and a parameterized Explain communication goal statement could be “Explain the profit of the store in the month”.

Ontology Data Structure:

FIG.3B depicts an example structure forontology320. Theontology320 may comprise one or more entity types322. Eachentity type322 is a data structure associated with an entity type and comprises data that describes the associated entity type. An example of anentity type322 would be a “salesperson” or a “city”. Eachentity type322 comprises metadata that describes the subject entity type such as a type324 (to identify whether the subject entity type is, e.g., a person, place or thing) and a name326 (e.g., “salesperson”, “city”, etc.). Eachentity type322 also comprises one or more attributes330. For example, an attribute330 of a “salesperson” might be the “sales” achieved by a salesperson. Additional attributes of a salesperson might be the salesperson's gender and sales territory.

Attributes330 can be represented by their own data structures within the ontology and can take the form of adirect attribute330aand a computedvalue attribute330b. Adirect attribute330ais an attribute of an entity type that can be found directly within a data set (e.g., for a data set that comprises a table of salespeople within a company where the salespeople are identified in rows and where the columns comprise data values for information such as the sales and sales territory for each salesperson, the attribute “sales” would be a direct attribute of the salesperson entity type because sales data values can be found directly within the data set). A computedvalue attribute330bis an attribute of an entity type that must be derived in some fashion from the data set. Continuing with the example above, a direct attribute for the salesperson entity type might be a percentage of the company's overall sales that were made by the salesperson. This information is not directly present in the data set but instead must be computed from data within the data set (e.g., by summing the sales for all salespeople in the table and computing the percentage of the overall sales made by an individual salesperson).

Both thedirect attributes330aand computed value attributes330bcan be associated with metadata such as a type340 (e.g., currency, date, decimal, integer, percentage, string, etc.), and aname342. However, computed value attributes330bcan also include metadata that specifies how the computed value attribute is computed (a computation specification348). For example, if a computedvalue attribute330bis an average value, the computation specification348 can be a specification of the formula and parameters needed to compute this average value.

Eachentity type322 may also comprise one ormore characterizations332. For example, acharacterization332 of a “salesperson” might be a characterization of how well the salesperson has performed in terms of sales (e.g., a good performer, an average performer, a poor performer). Characterizations can be represented by theirown data structures332 within the ontology. Acharacterization332 can include metadata such as a name360 (e.g., sales performance). Also, eachcharacterization332 can include a specification of the qualifications364 corresponding to the characterization. These qualifications364 can specify one or more of the following: (1) one or more attributes330 by which the characterization will be determined, (2) one or more operators366 by which the characterization will be determined, and (3) one or more value(s)368 by which the characterization will be determined. For example, a “good performer” characterization for a salesperson can be associated with a qualification that requires the sales for the salesperson to exceed a defined threshold. With such an example, the qualifications364 can take the form of a specified attribute330 of “sales”, an operator366 of “greater than”, and a value368 that equals the defined threshold (e.g., $100,000).

Eachentity type322 may also comprise one ormore relationships334.Relationships334 are a way of identifying that a relationship exists between different entity types and defining how those different entity types relate to each other. Relationships can be represented by theirown data structures334 within the ontology. Arelationship334 can include metadata such as therelated entity type350 with respect to thesubject entity type322. For example, a “salesperson” entity type can have a relationship with a “company” entity type to reflect that the salesperson entity type belongs to a company entity type. The ontological objects (e.g., entity types322,direct attributes330a, computed value attributes330b,characterizations332, and relationships334) may also comprise data that represents one or more expressions that can be used to control how the corresponding ontological objects are described in narrative text produced by the narrative generation system.

For example, theentity type322 can be tied to one or more expressions328. When the narrative generation process determines that the subject entity type needs to be described in narrative text, the system can access the expression(s)328 associated with the subject entity type to determine how that entity type will be expressed in the narrative text. The expression(s)328 can be a generic expression for the entity type322 (e.g., the name326 for the entity type, such as the name “salesperson” for a salesperson entity type), but it should be understood that the expression(s)32 may also or alternatively include alternate generic names (e.g., “sales associate”) and specific expressions. By way of example, a specific expression for the salesperson entity type might be the name of a salesperson. Thus, a narrative text that describes how well a specific salesperson performed can identify the salesperson by his or her name rather than the more general “salesperson”. To accomplish this, the expression328 for the salesperson can be specified indirectly via a reference to a data field in a data set (e.g., if the data set comprises a table that lists sales data for various sales people, the expression328 can identify a column in the table that identifies each salesperson's name). The expression(s)328 can also define how the subject entity type will be expressed when referring to the subject entity type as a singular noun, as a plural noun, and as a pronoun.

The expression(s)346 for thedirect attributes330aand computed value attributes330bcan take a similar form as and operate in a manner similar to the expression(s) for the entity types322; likewise for the expression(s)362 tied to characterizations332 (although it is expected that the expressions362 will often include adjectives and/or adverbs in order to better express thecharacterization332 corresponding to the subject entity type322). The expression(s)352 forrelationships334 can describe the nature of the relationship between the related entity types so that this relationship can be accurately expressed in narrative text if necessary. Theexpressions352 can typically take forms such as “within” (e.g., a “city” entity type within a “state” entity type, “belongs to” (e.g., a “house” entity type that belongs to a “person” entity type, “is employed by” (a “salesperson” entity type who is employed by a “company” entity type), etc.

Another ontological object can be atimeframe344. In the example ofFIG.3B,timeframes344 can be tied todirect attributes330aand/or computed value attributes330b. Adirect attribute330aand/or a computedvalue attribute330bcan either be time-independent or time-dependent. Atimeframe344 can define the time-dependent nature of a time-dependent attribute. An example of a time-dependent attribute would be sales by a salesperson with respect to a data set that identifies each salesperson's sales during each month of the year. Thetimeframe344 may comprise a timeframe type356 (e.g., year, month, quarter, hour, etc.) and one or more expressions(s)358 that control how the subject timeframe would be described in resultant narrative text. Thus, via thetimeframe344, a user can specify a timeframe parameter in a communication goal statement that can be used, in combination with theontology320, to define a specific subset of data within a data set for consideration. While the example ofFIG.3B showstimeframes344 being tied todirect attributes330aand computed value attributes330b, it should be understood that a practitioner might choose to maketimeframes344 only attachable to directattributes330a. Also, a practitioner might choose to maketimeframes344 also applicable to other ontological objects, such ascharacterizations332, entity types322, and/or evenrelationships334. As indicated in connection withFIG.3A, users can create and update theontology320 while composing communication goal statements. An example embodiment for such an ability to simultaneously compose communication goal statements and build/update an ontology is shown byFIG.3C. At step370, the system receives a text string entry from a user (e.g., through a text entry box in a user interface (UI)). As indicated, this text entry can be a natural language text entry to facilitate ease of use by users. Alternative user interface models such as drag and drop graphical user interfaces or structured fill in the blank templates could also be used for this purpose.

Atstep372, the processor attempts to match the received text string to a base communication goal statement that is a member of a base communication goal statement library504 (seeFIG.4A). This matching process can be a character-based matching process where the processor seeks to find a match on an ongoing basis as the user types the text string. Thus, as a user types the string “Comp”, the processor may be able to match the text entry to the “Compare the Value to the Other Value” base communication goal statement. Based on this matching, the system can auto-fill or auto-suggest a base communication goal statement that matches up with the received text entry (step374). At this point, the system can use the base communication goal statement as a framework for guiding the user to complete the parameterization of the communication goal statement.

Atstep376, the system continues to receive text string entry from the user. Atstep378, the processor attempts to match the text string entry to an object inontology320. Is there is a match (or multiple matches), the system can present a list of matching ontological objects for user selection (step380). In this fashion, the system can guide the user to define parameters for the communication goal statement in terms of objects known withinontology320. However, if the text string does not match any ontological objects, the system can provide the user with an ability to create a new object for inclusion in the ontology (steps382-384). At step382, the system provides the user with one or more UIs through which the user creates object(s) for inclusion in ontology320 (e.g., defining an entity type, attribute, characterization, relationship, and/or timeframe). At step384, the system receives the user input through the UI(s) that define the ontological objects. The ontology can thus be updated atstep308 in view of the text string entered by a user that defines a parameter for the communication goal statement.

Ifstep310 results in a determination that the communication goal statement has not been completed, the process flow returns to step376 as the user continues entering text. Otherwise, the process flow concludes afterstep310 if the communication goal statement has been fully parameterized (seeFIG.4B for examples of parameterized communication goal statements).

Through the use of composable communication goal statements andontology320, example embodiments are capable of generating a robust array of narrative stories about data sets that satisfy flexibly-defined communication goals without requiring a user to directly author any program code. That is, a user need not have any knowledge of programming languages and does not need to write any executable code (such as source code) in order to control how the narrative generation platform automatically generates narrative stories about data sets. To the extent that any program code is manipulated as a result of the user's actions, such manipulation is done indirectly as a result of the user's higher level compositions and selections through a front end presentation layer that are distinct from authoring or directly editing program code. Communication goal statements can be composed via an interface that presents them in natural language as disclosed herein, and ontologies can similarly be created using intuitive user interfaces that do not require direct code writing.FIG.3D illustrates this aspect of the innovative design. In an example embodiment, communication goal statements390 (e.g.,3901 and3902) are composed by a user using an interface that presents the base goal elements as natural language text where one or more words represent the goal operators and one or more words serve to represent the parameters as discussed above. These parameters, in turn, map intoontology320 and thus provide the constraints necessary for the narrative generation platform to appropriately determine how to analyze a data set and generate the desired narrative text about the data set (described in greater detail below). Hidden from the user are code-level details. For example, a computed value attribute (such as330b_n) is associated with parameterizedcomputational logic394 that will be executed to compute its corresponding computed value attribute. Thus, if the computedvalue attribute330b_nis an average value of a set of data values, thecomputational logic394 can be configured to (1) receive a specification of the data values as input parameters, (2) apply these data values to a programmed formula that computes an average value, and (3) return the computed average value as the average value attribute for use by the narrative generation platform. As another example,computational logic392 and396 can be configured to test qualifications for corresponding

characterizations

332₁and332₂respectively. The data needed to test the defined qualifications can be passed into the computational logic as input parameters, and the computational logic can perform the defined qualification tests and return an identification of the determined characterization for use by the narrative generation platform. Similar computational logic structures can leverage parameterization and theontology320 to perform other computations that are needed by the narrative generation platform. The inventors also disclose that theontology320 can be re-used and shared to generate narrative stories for a wide array of users. For example, anontology320 can be built that supports generation of narrative stories about the performance of retail businesses. This ontology can be re-used and shared with multiple users (e.g., users who may have a need to generate performance reports for different retail businesses). Accordingly, asontologies320 are created for different domains, the inventors envision that technical value exists in maintaining a library ofontologies320 that can be selectively used, re-used, and shared by multiple parties across several domains to support robust narrative story generation in accordance with user-defined communication goals.

Example Narrative Generation Architecture Using Composed Communication Goal Statements:

FIG.5 depicts a narrative generation platform in accordance with an example embodiment. An example embodiment of the narrative generation platform can include two artificial intelligence (AI) components. Afirst AI component502 can be configured to determine the content that should be expressed in a narrative story based on a communication goal statement (which can be referred to as “what to say” AI502). Asecond AI component504 can be configured to perform natural language generation (NLG) on the output of thefirst AI component502 to produce the narrative story that satisfies the communication goal statement (where theAI component504 can be referred to as “how to say it” AI504).

The platform can also include a frontend presentation layer570 through whichuser inputs572 are received to define the composedcommunication goal statement390. Thispresentation layer570 can be configured to allow user composition of thecommunication goal statement390 using natural language inputs. As mentioned herein, it can also employ structured menus and/or drag/drop features for selecting elements of a communication goal statement. Examples of various user interfaces that can be used by thepresentation layer570 are shown in Appendix A. As can be seen from these sample UIs, thepresentation layer570 can also leverage theontology320 andsource data540 to facilitate its user interactions.

The “what to say”AI502 can be comprised of computer-executable code resident on a non-transitory computer-readable storage medium such as computer memory. The computer memory may be distributed across multiple memory devices. One or more processors execute the computer code in cooperation with the computer memory.AI502 operates on a composedcommunication goal statement390 andontology320 to generate acomputed story outline528.

AI

502 includes a communicationgoal statement interpreter506, which is configured to process and interpret thecommunication goal statement390 to select a set of narrative analytics that are to be used to analyze a data set about which the narrative story will be generated. The computer memory may include alibrary508 of narrative analytics510 (e.g.,510₁,510₂,510₃, . . . ). Thenarrative analytics510 may take the form of parameterized computer code that performs analytical operations on the data set in order to facilitate a determination as to what content should be included in the narrative story so that the communication goal(s) corresponding to thecommunication goal statement390 are satisfied. Examples ofnarrative analytics510 can be the

computational logic

392,394, and396 shown inFIG.3D.

AI

502 can maintain a mapping that associates the various operators that may be present in communication goal statements (e.g., “Present”, “Compare”, etc.) to a sequence or set of narrative analytics that are to be performed on data in order to support the data analysis needed by the platform to generate narrative stories that satisfy thecommunication goal statement390. Thus, the “Compare” operator can be associated with a set of narrative analytics that do simple difference (a−b), absolute difference (abs(a−b)), or percent difference ((b−a)/b). In an example embodiment, the mapping can also be based on the parameters that are included in thecommunication goal statement390. The mapping can take the form of a data structure (such as a table) that associates operators (and possibly also parameters) with sets ofnarrative analytics510 fromlibrary508.Interpreter506 can then read and interpret thecommunication goal statement390 to identify the operator included in the communication goal statement, access the mapping data structure to map the identified operator to its corresponding set ofnarrative analytics510, and select the mapped narrative analytics. These selectednarrative analytics512 in turn drive downstream operations inAI502.

AI

502 can also includecomputer code516 that is configured to determine the data requirements that are needed by system to generate a narrative story in view of the selectednarrative analytics512 and the parameters that are included in thecommunication goal statement390. Thiscode516 can walk through the selectednarrative analytics512, thecommunication goal statement390, andontology320 to identify any parameters and data values that are needed during execution of the selectednarrative analytics512. For example, thecommunication goal statement390 may include parameters that recite a characterization of an entity.Computer code390 can identify this characterization in the communication goal statement and access theontology320 to identify the data needed to evaluate the characterization of the subject entity such as the attribute(s)330 and value(s)368 needed for thesubject characterization332 inontology320. Theontology320 can then be further parsed to determine the data requirements for the subject attribute(s) needed by thesubject characterization332, and so on until all data requirements for thecommunication goal statement390 and selectednarrative analytics512 are determined. This ultimately yields a set ofdata requirements518 that define the data needed byAI502 in order to support the data analysis used to determine the content to be expressed in the narrative story. In situations where the input toAI502 comprises multiplecommunication goal statements390 in a story outline,code516 can be configured to walk through the outline to assemble a list of the data requirements for all of the communication goal statements in the outline.

Once thedata requirements518 have been determined, theAI502 can executecomputer code522 that maps thosedata requirements522 to sourcedata540. (This can be done either in a “batch” model wherein all the data requirements are determined first, and the code to map those to source data is executed; or it can be done individually for each data requirement either as needed or as the other information necessary to make the determination becomes available.) Thesource data540 serves as the data set from which the narrative story will be generated.Source data540 can take the form of data in a database, data in spreadsheet files, or other structured data accessible toAI502.Computer code522 can use a data structure520 (such as a table) that associates parameters from the data requirements to parameters in the source data to perform this mapping. For example, consider a scenario where the communication goal statement is “Present the Sales of the Salesperson”. Thedata requirements518 for this communication goal statement may include a parameter that corresponds to the “sales” attribute of a salesperson. Thesource data540 may include a data table where a column labeled as “Amount Sold ($)” identifies the sales amount for each salesperson in a company. The parametermapping data structure520 can associate the “Sales” parameter from thedata requirements518 to the “Amount Sold ($)” column in thesource data540 so thatAI502 accesses the proper data. This parametermapping data structure520 can be defined by an author when setting up the system, as discussed hereinafter. The output ofcomputer code522 can be a set of mappedsource data524 for use by the selectednarrative analytics512.

Computer code

522 can also map data requirements to source data using story variable(s)542. For example, thecommunication goal statement390 might be “Compare the Sales of Salesperson “John Smith” to the Benchmark of the Salesperson”. The mappedsource data524 can identify where in the source data the sales and benchmark for salespeople can be found. If thesource data540 includes sales data for multiple salespeople (e.g., rows in a data table correspond to different sales people while columns in the data table correspond to sales amounts and benchmarks for salespeople), the selection of a particular salesperson can be left as a story variable542 such that the parametermapping data structure520 does not identify which specific row to use as the salesperson and instead identifies the salesperson data requirement as a story variable. When a user composes the communication goal statement such that “John Smith” is expressed in the statement where the salesperson parameter is located, thecomputer code522 can use “John Smith” in thecommunication goal statement390 as the story variable542 that governs the selection of which row ofsource data540 should be used. Similarly, the benchmark parameter might be expressed as a story variable542. For example, thesource data540 may not include a benchmark field, but the composed communication goal statement might express a number to be used as the benchmark. In such a situation, this number could be a story variable542 used by the system.

FIGS.46 and225-237, described below with reference to Appendix A, depict example GUIs through which a user can map the determined data requirements for a story outline to source data and story variables. These GUIs can be configured to list each data requirement in association with a user input mechanism through which the user can identify where in the source data a data requirement can be found (and whether a data requirement is to be parameterized as a story variable). As explained in Appendix A with respect to an example embodiment, the source data can take a number of forms, such as tabular data and document-based data, and the data requirements GUIs can be configured to accommodate both types.FIGS.238-255 and their supporting description in Appendix A further describe how source data can be managed in an example embodiment of the system.

AI

502 can also includecomputer code526 that executes the selectednarrative analytics512 using the mapped source data524 (and potentially any story variable(s)542) to produce acomputed story outline528. Thenarrative analytics512 specifies at least four components: the input parameters (e.g., an entity to be ranked, a metric it is to be ranked by, and a group in which it is to be ranked); the code that will execute the narrative analytics (i.e., that will determine the rank of the entity in the group according to the metric); the output parameters (i.e., the rank of the entity); and a statement form containing the appropriate input and output parameters that will form the appropriate statement for inclusion in the computed outline (in this case, rank (entity, metric, group, rankvalue)). Thecommunication goal statement390 can be associated with a general story outline that provides the basic structure for the narrative story to be generated. However, this general story outline will not be populated with any specific data—only general identifications of parameters. Through execution of the selected narrative analytics bycomputer code526, this general story outline can be populated with specific data in the form of the computedstory outline528. For example, continuing with an example from above where thecommunication goal statement390 is “Compare the Sales of Salesperson “John Smith” to the Benchmark of the Salesperson”, the selected narrative analytics may include parameterized code that computes data indicative of the difference between John Smith's sales amount and the benchmark in both absolute terms (e.g., performing a subtraction between the sales amount and the benchmark) and as a percentage (e.g., dividing the subtracted difference by the benchmark and multiplying by 100).Code526 executes these narrative analytics to compute data values for use in the story outline. These data values are then embedded as values for the parameters in the appropriate statement forms associated with the narrative analytics to produce statements for inclusion in the computed outline. The statement will be included in the computed outline as a new element of the section containing the communication goal for which it was computed, under the node representing that communication goal.Code526 will progress through the execution of the selected narrative analytics using mappedsource data524 and story variable(s)542 (if any) until all elements of the story outline have been populated with statements. Also associated with communication goals are characterizations that serve to express a characterization or editorialization of the facts reported in the statements in a manner that may have more narrative impact that just a reporting of the facts themselves. For example, rather than saying that an entity is ranked first, we might say that it is the best. (In another approach, these might be associated with sections rather than communication goals.) The characterizations associated with each communication goal are assessed with respect to the statements generated by the narrative analytics in response to that goal. This results in generating additional propositions or statements corresponding to those characterizations for inclusion in the computed outline in those cases when the conditions for those characterizations are met by the input statements. The characterizations are also linked to the statements which they characterize. The result of this process is a computedstory outline528 that serves to identify the content that is to be expressed in the narrative story.

The “how to say it”AI504 can be comprised of computer-executable code resident on a non-transitory computer-readable storage medium such as computer memory. The computer memory may be distributed across multiple memory devices. One or more processors execute the computer code in cooperation with the computer memory.AI504 employsNLG logic530 to generate anarrative story550 from the computedstory outline528 andontology320. As indicated above, objects inontology320 can be associated with expressions (e.g.,

expressions

328,346,352,358, and362) that can be used byNLG530 to facilitate decision-making regarding the appropriate manner of expressing the content in the computedstory outline528. Thus,NLG530 can access theontology320 when forming sentences from the computedstory outline528 for use in thenarrative story550. Example embodiments ofNLG530 are discussed below with reference toFIGS.6D and8A-H.

Once again, by leveraging predefined sets of parameterizednarrative analytics510,AI502 is able to shield the low level program coding from users so that a user need only focus on composingcommunication goal statements390 in a natural language in order to determine the content that is to be included in a narrative story. Further still,AI504 also operates transparently to users so that anarrative story550 can be generated from a composedcommunication goal statement390 without requiring the user to directly write or edit program code.

Example Platform Operation:

FIG.6A depicts a high level view of an example embodiment of a platform in accordance with the design ofFIG.5. The narrative generation can proceed through three basic stages: setup (an example of which is shown byFIG.6B), analysis (an example of which is shown byFIG.6C), and NLG (an example of which is shown byFIG.6D). The operation of theFIG.6A embodiment can be described in the context of a simple example where the project has an outline with a single section and a single communication goal statement in that section. The communication goal statement can be “Present the sales of the salesperson”. In this example, “salesperson” is an entity type in the ontology and it has an attribute of “sales”. Also, the project has a single data view backed by a static file that contains the names and sales data for the salespeople.

During setup, the system loads the story configuration from a configuration store. The configuration store is a database where configurations are maintained in persistent form, managed, and versioned. The configuration for a story includes items representing the outline (sections, communication goals, and their components), the ontology (entity types, relationships, timeframe types), and data connectors (sources, data mappings). Once the configuration for the story is loaded into memory, the story outline is constructed, as shown inFIG.6B. The story outline is a hierarchical organization of sections and communication goals (seeFIG.2). At this time, along with constructing the story outline, the connectors to the data sources are initialized. These will be used as needed during the story generation process to access the necessary data required by the narrative analytics specified in the outline. Specifically how this is accomplished can depend on whether the data is passed in via an API, in a static file managed by the system, or via a connection to a database.

Once the setup phase is complete, the outline can be used to govern the generation of a story. This is accomplished by traversing the outline and executing the analytics associated with each communication goal statement; and the results serve to parameterize the associated statement forms of the communication goal in order to generate the facts of the story (seeFIG.6C). These facts are then organized into the computed outline as described above.

When this generation process is invoked by a client, e.g., via an API request, the client provides certain values for parameters of the configuration. In this instance, for example, the story is about the sales of some particular salesperson. So the client may need to provide a unique identifier for the specific salesperson which can be interpreted via the mapping provided between parameters of the story outline and the data source to be used.

As shown byFIG.7, the narrative analytics can access source/customer data through Entity and Entity Collection objects. These objects provide an interface based on theproject ontology320 and hide the source of the data from other components. These objects can use Entity Types, mappings from relevant Attributes of the Entity Types to data sources and specifiers (e.g., columns or column names in tables or databases, or keypaths in documents, etc.) as previously specified by the user during configuration, and data interfaces to access the actual relevant data. Some computations that comprise aspects of the narrative analytics, such as sorting and certain aggregations, can be handled by the data stores themselves (e.g., as database operations). The specific Entity objects provide methods to invoke these external operations, such as parameterizable database queries.

Continuing with the example, the single communication goal statement in this case, “Present the Sales of the Salesperson”, is made up of two base communication goal statements, composed together by embedding one inside the other. The top level statement is AttributeOfEntity (AttributeName, <Entity>), and its Entity parameter is satisfied by the embedded statement EntityById(Id). EntityById is resolved first. This is computed by retrieving the entity's ID as provided by the client when invoking the generation process, e.g., via an API request. EntityById creates an (internal) Entity object corresponding to the (external) ID and returns that Entity object as its result. This internal Entity object is a new Entity of the appropriate Entity Type as specified in the configuration and with appropriate attributes as determined by the entity data mapping, in this instance, since we are talking about a Salesperson, relevant attributes of the Salesperson in question such as his or her name, gender, sales, office—whatever in fact the configuration specifies be retrieved or computed. This result is in the form of the embedded communication goal statement, namely, EntityById(Id, <Entity>); it is then, in turn, passed into the top-level AttributeOfEntity statement along with the attribute name “sales”. The AttributeOfEntity analytic comprises code that takes the entity object and returns the corresponding value for that attribute of the entity as its result. The analytic looks up where to get the attribute data based on the entity data mappings provided during configuration, and retrieves the specific relevant attribute data from the client's data. The results for both of these are wrapped up in statement forms to produce statements as described above, and these statements are then added to the Computed Outline. In this specific case, as mentioned above, the statements are composed by one being embedded inside the other. The resulting compound statement added to the Computed Outline in this instance, fully parameterized, would look something as follows: AttributeOfEntity (‘Sales’, EntityByID(1234, Salesperson1234), 15000).

FIG.6D shows a high level view of NLG being performed on a computed outline in order to generate a narrative story.FIGS.8A-8H elaborate on this NLG process.

As shown byFIG.8A, the NLG process starts with the Computed Outline. Each phase of the NLG process walks through the Computed Outline and processes each computed statement form individually. Some stages look across multiple statements at once (such as Model Muting (seeFIG.8B) and Entity Referencing (seeFIG.8F), described below.

The first phase, Model Generation, converts the compound statements in the computed outline into NLGModel graphs, as shown byFIG.8A. Model graphs are similar to the compound statement structures, but are structured specifically for constructing sentences. For example, dependencies between nodes in the model graph will represent where dependent clauses should be placed on the sentence. An NLGModel provides a mechanism for generating sentences, phrases, and words needed to produce a story. There is model type for each concept that needs to be expressed from authoring mapping to each individual type of statement included in the computed outline. Examples include attributes, values, units, entities, relationships, rankings, filters, and comparisons. The models produced from the statements in the computed outline are organized into a graph based on how the ideas are related to each other. The shape of the graph provides a method for the NLG system to handle phrase muting, clause placement, anaphora, and connectives.

For example, the statement for ArributeOfEntity(‘Sales’, EntityByID(‘1234’, Salesperson1234), 15000) is converted into a model graph where the root is an EntityModel representing the Salesperson1234. The EntityModel has a dependent AttributeModel representing the Sales attribute since Sales is an attribute of that entity. The attribute Sales has a value of 15000 so a ValueModel representing 15000 is added as a dependent to the AttributeModel. Finally, the ValueModel has a UnitModel representing the type of value. In this case it is ‘dollars’. This model graph now provides the structure needed for the NLG system to construct a sentence for this statement. This was a simple example. The more complicated the statement, the more complicated the model graph will be. The system can also combine multiple statements into a single big model graph assuming they are related somehow, for example each of them are about the same entity. This then allows the system to then express multiple sets of ideas in a single sentence. If the model graph is too big, i.e. there are too many ideas to express in one sentence, it is split up into reasonably sized subgraphs that make up individual sentences.

After a model graph has been generated for each node, adjacent nodes are compared with each other to mute redundant facts. This can be referred to as Model Muting, as shown byFIG.8B. Model Muting reduces redundant information from being expressed across sentences. Since the working example has only a single goal, there is only one node involved, and there will be nothing to mute in this phase with respect to the example. Say though, the goal also had a timeframe associated with it so instead it was “Present the sales in the month of the Sales Person” and an adjacent goal was “Present the sales in the month of the top ranking Sales Person by sales”. Without muting these goals would express as, “In August of 1993, Joe had sales of $15000. In August of 1993, Bob, the best seller, had sales of $430000”. The timeframe “In August of 1993” is redundant between these two sentences and will be dropped in the second sentence resulting in language of “In August of 1993, Joe had sales of $15000. Bob, the best seller, had sales of $430000”.

Next, sentences are generated based on each model graph during Sentence Generation as shown byFIG.8C. The base of the sentence is generated first. It is the core subject/verb/object constituents of a sentence. Initially this will not have expressed all of the models in the graph (those will be added later as clauses). Not all models in the graph can generate base sentences, but multiple models can add to the set of possible sentences for a node. Sentences almost always come from preferences set by the user in theontology320 through things like attribute expressions, rank expressions, and/or relationship expressions. The sentences generated in this phase will be built upon, and later one of these sentences will be picked to be used in the narrative story.

Continuing with the working example, only the Attribute model can generate sentences for this model graph. It will generate them based on the attribute expressions configured by the user for “sales”. Let's suppose the user configured three options: “the salesperson had sales of $100”, “the salesperson sells $100”, and “the salesperson's sales are $100”. The Attribute model would generate three sentences, one for each of these options.

After the base sentences have been generated, the models not expressed in that base sentence must then be expressed as clauses on the sentence. This can be referred to as Clause Placement (seeFIG.8D). Depending on where the unexpressed models are in the model graph, they will be placed as phrases on the sentence attached to the noun representing the model in the graph they are dependents of. This is done for each sentence from the list of sentences produced by the sentence generation phase. Clauses are generated similarly to how sentences were generated in the previous phase based on the user's expression preferences within the ontology.

In our example, there are no extra models that need to be added as clauses. However, to illustrate how the clause placement phase would work, let's say that the goal was actually “Present the sales of the salesperson working in the city.” A sentence from the Relationship model would be “Sally sells in Chicago.” This leaves the Attribute/Value/Unit models still needing to be expressed. The Attribute model can produce clauses for these. Based on the attribute expression configuration, it would generate clauses of “who has sales of $1000” or “who has sold $1000”. These would be added as a relative clause to “Sally” giving a complete sentence of “Sally, who has sales of $1000, sells in Chicago” (as one of the sentences among the several available permutations).

The next phase is Sentence Selection (seeFIG.8E). At this point, complete sentences have been built, and the system needs to pick one for use in the narrative story. The Sentence Selection phase can take into consideration several factors when selecting sentences. For example, the selected sentence should (1) correctly convey the intent of the goal, (2) only express what is necessary, and (3) prefer patterns that generally sound better. With these criteria, the system will likely be still left with more than one valid sentence. At this point, the system can choose from the remaining sentences that provide the best variability of expression. In an example embodiment, with all factors being equal, the system can randomly select a sentence from among the qualifying sentences. In our example, based on the goal, all three sentences are equally valid, so the system will randomly choose one to include in the final story. At the conclusion of the Sentence Selection phase, a sentence will have been selected for each node in the outline.

At this point, the system seeks to improve fluidity by looking across the nodes in the outline. At this stage, referred to as Entity Referencing (seeFIG.8F), nodes in the same section that repeat entities will be replaced with pronouns. The pronoun used will depend on the type of entity being replaced. If the base entity type is a Person and gender is available, the system will use gendered pronouns (e.g., he/she), otherwise it will use a non-gendered pronoun (e.g., they).

In our example, since there is only a single goal there would be no pronoun replacement. If instead there were two adjacent goals in the same section (e.g., “Present the sales of the salesperson” and “Present the title of the salesperson”, a pronoun would be used for the second sentence, resulting in the language “Sally had sales of $10000. She had the title VP of Sales.”

At this point, the sentences have been finalized. The next thing to do is ensure that the sentences are grammatically correct. This phase can be referred to as Realization (seeFIG.8G). To perform realization, the system adds articles (definite—“the”—and indefinite—“a/an”), conjugates verbs, and adds punctuation. After realization, the system has the final language for use in the story.

Wrapping up the example, the realized sentence ends up being “Sally has sales of $10,000.” To get to that, the verb “has” was conjugated into present tense because the lack of a timeframe. The system can be configured to assume the timeframe is “now” in cases where no timeframe is specified in the communication goal statement. Also, the Realization phase inspects “sales” and determines that it was plural so an indefinite article was not needed. Finally, “Sally” is determined to be a name proper noun, which accordingly means that a definite article is not needed before “Sally”.

As a last step, which can be referred to as Document Generation (seeFIG.8H), the system puts the realized language into a formatted document. Examples of suitable formats can include HTML, Microsoft Word documents, and JSON. The system returns the formatted document to the client.

Ontology Building:

FIGS.9-13 depict example process flows that show how theontology320 can be built in response to user input, including user input during the process of composing communication goal statements. Appendix A included herewith is a user guide for an example narrative generation platform, where the user guide shows examples of GUI screens that demonstrate how theontology320 can be built in response to user input.

FIG.9 depicts an example process flow for parameterizing a value in a communication goal statement, which relates to the attribute objects in theontology320. It should be understood that the order of many of the steps in this process flow could be changed if desired by a practitioner. Atstep900, the processor determines in response to user input whether a new attribute should be created for the value to be parameterized or whether an existing attribute should be used. Appendix A depicts example GUI screens that can assist the user as part of this process (see, e.g.,FIG.164 et seq.). If an existing attribute is to be used, the system can access theontology320 to provide the user with a list of attributes available for selection by the user. The user can select an existing attribute from this list (step918). The system can also use string matching technology to match any characters entered by a user through the GUI to existing attributes in theontology320. Upon detecting a match or partial match, the system can then suggest an existing attribute for selection.

If a new attribute is to be created for the value, the process flow proceeds to step902. Atstep902, the process flow makes a decision as to whether the new attribute should be a direct attribute or a computed value attribute.

If a direct attribute is to be created, the process flow proceeds to step904. Atstep904, the processor defines a label for the attribute in response to user input. This label can serve as the name for the attribute (e.g., “sales”—seeFIG.59). Next, atstep906, the processor defines a base type for the attribute in response to use input. Examples of base types for attributes can include currency, date, decimal, integer, percentage, and string.FIG.60 shows an example GUI screen through which a user can set the type for the subject attribute.

Next, atstep908, the processor defines the expression(s) that are to be associated with the subject attribute. Through specification of one or more expressions for the subject attribute, the user can provide the system with a number of options for expressing the attribute in words when rendering a narrative story.

At step910, the processor selects the entity type for the subject attribute in response to user input.FIGS.61-66 show example GUI screens for step910. Step910 is further elaborated upon with reference toFIG.11 discussed below.

Ifstep902 results in a determination that a computed value attribute is to be created, the process flow proceeds to step912 fromstep902. Atstep912, the system presents the user with a choice of making the computed value attribute a function or an aggregation (step912). If a function is selected atstep912, the process flow proceeds to step914 where the processor sets the computed value attribute according to the user-selected function. If an aggregation is selected atstep912, the process flow proceeds to step916 where the processor sets the computed value attribute according to the user-selected aggregation. Examples of available aggregations can include count, max, mean, median, min, range, and total. These aggregations can be associated with corresponding parameterized computational logic (seeFIG.3D) that is programmed to compute the desired aggregation. An example of an available function is a contribution function, which evaluates how much a component contributes to an aggregate. However, it should be understood that other functions can be available through the system. For example, additional functions could include a multiplication, a division, a subtraction, standard deviation, a first derivative, and a second derivative.FIGS.171-172, described in greater detail below in Appendix A, illustrate some example GUI screens through which a user can define computed value attributes.

After the attribute has been defined via the process flow ofFIG.9, theontology320 can be updated by adding the details for attribute330 toontology320.

It should be understood that additional operations can be included in the attribute definition process flow if desired by a practitioner. For example, if a practitioner wishes to attach timeframe details to attributes, a timeframe definition process flow can be added to theFIG.9 process flow.

FIG.10 depicts an example process flow for parameterizing a characterization object in a communication goal statement and ontology.Characterizations332 are editorial judgments based on defined qualifications that determine the language used when certain conditions are met. Through acharacterization332, a user is able to associate descriptive language with an entity type based on the nature of one or more attributes of that entity type. Atstep1000, the processor selects the entity type to be characterized in response to user input.FIG.11 provides an example process flow that elaborates on how the entity type can be defined.

Atstep1002, the system determines whether the user wants to create a new characterization or select an existing characterization. This step can be performed in a manner similarly to step900 inFIG.9, but for characterizations rather than attributes. If an existing characterization is desired, the system can make a selection of an existing characterization in response to user input atstep1012. However, if a new characterization is desired, the process flow proceeds to step1004.

At step1004, the user selects the attribute(s) for use in the characterization. If the attribute needs to be defined, the process flow ofFIG.9 can be followed. For example, if thecharacterization332 is meant to characterize the performance of a salesperson in terms of sales by the salesperson, step1004 can result in the user selecting the attribute “sales” as the attribute by which the characterization will be determined.

Atstep1006, the user sets the qualification(s) by which to evaluate the characterization. For example, these qualifications can be a series of thresholds by which the values of the sales attribute are judged (e.g., the characterization changes based on whether the sales amount are above or below a threshold of $10,000). Multiple thresholds can be defined for a characterization, which would then yield more than two potential outcomes of a characterization (e.g., three or more tiers of characterization outcomes). Also, the qualifications need not be defined in terms of fixed thresholds. The thresholds can also be flexibly defined in terms of direct attributes and/or computed value attributes (for example, a salesperson can be characterized as a satisfactory salesperson if the sales attribute for the subject salesperson has a value that exceeds the value of the benchmark attribute for the subject salesperson; as another example, a salesperson can be characterized as an above-average salesperson if the sales attribute for the subject salesperson has a value that exceeds the average value of the sales attributes for the all of the salespeople within a company). As part of defining the qualifications,step1006 can also involve the user specifying the operators by which to judge qualifications. Examples of operators may include “greater than”, “less than”, “greater than or equal to”, “equals”, etc. Atstep1008, the user sets the expression(s) for the subject characterization. These expressions can then be used by the NLG process when articulating the subject characterization in a narrative story. For example, in a characterization relating to the performance of a salesperson in terms of sales, expressions such as “star performer”, “outperformed”, “high performer” etc. can be used in situations where the sales exceeded the highest threshold, while expressions such as “laggard”, “poor performer”, “struggled”, etc. can be used in situations where the sales were below the lowest threshold.

FIGS.77-80,146-161, and204-209 depict example GUIs through which a user can provide inputs for the process flow ofFIG.10. Upon the completion of theFIG.10 process flow, the system can update theontology320 to add the details for the definedcharacterization332. It should be understood that additional operations can be included in the characterization definition process flow if desired by a practitioner. For example, if a practitioner wishes to attach timeframe details to characterization, a timeframe definition process flow can be added to theFIG.10 process flow.

FIG.11 depicts an example process flow for parameterizing an entity type in a communication goal statement and ontology. Entity types are how the system knows what to talk about with respect to a communication goal statement. An entity type is a primary object in the ontology which has particular attributes (e.g., a department (entity type) has expenses (attribute). An entity is a specific instance of an entity type, with data-driven values for each attribute (e.g., John Smith is a specific instance of a salesperson entity type, and this entity has a specific data value for the sales attribute of a salesperson entity type).Ontology320 may include more than one entity type.

Atstep1100, the processor decides, in response to user input, whether to create a new entity type or select an existing entity type. This step can be performed while a user is composing a communication goal statement. Ifstep1100 results in a determination that an existing entity type is to be used, the process flow can proceed to step1150 where an existing entity type is selected.

Ifstep1100 results in a determination that a new entity type is to be created, the process flow proceeds to step1102. Atstep1102, the user provides a label for the entity type. This label can be used as the entity type's name (e.g., a “salesperson” entity type). Next, atstep1104, the user sets a base type for the subject entity type. Examples of available base types to choose from can include person, place, thing, and event. However, it should be understood that more, fewer, and/or different base types can be used. The specified base type can be used by the AI logic to inform decision-making about the types of pronouns that can be used to express the subject entity type, among other expressive qualities for the entity type.

Atstep1106, the user sets one or more expressions in relation to the subject entity type. These expressions provide the NLG process with a variety of options for expressing the entity type in a story.

TheFIG.11 process flow can also include options for attaching a number of additional features to entity types.

For example, a relationship can be added to the subject entity type at steps1108-1116. Atstep1110, the user identifies the entity type to which the subject entity type is to be related. If the relating entity type does not exist, the process flow ofFIG.11 can be recursively invoked to create the relating entity type. An example of a relating entity type might be a “company” entity type with respect to a subject entity type of “salesperson”. Steps1112-1116 operate to define the nature of the relationship between the subject entity type and the relating entity type. Atstep1112, the process flow determines whether the user wants to create a new relationship or select an existing relationship. If create new is selected atstep1112, the process flow proceeds to step1114 where the user provides an expression for the new relationship (e.g., the relating expression can be “employed by” to relate the subject entity type of “salesperson” to the relating entity type of “company” (thus, the “salesperson” is “employed by” the “company”). Multiple expressions may be provided atstep1114 to provide variability during story rendering. For example, the expressions “works for”, “is a member of”, “belongs to” might be used as alternative expressions for the relationship between the “salesperson” entity type and the “company” entity type. If select existing is selected atstep1112, the process flow proceeds to step1116 where a user can be presents with a list of existing relationship expressions known to the system or within the ontology. The user can then select one or more of these expressions to define the nature of the relationship between the subject entity type and the relating entity type.

Another example of a feature that can be added to an entity type is a rank. Steps1120-1124 describe how a rank can be attached to an entity type. The rank feature provides the AI with a mechanism for notionally identifying entities to be discussed in a narrative story even if the user does not know in advance which specific entities are to be discussed. For example, a user may want the system to generate a story about the3 top ranked salespeople in terms of sales, but does not know a priori who these salespeople are. The rank feature attached to the salesperson entity type allows for a user to easily compose a communication goal statement that can be used by the AI to generate an appropriate narrative story. Atstep1122, the user sets the attribute by which the subject entity type is to be ranked. For example, if salespeople are to be ranked by sales, the user can specify the sales attribute atstep1122. TheFIG.9 process flow can be followed to specify the subject attribute for ranking. Atstep1124, the user sets a rank slice for the rank feature. The rank slice defines a depth for the rank feature with respect to the subject entity type. If the rank slice is set to 1, only the top ranked entity would be applicable. If the rank slice is set to n, the n highest rank entities would be returned.

Another example of a feature that can be added to an entity type is a qualification. Steps1130-1134 describe how a qualification can be attached to an entity type. Similarly to the rank feature, the qualification feature provides the AI with a mechanism for notionally identifying entities to be discussed in a narrative story even if the user does not know in advance which specific entities are to be discussed. For example, a user may want the system to generate a story about the salespeople who have 10 years of more of experience or who have been characterized as star performers in terms of sales, but does not know a priori who these salespeople are. The qualification feature attached to the salesperson entity type allows for a user to easily compose a communication goal statement that can be used by the AI to generate an appropriate narrative story. Atstep1132, the user sets the attribute330 and/orcharacterization332 that will be used to filter/qualify the subject entity type. For example, if the user wants the story to focus on salespeople with at least 10 years of experience, the user can specify a “years worked” or “start date” attribute atstep1132. TheFIG.9 process flow can be followed to specify the subject attribute for qualification. If a user wants to specify a characterization atstep1132, theFIG.10 process flow can be followed in order to specify a characterization of qualification. Atstep1134, the user defines condition(s) for the qualification. For example, if a “years worked” attribute is set as the qualification and the user wants to qualify salespeople based on 10 years of experience, the user can define the condition on the attribute as 10 years.

FIGS.121-161 depict example GUIs through which a user can provide inputs for the process flow ofFIG.11. Upon the completion of theFIG.11 process flow, the system can update theontology320 to add the details for the definedentity type322. It should be understood that additional operations can be included in the entity type definition process flow if desired by a practitioner. For example, if a practitioner wishes to attach timeframe details to characterization, a timeframe definition process flow can be added to theFIG.11 process flow. As another example, theFIG.11 process flow can include branching options for adding an attribute to an entity type directly from theFIG.11 process flow if desired. Similarly, theFIG.11 process flow can also include branching options for adding a characterization to an entity type directly from theFIG.11 process flow if desired.

FIG.12 depicts an example process flow for parameterizing a timeframe in a communication goal statement and ontology. A timeframe is a unit of time used as a parameter to constrain the values included in the expression of a communication goal statement or narrative story.Ontology320 may include more than one timeframe.

Atstep1200, the processor decides, in response to user input, whether to create a new timeframe or select an existing timeframe. This step can be performed while a user is composing a communication goal statement. Ifstep1200 results in a determination that an existing timeframe is to be used, the process flow can proceed to step1212 where an existing timeframe is selected.

Ifstep1200 results in a determination that a new timeframe is to be created, the process flow proceeds to step1202. Atstep1202, the system determines whether the user wants to create a new timeframe type or select from among existing timeframe types. Examples of timeframe types include years, months, days, hours, etc.

If a new timeframe type is desired, the process flow proceeds to step1204 where the user defines the timeframe type andstep1206 where the user sets the expression(s) for the timeframe type. The expression(s) provide the NLG process with a variety of options for expressing the timeframe in a story.

If an existing timeframe type is desired, the process flow proceeds to step1208 where the user makes a selection from among existing timeframe types andstep1210 where the user defines a designation for the selected timeframe type. Through this designation, the user can define qualifications via a “when” statement or the like that defines time-based conditions (e.g., “the month of the year when the sales of the store were highest”).

FIGS.67-69,92-93,101,107,167-170,192, and201-203 depict example GUIs through which a user can provide inputs for the process flow ofFIG.12. Upon the completion of theFIG.12 process flow, the system can update theontology320 to add the details for the definedtimeframe344.

FIG.13 depicts an example process flow for parameterizing a timeframe interval for use with a timeframe. The timeframe interval defines how the system should consider intervals of time within a timeframe (e.g., days of the month, weeks of the month, months of the year, quarters of the year, hours of the day, etc.). Atstep1300, the processor decides, in response to user input, whether to create a new timeframe interval or select an existing timeframe interval. Ifstep1300 results in a determination that an existing timeframe interval is to be used, the process flow can proceed to step1306 where an existing timeframe interval is selected. Ifstep1300 results in a determination that a new timeframe interval is to be created, the process flow proceeds to step1302. Atstep1302, the user defines the timeframe interval, and atstep1204 the user sets one or more expression(s) for the timeframe interval. The expression(s) provide the NLG process with a variety of options for expressing the timeframe interval in a story. Upon the completion of theFIG.13 process flow, the system can update theontology320 to add the details for the defined timeframe interval.

As explained above, theontology320 defined via the process flows ofFIGS.9-13 can be leveraged by the AI in coordination with the composed communication goal statements to not only determine the content to be expressed in the narrative story but also to determine how that content should be expressed in the narrative story.

Subgoals within Communication Goal Statements:

The communication goal statements may be interpreted by the system to include a plurality of subgoals or related goals. Thus, in order for a narrative story to satisfy the communication goal associated with a communication goal statement, it may be desirable to the narrative story to first satisfy one or more subgoals related to the communication goal of the communication goal statement. An example of this is shown byFIGS.14A-D. As shown byFIG.14A, acommunication goal statement1400 may be associated with a parent or base communication goal. Theinterpreter506 may be configured to interpretcommunication goal statement1400 as being comprised of two or more

communication goal statements

1402 and1404, where these

communication goal statements

1402 and1404 are associated with subgoals relating to the parent/base goal. When theAI502 seeks to determine the content for inclusion in the story, theinterpreter506 will process the

communication goal statements

1402 and1404 when generating the computed outline.

FIG.14B shows an example of this. In this example, the base communication goal statement corresponding to the parent/base goal is “CompareValue 1 toValue 2” (see base communication goal statement406). This basecommunication goal statement406 can be comprised of a series of three base communication goal statements, each relating to subgoals of the parent/base goal. In this example, these three base communication goal statements are: (1) “Present Value 1”402₁, (2) “Present Value 2”402₂, and (3) “Characterize the Difference BetweenValue 1 andValue 2”404. Thus, for the narrative story to accomplish the overall parent/base goal of comparingValue 1 toValue 2, it will be helpful for the narrative story to first

present Values

1 and 2 and then provide a characterization of the difference between

Values

1 and 2.

During the composition process, a user may parameterize the basecommunication goal statement406 ofFIG.14B as shown byFIG.14C. As shown byFIG.14C, the parameterizedcommunication goal statement406bcan read “Compare the Sales of the Salesperson during the Timeframe to the Benchmark of the Salesperson”, whereValue 1 is the “Sales of the Salesperson during the Timeframe” andValue 2 is the “Benchmark of the Salesperson”. Theinterpreter506 can be configured to interpret parameterizedcommunication goal statement406bfor the purposes of story generation as the following three parameterized communication goal statements: (1) “Present the Sales of the Salesperson during the Timeframe”402₁b, (2) “Present the Benchmark of the Salesperson”402₂b, and (3) “Characterize the Difference Between the Sales of the Salesperson during the Timeframe and the Benchmark of the Salesperson”404b. The system can then interact withontology320 to generate a narrative story as shown byFIG.14D from these three parameterized communication goal statements. As can be seen byFIG.14D, the NLG process created the first sentence of the narrative story in a compound form to satisfy the subgoals associated with the first two parameterized

communication goal statements

402₁band402₂b. The final sentence of the narrative story satisfies the subgoal associated with the third parameterizedcommunication goal statement404b. Overall, the narrative story satisfies the parent/base goal associated with parameterizedcommunication goal statement406b.

During the process of composing communication goal statements for use in the narrative generation process, the system can provide GUI screens to a user that allows the user to expand a communication goal statement to show communication goal statements associated with subgoals. Furthermore, the GUI can be configured to respond to user input to selectively opt in and opt out of which subgoals are to be included in the narrative generation process for a section of the story outline. Thus, if a user wants the story to include a headline or a title that is drawn from the “Compare” communication goal statement, a user can use a GUI to expand the “Compare” communication goal statement into statements for its constituent subgoals. For the headline/title, a user can choose to selectively opt out of the first two “Present” statements but retain the “Characterize” statement so that the headline/title is focused on a desired main point. Then, in the body of the narrative story, the user can selectively retain all of the constituent subgoals for the “Compare” statement so that the body of the narrative story provides the context for the comparison.FIGS.75-76 and215 depict example GUIs through which a user can expand a communication goal statement to view its related subgoals and selectively choose which of the subgoals will be used during the narrative generation process.

Example Embodiments for a Conditional Outcome Framework to Determine Narrative Content:

In another example embodiment, the system can employ a conditional outcome framework to support narrative generation. For example,AI502 can employ a conditional outcome framework to determine content for inclusion in a narrative.FIG.15A illustrates a simplified example where a conditionaloutcome data structure1502 is linked with one or moreidea data structures1504, where eachidea data structure1504 represents an idea that is to be expressed in a narrative. Theconditional outcome structure1502 can comprise (1) a name corresponding to the conditional outcome, (2) one or more conditions that define when the conditional outcome is defined as true, and (3) one or more links to one or more content oridea structures1502/1504. Thus, the conditional outcome data structure provides a mechanism for analyzing data to intelligently determine what ideas should be expressed in a narrative about that data. This can serve as a powerful building block for constructing theAI502 in a manner so that the content expressed in a narrative will intelligently respond to the underlying data being considered.

FIG.15B depicts an example that shows how the conditional outcome framework can be used in combination with a communication goal statement to intelligently adapt narratives to their underlying data in a manner that satisfies a desired communication goal. InFIG.15B,narrative analytics510 employ aconditional outcome framework1500. As explained in connection withFIG.5, thenarrative analytics510 can be associated with acommunication goal statement390. Thus, as the system processes acommunication goal statement390, an appropriate set ofnarrative analytics510 tailored toward satisfying that communication goal statement can be selected. Theconditional outcome framework1500 can include one or moreoutcome data structures1502 linked with one or moreidea data structure1504 as discussed above in connection withFIG.15A. Furthermore, any of theoutcome data structures1502 and/oridea data structures1504 can be associated with supportinganalytics1506. The supporting analytics provide logic that can be used by the system to compute information used for navigating theconditional outcome framework1500 and identifying ideas during execution at526 (seeFIG.5).

It should be understood that theoutcome data structures1502 can be tied together in numerous arrangements to define branching logic for theconditional outcome framework1500. For example, there can be multiple layers of outcome data structures1502 (each with associated conditions) to provide branching operations at multiple levels. Such branching structures allow for theconditional outcome framework1500 to accommodate highly complex and intelligent decision-making as to what ideas should be expressed in a narrative in view of the nature of the data under consideration. Moreover, theoutcome data structures1502,idea data structures1504, and supportinganalytics1506 can be parameterized to allow their re-use in a wide variety of contexts.

It should also be understood that the sameidea data structure1504 might be linked to multiple differentoutcome data structures1502. Furthermore, a givenoutcome data structure1502 might be linked to multipleidea data structures1504. Examples of such arrangements are discussed below with reference toFIG.16 et seq.

Example Embodiments for “Analyze” Communication Goal Statements:

As mentioned above, an operator such as “Analyze” can be used to identify a communication goal statement corresponding to an analysis communication goal. An example of a base communication goal statement for an analysis communication goal that could be supported by the system is “Analyze Entity Group by Attribute”, where “Entity Group” serves as a parameter for a group of entities in theontology320 and “Attribute” serves as a parameter for an attribute of the specified entity group in theontology320. Such a base communication goal statement could be parameterized into a communication goal statement as “Analyze the Salespeople by Sales”, where the Entity Group is specified as “Salespeople” (which can be a group of entities in theontology320 that have the entity type of “Salesperson”), and where the Attribute is specified as “Sales” (which can be an attribute of a “Salesperson” in the ontology320). However, it should be understood that such a base communication goal statement could be parameterized in any of a number of different ways. Further still, it should be understood that different base communication goal statements could be used to satisfy other analysis-related communication goals, some examples of which are discussed below.

The system can link a base communication goal statement of “Analyze Entity Group by Attribute” withnarrative analytics510 that are linked to a story structure that aims to provide the reader with an understanding of the distribution of a particular value across a group of entities. Accomplishing this may involve expressing a variety of quantitative ideas (the number of entities in the group, the average value within a group, the median value within a group, the entities with the highest and lowest values, etc.) and more qualitative ideas (the values are distributed normally, the values are distributed exponentially, the values demonstrate a “long-tail” distribution, one entity in particular had a much higher value than the other entities, etc.). Accordingly, if desired by a practitioner, the system can directly map such a communication goal statement to parameterized narrative analytics and a parameterized story configuration that will express these concepts. However, the use of aconditional outcome framework1500 by the relevant narrative analytics can provide additional flexibility where the resulting narrative story structure will adapt as a function of not only the specified communication goal but also as a function of the underlying data.

FIG.16 discloses an example embodiment for a conditional outcome framework that can be used by thenarrative analytics510 associated with acommunication goal statement390 for “Analyze Entity Group by Attribute”. In this example, the conditional outcome framework can employ multiple levels or layers ofoutcomes1502. For example, a first layer ofoutcomes1502 can correspond to different conditional outcomes that characterize the size of the group specified in thecommunication goal statement390. The second layer ofoutcomes1502 can correspond to different conditional outcomes that characterize the distribution of group members within the group based on the attribute specified by thecommunication goal statement390. The first layerconditional outcomes1502 can include a “tiny group”outcome1502, a “decent sized group”outcome1502, and a “large group”outcome1502. Each of these differentconditional outcomes1502 can be tied to the conditions that are evaluated by the system to assess whether thatconditional outcome1502 fits the underlying data.

To drive the assessments regarding group size, the supportinganalytics1506 for the conditional outcome framework can include groupsize characterization analytics1600 for the variousgroup size outcomes1502. For example, the “tiny group”outcome1502 can be associated with parameterized logic that determines whether the number of members of the group specified by thecommunication goal statement390 is less than or equal to 1 (it should be understood that other thresholds could be used to define the boundary conditions for a “tiny group”). If so, the “tiny group”outcome1502 would evaluate as true. As another example, the “decent sized group”outcome1502 can be associated with parameterized logic that determines whether the number of members of the group specified by thecommunication goal statement390 is between 2 and 50 (it should be understood that other thresholds could be used to define the boundary conditions for a “decent sized group”). If so, the “decent sized group”outcome1502 would evaluate as true. As another example, the “large group”outcome1502 can be associated with parameterized logic that determines whether the number of members of the group specified by thecommunication goal statement390 exceeds 50 (it should be understood that other thresholds could be used to define the boundary conditions for a “large group”). If so, the “large group”outcome1502 would evaluate as true.

To drive the assessments regarding distribution within the group, the supportinganalytics1506 for the conditional outcome framework can include groupdistribution characterization analytics1602 for the variousgroup distribution outcomes1502. In this example, the system seeks to characterize (1) a “tiny group” as being an empty group (see the “empty” outcome1502) or a single member group (see the “just one” outcome1502), (2) a “decent sized group” as being a typical distribution (see “typical distribution” outcome1502), a distribution that is clumpy at the top (see “clump at top” outcome1502), or a flat distribution (see the “flat distribution” outcome1502), and (3) a “large group” as being a normal distribution (see “normal distribution” outcome1502) or a long-tail distribution (see the “long-tail distribution” outcome1502). Each of thesesecond level outcomes1502 can be associated with parameterizedanalytics1602 that specify the computations used for characterizing the nature of the distributions within the group. For example, the “clump at top”outcome1502 can be associated with parameterizedanalytics1602 that are configured to sort entities by a particular value, group entities with similar values, and then determine if the highest ranked entities constitute a subgroup of similar values. Any thresholds or parameters used in determining such subgroups may be built into the system, specified directly by users, or tuned automatically by the system. As another example, the “long-tail distribution”outcome1502 can be associated with parameterizedanalytics1602 that are configured to perform distribution analysis and then determine if a significant proportion of the entities contributed values well below the mean contribution. Again, any thresholds or parameters used could be built into the system, specified directly by users, or tuned automatically by the system.

InFIG.16, each second layer/level outcome1502 is linked to one or moreidea data structures1504. Thus, the resolution of which ideas should be expressed in a given narrative that is generated to satisfy thecommunication goal statement390 will depend on whichoutcomes1502 were deemed true in view of the underlying data. The relationships between ideas for expression in a narrative to the nature of the underlying data in this example can be seen in the table below:


Outcome of Characterizing	Ideas to be Expressed in the
the Underlying Data	Narrative About the Underlying Data

Tiny Group (Empty Set)	Narrative should express the following idea:
	A count of the group members
Tiny Group	Narrative should express the following idea:
(Single Member)	A count of the group members
Decent Sized Group	Narrative should express the following ideas:
(Typical Distribution)	A count of the group members
	The total of the attribute values for the group
	The mean of the attribute values for the group
	The names and values of the top N group members as
	ranked according to the group members’ associated
	attribute values.
Decent Sized Group	Narrative should express the following ideas:
(Clump at Top Distribution)	A count of the group members
	The total of the attribute values for the group
	The mean of the attribute values for the group
	A discussion of the clumpy nature of the distribution of
	members within the group with respect to the attribute
	values.
	The names and values of the group members in the top
	clump (as ranked according to the group members’
	associated attribute values).
Decent Sized Group	Narrative should express the following ideas:
(Flat Distribution)	A count of the group members
	The total of the attribute values for the group
	The mean of the attribute values for the group
	A discussion of the flat nature of the distribution of
	members within the group with respect to the attribute
	values.
Large Group	Narrative should express the following ideas:
(Normal Distribution)	A count of the group members
	The mean of the attribute values for the group
	The names and values of the group members in the top n
	percentile (as ranked according to the group members’
	associated attribute values).
Large Group	Narrative should express the following ideas:
(Long Tail Distribution)	A count of the group members
	The total of the attribute values for the group
	A discussion of the long tail nature of the distribution of
	members within the group with respect to the attribute
	values.
	The names and values of the group members in the top n
	percentile (as ranked according to the group members’
	associated attribute values).

Anyideas1504 that are resolved based on the conditional outcome framework could then be inserted into the computedstory outline528 for use by AI504 (together with their associated specifications in view of the underlying data) when rendering the desired narrative.

To the extent that any of theideas1504 need additional computed values in order to be expressed (where such values were not previously computed byanalytics1600 or1602), the supportinganalytics1506 can further includeidea support analytics1604. For example, if the

analytics

1600 and1602 do not compute a mean value for the attribute values within the group, theidea support analytics1604 can include parameterized logic that computes such a mean value for the underlying data.

Thus, it can be seen that the example conditional outcome framework for a communication goal statement can define a hierarchical relationship among linked outcomes and ideas together with associated supporting analytics to drive a determination as to which ideas should be expressed in a narrative about a data set, where the selection of ideas for expression in the narrative can vary as a function of the nature of the data set.

In example embodiments, the conditional outcome framework can be designed so that it does not need any input or configuration from a user other than what is used to compose the communication goal statement390 (e.g., for the “Analyze Entity Group by Attribute” communication goal statement, the system would only need to know the specified entity group and the specified attribute). However, for other example embodiments, a practitioner might want to expose some of the parameters of the conditional outcome framework to users to allow further configurations or adjustments of the conditional outcome framework.

For example, a practitioner might want to implement the thresholds used within the conditional outcome framework as user-defined values. In the context ofFIG.16, this could involve exposing the thresholds used for characterizing the size of the group to users so that a user can adjust the group size boundaries in a desired manner (e.g., in some contexts, a large group might have a minimum of 100 members, while in other contexts a large group might have a minimum of 1000 members). Similarly, the values for “n” used by the conditional outcome framework ofFIG.16 (e.g., the top “n” group members or the “nth percentile”) could be exposed to users to allow adjustments of the value used for n.

As another example, a practitioner might want to provide users with a capability to enable/disable the links betweenoutcomes1502 andideas1504 in a conditional outcome framework. For example, a GUI could present a user with lists of all of theoutcomes1502 andideas1504 that can be tied to a communication goal statement within a conditional outcome framework. The user could then individually select whichideas1504 are to be linked to whichoutcomes1502. If desired by a practitioner, that conditional outcome framework can include default linkages that are presented in the GUI, and the user could make adjustments from there.FIG.17A shows an example where a user has adjusted the conditional outcome framework to add alinkage1700 between the “present the mean”idea1504 and the “long tail distribution”outcome1502.FIG.17B shows an example where a user has removed thelinkages1702 that had previously existed between the “present the mean”idea1504 and the “typical distribution”, “clump at top”, “flat distribution”, and “normal distribution”outcomes1502.

FIG.18A shows an example of anarrative1802 that can be generated using the conditional outcome framework ofFIG.16 as applied to acommunication goal statement1800 of “Analyze the salespeople by bookings” with respect to a data set that includes various salespeople and their associated bookings (e.g., the dollar values of their bookings). In this example, thenarrative1802 would be generated after an analysis of the data set arrived at a determination that theoutcomes1804 were true (the salespeople group was “decently sized” and has a “typical distribution” of salespeople with respect to their bookings). As can be seen inFIG.18A, thenarrative text1802 expresses the followingideas1806 that are tied to the outcomes1804: (1) a count of the number of salespeople in the group, (2) the total amount of bookings for the salespeople in the group, (3) the mean value of bookings for the salespeople in the group, and (4) the names of the top 3 salespeople in the group (by the booking values) and the booking values for each of the top 3.

FIG.18B shows an example of anarrative1812 that can be generated using the conditional outcome framework ofFIG.16 as applied to acommunication goal statement1810 of “Analyze the citizens by their salary” with respect to a data set that includes various citizens and their associated salaries. In this example, thenarrative1812 would be generated after an analysis of the data set arrived at a determination that theoutcomes1814 were true (the citizens group was a “large group” and has a “normal distribution” of citizens with respect to their salaries). As can be seen inFIG.18B, thenarrative text1812 expresses the followingideas1816 that are tied to the outcomes1814: (1) a count of the number of citizens in the group, (2) the mean value of the salaries for the citizens in the group, and (3) the average salary of the top decile of citizens (with respect to their salaries).

FIGS.18A and18B thus show how the same parameterized conditional outcome framework can be used to generate narrative stories across different content verticals (e.g., a story about salespeople and their bookings as inFIG.18A versus a story about citizens and their salaries as inFIG.18B), which demonstrates how the parameterized conditional outcome framework provides an effective technical solution to the technical problem of horizontal scalability in the NLG arts.

It should be understood that the system can also be designed to support other “analyze” communication goals. For example, another base communication goal statement that can be used by the system can be “Analyze Entity Group byAttribute 1 andAttribute 2”. Such a multi-attribute analysis goal can trigger the performance of tradeoff analysis as between the two attributes (and the expression of ideas that result from this analysis). For example, this goal may trigger analysis that results in quantitative ideas like the average values forAttribute 1, the average values forAttribute 2, the entity with the largest value forAttribute 1, etc. Assuming the system has an understanding of the relationship betweenAttribute 1 and Attribute 2 (for instance that “Attribute 1 is a driver ofAttribute 2” or that higher values forAttribute 1 represent a positive outcome while higher values forAttribute 2 represent a negative outcome), the goal may also result in more qualitative ideas that capture intuitive understandings like “Entities that score have high values forAttribute 1 also have high values forAttribute 2”, “The entity with the highest value forAttribute 1 actually has a really low value forAttribute 2”, or “There's no correlation between values forAttribute 1 andAttribute 2 in the group”. Accordingly, it should be understood that it may be desirable for the narratives produced in response to the “Analyze Entity Group byAttribute 1 andAttribute 2” communication goal statement to express different ideas than the narratives produced in response to the “Analyze Entity Group by Attribute” communication goal statement.

FIGS.19A and B disclose an example embodiment for a conditional outcome framework that can be used by thenarrative analytics510 associated with acommunication goal statement390 for “Analyze Entity Group byAttribute 1 andAttribute 2”. In these examples, the outcomes can be associated with groupsize characterization analytics1600 and groupdistribution characterization analytics1602 as discussed above in connection withFIG.16. However, these outcomes can be linked to different ideas (and associated idea support analytics1604) as indicated byFIGS.19A and B. For example, the ideas ofFIGS.19A and B can include totals, means, and names/values for the top n with respect to each attribute of thecommunication goal statement390. The ideas can also express whether the distributions of salespeople with respect to the two attributes are similar to each other or different than each other.

FIG.19A shows an example of anarrative1902 that can be generated using the conditional outcome framework shown by the upper portions ofFIG.19A-B as applied to acommunication goal statement1900 of “Analyze the salespeople by bookings and count of deals” with respect to a data set that includes various salespeople and their associated bookings (e.g., the dollar values of their bookings) and counts of their sales deals. In this example, thenarrative1902 would be generated after an analysis of the data set arrived at a determination that theoutcomes1904 were true (the salespeople group was a “tiny group” with only a single member). As can be seen inFIG.19A, thenarrative text1902 expresses the followingideas1906 that are tied to the outcomes1904: (1) a count of the number of salespeople in the group, (2) the names of the top n salespeople in the group (by the first attribute, bookings value) and the booking values for each of the top n salespeople (which in this example is a single person's bookings), and (3) the names of the top n salespeople in the group (by the second attribute, deal count) and the count of deals for each of the top n salespeople (which in this example is a single person's deals).

FIG.19B shows an example of anarrative1912 that can be generated using the conditional outcome framework shown by the upper portions ofFIGS.19A-B as applied to the samecommunication goal statement1900 shown byFIG.19A (“Analyze the salespeople by bookings and count of deals”) but with respect to a different data set that includes various salespeople and their associated bookings (e.g., the dollar values of their bookings) and counts of their sales deals. In this example, thenarrative1912 would be generated after an analysis of the data set arrived at a determination that theoutcomes1914 were true (the salespeople group was a “decent sized group” and has similar distributions of values among the salespeople with respect to the two attributes, bookings and deal counts). As can be seen inFIG.19B, thenarrative text1912 expresses the followingideas1916 that are tied to the outcomes1914: (1) a count of the number of salespeople in the group, (2) the total value of the first attribute (bookings) for the salespeople group, (3) the total value of the second attribute (deal counts) for the salespeople group, (4) the mean value of the first attribute (bookings) for the salespeople group, (5) the mean value of the second attribute (deal counts) for the salespeople group, (6) the names and attribute values for the top n of the salespeople group with respect to the first attribute (bookings), (7) the names and attribute values for the top n of the salespeople group with respect to the second attribute (deal counts), and (8) a statement that the distributions of salespeople with respect to the two attributes were similar to each other.FIGS.19A and B thus show how the same conditional outcome framework and same communication goal statement can produce dramatically different stories based on the content of the data set under consideration.

Another example of a base communication goal statement for an “analyze” communication goal that can be used by the system can be “Analyze Entity Group by a Change in Attribute (Over Time)”. Such communication goal statement can trigger analysis that eventually results in quantitative ideas representing the total change in value, average change in value, the median change in value, which entity had the biggest change in values, the number of entities that had positive changes, etc. Such a goal might also produce more qualitative ideas that capture intuitive understandings such as “All members of the group had positive changes”, “About half of the group had positive changes and about half had negative changes”, or “The group as a whole had a positive change, but it was really a small group of entities that had large positive changes while the rest had smaller negative changes. A practitioner may desire that narratives produced from this communication goal statement express different ideas than those generated from the other “analyze” communication goals discussed above.

FIG.20A discloses an example embodiment for a conditional outcome framework that can be used by thenarrative analytics510 associated with acommunication goal statement390 for “Analyze Entity Group by a Change in Attribute (Over Time)”. In this example, the framework includesattribute change analytics2008 that computes the changes/deltas in the specified attribute values for each member of the entity group over the relevant time period. These deltas can then be used as the attribute values for the conditional outcome framework that can otherwise function as shown byFIG.16.

FIG.20A shows an example of anarrative2002 that can be generated using the conditional outcome framework shown by the upper portion ofFIG.20A as applied to acommunication goal statement2000 of “Analyze the salespeople by the change in their bookings” (where the relevant time frame can be either a default timeframe, system-determined time frame, or user-determined time frame, in this case corresponds to a time frame of Q1 to Q2) with respect to a data set that includes various salespeople and their associated bookings (e.g., the dollar values of their bookings) over time. In this example, thenarrative2002 would be generated after an analysis of the data set arrived at a determination that theoutcomes2004 were true (the salespeople group was a “decent sized group” with a typical distribution of attribute delta values for the salespeople). As can be seen inFIG.20A, thenarrative text2002 expresses the followingideas2006 that are tied to the outcomes2004: (1) a count of the number of salespeople in the group, (2) the total number of salespeople in the group, (3) the mean value of changed bookings from Q1 to Q2 for the salespeople group, and (4) the names of the top n salespeople in the group (by their associated booking value deltas) and the booking value deltas for each of the top n salespeople.

FIG.20B discloses another example embodiment for a conditional outcome framework that can be used by thenarrative analytics510 associated with acommunication goal statement390 for “Analyze Entity Group by a Change in Attribute (Over Time)”. In this example, the framework includes group size change characterization analytics2010, where these analytics2010 are configured to analyzed the specified entity group to assess how its size changed over the relevant time period. In the example ofFIG.20B, there are three outcomes associated with these analytics2010—a conclusion that the group size increased significantly, a conclusion that the group size stayed mostly consistent, and a conclusion that the group sized decreased significantly. To reach these outcomes, the analytics2010 can tie each outcome to thresholds that are applied to computed changes in group size for the relevant time frame. For example, a group size change of +25% or more can be characterized as a significant increase, a group size change of −25% or more can be characterized as a significant decrease, and group sizes changes between these bounds can be characterized as consistent. Other outcomes within the conditional outcome framework can assess the nature of any change with respect to how the group members are ranked by the attribute over the relevant time frame. The analytics for these outcomes can also be parameterized to test whether their corresponding outcomes are applicable to the subject data. Furthermore,FIG.20B shows how the various ideas tied to the outcomes can include various informational items tied to the starting and ending times for the subject time frame, as well as ideas that express how certain group members rankings changed over the time frame.

FIG.20C shows an example of anarrative2022 that can be generated using the conditional outcome framework shown byFIG.20B as applied to thecommunication goal statement2000 of “Analyze the salespeople by the change in their bookings (over Q1 and Q2)” with respect to a data set that includes various salespeople and their associated bookings (e.g., the dollar values of their bookings) over time. In this example, thenarrative2022 would be generated after an analysis of the data set arrived at a determination that theoutcomes2024 were true (the size of the salespeople group increased significantly over Q1 to Q2, with the leaders among the salespeople with respect to bookings being largely unchanged over Q1 to Q2). As can be seen inFIG.20C, thenarrative text2022 expresses the followingideas2026 that are tied to the outcomes2024: (1) an identification of the change in size for the salespeople group from Q1 to Q2, (2) a count of the members of the salespeople group at Q1, (3) a count of the members of the salespeople group at Q2, (4) the total amount of bookings for the salespeople group at Q1, (5) the total amount of bookings for the salespeople group at Q2, (6) the mean value of bookings for the salespeople group at Q2, and (7) the names and booking values for the top n salespeople at Q2 (in terms of bookings value).

FIG.20D shows an example of anarrative2032 that can be generated using the conditional outcome framework shown byFIG.20B as applied to the samecommunication goal statement2000 shown byFIG.20C (“Analyze the salespeople by the change in their bookings (over Q1 and Q2)”) but with respect to a different data set that includes various salespeople and their associated bookings (e.g., the dollar values of their bookings) over time. In this example, thenarrative2032 would be generated after an analysis of the data set arrived at a determination that theoutcomes2034 were true (the size of the salespeople group decreased significantly over Q1 to Q2, with the salespeople who were leaders at Q1 with respect to bookings having been surpassed in Q2). As can be seen inFIG.20D, thenarrative text2032 expresses the followingideas2036 that are tied to the outcomes2034: (1) an identification of the change in size for the salespeople group from Q1 to Q2, (2) a count of the members of the salespeople group at Q1, (3) a count of the members of the salespeople group at Q2, (4) the total amount of bookings for the salespeople group at Q1, (5) the total amount of bookings for the salespeople group at Q2, (6) the names and booking values for the top n salespeople at Q1 (in terms of bookings value), (7) the names and booking values for the top n salespeople at Q2 (in terms of bookings value), (8) the positions at Q2 of the salespeople who were in the top n at Q1, (9) the positions at Q1 of the sales people who were in the top n at Q2, and (10) a statement that notes the change in leadership for salespeople as between Q1 and Q2.FIGS.20C and20D thus show another example of how the same conditional outcome framework and same communication goal statement can produce dramatically different stories based on the content of the data set under consideration.

Yet another example of a base communication goal statement for an “analyze” communication goal that can be used by the system can be “Analyze Entity Group by Characterization”. Such communication goal statement can trigger analysis that eventually results in quantitative ideas representing the count and percentage of entities with each characterization, the most common characterization, etc. Such a goal might also produce more qualitative ideas that capture intuitive understandings such as “There was a roughly even distribution of characterizations across the group”, “Every entity in the group had the same characterization”, “Almost all of the entities in the group had the same characterization”, etc. A practitioner may desire that narratives produced from this communication goal statement express different ideas than those generated from the other “analyze” communication goals discussed above.

FIGS.21A and B disclose an example embodiment for a conditional outcome framework that can be used by thenarrative analytics510 associated with acommunication goal statement390 for “Analyze Entity Group by Characterization”. In these examples, the outcomes can be associated with groupsize characterization analytics1600 and groupdistribution characterization analytics1602 as discussed above in connection withFIG.16. However, these outcomes can be linked to different ideas (and associated idea support analytics1604) as indicated byFIGS.21A and B. For example, the ideas ofFIGS.21A and B can express concepts such as which characterizations are most common among members of the entity group, and corresponding counts and percentages for various characterizations within the entity group.

FIG.21A shows an example of anarrative2102 that can be generated using the conditional outcome framework shown by the upper portions ofFIG.21A-B as applied to acommunication goal statement2100 of “Analyze the properties by their type” with respect to a data set that includes various properties and associated types for those properties (e.g., single unit homes, duplexes, commercial storefronts, etc.). In this example, thenarrative2102 would be generated after an analysis of the data set arrived at a determination that theoutcomes2104 were true (the size of the group of properties was a “large group” where almost all of the properties in that group shared the same characterization). As can be seen inFIG.21A, thenarrative text2102 expresses the followingideas2106 that are tied to the outcomes2104: (1) an identification of the most common type characterization for the properties in the group (single unit homes in this case), (2) the percentage of properties in the group that have this type characterization, and (3) other common type characterizations that exist in the property group.

FIG.21B shows an example of anarrative2112 that can be generated using the conditional outcome framework shown by the upper portions ofFIGS.21A-B as applied to the samecommunication goal statement2100 shown byFIG.21A (“Analyze the properties by their type”) but with respect to a different data set that includes various properties and their associated type characterizations. In this example, thenarrative2112 would be generated after an analysis of the data set arrived at a determination that theoutcomes2114 were true (the size of the group of properties was a “decent sized group” where there was a relatively even distribution of properties in that group with respect to their type characterizations). As can be seen inFIG.21B, thenarrative text2112 expresses the followingideas2116 that are tied to the outcomes2114: (1) an identification of the common type characterizations for the properties in the group (single family homes, duplex-style homes, and commercial storefronts in this case), (2) the count of properties in the group with each of these common type characterizations, (3) an identification of the uncommon type characterizations for the properties in the group (warehouses and parking lots in this case), and (4) the count of properties in the group with each of these uncommon type characterizations. Thus,FIGS.21A and B show yet another example of how the same conditional outcome framework and same communication goal statement can produce dramatically different stories based on the content of the data set under consideration.

“Smart” Attributes:

In another example embodiment, the system can employ “smart” attributes to support narrative generation. For example, the attributes included in theontology320 can specify a model that identifies one or more drivers of the metrical values for the subject attribute and a functional relationship between the metrical values for the subject attribute and its drivers, even if the values for that attribute are directly referenced in thesource data540. Such a configuration for attributes provides an explicit model through which the system can readily discover and assess the drivers for the subject attribute. Accordingly, this explicit model for an attribute supports narrative generation relating to drivers (e.g., narratives that explain why an attribute may have a certain value, such as explaining whether increased revenue and/or decreased expenses may be the drivers for increased profit). Moreover, by incorporating the explicit model in the ontology's attribute data structure, narrative generation system supports configurability and scalability such that the analytics for driver analysis need not be separately coded for each different use case.

FIG.22A depicts an example structure for asmart attribute2200. Thesmart attribute2200 may specify atype340,name342,timeframe344, and expression(s)346 as discussed above with respect to direct and computed value attributes330aand330b. If thesmart attribute2200 corresponds to adirect attribute330a, then thesmart attribute2200 can also include alocation2202 that identifies where the values for the subject attribute can be found in thesource data540. However, thislocation2202 can be omitted if thesmart attribute2200 corresponds to a computedvalue attribute330b.

Smart attribute

2200 can also specify adirectional sentiment2208, which flags whether larger values for the subject attribute are seen as good/positive outcomes or bad/negative outcomes. For example, with respect to an attribute such as “profit”, larger and/or increasing values (up) can be associated with a good sentiment, while smaller and/or decreasing values can be associated with a bad sentiment. Bounds and targets may also be used when defining directional sentiment. For instance, when considering a person's body temperature, 98.6 degrees Fahrenheit is better than 103.4 degrees Fahrenheit, but a temperature of 94.2 degrees Fahrenheit is definitely not better than 98.6 degrees Fahrenheit. To model directional sentiment in instances such as these, ranges can be used to define good/positive values (or bad/negative values as the case may be), with sentiment changing as the values diverge from the defined range (in either direction).

Smart attribute

2200 also specifies one ormore models2204 and one ormore model types2206 corresponding to the model(s)2204. Through the model(s)2204 and model type(s)2206, thesmart attribute structure2200 identifies one or more associated drivers for the subject attribute and the nature of the functional relationship between the driver(s) and the subject attribute. Examples ofmodel types2206 that can be used include quantitative models and qualitative models.

With a quantitative model, themodel2204 uses a formulaic and/or computational structure for expressing the model (e.g., Profit=Revenue−Expenses). If desired, a practitioner can also define different types of qualitative models (e.g., complex formulas (such as a quadratic equation), pure linear sum/difference formulas, pure linear product/quotient formulas, etc.). The functional relationship defined by a quantitative model can even be a “black box”, such as specifically in the case of deltas, as long as it is possible to relate changes in the values of the output. For example, a simple stock movement model can be represented as the formula Stock Movement=Closing Price−Opening Price. This stock movement model would allow the movement of a stock to be represented and discussed in a narrative story even if the closing and opening prices are not be present in the data so long as the stock movement data is received in the form of the delta values (where the actual stock movement values are present in the data).

With a qualitative model, themodel2204 identifies of one or more drivers and the nature of their influence on the subject attribute (e.g, a positive influencer or negative influencer), but there is not a precise computational measure that functionally relates the driver(s) to the attribute. As an example, the number of customer visits to a store can be a positive influencer of revenue for that store. With qualitative influencers, some examples of narrative characterizations that can be developed include whether the outcome was expected and whether the outcome was unexpected, particularly when the subject attribute is analyzed over the course of a timeframe. For example, if a store foot traffic attribute is expected to be positively influenced by temperate weather and in-store promotions, but store foot traffic goes down despite increases in temperate weather and in-store promotions, this unexpected result can be an useful insight to capture and expose via automated narrative generation. Similarly, when outcomes go as expected, that can also be an interesting idea to capture and expose via automated narrative generation.

Model

2204 can be configured to specify the drivers in terms of other attributes known withinontology320. Thus, the system is able to usemodel2204 to readily identify the drivers for attributes and then locate and interpret data for such drivers.

Also, it should be understood thatsmart attributes2200 can specify multiple models and model types. For example, asmart attribute2200 for an attribute can specify both a quantitative model and a qualitative model. Accordingly, such asmart attribute2200 can be queried to assess both quantitative drivers and qualitative drivers with respect to the subject attribute (e.g., evaluating a store's revenue in terms of not only quantitative drivers such the sum of revenues for individual products sold by the store but also a qualitative driver such as the number of customer visits).

FIG.22B shows an example of how asmart attribute2200 can be used in combination withsource data540 to support driver analysis. In this example, there is asmart attribute2200 for “profit”, which has anattribute type340 of “currency”, anattribute name342 of “profit”, atimeframe344 of “month”, andexpressions346 of “profit”, “net” (and possibly others). Thelocation2202 for “profit” is identified as Column C within thesource data540. In this example,source data540 can be a table or spreadsheet that provides monthly financial information for various store locations (e.g., Column A that provides astore identifier2252, Column B that provides astore address2254, Column C that provides astore profit2256, Column D that providesstore revenue2258, and Column E that provides store expenses2260). Also, in this example, the smart attribute for profit has a quantitative formula model, via2204 and2206, that expresses profit as the difference between revenue and expenses. Because the values of profit are directly specified in Column C ofsource data540, the system need not use themodel2204 to compute store profits. However, as indicated above and further elaborated upon below, this profit model does allow the system to readily identify and investigate the drivers of a store's profits. Furthermore,sentiment2208 is identified to label up as good and down as bad for profit values.

The terms of the specified profit model point tosmart attributes2200 for “revenue” and “expenses” as also shown inFIG.22B. Thus, if the system wants to assess the drivers of store profit, it can read theprofit model2204 to locate information about therevenue attribute2200 and expenses attribute2200, and use this information to locate data values for these attributes to be analyzed as part of the driver investigation.

Thesmart attribute2200 for “revenue”, which has anattribute type340 of “currency”, anattribute name342 of “revenue”, atimeframe344 of “month”, andexpressions346 of “revenue”, “income” (and possibly others). Thelocation2202 for “revenue” is identified as Column D within thesource data540. Also, in this example, the smart attribute for revenue has a quantitative aggregation model, via2204 and2206, that expresses revenue as a sum of component parts (e.g., an aggregation of the revenues attributable to the various products sold by the store). Thesentiment2208 for revenue is that up is good and down is bad.

Thesmart attribute2200 for “expenses”, which has anattribute type340 of “currency”, anattribute name342 of “expenses”, atimeframe344 of “month”, andexpressions346 of “expenses”, “costs” (and possibly others). Thelocation2202 for “expenses” is identified as Column E within thesource data540. Also, in this example, the smart attribute for expenses has a quantitative aggregation model, via2204 and2206, that expresses expenses as a sum of component parts (e.g., an aggregation of the costs attributable to various aspects of store operations (e.g., employee costs, rent, insurance costs, etc.)). Thesentiment2208 for expenses is that up is bad and down is good.

Using these structures, the narrative analytics that support driver analysis can dive into the values for the revenues and expenses of one or more stores within thesource data540 to assess how revenues and expenses have impacted store profits. As a result of such analysis, the system can then draw conclusions such as whether and/or the extent to which increased profits were due to increased revenues and/or decreased expenses.

Furthermore, it should be understood that the use of attribute models forattributes2200 withinontology320 provides opportunities for the narrative analytics to perform deep analyses of data sets. For example, the narrative analytics can conduct not only driver analysis but also a recursive multi-level driver analysis to gain ever deeper insights into the data. For example, the narrative analytics can perform an analysis of the drivers of the drivers (e.g., by using the specified revenue model to assess the drivers of revenue). For example, the driver analysis shown inFIG.22B can reveal that increased revenues may have been the driver for increased profits, and a further second level analysis into the drivers of revenue might reveal that the driver of increased revenues might have been increased sales for Products X and Y. By leveraging the structure ofontology320 and the explicit quantitive and/or qualitative models within theattributes2200, the system would be able to generate a narrative that explains to a reader that increases in sales of Products X and Y were the drivers of an increase store profits.

FIG.22C shows another example of how asmart attribute2200 can be used in combination withsource data540 to support driver analysis. In this example, thesmart attribute2200 for revenue has a qualitative formula model, via2204 and2206, that expresses revenue as being positively influenced by foot traffic and negatively influenced by the number of cold days (e.g., for a store that sells popsicles). In this example, the source data also includes data that identifies thefoot traffic2262 for each store (see Column F) as well as the number of cold days2264 for each store (see Column G). Because the values of revenue are directly specified in Column D ofsource data540, the system need not use themodel2204 to derive values for store revenue. However, as indicated above and further elaborated upon below, this revenue model does allow the system to readily identify and investigate the drivers of a store's revenue.

The terms of the specified revenue model point to directattributes330bfor “foot traffic” and “cold day count” as also shown inFIG.22C. Thus, if the system wants to assess the drivers of store revenue, it can read therevenue model2204 to locate information about thefoot traffic attribute330band coldday count attribute330b, and use this information to locate data values for these attributes to be analyzed as part of the driver investigation.

Thedirect attribute330bfor “foot traffic”, which has anattribute type340 of “integer”, anattribute name342 of “foot traffic”, atimeframe344 of “month”, andexpressions346 of “foot traffic”, “customer visits” (and possibly others). Thelocation2202 for “foot traffic” is identified as Column F within thesource data540. The foot traffic attribute may also include a sentiment (not shown) to indicate that up is good and down is bad.

Thedirect attribute330bfor “cold day count”, which has anattribute type340 of “integer”, anattribute name342 of “cold day count”, atimeframe344 of “month”, andexpressions346 of “cold days”, “chilly days”, “days of 40 degrees or less” (and possibly others). Thelocation2202 for “cold day count” is identified as Column G within thesource data540. The cold day count attribute may also include a sentiment (not shown) to indicate that up is bad and down is good.

Using these structures, the narrative analytics that support driver analysis can dive into the values for the foot traffic and cold days with respect to one or more stores within thesource data540 to draw insights such as whether an increase in foot traffic may have led to increased revenue, whether revenue increased despite a drop in foot traffic, whether a cold wave may have contributed to decreased revenues, etc.

It should be understood thatFIGS.22B and22C show examples only, and that other models can be used, including more complicated models such as complex equations.

To support an understanding of how drivers impact the subject attribute, thesmart attribute2200 can also be associated with analytics that are executed to determine the nature of the relationship between the driver and the attribute. If themodel2204 is a simple quantitative model such as a linear sum or difference or linear product/quotient, then the analytics rules can be relatively simple (larger numbers have larger impacts in linear sums/differences, in both the positive and negative directions; larger numbers in a numerator drive a value up while larger numbers in a denominator drive a value down, etc.).

However, in some instances, particularly with complex formulas, it is not necessarily straightforward how a change in value for a driver will impact a change in value for the subject attribute. To gain such understandings, the system can perform multivariable calculus to draw conclusions about how drivers impact their subject attributes. For example, the narrative analytics can perform a perturbation or sensitivity analysis where the value of the input/driver under consideration is shifted while holding the other input(s)/driver(s) in the model constant to see how these shifts affect the value of the output. In general, the perturbation analysis can shift the input with small changes around the current value.

In scenarios where the model involves understanding what drove the change in a value, another approach is available. In these scenarios, the system may be designed to iteratively zero out the change in each input and determine how fixing each input value alters the calculated output value.

Another technique can be using multivariable calculus to compute the rate of change of the output with respect to different inputs using a symbolic or numeric equation solver such as Mathematica, to directly compute the relevant derivatives. These derivatives can then be used to compute and explain how the values of the drivers affect the values of the attribute.

Further still, the functional relationship identified bymodel2204 need not necessarily be of an input/output nature. The functional relationship specified bymodel2204 may also be a correlation or anti-correlation relationship. With respect to anti-correlation, the driver and the attribute can be involved in a trade-off. In such a case, the system can also be configured to compute Pareto optimal frontiers to describe this trade-off. To assess correlations and/or anti-correlations, the system can receive inputs from a user regarding two or more attributes to be compared with each other to assess degrees of correlation/anti-correlation. Thresholds can be used to govern the levels of correlation or anti-correlation that are needed for two attributes to be judged correlated or anti-correlated (e.g., correlation coefficients above or below a specified value). However, it should be understood that the system can also be configured to automatically detect attributes that are correlated and/or anti-correlated by systematically cycling through multiple permutations of attributes withinontology320 and computing correlation/anti-correlation scores for each. Then, thesmart attribute structure2200 for an attribute can be updated to identify other attributes within theontology320 with which it is correlated/anti-correlated. With such an approach, it may be desirable to employ a secondary classification with such assignments to allow users to remove correlation/anticorrelation assignments that may not be helpful with respect to narrative generation (such as flagging therevenue attribute2200 as correlated with the profits attribute, which might be misinterpreted to mean that profits are a driver of revenue when it is the reverse that is true).

User interfaces (for example, structured GUIs) can be used to permit users to control the content of smartattribute data structures2200. For example, through such a user interface, a user can define themodels2204 andmodel types2206 used bysmart attributes2200. Furthermore, the user can also define thesentiment data2208. However, as indicated above, the models/model types2204/2206 could also be learned automatically via statistical and other techniques.

FIG.23 depicts an example process flow that shows how thesmart attributes2200 can be leveraged to support driver analysis. Atstep2300, a processor determines whether the narrative analytics to be executed call for some level of driver analysis with respect to an attribute. If so, the process flow proceeds to step2302. An example of narrative analytics that may call for driver analysis can be the narrative analytics associated with an “explain” communication goal. However, it should be understood that other communication goals may find driver analysis helpful. For example, themodels2204 could also be used to support communication goals relating to prediction and/or recommendation. For example,models2204 based on perturbation or sensitivity analysis can be used to come up with recommendations in response to an inquiry such as “How can I increase the value of Attribute X?” or with predictions such as “What would likely happen to my revenue if there are 6 cold days next month?”. As such, communication goals relating to predictions and recommendations may also call for driver analysis.

Atstep2302, a processor analyzes theontology320 to determine with the subject attribute has anattribute model2204. If so, the process flow proceeds to step2304, where a processor determines one or more drivers from theattribute model2204. Upon determination of the driver(s), the processor can access the ontology mappings to identify and access the data for the driver(s) (step2306) (see, for example, the linkages intosource data540 shown byFIGS.22B and22C). Thereafter, atstep2308, the processor can perform a variety of analytics on the accessed driver data. These analytics can be analytics that support communication goals such as “explain”, “predict”, and/or “recommend”, etc.

Example Embodiments for “Explain” Communication Goal Statements:

As mentioned above, an operator such as “Explain” can be used to identify a communication goal statement corresponding to an explanation communication goal. An example of a base communication goal statement for an explanation communication goal that could be supported by the system is “Explain (a Value of) an Attribute (of an Entity or Entity Group) (in a Timeframe)” (which can be labeled in shorthand as “Explain a Value”), where “Attribute” serves as a parameter for an attribute of the specified (or understood) “Entity” in theontology320 within a specified (or understood) “Timeframe” in theontology320. Such a base communication goal statement could be parameterized into a communication goal statement as “Explain the Profit of the Store in the Month”, where the Attribute is specified as “Profit” and where the entity or entity group is specified as “Store”. However, it should be understood that such a base communication goal statement could be parameterized in any of a number of different ways. Further still, it should be understood that different base communication goal statements could be used to satisfy other explanation-related communication goals, some examples of which are discussed below.

The system can link a base communication goal statement of “Explain an Attribute of an Entity” withnarrative analytics510 that are linked to a story structure that aims to provide the reader with an understanding of why an attribute has a value that it does. As discussed above, thesenarrative analytics510 can perform driver analysis to gain an understanding of what the contributing and/or inhibiting factors with respect to the attribute's value are. Accomplishing this may involve expressing a variety of ideas that are characterizations of the data including the drivers, such as which drivers are the “biggest contributor(s)”, whether there was a “great team effort” (e.g., lot of drivers making similar positive contributions), whether there was a “wash” situation (e.g.,Driver 1 went up butDriver 2 went down and they largely canceled each other out), whether there was a “held back” situation (e.g., there was a big contribution by a positive driver, but lost of small contributions by negative drivers held the subject value down), etc. Accordingly, if desired by a practitioner, the system can directly map such a communication goal statement to parameterized narrative analytics and a parameterized story configuration that will express these concepts. However, the use of aconditional outcome framework1500 by the relevant narrative analytics can provide additional flexibility where the resulting narrative story structure will adapt as a function of not only the specified communication goal but also as a function of the underlying data.

FIG.24A discloses an example embodiment for a conditional outcome framework that can be used by thenarrative analytics510 associated with acommunication goal statement390 for “Explain a Value”. In this example, the conditional outcome framework can employ multiple levels or layers ofoutcomes1502 that serve as drivertype characterization logic2450 used by supportinganalytics1506. The drivertype characterization logic2450 can be configured to precisely categorize themodel type data2406 associated with the subject attribute, whereupon this categorization will control the type ofideas1504 that will be considered and/or presented with respect to the narrative generation process for “Explain a Value”. For example, thelogic2450 can be configured to assess whether themodel type2406 corresponds to a formula, aggregation, or influencer(s). If themodel type2406 is a formula, thelogic2450 can also determine whether the formula is a complex formula or a pure sum formula (as governed by various predefined parameters applied to the formula in question or by metadata within the smart attribute structure2200). If the formula is a pure sum formula, thelogic2450 can further categorize the pure sum formula based on how many operands are included in the pure sum formula. If themodel type2406 is an aggregation, thelogic2450 can also determine the size of the aggregated group (e.g., how many members are parts of the aggregation) and classify the aggregation accordingly. An aggregation can be distinguished from a pure sum because an aggregation works over a group. For example, an aggregation can be “the total bookings of all salespeople”, which can be modeled by summing the bookings of each member of the group “salespeople”. Another example of an aggregation can be “the average salary of people in the neighborhood”, which can be modeled as the average of the salary values for each member of the group “people in the neighborhood”. Accordingly, it should also be understood that aggregations can be values other than sums; for example, averages, medians, standard deviations, maximums, and minimums can be aggregations. By contrast, a pure sum has fixed operands with no group involved. An example of a pure sum can be “total costs=operating costs+cost of goods+salaries”, where that calculation will always have three operands. Themodel type2206 can identify whether acorresponding model2204 is an aggregation or pure sum, and this model type can be specified in response to user input when an smart attribute is created, or it could be determined via an automated process that classifies models based on their content (e.g., determining whether a group is present in the model2204).

InFIG.24A,various outcomes1502 are linked to one or moreidea data structures1504. Thus, the resolution of which ideas should be expressed in a given narrative that is generated to satisfy thecommunication goal statement390 will depend on whichoutcomes1502 were deemed true in view of the underlying data. The relationships between ideas for expression in a narrative to the nature of the underlying data in this example can be seen in the table below:


Outcome of
Characterizing	Ideas to be Expressed in the
the Underlying Data	Narrative About the Underlying Data

Complex Formula	Narrative should express the following ideas:
	The value for the attribute
	The names and values of the drivers for the
	attribute.
Pure Sum Formula	Narrative should express the following ideas:
(Less than 3 Operands)	The value for the attribute
	The names and values of the drivers for the
	attribute.
Pure Sum Formula	Narrative should express the following ideas:
(3 or More Operands)	The value for the attribute
	The names and values of the most positive
	drivers for the attribute.
	The names and values for the most negative
	drivers of the attribute.
Aggregation	Narrative should express the following ideas:
(Decent-Sized Group)	The value for the attribute
	The names and values of the most positive
	drivers for the attribute.
	The names and values for the most negative
	drivers of the attribute.
Aggregation	Narrative should express the following ideas:
(Very Small Group)	The value for the attribute
	The names and values of the drivers for the
	attribute.
Aggregation	Narrative should express the following idea:
(Empty Group)	That the group is empty
Influencers	Narrative should express the following ideas:
	The value for the attribute
	The names and values of the influencers for the
	attribute.

To the extent that any of theideas1504 need additional computed values in order to be expressed (where such values were not previously computed by analytics2450), the supportinganalytics1506 can further include idea support analytics2452. For example, if theanalytics2450 do not compute or retrieve the names and/or values for the drivers, the idea support analytics2452 can include parameterized logic that computes retrieves or computes such information.

In example embodiments, the conditional outcome framework can be designed so that it does not need any input or configuration from a user other than what is used to compose the communication goal statement390 (e.g., for the “Explain a Value” communication goal statement, the system would only need to know the specified attribute and the entity for that attribute plus any applicable timeframe). However, for other example embodiments, a practitioner might want to expose some of the parameters of the conditional outcome framework to users to allow further configurations or adjustments of the conditional outcome framework.

For example, a practitioner might want to implement the thresholds used within the conditional outcome framework as user-defined values. In the context ofFIG.24A, this could involve exposing the thresholds used for characterizing the size of the aggregation group to users so that a user can adjust the group size boundaries in a desired manner (e.g., in some contexts, a large group might have a minimum of 100 members, while in other contexts a large group might have a minimum of 1000 members). Similarly, the thresholds for how many drivers are included in the groups “the most positive drivers” and “the most negative drivers” could be exposed to users to allow adjustments.

As another example, a practitioner might want to provide users with a capability to enable/disable the links betweenoutcomes1502 andideas1504 in a conditional outcome framework. For example, a GUI could present a user with lists of all of theoutcomes1502 andideas1504 that can be tied to a communication goal statement within a conditional outcome framework. The user could then individually select whichideas1504 are to be linked to whichoutcomes1502. If desired by a practitioner, that conditional outcome framework can include default linkages that are presented in the GUI, and the user could make adjustments from there.

FIG.24A shows an example of anarrative2402 that can be generated using the conditional outcome framework ofFIG.24A as applied to acommunication goal statement2400 of “Explain the Profit of the Store in the Month” with respect to a data set such as the ones shown inFIGS.22B and22C, and where the attribute model/model type2204/2206 is a pure sum formula where “Profit=Revenue−Expenses”. In this example, thenarrative2402 would be generated after an analysis of the data set arrived at a determination that theoutcomes2404 were true (the model/model type2204/2206 for “profit” is a pure sum formula with less than 3 operands). As can be seen inFIG.24A, thenarrative text2402 expresses the followingideas2406 that are tied to the outcomes2404: (1) an identification of the value for the store's profit, and (2) the names and values for the store's profit drivers (revenue and expenses).

FIG.24B shows an example of anarrative2412 that can be generated using the conditional outcome framework ofFIGS.24A and24B as applied to acommunication goal statement1810 of “Explain the fixed expenses of the person in the month” with respect to a data set that includes various people and data about their various expenses, and where the attribute model/model type2204/2206 for the “fixed expenses” is a pure sum formula where “fixed expenses=rent+car payment+gas+electricity+internet+cell phone”. In this example, thenarrative2412 would be generated after an analysis of the data set arrived at a determination that theoutcomes2414 were true (the model/model type2204/2206 for “profit” is a pure sum formula with more than 3 operands). As can be seen inFIG.24B, thenarrative text2412 expresses the followingideas2416 that are tied to the outcomes2414: (1) an identification of the value for the person's fixed expenses, (2) the names and values for the person's two largest expense drivers (rent and car payments), and (3) the names and values for the person's most negative drivers (which in this case is an empty set).

FIG.24C shows an example of anarrative2422 that can be generated using the conditional outcome framework ofFIGS.24A-C as applied to acommunication goal statement1810 of “Explain the mpg of the car in the week” with respect to a data set that includes weekly data values miles traveled and gallons consumed by a car, and where the attribute model/model type2204/2206 for the “mpg” is a complex formula where “mpg=miles traveled/gallons consumed”. In this example, thenarrative2422 would be generated after an analysis of the data set arrived at a determination that theoutcomes2424 were true (the model/model type2204/2206 for “mpg” is a complex formula). As can be seen inFIG.24C, thenarrative text2422 expresses the followingideas2426 that are tied to the outcomes2424: (1) an identification of the value for the car's miles per gallon, and (2) the names and values for the car's mpg drivers (miles traveled and gallons consumed).

FIG.24D shows an example of anarrative2432 that can be generated using the conditional outcome framework ofFIGS.24A-D as applied to acommunication goal statement1810 of “Explain the profits of the company in the year” with respect to a data set that includes data that describes the company's profits in various regions, and where the attribute model/model type2204/2206 for “profits” is an aggregation where “profits=sum(profits in each region)”. In this example, thenarrative2432 would be generated after an analysis of the data set arrived at a determination that theoutcomes2434 were true (the model/model type2204/2206 for “profits” is an aggregation with a decent-sized group). As can be seen inFIG.24D, thenarrative text2432 expresses the followingideas2436 that are tied to the outcomes2434: (1) an identification of the value for the company's profits, (2) the names and values for the regions which were the most positive drivers of profit, and (3) regions which were the most negative drivers of profit. In this example, there are two regions in each group (most positive and most negative). As indicated above, this size can be pre-set within the analytics or it can be derived as a function of the data.

FIG.24E shows an example of anarrative2442 that can be generated using the conditional outcome framework ofFIGS.24A-E as applied to acommunication goal statement1810 of “Explain the sales of the store in the quarter” with respect to a data set that includes data that describes various forms of store data, and where the attribute model/model type2204/2206 for “sales” is an influencer model where foot traffic and in-store promotions are a positive influencer of sales and where days with inclement weather is a negative influencer for sales. In this example, thenarrative2442 would be generated after an analysis of the data set arrived at a determination that theoutcomes2444 were true (the model/model type2204/2206 for “sales” is an influencer model). As can be seen inFIG.24E, thenarrative text2442 expresses the followingideas2446 that are tied to the outcomes2444: (1) an identification of the value for the store's sales, and (2) the names and values for the store's sales influencers (foot traffic, in-store promotions, and days of inclement weather).

FIGS.24A-E thus show how the same parameterized conditional outcome framework can be used to generate narrative stories across different content verticals (e.g., a story about store profits as inFIG.24A versus a story about car mileage efficiency as inFIG.24C), which demonstrates how the parameterized conditional outcome framework provides an effective technical solution to the technical problem of horizontal scalability in the NLG arts.

It should be understood that the system can also be designed to support other “explain” communication goals. For example, another base communication goal statement that can be used by the system can be “Explain the Change in (a Value of) an Attribute (of an Entity or Entity Group) (over a Timeframe)” (which can be labeled in shorthand as “Explain a Change in a Value”)”. Such a goal can produce ideas that capture a variety of understandings such as which drivers gained or lost significantly (even if not necessarily the biggest magnitude driver), how main drivers may have changed over time, how the group size of the main drivers may have changed over time, etc.FIG.25A depicts an example of various ideas that can be learned and presented by a narrative generation system with respect to a communication goal of “Explain a Change in Value” with respect to an example data set for store profits and drivers A-F. Accordingly, it should be understood that it may be desirable for the narratives produced in response to the “Explain a Change in a Value” communication goal statement to express different ideas than the narratives produced in response to the “Explain a Value” communication goal statement.

FIG.25B discloses an example embodiment for a conditional outcome framework that can be used by thenarrative analytics510 associated with acommunication goal statement390 for “Explain the change in value” (where the relevant time frame can be either a default timeframe, system-determined time frame, or user-determined time frame. In this example, the framework includesattribute change analytics2550 that compute the changes/deltas in the specified attribute values (including the driver attributes) over the relevant time period. These deltas can then be used by the conditional outcome framework to identify ideas for possible expression in a narrative story. In this example, theattribute change analytics2550 include afirst level2552 ofconditional outcomes1502 relating to changes in value for the subject attribute (store profits) and asecond level2554 of conditional outcomes relating to changes in value for the drivers of the subject attribute. For example, thefirst level2552 can include analytics that determine whether the value of the subject attribute change over the relevant time frame (which may include some thresholding to eliminate insignificant changes in value (e.g., changes of 2% or less could be deemed “no change”). Examples of analytics in thesecond level2554 can include analytics that are configured to (1) determine which driver values changed the most over the relevant time frame, (2) whether any of the drivers were the main drivers of change for the subject attribute and/or drowned out the other drivers, (3) whether the changes in driver values effectively canceled each other out, and (4) whether the mix of significant drivers changed over the relevant time frame.

FIG.25B shows an example of anarrative2502 that can be generated using the conditional outcome framework shown by the upper portion ofFIG.25A as applied to acommunication goal statement390 of “Explain the change in the profit of the store between the previous month and the month” (where the relevant time frame is user-defined as previous month-to-current month) with respect to a data set that includes profits, revenues, and expenses for a store over time, and where the attribute model/model type2204/2206 is a pure sum formula where “Profit=Revenue−Expenses”. In this example, thenarrative2502 would be generated after an analysis of the data set arrived at a determination that theoutcomes2504 were true (the model/model type2204/2206 for “profit” is a pure sum formula, where the store profit changed over the timeframe, and where one driver was the main driver for this change in store profits). As can be seen inFIG.25B, thenarrative text2502 expresses the followingideas2506 that are tied to the outcomes2504: (1) an identification of the value for the store's profit for the first month of the time frame, (2) an identification of the value for the store's profit for the last month of the time frame, (3) an identification of the value of the change in the store profits from the previous month to the current month, (4) an identification of the driver that drove the change in store profits, and (5) a description of the change and change direction for this driver over the timeframe.

FIG.25C shows an example of anarrative2512 that can be generated using the conditional outcome framework shown by the upper portion ofFIGS.25A and25B as applied to acommunication goal statement390 of “Explain the change in profits of the company between last year and this year” (where the relevant time frame is user-defined as previous year-to-current year) with respect to a data set that includes data that describes the company's profits in various regions, and where the attribute model/model type2204/2206 for “profits” is an aggregation where “profits=sum(profits in each region)”. In this example, thenarrative2512 would be generated after an analysis of the data set arrived at a determination that theoutcomes2514 were true (the model/model type2204/2206 for “profit” is an aggregation, where the company profits did not change over the timeframe, and where the changes in various drivers of company profits canceled each other out). As can be seen inFIG.25C, thenarrative text2512 expresses the followingideas2516 that are tied to the outcomes2514: (1) an identification of the value for the company's profits at the end of the timeframe, (2) an identification that the changes in the drivers canceled each other so as to result in no change in profits over the timeframe, (3) an identification of the driver with the biggest positive change in direction (and the values for this change), and (4) an identification of the driver with the biggest negative change in direction (and the values for this change).

FIG.25D shows an example of anarrative2522 that can be generated using the conditional outcome framework shown by the upper portion ofFIGS.25A-C as applied to acommunication goal statement390 of “Explain the change in sales of the store between last week and this week” (where the relevant time frame is user-defined as previous week-to-current week) with respect to a data set that includes data that describes various forms of store data, and where the attribute model/model type2204/2206 for “sales” is an influencer model where foot traffic and in-store promotions are a positive influencer of sales and where days with inclement weather is a negative influencer for sales. In this example, thenarrative2522 would be generated after an analysis of the data set arrived at a determination that theoutcomes2524 were true (the model/model type2204/2206 for “sales” is an influencer model, and where the store sales changed over the timeframe. As can be seen inFIG.25D, thenarrative text2522 expresses the followingideas2526 that are tied to the outcomes2524: (1) an identification of the value for the store's sales for the first week of the time frame, (2) an identification of the value for the store's profit for the last week of the time frame, (3) an identification of the value of the change in the store sales from the previous week to the current week, (4) an identification of the influencer driver with the biggest change in direction in the same direction as the change in store sales (and the values for this change), and (4) an identification of the influencer driver with the biggest change in the opposite direction of the change in store sales (and the values for this change).

Furthermore, it should be understood that the narrative analytics tied to “Explain” communication goals can be executed recursively to analyze and assess thing such as drivers of drivers. For example, as shown inFIGS.26A and26B, one or more of theideas1506 in the conditional outcome framework associated with an “explain” communication goal can include afeedback path2650 for a recursive traversal of the conditional outcome framework using a new communication goal statement that includes one or more attributes from thesubject idea1506 in place of the attribute from the prior pass.FIGS.26A and26B show an example where the system employs two passes through the conditional outcome framework to perform not only driver analysis with respect to the subject attribute, but also a drivers of drivers analysis.

FIG.26A shows an example first pass through such a conditional outcome framework with respect to acommunication goal statement390 of “Explain the change in profits of the company between last year and this year” (where the relevant time frame is user-defined as previous year-to-current year) with respect to a data set that includes data that describes the company's profits in various regions, and where the attribute model/model type2204/2206 for “profits” is an aggregation where “profits=sum(profits in each region)”. In this example, analysis of the data set arrives at a determination that theoutcomes2604 are true (the model/model type2204/2206 for “profit” is an aggregation, where the company profits changed over the timeframe, and where one driver drove this change in company profits). As can be seen inFIG.26A, one of theideas2606 that results from such analysis is an idea that includes a feedback path2650 (the idea for “biggest change in direction of overall change”).

Thus, viafeedback path2650, the system performs a second pass through the conditional outcome framework, as shown inFIG.26B. With this second pass, the communication goal statement that is used is “Explain the change in value of the profit for the Asia region between last year and this year” (where the Asia region's profits serves as the driver of company profits that had the biggest change in the same direction as the overall change for the company's profits). The attribute model/model type2204/2206 for regional profits is an aggregation of profits for each country in the subject region. In this example, after the second pass, the system would conclude thatoutcomes2614 were true (the model/model type2204/2206 for “regional profit” is an aggregation, where the regional profits changed over the timeframe, and where most of the drivers of regional profits changed during the time frame). As can be seen inFIG.26B, thenarrative text2602 expresses the following

ideas

It should be understood thatFIGS.26A and26B are examples only, and that the recursive nature of the narrative analytics tied to “Explain” communication goals need not be limited to only two passes. For example, the analytics could be configured to recursively analyze drivers so long as further drill downs are available for drivers. As another example, a user-defined input can control the depth of recursiveness. Moreover, the system could define a default level of recursiveness of multiple levels of recursion are available. Also, whileFIGS.26A and B show a recursive conditional outcome framework with respect to an “Explain the Change in Value” communication goal, it should be understood that the conditional outcome frameworks for other “explain” communication goals could also be made recursive (such as the frameworks shown inFIGS.24A-E with respect to the “Explain a Value” communication goal.

Live Story Editing:

Another innovative feature that may be included in a narrative generation platform is an editing feature whereby a user can use a story outline comprising one or more composed communication goal statements and an ontology to generate a narrative story from source data, where the narrative story can be reviewed and edited in a manner that results in automated adjustments to the narrative generation AI. For example, an author using the system in an editing mode can cause the system to generate a test narrative story from the source data using one or more composed communication goal statements and a related ontology. The author can then review the resulting test narrative story to assess whether the story was rendered correctly and whether any edits should be made. As an example, the author may decide that a different expression for an entity would work better in the story than the expression that was chosen by the system (e.g., the author may decide that a characterization expressed as “slow growth” in the narrative story would be better expressed as “sluggish growth”). The user can directly edit the text of the narrative story using text editing techniques (e.g., selecting and deleting the word “slow” and typing in the word “sluggish” in its place). Upon detecting this edit, the system can automatically update theontology320 to modify thesubject characterization object332 by adding “sluggish growth” to the expression(s)364 for that characterization (and optionally removing the “slow growth” expression).

To accomplish this, words in the resultant test narrative story can be linked with the objects fromontology320 that these words express. Further still, sentences and clauses can be associated with the communication goal statements that they serve. In this fashion, direct edits on words, clauses, and sentences by an author on the test narrative story can be traced back to their source ontological objects and communication goal statements.

Through the automated changes to theontology320 and/or story outline, the system can be able to quickly adjust its story generation capabilities to reflect the desires of the author. Thus, during a subsequent execution of the story generation process, the system can use the updatedontology320 and/or story outline to control the narrative generation process.

FIGS.256-278 and their supporting description in Appendix A describe aspects of such editing and other review features that can be included in an example embodiment of a narrative generation platform. Appendix A also describes a number of other aspects that may be included in example embodiments of a narrative generation platform.

While the invention has been described above in relation to its example embodiments, various modifications may be made thereto that still fall within the invention's scope. Such modifications to the invention will be recognizable upon review of the teachings herein.

Appendix A

This appendix describes a user guide for an example embodiment referred to as Quill, and it is organized into the following sections:

A1: Introduction

A1(i): What is Quill?

A1(ii): What is NLG?

A1(iii): How to use this Guide

A2: Getting Started

A2(i): Logging in

- A2(i)(a): Supported Browsers
- A2(i)(b): Hosted on-premises

A2(ii): General Structure

- A2(ii)(a): Creating an Organization
- A2(ii)(b): Creating Users

A2(iii): Creating Projects

- A2(iii)(a): Authoring
- A2(iii)(b): Data Manager
- A2(iii)(c): Project Administration
  A3: Configure a Story from a Blueprint

A3(i): Configure a Sales Performance Report

- A3(i)(a): Headline
- A3(ii)(b): Overview
- A3(iii)(c): Drivers
- A3(iv)(d): Adding Data
- A3 (v)(e): Data Requirements
  A4: Ontology Management

A4(i): Entity Types and Expressions

- A4(i)(a): Entities Tab
- A4(i)(b): Creating an Entity Type

A4(ii): Relationships

- A4(ii)(a): Creating a Relationship

A4(iii): Characterizations

- A4(iii)(a): Entity Characterizations
- A4(iii)(b): Assessment Characterizations

A4(iv): Attributes

- A4(iv)(a): Attribute Values
- A4(iv)(b): Computed Attributes
  A5: Configure a Story from Scratch

A5(i): The Outline

- A5(i)(a): Sections
  - A5(i)(a)(1): Renaming a Section
  - A5(i)(a)(2): Deleting a Section
  - A5(i)(a)(3): Moving a Section
- A5(i)(b): Communication Goals
  - A5(i)(b)(1): Creating a Communication Goal
    - A5(i)(b)(1)(A): Entity Types
    - A5(i)(b)(1)(B): Creating an Entity Type
    - A5(i)(b)(1)(C): Creating a Relationship
    - A5(i)(b)(1)(D): Characterizations
  - A5(i)(b)(2): Deleting a Communication Goal
  - A5(i)(b)(3): Moving a Communication Goal
  - A5(i)(b)(4): Linked Goals
  - A5(i)(b)(5): Related Goals (Subgoals)
  - A5(i)(b)(6): Styling Communication Goals
  - A5(i)(b)(7): Charts
- A5(i)(c): Data Requirements
  - A5(i)(c)(1): Tabular Data
  - A5(i)(c)(2): Document-Based Data
- A5(i)(d): Data Formatting
- A5(i)(e): Data Validation
  A6: Data Management

A6(i): Getting Data Into Quill

- A6(i)(a): Uploading a File
- A6(i)(b): Adding a Connection
  A7: Reviewing Your Story

A7(i): Live Story

- A7(i)(a): Edit Mode
  - A7(i)(a)(1): Entity Expressions
  - A7(i)(a)(2): Characterization Expressions
  - A7(i)(a)(3): Language Guidance
- A7(i)(b): Review Mode

A7(ii): Logic Trace

A7(iii): Monitoring

A8: Managing Story Versions

A8(i): Drafts and Publishing

A8(ii): Change Log

A9: Writing Stories in Production

A9(i): API

A9(ii): Scheduling

A10: Sharing and Reuse

A11: Terminology

A12: Communication Goal Families

A13: Miscellaneous

A13(i): Supported Chart Types

A13(ii): Supported Document Structures

- A13(ii)(a): Single Document
- A13(ii)(b): Nested Documents
- A13(ii)(c): Unsupported Structures

A13(iii): Styling Rules

A13(iv): Using Multiple Data Views

A13(v): Permission Structure

The following sections can be read in combination withFIGS.27-298 for an understanding of how the example embodiment of Appendix A can be used by users.

A1: Introduction

A1 (i): What is Quill?

Quill is an advanced natural language generation (Advanced NLG) platform that transforms structured data into narratives. It is an intelligent system that starts by understanding what the user wants to communicate and then performs the relevant analysis to highlight what is most interesting and important, identifies and accesses the required data necessary to tell the story, and then delivers the analysis in the most intuitive, personalized, easy-to-consume way possible—a narrative.

Quill is used to automate manual processes related to data analysis and reporting. Its authoring capabilities can be easily integrated into existing platforms, generating narratives to explain insights not obvious in data or visualizations alone.

A1 (ii): What is NLG?

Natural Language Generation (NLG) is a subfield of artificial intelligence (AI) which produces language as output on the basis of data input. Many NLG systems are basic in that they simply translate data into text, with templated approaches that are constrained to communicate one idea per sentence, have limited variability in word choice, and are unable to perform the analytics necessary to identify what is relevant to the individual reader.

Quill is an Advanced NLG platform that does not start with the data but by the user's intent of what they want to communicate. Unlike templated approaches that simply map language onto data, Quill performs complex assessments to characterize events and identify relationships, understands what information is especially relevant, learns about certain domains and utilizes specific analytics and language patterns accordingly, and generates language with the consideration of appropriate sentence length, structure, and word variability. The result is an intelligent narrative that can be produced at significant scale and customized to an audience of one.

A1 (iii): How to use this Guide

Getting Started walks through how to log in to Quill and set up Organizations, Users, and Projects. It also provides an overview of the components of Quill.

Ontology Management is a high-level description of the conceptual elements stories in Quill are based on. This section will help you understand the building blocks of writing a story.

Configuring a Story from Scratch and Configuring a Story from a Blueprint talk through the steps of configuring a story in Quill. Jump to one of these sections if you want to learn the basics of using Quill.

Data Management contains the necessary information for setting up data in Quill, discussing the accepted formats and connections.

Reviewing Your Story discusses the tools available to review, edit, and monitor the stories you configure in Quill.

Managing Story Versions covers publishing stories and tracking changes made to projects.

Writing Stories in Production addresses administrative aspects of story generation, including setting up an API endpoint and scheduling story runs.

Sharing and Reuse goes through how to make components of a particular project available across projects.

Common Troubleshooting offers simple, easy-to-follow steps for dealing with common questions that arise when working in Quill.

The Terminology will help you understand the terminology used in this manual and throughout Quill, while the Communication Goal Families describes the available communication goals and how they relate to each other.

The Miscellaneous section presents an example of a state of Quill functionality.

A2: Getting Started

A2(i): Logging in

A2(i)(a): Supported Browsers

Quill is a web-based application that supports Firefox, versions 32 ESR and up, and all versions of Chrome. Logging in will depend on whether Narrative Science is hosting the application or Quill has been installed on-premises.

A2(i)(b): Hosted On-Premises

For on-premises installations of Quill, if you are an authenticated user, go to your custom URL to access Quill. You will be taken directly to your project dashboard. If you see an authentication error, contact your site administrator to be set up with access to Quill.

A2(ii): General Structure

Quill is made up of Organizations and Projects. An Organization is the base level of access in Quill. It includes Administrators and Members and is how Projects are grouped together. Projects are where narratives are built and edited. They exist within Organizations. Users exist at all levels of Quill, at the Site, Organization, and Project levels. Access privileges can be set on a per User basis and apply differently at the Site, Organization, and Project levels. (For more detail, refer to the Permissions Structure section of the Miscellaneous section.)

A2(ii)(a): Creating an Organization

Creating an Organization is a Site Administrative privilege. At the time that Quill is installed, whether hosted by Narrative Science or on-premises, a Site Administrator is designated. Only a Site Administrator has the ability to create an Organization (seeFIG.27).

Site Administrators can add users, and users can only see the Organizations of which they are members. Site Administrators have access to all Organizations with the View All Dashboards option (seeFIG.28), but Organization Members do not.

Members only see the Organizations they have access to in the Organization dropdown and can toggle between them there (seeFIG.29).

Site Administrators can use the Organization dropdown to switch between Organizations or from the Organizations page. Each Organization will have a dashboard listing Projects and People.

FIG.30 shows where Organization Administrators and Members may create Projects, but only Organization Administrators may create Users. Both Organization Administrators and Members may add Users to Projects and set their permissions. For both Administrators and Members, Quill will show the most recent Organization when first opened.

A2(ii)(b): Creating Users

Only an Administrator (both Site or Organization) may create a User (seeFIG.31). Users can be added to Organizations as Administrators or Members (seeFIG.32).

Administrative privileges cascade through the structure of Quill. (See Permission Structure in the Miscellaneous section for more information.) That is to say, an Administrator at the Organization level has Administrative privileges at the Project level as well. The Project permissions of Members are set at the Project level.

At the Project level, a user can be an Administrator, an Editor, or a Reviewer (seeFIG.33).

An Administrator on a Project has full access, including all aspects of Authoring, sharing, drafts and publishing, and the ability to delete the Project. An Editor has access to Authoring but cannot share, publish and create a new draft, or delete the Project. A Reviewer only has access to Live Story in Review Mode. A user's access to a Project can be edited on the People tab of the Organization dashboard.

A2(iii): Creating Projects

Both Administrators and Members can create Projects from the Organization dashboard (seeFIG.34).

The creator of a Project is by default an Administrator. When creating a new Project, select from the list of blueprint options whether it will be an Employee History, Empty Project, Municipal Expenses, Network Analysis, or a Sales Performance report (seeFIG.35).

This is also where you can access shared components of existing projects which members of an Organization have elected to share for reuse by other Organization members. As shown byFIG.36, you can filter them based on what parts of them have been shared: Outline, Ontology, and Data Sources; Outline and Ontology; and Outline. (Refer to the Sharing and Reuse section for additional information.)

An Empty Project allows the user to configure a Project from the ground up, and a Sales Performance Report provides the framework to configuring a basic version of a sales performance report. A user can be added to a project by clicking the plus symbol within a project (seeFIG.37) and adding them by user name. To add a user to a Project, the user should be a member of the Organization.

You can set Project level permissions using the dropdown menu (seeFIG.38).

You can edit permissions and remove users here as well (seeFIG.39).

Users can also be added to Projects from the People tab of the Organization dashboard (seeFIG.40).

Each Project includes Authoring, a Data Manager, and Admin (seeFIG.41).

Authoring is where the narrative gets built and refined; the Data Manager is where the data for the story is configured; and Project Administration is where Monitoring, the Change Log, API documentation, Project Settings, and Scheduling are located.

A2(iii)(a): Authoring

The main view in Authoring is the Outline, as shown byFIG.42.

The Outline is where the narrative is built. Sections can be added to provide structure and organization to the story (seeFIG.43).

Communication Goals are then added to a Section (seeFIG.44).

Communication Goals are one of the main underpinnings of Quill. They are the primary building blocks a user interacts with to compose a story.

Authoring is also where Entities are managed (seeFIG.45).

An Entity is any primary “object” which has particular Attributes. It can be set to have multiple expressions for language variation within the narrative or have Relationships to other Entities for more complex representations. All of these things comprise an Ontology.

Data Requirements are how the data that supports a story is mapped to the various story elements.

Based on the Communication Goals in the Outline, the Data Requirements tab will specify what data points it needs in order to generate a complete story (seeFIG.46).

Live Story is a means of reviewing and editing a story generated from the Outline.

It has two modes, Review mode and Edit mode. Review mode allows the user to see a complete narrative based on specific data parameters (seeFIG.47). Edit mode allows the user to make changes to the story (seeFIG.48).

Drafts and Publishing are Quill's system of managing versions of your story (seeFIG.49).

This is how you publish your story configurations and keep a published version as read-only in order to request stories through the API or via the Scheduler. Each Project can only have one draft and one published version at a time.

A2(iii)(b): Data Manager

The Data Manager is the interface for adding the database connections or uploading the files that drive the story (seeFIGS.50 and51).

A2(iii)(c): Project Administration

The Project Administration features of Quill are Monitoring, the Change Log, API documentation, Project Settings, and Scheduling. They are located in the Admin section of the Project.

Monitoring allows the user to see the status (success or failure) of generated stories (seeFIG.52). Stories run through the synchronous API or generated in Live Story will be listed here and can be filtered based on certain criteria (e.g. date, user).

The Change Log tracks changes made to the project (seeFIG.53).

Quill supports on-demand story generation through synchronous API access (seeFIG.54).

Project Settings are where you can change the name of the Project and set the project locale (seeFIG.55). This styles any currencies in your Project to the relevant locale (e.g. Japanese Yen).

You can set your story to run at regular intervals in Scheduling (seeFIG.56).

A3: Configure a Story from a Blueprint

The benefit of configuring a story from a project blueprint is the ability to reuse Sections, Communication Goals, Data Views, and Ontology as a starting point. These blueprints are available in the Create Project screen as discussed in the Getting Started section.

A3(i): Configure a Sales Performance Report

Select the Performance Project Blueprint and give your project a name. You can always change this later by going to Admin>Project Settings. After the project is created, you'll be taken to Authoring and presented with an Outline that has a “Headline”, “Overview”, and “Drivers” sections with associated Communication Goals within them (seeFIG.57).

A3(i)(a): Headline

To begin, set the Attributes in the Communication Goal in the Headline. Select “the value” (seeFIG.58) to open a sidebar on the right side of the screen.

Create an Attribute by entering “sales” and clicking “Create “sales” (seeFIG.59).

Then specify “currency” from the list of Attribute types (seeFIG.60).

The next step in Attribute creation is to associate the Attribute with an Entity type. Since there are no existing Entity types in this blank Project, you'll have to create one (seeFIG.61).

Click “an entity or entity group” to bring out the Entity type creation sidebar (seeFIG.62).

Name the Entity type “salesperson” and click to create “salesperson” (seeFIG.63).

Set the base Entity type to Person (seeFIG.64).

Quill will make a guess at the singular and plural expressions of the Entity type. Make corrections as necessary and click “Okay” (seeFIG.65).

There are no designations on the Entity type you created, so click “Okay” to return to the Attribute editing sidebar (seeFIG.66). A designation modifies the Entity type to specify additional context such as relationships to other Entity types or group analysis.

Once an Entity type is created, it will be available for selection throughout the project. Additional Entity expressions can be added in the Entities tab (see Ontology Management).

Next, you'll specify a Timeframe for the Attribute (seeFIG.67).

Click “Timeframe” to create a new Timeframe (seeFIG.68).

Choose Month (seeFIG.69) to complete the creation of the Attribute (seeFIG.70).

Click “the other value” to set another Attribute (seeFIG.71).

Name it “benchmark” (seeFIG.72) and set its type to “currency” (seeFIG.73).

Associate it with the Entity type “salesperson” and set it to be in the “month” Timeframe (seeFIG.74).

Click on the arrow to the left of the Communication Goal in the headline section (seeFIG.75) to expose the list of related goals.

Check the box to opt in to the Characterization (seeFIG.77).

Quill has default thresholds to determine the comparative language for each outcome.

Entering different values into the boxes (seeFIG.78), with each value being percentage comparisons calculated against your data view, can change these thresholds (seeFIG.79). As such, these comparisons are done against numerical Attribute Values. If a value is changed to be less than the upper bound or greater than the lower bound of a different outcome, Quill will adjust the values so that there is no overlap.

A3 (ii)(b): Overview

Configure the first Communication Goal in the Overview section (seeFIG.80) using the same steps as for the Communication Goal in the Headline section.

Set the Attribute of the first “Present the value” Communication Goal to be “sales in the month of the salesperson,” and the Attribute of the second “Present the value” Communication Goal to be “benchmark in the month of the salesperson” (seeFIG.81).

Link the two Present Communication Goals by dragging (using the gripper icon on the right side of the Communication Goal that is revealed when you hover your cursor over the Goal—seeFIG.82) “Present the benchmark in the month of the salesperson” to overlap “Present the sales in the month of the salesperson” (seeFIG.83).

A3(iii)(c): Drivers

Step One: Click “the value” in the first Communication Goal in the Drivers section to set the Attribute. Choose computed value in the Attribute creation sidebar and go into the functions tab in order to select “contribution” (seeFIG.84).

Set the Attribute to be “sales” (seeFIGS.85 and86).

Click the first entity and create the new Entity type “sector” of type “Thing” (seeFIG.87).

Add a relationship (seeFIG.88) and set the related entity as “salesperson” (seeFIG.89).

Set the relationship as “managed by” (seeFIGS.90 and91).

Add a group analysis and set the Attribute as “sales” and the Timeframe to “month” (seeFIG.92).

Set the second entity to “salesperson” and the timeframe to “month” (seeFIG.93).

Step Two: Follow the steps as above to complete the second Communication Goal in the Drivers section but set the position from top to be 2 in the group analysis (seeFIGS.94-95).

Step Three: Click into the “Search for a new goal” box and select “Call out the entity” (seeFIG.96).

Set the entity to be “highest ranking sector by sales in the month managed by the “salesperson” (seeFIG.97).

Then move the goal by grabbing the gripper icon on the right side to the first position in the section (seeFIG.98).

Step Four: Create another Call out the entity Communication Goal (seeFIG.99).

Create a new Entity type of “customer” and set the base entity type to “thing” (seeFIG.100).

Add a group analysis and set the Attribute to “sales” and the Timeframe to “month” (seeFIG.101).

Then add a relationship and set the related entity to be “highest ranking sector by sales in the month managed by the salesperson” and choose the relationship “within” (seeFIG.102). Then move it to the third position in the Drivers section, after the first Present goal (seeFIG.103).

Step Five: Create another Call out the entity Communication Goal and set the entity to “second highest ranking sector by sales in the month managed by the salesperson” (seeFIG.104).

And move it to the fourth position in the Drivers section, before the second Present goal (seeFIG.105).

Step Six: Create another Call out the entity Communication Goal. Create a new entity type of customer following Step Four, but set the related entity to be “second highest ranking sector by sales in the month managed by the salesperson” (seeFIG.106).

Step Seven: Finally, create another Call out the entity Goal. Create a new plural Entity type of “regions” and set its type to be “place.” Add a group analysis and set the number from top to “3,” the Attribute to “sales,” and the Timeframe to “month” (seeFIG.107).

Then add a relationship, setting the related Entity type as “salesperson” and the relationship as “managed by” (seeFIG.108).

The completed outline should matchFIGS.109 and110. Quill will update the “Data Requirements” tab with prompts asking for the information necessary to generate the story from that configuration.

A3(iv)(d): Adding Data

In order to complete the Data Requirements for the story, you add a Data Source to the Project. Go the Data Manager section of the Project to add a Data View (seeFIG.111).

Choose to Upload a file and name the Data View (seeFIG.112). Upload the Sales Performance Data csv file that you were provided.

Once Quill has saved the Data View to the Project, you will be presented with the first few rows of the data (seeFIG.113).

A3(v)(e): Data Requirements

The Data Requirements will guide you through a series of questions to fill out the necessary parameters for Narrative Analytics and Communication Goals (seeFIG.114). Go to the Data Requirements tab in Authoring.

See the Data Requirements section of Configure a Story from Scratch for more detail. The completed Data Requirements can appear as shown byFIGS.115-118.

Go to Live Story to see the story (seeFIG.119).

Toggles for “salesperson” (seeFIG.120) and “month” will show you different stories on the performance of an individual Sales Person for a given quarter.

A4: Ontology Management

A4(i): Entity Types and Expressions

Entity types are how Quill knows what to talk about in a Communication Goal. An Entity type is any primary “object” which has particular Attributes. An example is that a Department (entity type) has Expenses (Attribute)—seeFIG.121. An Entity is a specific instance of an Entity type, with data-driven values for each Attribute.

In other words, if you have an Entity type of Department, Quill will express a specific instance of a Department from your data, such as Transportation. Likewise, Expenses will be replaced with the numerical value in your data. Quill also allows you to create Entity and Attribute designations, such as departments managed by the top salesperson or total expenses for the department of transportation (seeFIG.122).

When you generate a story with such designations, Quill replaces them with the appropriate calculated values.

A4(i)(a): Entities Tab

Entity types are managed in the Entities tab (seeFIG.123).

Quill defaults to showing all Entity types, but you can filter to only those that are in the story (seeFIG.124).

Clicking an Entity type tile allows you to view its details and edit it. Here, you can modify or add Entity expressions (seeFIG.125), edit or add Entity characterizations (seeFIG.126), add or edit Attributes associated with the Entity (seeFIG.127), and add Relationships (seeFIG.128).

A4(i)(b): Creating an Entity Type

Entity types can be created from the Entities tab (seeFIG.129) or from the Outline (seeFIG.130).

When you create an Entity type, you select its base Entity type from the options of Person, Place, Thing, or Event (seeFIG.131).

This gives Quill context for how to treat the Entity. In the case of the Person base Entity type, Quill knows to determine gender and supply an appropriate pronoun.

Entity types can have multiple expressions. These are managed in the Entities tab of a project (seeFIG.132).

They can be added either from the Entities tab (seeFIG.133) or from Live Story (seeFIG.134).

To add expressions, open the details for an Entity type (by clicking on “salesperson,” as shown above) and click in the text area next to the plus icon in the sidebar. Type in the expression you want associated with the Entity. You can add expressions for the Specific, Generic Singular, and Generic Plural instances of the Entity by clicking on the arrow dropdown in the sidebar to toggle between the expressions (seeFIG.135).

Attributes can be referenced in Specific entity expressions by setting the attribute name off in brackets. For example, if you would like the last name of the salesperson as an expression, set “last name” off in brackets as shown inFIG.136.

You can also opt into and out of particular expressions. If you have multiple expressions associated with the Entity, Quill will alternate between them at random to add Variability to the language, but you can always uncheck the box to turn the expression off (seeFIG.137) or click on the x icon to remove it completely. You cannot opt out of whichever expression is set as the primary expression, but if you want to make one you've added the primary expression simply click and drag the expression to the top of the list.

A4(ii): Relationships

Entity types can be tied to each other through Relationships. For example, a City contains Departments, and Departments are within a City (seeFIG.138). Relationships are defined and created during Entity type creation in Authoring.

They can also be added to an existing Entity type by editing the Entity type in Authoring.FIG.139 shows how a relationship can be added from the Entity type tile.FIG.140 shows setting the related Entity type, andFIG.141 shows choosing the relationships.

An Entity type can support multiple relationships. For example, Department has a relationship to City: “within cities”; and a relationship to Line Items: “that recorded line items” (seeFIG.142).

A4(ii)(a): Creating a Relationship

If the Relationships already set in Quill do not meet your needs, you can create your own. Type the relationship you want to create in the “search or create” textbox and click “Create new relationship” at the bottom of the sidebar (seeFIG.143).

After that, you will be taken through some steps that tell Quill how the new Relationship is expressed. Enter in the present tense and past tense forms of the Relationship, and Quill automatically populates the noun phrase that describes the relationship between the Entities (seeFIG.144).

Once you complete the steps for both directions of the relationship (seeFIG.145), Quill will apply the relationship to your Entity types and add the relationship to its library. You can use the Relationship again anywhere else in the project.

A4(iii): Characterizations

Characterizations are editorial judgments based on thresholds that determine the language used when certain conditions are met. Characterizations can be set on Entity types directly or when comparing Attributes on an Entity in a Communication Goal.

A4(iii)(a): Entity Characterizations

An Entity characterization allows you to associate descriptive language with an Entity type based on the performance of a particular Attribute. For example, you might want to characterize a Sales Person by her total sales (seeFIG.146).

Click “+Characterization” to create a Characterization (seeFIG.147).

Once you've named and created the Characterization, you'll have to set the expressions for the Default outcome. Click the grey parts of speech to edit the expression in the sidebar (seeFIG.148).

To add an Outcome, click “+Outcome” (seeFIG.149).

Change the Outcome label to describe the outcome. For this example, the Outcome label will be “Star” to reflect an exceptional sales performance. Again, edit the expressions by clicking on the grey parts of speech. In order for the outcome to be triggered under specific conditions, you need to add a Qualification (seeFIG.150).

Click “+Qualification” to set the value to Sales (seeFIG.151) and the comparison as “greater than” (seeFIG.152).

You have a choice for comparing the value to an Attribute or a static value (seeFIG.153).

In this case, choose to keep it a static value and set the value to $10,000 (seeFIG.154).

Follow the same steps to create the lower bound outcome, setting the label as “laggard” and the static value to $1,000 (seeFIG.155).

Once you have defined Characterizations on an Entity, you can include them in your story by using the Present the Characterization of the entity Communication Goal (seeFIG.156).

A4(iii)(b): Assessment Characterizations

To set the characterizations on a comparative Communication Goal, expand the arrow to the left of the Communication Goal (seeFIG.157).

This exposes the list of available subgoals (see section below). At the bottom of this list is a goal to assess the difference between the attributes. Check the box to expose the thresholds applied to the comparison (seeFIG.158).

Quill has default thresholds to determine the comparative language for each outcome. These thresholds can be changed by entering different values into the boxes. If a value is changed to be less than the upper bound or greater than the lower bound of a different outcome, Quill will adjust the values so that there is no overlap (seeFIG.159).

There is also default language to correspond with each of the possible outcomes. This can also be changed to suit your particular needs and the tone of your story. Click on the green, underlined text to open a sidebar to the right where you can add additional expressions and set which expression you would like to be the primary characterization (seeFIG.160).

You can also opt into and out of particular expressions. However, in the example of Appendix A, you cannot opt out of whichever expression is set as the primary characterization. If you have multiple expressions associated with the outcome (seeFIG.161), Quill will alternate between them at random to add Variability to the language. These additional expressions will be tied to the specific Communication Goal where you added them and will not appear for others. You can also opt into and out of particular expressions, as well as delete them using the x. However, in the example of Appendix A, you cannot opt out of whichever expression is set as the primary expression.

These expressions can also be edited in Edit mode in Live Story (seeFIGS.162 and163).

A4(iv): Attributes

An Attribute is a data-driven feature on an Entity type. As described above, Quill will express a specified Attribute with the corresponding value in the data based on your Communication Goal. Quill also supports adding modifiers to attributes in order to perform calculations on the raw value in the data.

A4(iv)(a): Attribute Values

Attribute Values are those values that are taken directly from your data. In other words, no computations are performed on them. An example is the Name of the City. If there is a value in the data for the total expenses of the city, Quill pulls this value directly and performs no computations, unless a data validation rule is applied e.g. “If null, replace with Static Value.” which is set in the Data Requirements when mapping the Outline's information needs to your Data View.FIG.164 shows an attribute creation sidebar.FIG.165 shows creating an attribute value in the attribute creation sidebar.FIG.166 shows setting the type of an attribute in the attribute creation sidebar.FIG.167 shows a completed attribute in a communication goal.

You also have the option of specifying a Timeframe (seeFIGS.168 and169).

This allows you to restrict the window of analysis to a particular day, month, or year.

Create a new Timeframe by selecting one of those three options. Once you've done this, Quill also recognizes the “previous” and “next” instances of that Timeframe (seeFIG.170). In other words, if you create a day Timeframe, Quill will populate the list of known Timeframes with day, along with previous day and next day.

A4(iv)(b): Computed Attributes

On the other hand, if the total expenses of the city are calculated by taking the sum of the expenses for each department, Quill allows you to create a Computed Value. Computed Values allow you to compute new values from values in your data and use them for group analysis.

Computed Values can be aggregations or functions. Aggregations include count, max, mean, median, min, range, total (seeFIG.171).

In the example of Appendix A, current functions are limited to contribution, which evaluates how much of an aggregate a component contributed (seeFIG.172).

Computed Values can be created from Present or Callout Communication Goals. When you create the attribute you are presenting or using to filter the group of Entities, click into the Computed Value tab to access the list of aggregations and functions.

A5: Configure a Story from Scratch

Quill allows you to build a story based on an existing blueprint or entirely from the ground up. To build a story specific to your needs, choose to create a Blank Project Blueprint and name it.

A5(i): The Outline

Once you've created your project, you'll be taken to the Outline (seeFIG.173).

The Outline is a collection of building blocks that define an overall Story. This is where you do the work of building your story.

A5(i)(a): Sections

Create and name Sections to organize your story (seeFIG.174).

Once created, a Section can be renamed, deleted, or moved around within the outline. Sections are how Communication Goals are grouped together.

A5(i)(a)(1): Renaming a Section

Click the name of the Section and type in the new name.

A5(i)(a)(2): Deleting a Section

Hover your cursor over the Section you want to delete. On the right side, two icons will appear: an ellipses and a gripper icon (seeFIG.175).

Click the ellipses to reveal the option to delete the Section (seeFIG.176).

If deleted the Section will disappear from the outline along with any Communication Goals it contains.

A5(i)(a)(3): Moving a Section

As above for deleting a Section, hover your cursor over the Section you want to move. Click and hold the gripper icon (seeFIG.177) to drag the Section where you want to move it and let go.

A5(i)(b): Communication Goals

Communication Goals provide a bridge between analysis of data and the production of concepts expressed as text. In other words, they are the means of expressing your data in language.

A5(i)(b)(1): Creating a Communication Goal

Click the text box where it says to Search for a new goal. Choose the Communication Goal you'd like to use (seeFIG.178).

A5(i)(b)(1)(A): Entity Types

Depending on the Communication Goal you choose, you will have to set the Entity type or types it is talking about. An Entity type is any primary “object” which has particular Attributes. An example is that a Department (Entity type) has Expenses (Attribute). An Entity is a specific instance of an Entity type, with data-driven values for each Attribute.

In the example of the Communication Goal “Call out the entity”, the example embodiment for Quill of Appendix A requires that an Entity type be specified. What, in your data, would you like to call out? Click “the entity” in the Communication Goal to open a sidebar to the right (seeFIG.179).

Here you can select among Entity types that already exist or create a new one. Available entities include entities created from the outline or the entities tab (including any characterizations).

A5(i)(b)(1)(B): Creating an Entity Type

Click “new” in the Entity sidebar (seeFIG.180). Then choose from existing Entity types or create a new one. Set whether the Entity type is singular or plural (seeFIG.181). Once you have created the Entity type, you will be asked to set its base Entity type: Event, Person, Place, or Thing (seeFIG.182). Next, set the plural and singular expressions of the Entity type (seeFIG.183). Quill takes an educated guess at this, but you have the opportunity to make changes. Next you will designate any relationships, group analysis, or qualification pertaining to the Entity type (seeFIG.184).

Quill lets you know the state of an Entity type, whether it is unset, in progress, or valid based on the appearance of the Entity type in the Communication Goal. The Entity type appears grey when unset (seeFIG.185), blue when being worked on (seeFIG.186), and green when valid (seeFIG.187).

Adding a relationship allows you to tell Quill that an Entity is related to another Entity. To do so, choose to Add Relationship as you create your Entity type. Then set or create the Entity type that this Entity has a relationship to (seeFIG.188). Quill suggests a number of relationships from which you can choose, including “lives in”, “managed by”, “within”, and more.FIG.189 shows a list of available relationships between two entities (department and city).FIG.190 shows an entity with a designated relationship. You can also create Relationships that will be added to the library.

When creating an Entity type of the base type event (seeFIG.191), Quill will prompt you to set a timeframe for it to associate the event with (seeFIG.192).

A5(i)(b)(1)(C): Creating a Relationship

If the Relationships already set in Quill do not meet your needs, you can create your own. Type the relationship you want to create in the “search or create” textbox and click “Create new relationship” at the bottom of the sidebar (seeFIG.193).

After that, you will be taken through some steps that tell Quill how the new Relationship is expressed. Enter in the present tense and past tense forms of the Relationship, and Quill automatically populates the noun phrase that describes the relationship between the Entities (seeFIG.194).

Once you complete the steps for both directions of the relationship (seeFIG.195), Quill will apply the relationship to your Entity types and add the relationship to its library (seeFIG.196). You can use the Relationship again anywhere else in the project.

You can also apply Group Analysis to an Entity type (seeFIG.197).

In the example of Appendix A, rank is supported. This allows you to specify which Entity in a list of Entities to use in a Communication Goal. Select whether you are asking for the position from the top or the position from the bottom and the ranking of the Entity you want (seeFIG.198).FIG.199 shows setting the attribute to perform the group analysis by.FIG.200 shows an Entity type with group analysis applied.

You also have the option of specifying a Timeframe (seeFIG.201).

This allows you to restrict the window of analysis to a particular day, month, or year (seeFIG.202).

Create a new Timeframe by selecting one of those three options. Once you've done this, Quill also recognizes the “previous” and “next” instances of that Timeframe (seeFIG.203). In other words, if you create a day Timeframe, Quill will populate the list of known Timeframes with day, along with previous day and next day.

Once you have completed the steps to create an Entity type, Quill adds it to the list of Entity types available for use throughout the story. In other words, you can use it again in other parts of the Outline.

A5(i)(b)(1)(D): Characterizations

Refer to Characterizations in Ontology Management for more information on Entity Characterizations.

To set the characterizations on a comparative Communication Goal, expand the arrow to the left of the Communication Goal (seeFIG.204).

This exposes the list of available subgoals (see section below). At the bottom of this list is a goal to characterize the difference between the attributes. Check the box to expose the thresholds applied to the comparison (seeFIG.205).

Quill has default thresholds to determine the comparative language for each outcome. These thresholds can be changed by entering different values into the boxes. If a value is changed to be less than the upper bound or greater than the lower bound of a different outcome, Quill will adjust the values so that there is no overlap (seeFIGS.206 and207).

There is also default language to correspond with each of the possible outcomes. This can also be changed to suit your particular needs and the tone of your story. Click on the green, underlined text to open a sidebar to the right where you can add additional expressions and set which expression you would like to be the primary expression (seeFIG.208).

If you have multiple expressions associated with the outcome (seeFIG.209), Quill will alternate between them at random to add Variability to the language. These additional expressions will be tied to the specific Communication Goal where you added them and will not appear for others. You can also opt into and out of particular expressions, as well as delete them using the x. However, you cannot opt out of whichever expression is set as the primary expression.

A5(i)(b)(2): Deleting a Communication Goal

To delete a Communication Goal, hover your cursor over it to reveal a trash can icon (seeFIG.210). Click it to delete the Communication Goal.

A5(i)(b)(3): Moving a Communication Goal

Moving a Communication Goal is done the same way as moving a Section. Hover your cursor over the Communication Goal to reveal the gripper icon (seeFIG.211).

Click and move the Communication Goal within the Section or to another section (seeFIG.212). Be careful when you move Communication Goals to make sure there is space between them.

Communication Goals without space between them are Linked Goals, described below.

A5(i)(b)(4): Linked Goals

Quill supports linking Communication Goals. This allows the user to express ideas together. For example, you may wish to talk about the number of departments in a city along with the total budget for the city. Hover your cursor over the Communication Goal to reveal the gripper icon, click and drag it above the goal you wish to link (seeFIG.213). They will always be unlinked by revealing the gripper icon again by hovering, and moving the Communication Goal into an empty space on the Outline.

When you link the Communication Goal that expresses the number of departments and the Communication Goal that expresses the total budget for the city (seeFIG.214), Quill will attempt to express them together with smoother language such as combining them into one sentence with a conjunction.

A5(i)(b)(5): Related Goals (Subgoals)

Some goals support related goals, or subgoals. This allows you to include supporting language without having to create separate Communication Goals for each related idea. For example, if you have a Communication Goal comparing attributes on an entity—in this case, the budget and expenses of the highest ranking department by expenses within the city—you may also wish to present the values of those attributes. Expand the Communication Goal to expose those related goals and opt into them as you like (seeFIG.215).

A5(i)(b)(6): Styling Communication Goals

Quill allows for styling Communication Goals for better presentation in a story. Hover your cursor over a Communication Goal to reveal the “Txt” dropdown on the right side (seeFIG.216).

Here, you can choose whether the language expressed is styled as a headline (seeFIG.217), normal text (seeFIG.218), or bullets (seeFIG.219).

A5(i)(b)(7): Charts

Charts are supported for two Communication Goals: Present the [attribute] of [a group] and Present the [attribute] of a [group of events]. For either of these goals, to get a chart, go to the Txt dropdown and select Chart (seeFIG.220).

This will render the Communication Goal as a chart.

Present the [attribute] of [a group] (seeFIG.221) will result in a bar chart (seeFIG.222).

Present the [attribute] of [a group of events] (seeFIG.223) will result in a line chart (seeFIG.224).

A5(i)(c): Data Requirements

Once you have configured your story, Quill will ask where it can find the data to support the Entity types and Attributes you have specified in the Communication Goals. Go to the Data Requirements tab in Authoring to provide this information (seeFIG.225).

The Data Requirements will guide you through a series of questions to fill out the necessary parameters for Narrative Analytics and Communication Goals. For each question, select the data view where that data can be found and the appropriate column in the table.

A5(i)(c)(1): Tabular Data

FIG.226 shows an example where the data is tabular data.

A5(i)(c)(2): Document-Based Data

FIG.227 shows an example where the data is document-based data.

Where the value supplied is numerical, Quill will provide analytic options for cases where there are multiple values (seeFIG.228). “Sum” sums values in a column like a Pivot Table in a spreadsheet. “Constant” is if the value does not change for a particular entity. For example, the quarter may always be Q4 in the data.

For each Entity type, Quill will ask for an identifier (seeFIG.229).

This is what Quill uses to join data views. An identifier has no validation options as it doesn't actually appear in the story. (Data Validation is discussed below.)

The final question in Data Requirements will be to identify the main Entity the story is about (seeFIG.230).

In the city budget example, Quill needs to know what city the story will be about. This can be set as a static value (e.g. Chicago) or as a Story Variable (seeFIG.231).

A Story Variable allows you to use a set of values to trigger stories. In other words, if your data contains city budget information for multiple cities, setting the city the story is about as a Story Variable will allow you to run multiple stories against the same dataset. The location of the value for the Story Variable is defined earlier in Data Requirements where Quill asks where to find the city.

If there is a Timeframe in the Headline of the story, Quill will need you to identify this in Data Requirements as well.

As with the entity, this can be a static value or a Story Variable. It can also be set as the run date (seeFIG.232), which will tell Quill to populate the value dynamically at the time the story is run. (See the Scheduling section for more information.)

A5(i)(d): Data Formatting

Quill allows you to set the format for certain data points to have in your data source so it can be mapped to your Outline. These formats are set based on the ontology (Entities, Attributes, etc.) being used in your Communication goals, with default styling applied to values. See the Miscellaneous section for specific styling information. As you configure the appropriate data formats present in your data view, validation rules can be applied if the types do not match for a particular story run. For example, if Quill is expecting the expenses of a city to be a currency and receives a string, the user is provided with various options of actions to take. These are specified in the Data Validation section below. To select the format of any date fields you may have, go to the Data Requirements tab in Authoring and click the checkbox icon next to a date (seeFIG.233) to pull out the sidebar (seeFIG.234).

Click on the date value to open a list of date format options and make your selection (seeFIG.235).

A5(i)(e): Data Validation

Quill supports basic data validation. This functionality can be accessed in Data Requirements. Once you specify the location of the information in the data, a checkbox appears next to it. Click this to open the data validation sidebar (seeFIG.236).

You will be presented with a number of options in a dropdown menu for what to do in the case of a null value (seeFIG.237).

You can tell Quill to fail the story, drop the row with the null value, replace the null value with a value you provide in the text box below, or ignore the null value.

A6: Data Management

Quill allows for self-service data management. It provides everything you need to upload files and connect to databases and API endpoints.

A6(i): Getting Data Into Quill

Quill supports data in tabular or document-based formats. Tabular data can be provided to Quill as CSV files or through table selections made against SQL connections (PostgreSQL, Mysql, and Microsoft SQL Server are supported). Document-based data can be provided by uploading a JSON file, creating cypher queries against Neo4j databases, a MongoDB connection, or through an HTTP API connection (which you can also set to elect to return a CSV).

A6(i)(a): Uploading a File

You can upload a CSV or JSON file directly to Quill in the Data Manager. In the Views tab, choose to Upload a file from the Add a Data View tile (seeFIG.238).

Provide the name of the view and upload the file. The amount of time it will take to upload a file depends on the size of the file for a maximum file size of 50 MB, and operating against a data base connection is recommended. This automatically populates the Source Name.FIG.239 shows an example where a CSV file is uploaded.FIG.242 shows an example where a JSON file is uploaded. You can edit the Source Name, which is helpful when file names are difficult to parse and for readability when selecting the file from the Live Story dropdown when previewing your story. Quill automatically detects whether the data is in tabular or document form and samples a view of the first few rows or lines of data.FIG.240 shows an example of uploaded tabular data, andFIG.241 shows a sample view of tabular data.FIG.243 shows an example of uploaded document-based data, andFIG.244 shows a sample view of document-based data.

Quill also supports uploading multiple data sources into one Data View. This functionality can be accessed in the Data View by clicking the three dots icon (seeFIG.245).

Here, you can upload additional files or add additional connections (seeFIG.246). If you have multiple data sources in a Data View, you can set a source as primary, edit, or delete it. New data files or tables can be added to an existing data view, but only tabular sources can be added to tabular views and document-based sources to document-based views. To make the newly uploaded source your primary dataset, click on the three dots icon and select it as primary. This makes it the file used during runtime story generation requests or Live Story previews.

A6(i)(b): Adding a Connection

You can also provide data to Quill by connecting to a SQL database, a cypher query against a Neo4j database, a MongoDB database, or an HTTP API endpoint. You can add a connection from the Data View tab by choosing Start from Connection from the Add a Data View tile (seeFIGS.247 and248) or by choosing to Add a Connection from the Connections tab (seeFIG.249).

Quill will ask for the appropriate information to set up each type of connection.FIG.250 shows an example of credentials for a SQL database connection.FIG.251 shows an example of credentials for a Neo4j database connection.FIG.252 shows an example of credentials for a MongoDB database connection.FIG.253 shows an example of credentials for an HTTP API connection.

The connection will be made, subject to network latency and the availability of the data source. Data Views from connections are made from the Views tab. Choose Start from a Connection and select the connection you created (seeFIG.254).

Quill will prompt you to specify the table to add the data source. For neo4j connections, you will have to put in a cypher query to transform the data into tabular form (seeFIG.255). From there, Data Requirements can be satisfied using the same experience as tabular and document-based views allowing for type validation rules to be set as needed.

A7: Reviewing Your Story

Once you have configured your story with Sections and Communication Goals, and satisfied the Data Requirements against a data source, you can review or edit its contents, understand the logic Quill used to arrive at the story, and monitor the status of stories you run.

A7(i): Live Story

Live Story is where you can see the narrative expression of the story you configured in the Outline (seeFIG.256).

If you have set up your story to be based on Story Variables (as opposed to a static value), you can toggle between them (seeFIG.257) and see how the narrative changes.

You can also switch between data sources (seeFIG.258).

Click the “rewrite” button to generate a new narrative to see how any additional expressions you have added affect the Variability of the story (seeFIG.259).

Live Story has two modes: Edit and Review.

A7(i)(a): Edit Mode

Edit mode allows you to make changes to the language in your story (seeFIG.260).

A7(i)(a)(1): Entity Expressions

You can add Entity expressions from Live Story (in addition to the Entities tab). If you click on any Entity (highlighted in blue under the cursor) (seeFIG.261), a sidebar will open on the right side (seeFIG.262).

You can add Entity expressions by typing in the area next to the plus sign. You can also opt into and out of particular expressions. If you have multiple expressions associated with the Entity, Quill will alternate between them at random to add Variability to the language. Click the rewrite button to see how your story changes. As described in the Ontology Management section, you can also click, hold, and drag an expression to the top of the list and opt out of the additional expressions to set it as primary.

A7(i)(a)(2): Characterization Expressions

You can edit the expressions in any Characterizations you have set on Compare Communication Goals from Edit mode in Live Story. As with Entity expressions, Characterization expressions will be highlighted in blue when you move the cursor over them (seeFIG.263).

Click on the expression to open a sidebar to the right where you can add additional expressions and set which expression you would like to be the primary expression (seeFIG.264).

Quill will alternate between them at random to add Variability to the language. These additional expressions will be tied to the specific Communication Goal where you added them and will not appear for others. You can also opt into and out of particular expressions, as well as delete them using the x. However, you cannot opt out of whichever expression is set as the primary expression. See Assessment Characterizations in Ontology Management for more detail.

A7(i)(a)(3): Language Guidance

You can add set Language Preferences, such as word order choice, to your story in the Edit mode of Live Story using Language Guidance. Hover over a section (sections correspond to Sections in the Outline) of the story to reveal a Quill icon on the right side (seeFIG.265).

Click it to isolate the section from the rest of the story (seeFIG.266).

Click on a sentence to expose any additional expressions you can opt into (seeFIG.267).

Quill generates expressions using language patterns appropriate to the Communication Goal, so the number of additional expressions will vary and not all sentences will have additional expressions. Quill will alternate between them at random to give your story more language variation.

A7(i)(b): Review Mode

Project Reviewers have access to this aspect of Authoring. In review mode (seeFIG.268), you can read stories and switch datasets to see how they affect the story. You can also see if there are any errors in the story with Quill's logic trace (discussed below).

A7(ii): Logic Trace

Quill allows you to see the steps it takes to express Communication Goals as a story. If you click on any sentence in the story in Live Story in Review mode, Quill will show the underlying Communication Goal or Goals (seeFIG.269).

Expand the arrow on the left of the Goal to see the steps Quill took to retrieve data based on the Communication Goal and Data Requirements (seeFIG.270).

In this case, it created a Timeframe and an Entity Type. Then it “shows its work” of pulling the Attribute Value of “sales” constrained by the Timeframe of “month” and associated with the Entity Type “Salesperson 1.”

The Logic Trace can also be downloaded as a JSON file from the Monitoring tab in Admin (seeFIG.271).

A7(iii): Monitoring

You can monitor the status of any stories you run, whether they were written in Live Story or generated through API requests in the Monitoring tab in Admin. Here, you can see whether stories succeeded or failed, and filter for specific stories using the available filters below (seeFIG.272).

Use the Newer and Older buttons to scroll through the stories (seeFIG.273), and use the arrows on the column headers to set search criteria. You can filter by story status (seeFIG.274), when the story completed writing (seeFIG.275), the user who requested the story (seeFIG.276), a run type for the story (seeFIG.277), and a version for the story (seeFIG.278).

A8: Managing Story Versions

Quill supports creating and keeping track of changes to and versions of the stories you configure.

A8(i): Drafts and Publishing

Once you have configured your story and are satisfied with its expression in Live Story, you can Publish the draft of your story (seeFIG.279).

Once Published, your story will go live and that version will be the one that Quill uses when stories are requested through an API connection. After a draft has been Published, any changes you wish to make to the Project should be made after creating a new draft (seeFIG.280).

Once a new draft has been created, it can be deleted. You can also switch to the Published version if you want to abandon the changes you have made in the new draft. The drafts and publishing dropdown is also where you can save the Project as a blueprint to share with others in the Organization (seeFIG.281). This is discussed in Sharing.

Project Administrators are the only ones with draft creation and publishing privileges. While Editors may make changes to active drafts, they cannot publish them or create new ones. Reviewers only have access to review mode in Live Story and cannot create, make changes to, or publish drafts.

A8(ii): Change Log

Quill tracks configuration changes made within a Project. Anytime a user makes a change or adds a new element to a Project, it's noted in the Change Log. The Change Log can be accessed in the Admin section of Quill (seeFIG.282).

Here, you can see a list of all changes in the Project, the users that made the changes, the date and time the changes were made, and the version of the project the changes were made to. As with Monitoring, you can page through the list of changes by clicking on the Newer and Older buttons (seeFIG.283).

The Time, User, and Version information can be used to filter the list by using the drop-downs next to the column headers.FIG.284 shows an example dropdown to filter by time.FIG.285 shows an example dropdown to filter by user.FIG.286 shows an example dropdown to filter by version.

You can also download the changes made as a CSV (seeFIG.287) in order to plot the Project activity or aggregate it for purposes of visualization or archiving.

A9: Writing Stories in Production

A9(i): API

Quill supports on-demand story generation by connecting to an API. The documentation can be accessed from Admin.

API request samples are available in the API Documentation tab of the Admin section of Authoring (seeFIG.288). These samples are based on the project Outline configuration and available data source connections. Parameters and output formatting can be set here so that stories can be requested to meet specific data requirements from an outside application.

The Request Builder allows the user to select the dataset, set the format (Plain Text, HTML, JSON, or Word) of the output, and choose the syntax of the request sample (seeFIG.289).

An external application can use the sample to post requests to the API to generate stories from Quill once the text in red has been replaced with its specific variables (seeFIG.290).

Each Quill user will be able to request a certificate and key from their system administrator.

A9(ii): Scheduling

Stories can also be run on a schedule (seeFIG.291).

Once Scheduling is enabled (seeFIG.292), stories can be run at scheduled intervals (seeFIG.293) beginning at a specific date and time. The run can be ended at a specific time or continue indefinitely. Additionally, you can set the format of the story to Plain Text, HTML, or JSON (seeFIG.294), which can then be retrieved for viewing from the Monitoring page. Published Project schedules are un-editable at this time. To edit the schedule, create a new draft and update as needed.

A10: Sharing and Reuse

Projects can be shared with other users. The Draft dropdown menu includes an option to Save as Blueprint (seeFIG.295).

Here, you can give the shared version of the Project a name and description (seeFIG.296).

You can also specify how much of the Project you make available for sharing. You can include the Outline, Ontology (Entities), and Data Sources, the Outline and Ontology, or just the Outline (seeFIG.297).

Projects that have been saved as blueprints can be accessed when choosing a blueprint. Quill defaults to including all shared projects, but you can filter blueprints based on what elements they include (Outline, Ontology, Data Sources) (seeFIG.298).

All: Terminology

The following provides a glossary for various terms used in connection with describing the example embodiment of Appendix A.

An Organization is a collection of Projects managed by an Administrator. Members of an Organization have access to those Projects within it that they have permissions for. Outlines are collections of building blocks that define an overall Story.

Communication Goals provide a bridge between analysis of data and the production of concepts expressed as text.

Narrative Analytics generate the information needed by Communication Goals to generate stories.

Projects are where stories are configured. A Project includes Authoring, the Data Manager, and Admin.

Project Blueprints are templates comprised of an Outline, specific story sections, and collections of Communication Goals.

An Ontology is a collection of Entity Types and Attributes, along with their expressions, that powers how Quill expresses your story.

An Entity Type is any primary “object” which has particular Attributes. An example is that a Sales Person (entity) has Sales (attribute). Relationships provide context for entities within a story.

Every Entity Type has a Base Entity Type that identifies to Quill whether it is a Person, Place, Thing, or Event.

Computed Values are a way of reducing a list of values into a representative value. The currently available aggregations are count, maximum, mean, median, minimum, and total, and the currently available function is contribution.

Characterizations are editorial judgments based on thresholds that determine the language used in communication goals when certain conditions are met.

Expressions are the various words Quill uses to express a particular concept generated by the combination of executing Narrative Analytics and Story Elements.

A Timeframe is a unit of time used as a parameter to constrain the values included in the expression of a Communication Goal or story.

Variability is variation in the language of a story. Variability is provided through having multiple Entity and Characterization expressions as well as option into additional sentence expressions through Language Guidance.

Authoring includes the Outline, Data Requirements, and Live Story. This is where you configure Communication Goals, map Entity Types and Attributes to values in the data, and review generated stories.

Data Requirements are how a user tells Quill the method by which we will satisfy a Communication Goal's data requirements. These are what a Narrative Analytic and Communication Goal need to be able to express a concept. These are satisfied either directly by configuration of the data requirements or through the execution of Narrative Analytics.

A Story Variable is the focus of a story supplied at runtime as a value from a data source (as opposed to a static value).

A Draft is an editable version of the story in a Project. Project Administrators and Editors have the ability to make changes to Drafts. Project Administrators can publish Drafts and create new ones.

The Data Manager is the part of the Project where Data Views and Data Sources backing the story are managed. This is where files are uploaded and database connections are added.

A Data View is a used by Quill to map the Outline's information needs against Data Sources. A Project can be backed by multiple Data Views that are mapped using Identifiers in the schemas.

A Data Source is a file or table in a database used to support the Narrative Analytics and generation of a story.

Admin allows you to manage all aspects of story generation other than language and data. This is where Monitoring, the Change Log, API Documentation, Project Settings, and Scheduling are located.

A12: Communication Goal Families

The example embodiment of Appendix A supports three communication goal families: Present, Callout, and Compare.

Present

The Present goal family is used to express an attribute of a particular entity or group of entities.

Most Present goal statements have the form “Present the attribute (or computed value) of the specified entity/group.” For example:

- Present the price of the car.
- Present the price of the highest ranked by reviews item.
- Present the average value of the deals made by the salesperson.

The two exceptions to this form are when the Count or Contribution computed values are used, in which case the statements look like this:

- Present the count of the group.
- E.g. Present the count of the franchises in the region.
- Present the attribute contribution of the entity to the parent entity.
- E.g. Present the point contribution of the player to the team.

Callout

The Callout goal family is used to identify the entity or group of entities that has some editorially-interesting position, role, or characteristics. E.g. the highest ranked salesperson, franchises with more than $1k in daily sales, players on the winning team, etc. Every Callout goal statement has the same structure: “Callout the specified entity/group.” For example:

- Callout the highest ranked by sales salesperson.
- Call out the franchises with more than 1,000 in daily sales.
- Callout the players on the winning team.

Compare

The Compare goal is used to compare the values of two attributes on the same entity. Every Compare goal has the same structure: Compare the first attribute of the specified entity to the second attribute. For example:

- Compare the sales of the salesperson to the benchmark.
- Compare the final value of the deal to the expected value.
- Compare the revenue of the business to the expenses.
  A13: Miscellaneous

A13(i): Charts

Quill is able to express certain configured goals as Charts, such as Bar and Line. These have default styling and colors and are guided by the Communication Goal's Narrative Analytics. Charts are supported in each available output format.

A13(ii): Supported Document Structures

Generally, Quill supports documents that are homogenous (uniformly structured) with stable keys. Example permutations of supported structures are described below.

A13(ii)(a): Single Document

In this example, as long as all documents contain the same keys (in this case, “a”, “b”, and “c”) Quill can use this data structure.

A13(ii)(b): Nested Documents

Documents with other documents nested within them are supported, though the nested documents must be homogenous with stable keys across documents.

A first example is:

{

“a”: {

- “aa”: 1,
- “ab”: 2
  },

“b”: {

- “ba”: 3,
- “bb”: 4

- {
  - “ba”: 11,
  - “bb”: 12
- },
  - “ba”: 20,
  - “bb”: 44
- }

]

}

]

A13(ii)(c): Unsupported Structures

The example embodiment of Appendix A does not support heterogeneous documents (non-uniform) or documents where values are used as keys.

A13(iii): Styling Rules

Oxford Commas

Quill does not use Oxford commas. So it writes like “Mary spoke with Tom, Dick and Harry” and not like “Mary spoke with Tom, Dick, and Harry.”

Spaces Between Sentences

Quill puts one space between sentences.

Dates

Year: Datetimes that are just years are expressed numerically.

2016→“2016”

1900→“1900”

Month and Year: Datetimes that are just months and years have written out months and numeric years.

2016-03→“March 2016”

2015-11→“November 2015”

Day, Month, and Year: Datetimes that are full dates are written out months with numeric days and years.

2016-03-25→“Mar. 25, 2016”

2015-11-05→“Nov. 5, 2015”

Percents

Percents are rounded to two places, trailing zeros are removed, and a “%” is appended.

53.2593→“53.26%”

53.003→“53%”

Ordinals

Ordinals are written with numerical contractions.

Decimals are written out with decimal parts and commas inserted.

1.1→“1.1”

1.9→“1.9”

123456789→“123,456,789”

Currencies

Currencies are currently assumed to be USD. In the future, they can be locale-specific (e.g. Euros). They're styled differently based on how big they are.

Less than One Thousand

Rounds to two decimal places. There are always two decimal places.

3→“$3.00”

399.9999→“$400.00”

Less than Ten Thousand

Rounds to an integer.

5000.123→“$5,000”

4171→“$4,171”

Less than One Million

Rounds to thousands with zero decimal places, appends a “K”

500,000→“500K”

123,456.789→“123K”

Less than One Billion

Rounds to millions with one decimal place if necessary, appends an “M”

500,000,000→“500M”

500,100,000.12→“500.1M”

Less than One Trillion

Rounds to billions with two decimal places if necessary, appends an “M”

500,000,000,000→“500B”

500,100,000,000.12→“500.1B”

500,130,000,000.12→“500.13B”

Supported Datetime Formats

The following datetime formats are supported in Quill.

Tuesday, Jan. 31, 2015

Tuesday, Jan. 31, 2015, 01:30 AM

2015-01-31T01:30:00-0600

01/31/2015 01:30:45 AM

31/01/2015 01:30:45

2015/01/31 01:30:45

A13(iv): Using Multiple Data Views

Users can satisfy their outline's data requirements using multiple data views. While it may often be more straightforward to create a de-normalized view in the source database, the following use cases are supported. These apply to both tabular and document-based data sources.

Single Entity Type, Attribute Lookup by Entity ID

Quill can return the Gender fromData View 2 associated with the Sales Person's ID inData View 1 using the Sales Person ID.

Data View 1

Sales Person ID	Sales Person Name

123	Aaron Young
456	Daisy Bailey

Data View 2

	Sales Person ID	Gender

	123	Male
	456	Female

Two Entity Types

Quill can match the Transactions inData View 2 to the Sales People inData View 1 by Sales Person ID.

Data View 1

Sales Person ID	Sales Person Name

123	Aaron Young
456	Daisy Bailey

Data View 2

Transaction ID	Amount	Sales Person ID

777	$100.00	123
888	$70.00	456
999	$20.00	123

A13(v): Permission Structure

Quill Access

	Create	Create	API	Create
Role	Organizations	Users	Token	Projects

Site	X	X	X	X
Administrator
Organization		X	X	X
Administrator
Organization			X	X
Member

Project Access

				Create and	Live Story:
	Add	Edit	Live Story:	Publish	Review
Role	Users	Story	Edit Mode	Drafts	Mode

Administrator	X	X	X	X	X
Editor		X	X		X
Reviewer					X