US20170116179A1

Movatterモバイル変換

Info

Publication number: US20170116179A1
Application number: US15/316,822
Authority: US
Inventors: Maud GAGNÉ-LANGEVIN; Valeriy FEDOROV; Maksim GRITSAY; Alexander POTAPOV; Mikhail PELYANSKIY; Svetlana KRIVOSHEY; Anna SHABALINA; Vitaliy BUNCHUK
Original assignee: Foulnes Services Corp
Current assignee: Foulnes Services Corp
Priority date: 2014-06-06
Filing date: 2015-06-08
Publication date: 2017-04-27
Also published as: US20170185569A1; CA2901703C; CA2996314A1; CA2901703A1; WO2015184554A1; US20190251142A1; US20200257848A1; GB201700115D0; GB2542525A8; WO2015184534A1; US10140263B2; GB2542525A

Abstract

A method and system are provided for processing a document comprising a plurality of content portions. The document includes code identifying tasks corresponding to at least one content portion of the document, and code defining an associated user interface element. When the user interface element is activated to invoke the task, tasks to be executed by a remote system are executed by the remote system using a remotely stored copy of the associated content portion, while tasks to be executed by a local system are executed by the local system using a locally stored copy of the content portion. Changes to at least certain portions of the document are synchronized between the local and remote servers. The tasks can include a consistency-checking task for verifying consistency of certain content within the document, and display of results with optional suggested corrections to permit manual or automatic correction of detected discrepancies.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS

This application references and claims priority to U.S. Provisional Application No. 62/008,743 filed on Jun. 6, 2014 and to International Application No. PCT/CA2015/050427 filed on May 12, 2015, the entireties of which are incorporated herein by reference.

BACKGROUND

1. Technical Field

The present disclosure relates to document processing, and in particular parsing and handling of document content for the purposes of document editing, validation, and analysis.

2. Description of the Related Art

Numerous solutions have been proposed for automated document creation and review to reduce the workload on those personnel tasked with scrutinizing and validating documents. For instance, some tools automatically generate documents from brief answers entered in a questionnaire; the input information is used to populate a standard form document. This type of tool is suitable for documents that only require input of discrete, atomic items of information (such as names, addresses, asset or debt listings and the like), where the remaining document content is pre-written, and the interpretation of the document is less likely to be nuanced by the input information. Examples of such documents include loan applications and purchase orders.

Such automated document creation and review tools, however, are less suitable for “bespoke” documents in which much of the content is customized to reflect unique situations or relationships between parties. Examples of such bespoke documents can include prospectuses and other disclosure documents of different types in various commercial and industrial sectors, non-standard contracts, court pleadings, and even patent applications. Other computerized solutions have been proposed to automatically validate legal instruments and complex business documents through text analytics and other techniques to compare document content against predetermined text passages.

These solutions, generally, are intended to improve efficiency in the review and validation process by eliminating or reducing the need for human clerical or professional skill and judgment. While improved efficiency may be a desirable goal, these types of automated solutions are heavily reliant on proper advance preparation of a standard form document, or of a library of standard texts and a lexical analysis engine.

BRIEF DESCRIPTION OF THE DRAWINGS

In drawings which illustrate by way of example only embodiments of the present application,

FIG. 1 is a schematic illustrating possible physical layouts of documents containing similar content.

FIG. 2 is a schematic of select components of a client computing system optionally in communication with a network and a scanning device.

FIG. 3 is a schematic of select components of a server computing system.

FIG. 4 is a further schematic of select modules of the server ofFIG. 3.

FIG. 5 is an illustration of a data processing environment including client and server systems.

FIG. 6 is a flowchart illustrating an overview of initial processing of a document by the server system.

FIG. 7 is a flowchart illustrating processing and alteration of a document for delivery to the client system.

FIG. 8 is a flowchart providing further details of select aspects of the process ofFIG. 7.

FIG. 9 is a flowchart providing further details of select aspects of the process ofFIG. 8 pertaining to the insertion of code in the document.

FIG. 10 is a flowchart illustrating an overview process for rendering and displaying the altered document at a client system.

FIG. 11 is a schematic illustrating an initial physical layout of a document prior to alteration.

FIG. 12 is a schematic illustrating a further physical layout of the document after alteration including rendered components resulting from inserted code.

FIG. 13 is an illustration of a graphical user interface at the client system presenting the altered document.

FIGS. 14 and 15 are illustrations of a graphical user interface during selection and presentation of elements of the altered document.

FIG. 16 is an illustration of a graphical user interface displaying related citations or references for a selected element of the altered document.

FIGS. 17 and 18 are illustrations of a graphical user interface during an operation on a selected element of the altered document.

FIG. 19 is an illustration of a graphical user interface during a further operation on a selected element of the altered document.

FIGS. 20 to 23 are illustrations of a graphical user interface during operations to insert a further element into the altered document.

FIG. 24 is a flowchart illustrating a process for operating on elements of the altered document.

FIG. 25 is an interaction diagram illustrating data flow between various client and server components in response to changes to the altered document.

FIG. 26 is a flowchart illustrating one possible process for server handling of changed and validated elements of the altered document.

FIG. 27 is a schematic illustrating possible states of a memory stack at the server.

FIG. 28 is an interaction diagram illustrating data flow between various client and server components in response to validation and download instructions.

FIG. 29 is an illustration of example tabular content and accompanying footnote or free text in an example document.

FIG. 30 is a schematic illustrating possible author-applied formatting in a table cell in the example ofFIG. 29.

FIGS. 31A and 31B are schematics illustrating a possible method of handling of table cell values in memory.

FIG. 32 is a flowchart illustrating an overview of possible handling of tabular data in response to invocation of a task.

FIGS. 33, 34, and 35 are flowcharts illustrating overview processes of tabular data handling during task execution.

FIGS. 36A, 36B, and 36C are schematics illustrating an example of handling of table cell values in memory and transfer to the editing copy or altered version of a document.

FIG. 37 is a schematic illustrating a possible association between footnote or free text elements and reference elements in a portion of a document.

FIG. 38 is an illustration of an example graphical user interface presenting a report of results from a consistency-checking task.

FIG. 39 is a flowchart of a process for footnote/endnote consistency checking.

DETAILED DESCRIPTION

Many documents generated or received in the course of the operation of an enterprise or other organization are subject to approval or review mechanisms that can involve review or validation against pre-set rules or requirements, best practices, and/or internal consistency requirements.

Some types of documents lend themselves more readily to automated processing for validation purposes than others; for instance, documents that predominantly consist of line items with relatively short descriptions, such as invoices, requisitions, bills of lading, etc. can be automatically rendered in computer-understandable format if they are not already (e.g., by optical character recognition (OCR)), and their content compared to predefined templates with relative ease. Indeed, a number of standards have been defined for electronic document creation and exchange for business, transport, engineering, and medical purposes. Such standards are most easily implemented where language and forms of expression are normalized and there is strict or nearly-strict adherence to normalized expressions when the documents are generated.

Other types of documents are subject to greater variation between one document and the next, not only in substantive content, but also in expression. This can occur when the author(s) or publisher of a document are able to exercise creative or professional control over the document content, even when the substantive content is prescribed by a rule-maker or guideline. This situation arises, for example, in the context of financial or corporate disclosure documents: while governing regulations may identify required components of a disclosure document, there may be different manners of expressing these required components in text. The actual text content of a document will necessarily vary according to the subject of the disclosure document, the disclosing entity, and/or the practices and writing style of the individual (or individuals) preparing the document.

Furthermore, the creator of the document, or the party responsible for preparing the document for publication, may apply their own layouts and page designs to the document. Such layouts and designs may be intended to improve human comprehension of the document, for instance by presenting data in tabular format, or by applying different formatting to different parts of the text, such as titles or headings, subheadings, paragraphs, and the like. Formatting can include text alignment (left-aligned, right-aligned, justified, or centered), font face and size, text decoration (e.g. bold, italics, underline) or variations in tabular layouts (e.g., merging or splitting individual cells). Some content may be presented as footnotes or endnotes rather than contained in the main body text of the document, and are denoted by numbers, letters, or other symbols that are referenced in the main body text. Other layout and design features can include ornamental features that are not primarily intended to affect human comprehension of the document, such as shading, color, and graphic elements. The combination of layout and design features can be considered to be the presentation template or presentation format of the document. The presentation format of the document may be defined in a word processing or desktop publishing template that is applied to an electronic form of the document, or it may be created on the fly by the document creator or preparer.

FIG. 1 illustrates, in schematic form, different presentation formats for similar document content intended to be presented on standard-sized sheets of paper (e.g., letter size (8.5×11″) or A4). Two

documents

100aand100bwith similar content are illustrated. As can be seen in the drawing, the content ofdocument100abreaks over three sheets of paper, or

pages

101a,102a, and103a, whereas the content ofdocument100bbreaks over only two

pages

101band102b, due primarily to design variations between the presentation formats of

documents

100a,100b. In this example, both documents include amain title102;headings103 preceding major sections or portions of content; optional subheadings; paragraphs of text; and images. While themain title102 and theheadings103 may contain identical content and are laid out in a similar manner (here, themain title102 is centred in both documents, whileheadings103 are left-aligned), other content is presented differently. For example,text portions104aindocument100aare laid out substantially across the printable width of the document in a single column, the same content intext portions104bindocument100bare laid out in a two-column arrangement. The second page of either document100a,100bmay include the same or very similar biographical content, laid out differently. Indocument100a, the names of the subjects are presented insubheadings105, but not presented as subheadings indocument100bat all. Indocument100b, they are instead included in the mainbiographical text107b. Both

documents

100a,100bcan include images of the subject106a,106b; but indocument100b, the text inbiographical text107bwraps around theimages106bwhereas indocument100athebiographical text107ais presented to the right of theimages106a.

These layout differences may result in similar content occupying more or less of a single page; the effect is illustrated inFIG. 1, asdocument100aspans three pages whiledocument100brequires only two. In addition to such layout differences, different choices may be made regarding font size and face, line spacing, margin widths, header and footer depths, and so on, also resulting in different white space and/or pagination. WhileFIG. 1 illustrates document content intended for presentation on paper sheets, it will be appreciated by those skilled in the art that differences in presentation format can also affect pagination of content in electronic documents. It is common, for instance, for documents formatted for print to be rendered in PDF or other electronic document formats, so the differences resulting from the layout and formatting choices in

documents

100aand100bwould apply to electronic versions of the documents as well. Differences in layout and formatting will similarly affect the appearance and pagination of documents primarily intended to be presented onscreen, such as slides (e.g., Microsoft PowerPoint™ format) or webpages.

Moreover, the documents that are subject to approval or review may not have been generated by the party conducting the approval or review process. Instead, the approving or reviewing party may be attempting to review a third party document. In those cases, the reviewing party may not have access to a source electronic document that would permit easy access to electronically searchable text content for automated review purposes. Rather, the reviewing party may have been provided only with a printed version of the document, which must be scanned and converted to a format that can be electronically processed (e.g., by OCR). In that case, the presentation format may interfere with the OCR process.

Aside from these impediments to automated processing for the purpose of approval or review, it will also be appreciated by those skilled in the art that the approval or review process itself is subject to change. Best practices and internal requirements may evolve over time, and rules and requirements may similarly evolve and change. Often times, this guidance originates outside the organization, and the information sources for this guidance may be decentralized. For instance, laws, regulations, and guidelines governing the content of documents may originate from a number of jurisdictions. Best practices and other guidance may be published in secondary sources that are either print or electronic, such as textbooks, reference books, online databases, and the like.

Still other sources of reference information that may be used in reviewing or approving documents are exemplars or precedents. In the legal field, for example, precedent documents are used as models to assist in composition of new legal documents. The precedent document itself may be modified to add new content and delete irrelevant content, or else the new document is composed and then compared to the precedent. Similarly, in other fields, existing documents may be used as models to assist in the composition of new documents. These exemplars or precedents may have originated from third parties, and may have been received in either print or electronic format.

All of the foregoing reference information sources may be available electronically, but even so, they are typically not integrated into an automated document review process. For instance, the reviewing party may be editing or reviewing the document by computer using a word processor, but reference materials may be stored in an online resource accessed using a web browser or dedicated application.

Further, as noted above, many automated solutions that have been proposed for improving efficiency in document generation, review, and validation are generally intended to reduce the need for clerical or professional input. While automation in this manner may reduce the human resource cost in generating and reviewing documents, there still remains a need for the exercise of professional skill and judgment in the preparation and review of many types of documents, particularly “bespoke” documents. Human judgment cannot be completely delegated to computers.

Accordingly, the examples and embodiments described herein provide an improved system, method, and data processing device-readable medium for implementing and managing automated document analysis and review in combination with task management and execution so as to improve efficiency in document generation, review, editing, and validation. Electronic versions of documents, which may have digitized from a printed source and/or may have been originally generated in a non-standard layout or format, are validated against prescribed elements either defined in advance for the document type or identified within the document itself. Based on the result of the validation, specific tasks are identified for discrete portions of content within the document and, in some embodiments, the document is altered to include identification of the tasks. While the validation, identification of tasks, and alteration of the document may be carried out remotely from the user at a server system accessible over a network, the tasks can be invoked at a client device when the document is displayed. In some examples, the identification of the task added to the document includes code for rendering a graphical user interface element for display together with the relevant document portion on a display screen, such that the graphical user interface element can be actuated or activated to invoke the related task. The system thus facilitates on-point identification and execution of tasks for discrete portions of a single document, and, in some embodiments, merges the user's supplied document with a task-based framework to produce a portable, combined editable document and task list that the user can edit and execute on a variety of platforms and locations. The tasks may include validation tasks, checks for data consistency, data lookups (e.g., to query a source for relevant reference material), and automated generation of reports based on the document content. When the user views the combined document on a client platform, certain tasks may be executed by a server. Changes made to the document resulting from invocation of tasks may be stored locally or selectively transmitted to the server for remote storage.

These embodiments and examples are described and illustrated primarily in the context of a data processing environment comprising one or more data processing systems, which may operate over a local or wide-area network.FIGS. 2-4 illustrate select components of data processing devices or systems that are suitable for use in the contemplated data processing environment.

FIG. 2 is a block diagram of select components of an example clientdata processing system110, which may be embodied in a single device, such as a desktop computer, workstation or terminal, mobile computer (e.g., laptop computer, tablet computer, or smartphone). While theexample system110 is illustrated herein as a desktop computer or workstation, it will be appreciated by those skilled in the art that this is not intended to be limiting, and the solutions described herein may be implemented on any suitable data processing device that is configurable to operate as described, whether or not this device is primarily intended for productivity uses or other types of uses.

Operation of thesystem110 is generally controlled by a main processor orprocessors112. Thesystem110 may be operated under mains power or may be a battery-powered device; these features are not illustrated inFIG. 2 for ease of exposition. Data, programs, and other instructions or information can be stored in one of several possible memory components of thesystem110, such as internal memory114 (which can include standard volatile and non-volatile memory components, which can be integrated with other components such as theprocessor112 or provided as distinct components). Information can also be stored in thesystem110 on other storage devices, either internal or external, such as hard drives, flash drives, memory cards, and peripheral devices, not shown inFIG. 2. Typically, software and data components such as the operating system (OS)130, programs (applications)140,application data150, and user data160 are stored in resident persistent memory. In somesystems110, some components of theOS130 may be embedded as firmware in integrated memory in theprocessor112. However, portions of such components may be temporarily loaded into volatile memory. In this example, theprograms140 can include, among various applications that may be installed during initial configuration by the manufacturer or distributor of thesystem110, or after receipt by the user or an administrator, a general purpose user agent such as aweb browser application142 and/or a dedicated document editing andverification tool144. Either thebrowser142 or thededicated tool144 may be used to implement the examples described here.

Implementation using abrowser142 provides, among other advantages, improved mobility and portability on the part of users, who may be able to access the server system providing various services such as validation, mentioned above, from any suitable clientdata processing system110 without requiring installation of specialized software aside from scripts and other code downloaded by the browser. On the other hand, adedicated tool144 provides developers with greater control over the operation of the tool on theclient system110 without requiring compatibility with current web standards. The benefits of either type of implementation will be understood by those skilled in the art. While the examples described here are described in the context of implementation in a browser, it will also be understood that this context is not intended to be limiting. In any event, it is contemplated that in browser implementations, these examples may conform to known standards for the structure and presentation of content, in particular HTML5, published by the World Wide Web Consortium (W3C) at w3.org. In addition, these examples may comply with companion and predecessor standards and specification, including without limitation HTML 4.01, XHTML 1.0 and 2.0,DOM Levels 1 through 3, andCSS Levels 1 through 3 andLevel 4 modules, also published by the World Wide Web Consortium (W3C) at w3.org. Many standards are under revision or may be replaced in future, and it is expected that the examples described herein will be implementable under successor or replacement standards. Resources used in these examples may include or be associated with elements such as scripts written in JavaScript™ published by the Mozilla Foundation, Mountain View, Calif., www.mozilla.org (trademark owned by Oracle Corporation, Redwood Shores, Calif.) or in other scripting languages designed to enable programmatic access to computational objects within a host environment; Adobe Flash and Flex technologies from Adobe Systems Incorporated, San Jose, Calif.; video files in any one of various compatible formats, including Flash, Quicktime, MPEG and in particular MPEG-4; dynamic HTML technology, widgets, modules, code snippets, and the like, which may be delivered together with documents and webpages to theclient system110, or which alternatively may be downloadable separately by theclient system110, progressively downloaded, or streamed from a server.

The examples described herein may be implemented using one or more of the foregoing technologies and other combinations of technologies. Further, the resources may be executed in browser, microbrowser and browser widget environments implemented using various known layout engines including, without limitation, WebKit (available at webkit.org), Gecko (Mozilla Foundation), Trident (Microsoft Corporation, Redmond, Wash.), Presto (Opera Software ASA, Oslo, Norway) and the like designed for various runtime environments including Java™ (Oracle Corporation, Redwood Shores Calif.), OSX™ and iOS™ (Apple Inc., Cupertino Calif.), and Windows™ (Microsoft Corporation), among others. Accordingly, the browser may be provided with one or more plug-in modules adapted for processing and rendering ancillary items, such as plug-ins for rendering Flash content. Suitable browsers that are currently in widespread usage include Google Chrome™, available from Google Inc., Mountain View, Calif.; Mozilla Firefox™, from Mozilla Foundation and Mozilla Corporation, Mountain View, Calif.; Internet Explorer™, from Microsoft Corporation; and Safari™, from Apple Inc.

The relevant environment need not be restricted to a browser environment; for example, other runtime environments designed for implementation of rich media and Internet applications may be used, such as Adobe Integrated Runtime (AIR)™, also from Adobe Systems Incorporated. The selection and implementation of suitable existing and future structural or presentation standards, various elements, scripting or programming languages and their extensions, browser and runtime environments and the like, will be known to those of skill in the art.

Application data

150, including data stored by thebrowser142 ordedicated tool144, may be stored in persistent memory of thedata processing system110, as mentioned above. The data may be stored on astorage device116, or may be stored volatile memory instead. Allocation of local storage to applications may be managed by theOS130. In the case where the examples herein are implemented using a browser182, the application data may be stored as an HTML local storage object, as defined in HTML5. User data160, which can include information intended for longer term storage (i.e., longer than an individual application or browser session), such as contacts, message stores, word processing files, and the like, may be stored in resident persistent memory or on astorage device116. Permission to access local application storage or user data may be limited to the application owning or creating the data, although permissions may be configured differently so that other applications or functions executing on the device have access to data objects created by other applications.

Thedata processing system110 is provided with user orsensor input devices118. User input devices can include a touch and/or pointing device, such as a touchscreen, touchpad, mouse, or trackball; a keyboard; security peripherals such as a biometric scanner; and multimedia input devices, such as cameras or microphones. Thesystem110 may also have environmental or contextual input devices such as an orientation or inertial navigation sensor (particularly in the case of a touchscreen device), ambient light sensor, or a global positioning system (GPS) or other location detection module. Thesystem110 can also include one ormore output devices120, including in particular a display screen, which may be integrated in the chassis of thedata processing system110, or else provided as a peripheral device. Thesystem110 may be configured to output data to an external monitor or panel, tablet, television screen, projector, or virtual retinal display, via a data port or transmitter, such as a Bluetooth® transceiver, USB port, HDMI port, DVI port, and the like. The data port or transmitter may be one of thecommunication subsystems122 illustrated inFIG. 2. Graphics data to be delivered to the display screen is either processed by theprocessor112, or else by a dedicated graphics processing unit, not included inFIG. 2. Other output devices include speakers, and haptics modules.

Not all of these suggested input or output devices are required, and many may be omitted. For instance, where the primary user interface of thesystem110 is a touchscreen, a physical keyboard may be omitted altogether.

Communication functions, including data and optionally voice communications, are performed through one ormore communication subsystems122 in communication with theprocessor112. Other functional components used to accomplish communication functions, such as antennae, decoders, oscillators, digital signal processors, and the like, may be considered to be part of these subsystems. Wireless communication subsystems are used to exchange data with wireless networks or other wireless devices in accordance with one or more wireless communications standards. New wireless standards are still being defined, but it is believed that they will have similarities to any network or communication behavior described herein, and the examples described here are intended to be used with any suitable standards that are developed in the future. The wireless link connecting the communication subsystems may operate over one or more different radiofrequency (RF) channels according to defined protocols, such as wireless LAN (e.g., one or more of the 802.11™ family of standards), near-field communication, Bluetooth® and the like. The particular design of a communication subsystem is dependent on thecommunication network410 with which it is intended to operate. Thecommunication subsystems122 may include adaptors for use with wired connections as well.

It will be understood by those skilled in the art that the components illustrated inFIG. 2 are merely representative of particular aspects of thedata processing system110, and that other components that are typically included in such a device have been excluded in the drawings and this description only for succinctness. Furthermore, those skilled in the art will understand that thesystem110 may be successfully used with the various examples described herein even when some components described in relation toFIG. 2 are omitted.FIG. 2 illustrates in particular one additional peripheral for use with thedata processing system110, ascanner165. This equipment is optional, but is noted as a particular optional peripheral for thesystem110 since the example documents discussed herein may be initially obtained from printed documents, then digitized and converted either at theclient system110 or server system.

Turning toFIGS. 3 and 4, select components of a serverdata processing system200 are illustrated. Again, it will be appreciated by those skilled in the art that these components are merely representative, and that some of these components may be omitted or substituted while still achieving successful operation of the embodiments and examples described herein. InFIG. 3, components similar to those of the client data processing system100 are illustrated, including one ormore processors210,memory220,storage devices230, input and

output devices

240,250 respectively, andcommunication subsystems260. The appropriate selection of components for aserver system200 will be known to those skilled in the art. While theserver system200 may includelocal storage devices230, data processed or managed by the server may be stored remotely from theserver system200, for example on a file server, not illustrated.

FIG. 4 illustrates component of theserver system200 from a functional perspective. Thesystem200 may be implemented on multiple data processing devices, and not merely one. Thesystem200 may include acommunications interface module310, which brokers communication with other systems or services, as well as theclient system110. The communications interface may include an HTTP server, where theclient system110 accesses theserver system200 using a web browser. Thesystem200 can also include anauthentication service320 for authenticating users and granting access to the functions provided by theserver system200, and a conversion orparsing service330 which converts received documents to a standardized structured document format, such as HTML. Theconversion service330 may be optional in thedata processing system200, since not every document may require conversion. Theconversion service330 may also be operated outside the domain of thedata processing system200, and by a third party; for example, a third party conversion service may be used for those documents that will require conversion.

Theserver system200 also includes aformatting module340, which is used to normalize the formatting of converted or uploaded documents. Avalidation module350 operates to carry out validation tasks, such as data conformity and consistency checks, on document content. Both theformatting module340 andvalidation module350 retrieve template data, validation criteria, and/or rule sets from adata store380 to carry out their functions, and store updated data that they create (e.g., formatted documents, updated state information) in a document andstate data store390. Thesystem200 also includes anediting module360 and a rollback orbackup module370, which access copies of the document or portions thereof stored in thedata store390. Theediting module360 implements editing instructions received from theclient system110 on the document, and therollback module370 permits the user to revert the state and content of the document to an earlier stage in the editing process.

The client and server

data processing systems

110,200 may be employed in adata processing environment400 such as that illustrated inFIG. 5. This figure illustrates one possible network topology for use in theenvironment400, and is by no means limiting. In this example, the clientdata processing system110 communicates with the serverdata processing system450 over awide area network410, such as the Internet. Thenetwork410 need not be the Internet, or a wide area network; thenetwork410 may be public or private, wide or local, fixed or wireless. It is expected that a common implementation will, however, be over the Internet or a wide area network, in view of the current popularity of cloud-based services. However, this is by no means the only implementation possible, or the only implementation contemplated herein. In many examples, theclient system110 and the

server system

200 or450 may be physically and geographically removed from one another. In other examples, however, the two systems may be provided at the same physical location, for instance in communication over a local area network. Either way, the two systems may be considered either physically or logically “remote” from one another.

In another example, theclient system110 and the functions of theserver system200 are integrated at a single site, for instance within theclient system110. In other words, theformatting340,validation350, editing,360 androllback370 modules illustrated inFIG. 4 may be implemented by theclient system110. In that case, thecommunications interface310 and/orauthentication service320 may not be required. As will be apparent from discussion below, however, even when a client-server implementation is used, certain validation and editing functions may be carried out at theclient system110, even though other validation and editing functions are carried out at theserver system450. And, as noted above, theconversion service330 may be provided by a third party.

The components of theserver system450 and/or the clientdata processing system110 may be implemented on separate data processing devices, and thus each of these components may be considered to be logically and/or geographically “remote” from one another. In theenvironment400 illustrated inFIG. 5, theauthentication service452,conversion service456,main processing server454, anddata repository460 are illustrated as discrete server implementations; they may be located remote from one another, rather than integrated into a single server computer. However, two or more of these functions may be integrated into a single server. Also, as mentioned above, theconversion service456 may be implemented by a third party, in which case it may not be considered part of theserver system450. The authentication service may also be optional, and excluded from theserver system450. Thedata repository460 may comprise one or more file servers, or may be themain processing server454's storage device. Thedata repository460

stores code

462, template content464, rule sets466, and validation criteria468 for use in processing documents. Thedata repository460 can also includereference text data472, which can include information from reference or authoritative texts, and third-party data uploaded to theserver454 for use in comparative analysis or data consistency validation. Thedata repository460 can also includebackup files474, for example for use by therollback module370. In addition, a copy of the document currently being processed may be stored in thedata repository460, or else in local storage of themain server454.

The automated document processing carried out by thedata processing environment400 may include a number of stages, such as initial document loading and conversion; processing and alteration of the document to embed tasks, and delivery of the document and optionally accompanying presentation code to theclient system110; rendering and presentation of the altered document at the client system; server-side validation and automated revision; client-side validation and editing; rollback; reference queries; benchmarking; report generation; and finalization and delivery of a final document. Not all stages may be implemented in an analysis/review cycle for a given document.

FIG. 6 provides an overview of the initial loading, conversion, processing, alteration, and delivery of the document and code to aclient system110 by theenvironment400 ofFIG. 5. At505, theclient system110 initiates a request for access to theserver system450. Theauthentication service452 governs access by the user at the client data processing system100 to theserver system450. For example, where a browser application executing at theclient system110 is used to access theserver system450, the browser sends an initial authentication request, and authentication may be carried out by theauthentication service452 at510 using an appropriate authentication method. The authentication method may involve single- or multiple factor authentication; for instance, the user may be provisioned with a username and password to use as credentials for authentication, and in addition to this, is optionally provided with a physical or digital token bearing additional authentication data (e.g., a digital certificate) for use in authentication. The user may be provided with an account at theserver system450 which, in some embodiments, is allocated persistent storage in a data store of theserver system450 for storing data such as thedocuments20 and revised versions of the document, as well as further reference data as discussed below.

Once granted access, the user at theclient system110 may upload one ormore documents20 for processing to theserver system450 at515. The documents are uploaded in a digital form. In some cases, the digital version of thedocument20 may be generated from a non-digital (e.g., paper) originatingversion 10 of the document, as indicated inFIG. 5. A printed version of the document may be digitized locally at theclient system110 site, for instance using thescanner165 illustrated inFIG. 2. Thus, thedocument20 that is initially transmitted to theserver system450 at515 may be an electronic file comprising document content (text, images, tables, etc.) in an open or proprietary document format, such as a word processing or text file format (e.g., Microsoft Word™ format; OpenDocument™ text format; Portable Document Format; Rich Text Format; plain text), or a webpage or text file in markup format (e.g. HTML or other markup format). In some cases the document content may be contained in image files as a result of digitization, and will require optical character recognition (OCR), which may be implemented at either theclient system110 or theserver system450, or as part of the conversion process. In other cases, the document is not uploaded at515, but rather loaded from a data store at theserver system450 or obtained from another remote data store, not illustrated, over thenetwork410. For example, rather than selecting a document for uploading to theserver system450, the user may instead identify a document location by uniform resource identifier (URI). In some implementations, however, users may prefer that no permanent or non-transient copies of the user's documents are stored at the server for security and confidentiality reasons. In that case, the document would not be retrieved from a data store at theserver system450; instead, the user may be required to upload the document or provide a document location at the beginning of each working session, and download the edited or validated version of the document for local storage at the end of each session.

Thedocument20 is received by theserver system450 at520. A determination is made whether the document requires conversion to a different format. In these examples, processing carried out by theprocessing server454 is carried out on an HTML version of thedocument20, and once processed, the document is provided in HTML format to theclient system110. Thus, when thedocument20 is received, at525 a determination is made whether the document requires conversion to HTML. Where HTML format is not used by the application executing at theclient system110—for instance, when a dedicated tool184 uses a proprietary or other type of document format—then conversion to another type of format may be required. It should be noted that while the examples described here are described using HTML notation and format, the embodiments described herein need not be so limited; other document formats may be used in place of HTML. When conversion is required, the conversion is carried out530 by theconversion service456, which as noted above may be included as part of theserver system450. The conversion service may carry out any required OCR in order to present textual document content in text form. Suitable conversion services or modules will be known to those skilled in the art. An example of a Word document to HTML converter is the built-in function of Microsoft Word, and an example of PDF to HTML conversion is theBCL easyConverter SDK 4 Word/HTML converter, from BCL Technologies, San Jose, Calif.

The HTML document, either provided by theclient system110 in this format, or converted from another format by theconversion service456, is processed at535 by theprocessing server454 to normalize the formatting of the document and to identify certain prescribed elements in the document in accordance with a corresponding framework identified for the document. A framework includes, in these examples, optional templates464, rule sets466, and validation criteria468 defined in advance for the document. A “prescribed element” is contained within one or more content portions of a document. As will be appreciated from the discussion below, a “content portion” of a document is an atomic element or unit of content within the document. Each content portion may be identified by pattern or structural feature. Examples of identification by pattern include defining a content portion as the content filling a single page of the original document, if converted from a paginated document; and defining a content portion as each portion of the document consisting of a title or heading-like content followed by one or multiple contiguous content elements sharing common attributes, such as a heading and its following paragraphs up to the next heading. Examples of identification by structural feature include defining each content portion as the content of a single <div> element in an HTML document, or those <div> elements that have a particular parent-child relationship with other <div> elements; and defining each content portion as a single atomic HTML element or other atomic structural or programmatic element of the document, such as a heading, paragraph, image, and the like.

A “prescribed” element or other element of the document, in this context, is not necessarily a structural element (like an HTML element), but rather comprises a unit of substantive content within the document or that is intended for inclusion in the document. Such units of substantive content may be defined by subject or theme; for instance, a unit may include one or more headings, paragraphs, tables, images, and/or footnotes or other references pertaining to a particular category or subcategory of information. Substantive content need not be literary in nature; it may include one or more sets of data, charts, images, and graphs (for example, as may be presented in a technical, scientific, or environmental report). A prescribed element comprises a unit predefined for a document type. For example, a particular type of document may be expected to include information about a particular subject, or a table contain certain data, and thus that information would form part or all of a prescribed element. A prescribed element may contain sub-elements; for instance, a complete prescribed element may include a particular title or heading, together with a table or paragraph of content.

In addition, the state of each of a set of prescribed elements predefined for the document is determined. This processing is used to identify tasks associated with the document. Identifying code associated with at least some of the identified tasks is inserted into the document, and document thus altered, together with state information and additional presentation code, are provided to the client system at540 as indicated inFIG. 5 bydata45. At545, theclient system110 receives and renders the altered document for presentation. The rendering can include execution of other processing to identify additional tasks at the client side. After receipt at theclient system110, as discussed further below, various tasks identified in the document and/or editing are carried out based on instructions andother data25 sent from theclient system110 to theserver system450, resulting in changes to the document, which are reflected in an updated version of the document rendered and displayed at theclient system110. Ultimately, a final version of thedocument50 is produced and transmitted back to theclient system110. Thefinal document50 may be provided in HTML format, or converted to the original format of thedocument20 received from theclient system110 with any presentation code inserted earlier by the server removed.

FIG. 7 further breaks down the document processing functions carried out on the document once converted to HTML. At605, theprocessing server454 loads the document (converted or originally provided in HTML format). At610, a determination of document type or kind is made. The document type may be identified by the user at theclient system110 at the time the document is initially uploaded, or else automatically determined by theserver system450 based on a comparison of keywords or document structure to keywords or structure information in various stored templates or frameworks. For instance, in the case of corporate disclosure documents, the type may be identified as a “proxy circular”, “annual information form”, and so forth. Based on the identification of document type, theserver454 loads information from acorresponding framework615. A framework comprises an identification of predefined prescribed elements for the document type, various rules and validation criteria for determining conformity of document content to prescribed elements, and an identification of tasks associated with the document type and/or prescribed elements. Table 1 illustrates example content of a framework for a specific document type. The tabular form presented below does not necessarily represent the data structure in which the framework information is stored:

TABLE 1

Example framework information for a document type.
Framework 1
Document Type <type id/name>

Prescribed
Element	Attribute/Type	Validator(s)	Rule(s)	Task(s)

<identifier 1>	<attribute 1>	<v_set 1>	<r_set 1>	<t_set 1>
<identifier 2>	<attribute 2>	<v_set 2>	<r_set 2>	<t_set 2>
<identifier 3>	<attribute 3>	<v_set 3>	<r_set 3>	<t_set 3>
<identifier 4>	<attribute 4>	<v_set 4>	<r_set 4>	<t_set 4>

Thus, a framework is defined for a given document type or kind (“type id/name”), and defines a set of prescribed elements and any sub-elements of the prescribed elements (all named in this example as “identifier 1” through “identifier 4”) for the document. Prescribed elements may be predefined for the document according to any authoritative text or guideline applicable to the document. For example, guidelines for the document may require or recommend inclusion of certain kinds of substantive content (e.g., compensation data, biographical information). Each prescribed element and any sub-element thereof is defined according to an attribute or element type, one or more validation criteria (“Validator(s)”) and one or more rules, and is associated with one or more tasks. In Table 1, the first prescribed element or sub-element (“identifier 1”) is defined as having an attribute or element type of “attribute 1”, and is associated with a set of validation criteria “v_set 1” and a set of rules “r_set 1”, and is further associated with a set of tasks “t_set 1”. The attribute or element type may be an HTML element or attribute; for instance, a given prescribed element may be defined as an HTML heading or table, or a particular level of heading. The designation of an attribute or element type is used to facilitate validation and correlation of tasks to document content, as will be seen below.

Validation criteria can include keywords or structural requirements used to determine whether a given prescribed element is present, missing, or incomplete in the document content. For instance, a prescribed element may comprise a particular title or heading in the document, in which case the validation criteria can include specific keywords in the particular title, or acceptable synonyms. Rules can include requirements for presence of exact keywords or synonyms, and in some cases a requirement that a particular keyword or synonym not be present in the vicinity of another keyword or synonym in a given content portion (e.g., for a determination that a particular portion is an “indoor air emissions” prescribed element, a rule may require that the word “emissions” be present and the word “outdoor” or a synonym like “outside” or “external” not be within a specific range of words, lines, or sentences of “emissions”). As another example, a prescribed element may comprise multiple sub-elements, so the validation criteria may include requirements for location or adjacency in the document; for example, a prescribed element that is defined as comprising a title and tabulated data may be considered present and complete in the document if a particular type of HTML element that contains specified keywords (such as a title with a specific phrase) is found (the first sub-element) and is present in the document adjacent or substantially adjacent to another HTML element, such as a table structure (the second sub-element) that also meets its validation criteria. On the other hand, that prescribed element may be determined to be present but incomplete if the first sub-element is found but not the second, or vice versa. Thus, the framework may contain multiple validation criteria and rules for a given prescribed element. Validation criteria may be established by subject matter experts for the given document type, or by automated analysis of exemplar documents. Keyword synonyms may be detected by monitoring user word choices.

A prescribed element may also be associated with multiple types of tasks. In the framework, the validation criteria, rules, and tasks may be represented as pointers to another data structure that contains the actual criteria, rules, and task definitions. In some cases, different prescribed elements may have common validators, rules, or tasks, so the relationship among prescribed elements and these characteristics may be a many-to-many relationship.

Returning toFIG. 7, at620 the existing document formatting is “normalized” according to predefined rules. As will be discussed further below, conversion of the document to HTML format (or whatever other standardized format), due to design choices made by the original document author, may result in inconsistencies or anomalies when the HTML version of the document is generated. Theprocessing server454 implements formatting rules to reduce the instance of inconsistencies or anomalies to reduce the amount of manual editing that might otherwise have to be undertaken by the user.

At625, theprocessing server454 identifies prescribed elements present in the document according to the selected framework, and inserts identifying code in the document for each located prescribed element. The state of each prescribed element in the framework (including those not present) is determined at630, and as a result of the identification of prescribed elements and state determination, appropriate code is selected for the document at635 in order to embed references to corresponding tasks in the document itself, in appropriate presentation locations when the document is rendered for presentation at theclient system110. The code, state information, and the altereddocument45 are then sent to the client system at640. The code and/or state information may be embedded in the document to be sent to theclient system110, or may be delivered separately. From the foregoing description, it will be appreciated by those skilled in the art that the identification of prescribed elements, and their state, does not require prior semantic tagging or document preparation by the user; the document supplied by the user may be substantially unstructured (e.g., plain text or a text-based document) without parts of the document or parts of speech specially identified. Moreover, there is no need for document preparation by the user to identify the locations for embedding the selected code to identify the types of tasks to be included in the document.

It will be appreciated by those skilled in the art that certain stages or steps described herein may be implemented in different orders than represented in the accompanying figures, or in parallel where issues of dependency or inheritance do not impact the outcome of the steps. For instance, in some cases the normalization of thedocument format620 may occur prior to loading the document type-specific framework at615, where normalization involves rules and criteria that apply to multiple document types.

FIG. 8 illustrates further detail of the initial processing of the document generally represented by blocks625-635 ofFIG. 7. Once the document and framework are loaded at theprocessing server454, the server sets initial values for the state of each prescribed element in theframework705. The initial value may represent a presence state in the document, such as missing, complete, or incomplete. An “incomplete” state may reflect the case where a prescribed element meets sufficient validation criteria to be identified as present in the document, but not complete. At the outset, the initial values are generally set to reflect that each prescribed element is not present, or missing.

At710, a first content portion of the document is selected. The content portions may be selected in turn according to their order of occurrence in the document; for example, in an HTML document, in order of occurrence as the DOM is traversed. However, other orders of operation can be implemented; for instance, all document structural elements or content portions having a particular element type or attribute may be selected and queued for processing, and separate threads may execute to process portions of a corresponding type or attribute. In this example, once the first content portion of the document is selected, at715 its HTML tag is inspected to determine its attribute or element type. Candidate prescribed elements or sub-elements having matching attributes or element types are then identified from the framework. If a determination is made at720 that the content portion matches a prescribed element type or attribute in the framework, then at725 the content of the portion is inspected and compared to the validator(s) for the prescribed element or sub-element, in accordance with the defined rules. If the content portion is determined to match a sub-element, then additional content portions (e.g., the immediately following content portions within the document) can then be inspected to locate other sub-elements of the prescribed element.

If at730 it is determined that there is sufficient correspondence to the validator(s) defined for the prescribed element to update the state of the prescribed element to a presence indicator, then at735 the prescribed element's state is updated. The state can include an indicator of the presence of sub-elements of the prescribed element rather than, or in addition to, an indicator of the prescribed element's overall state. As noted earlier, some prescribed elements may include validation criteria pertaining to adjacency of one sub-element to another sub-element; thus, in some cases, a prescribed element may be identified as “incomplete” or an analogous state to indicate that not all required sub-elements were located according to the validators defined for the prescribed element, while a state for the individual sub-elements of the prescribed element are set to “complete” or “missing” (or analogous states), as the case may be. In some implementations, where a content portion appears to match validators for a plurality of prescribed elements, the user may be queried for a selection of a corresponding prescribed element, or else one of the prescribed elements is automatically selected according to weightings assigned to each validator.

Note that multiple prescribed elements in a given framework may share a common element type or attribute (for example, a document may require multiple tables containing numeric data, each table fulfilling a different prescribed element); thus, the determination whether the content portion matches an element type or attribute and sufficiently corresponds to certain validators may be carried out for multiple prescribed elements in the framework, and the server will determine that the content portion corresponds to one particular prescribed element based on a comparison of the outcomes of these determinations.

Once correspondence between a content portion and a prescribed element is determined, at740 identifying code for the prescribed element and its associated task(s) is inserted in the document, and appropriate presentation code for execution by the client system110 (in particular, when theclient system110 employs a browser) is selected at745. The prescribed element, its identifying code, and presentation code may be wrapped in a container or other delimiter within the document; for instance, all content determined to correspond to a prescribed element (and its sub-elements, as the case may be) may be wrapped in a <div> tag if the document is in HTML format; the identifying code can be included as an attribute within the tag. Presentation code can be provided within another structural element within the container, e.g., as a unit of HTML button code, identifying the prescribed element by its identifying code.

If, however, no correspondence between the content portion and any prescribed element in the framework is identified, then optionally at750 identifying code for the content portion is inserted in the document (for example, an identifier of the content portion as free text, rather than a prescribed element). The process then moves on to the next content portion in the document at755, if there is one available. If there is a next document component, it is selected at760 and the processing resumes at715 for this next component.

It is contemplated that specific tasks will have been defined for prescribed elements of the document, as illustrated inFIG. 9. However, tasks may also be generally associated with elements of a document other than prescribed elements. These may be tasks that generally apply to any element of the document, whether determined to be a prescribed element or otherwise. Association of tasks and insertion of presentation code or referrers for presentation code may be implemented for such other elements in a similar manner as that described inFIG. 9.

Rendering and presentation at theclient system110 is illustrated in the flowchart ofFIG. 10. At905, the altered document andother data45 are received at theclient system110. As noted earlier, theclient system110 may use a general purpose user agent such as aweb browser142, or adedicated application144, and it may be this component of thesystem110 that implements the rendering and display steps. Theclient system110 then renders the altered document and the state information and code at910-935 for presentation, for instance using the layout engine of thebrowser142 orapplication144. At910 theclient system110 may render navigation user interface elements that are based on the state information, as discussed in further detail below. Altered document rendering is then initiated at915. Presentation code provided to theclient system110 with the altered document is executed in order to place task user interface elements in designated locations when the rendered altered document is presented at920. Subsequently, as discussed below, the user of theclient system110 may execute the tasks associated with the various elements of the altered document, and make changes to the content that are stored locally in theclient system110 and/or remotely at theserver system200. The altered document delivered to theclient system110 thus also constitutes an editing copy of the document, which may be intermittently updated at both theclient110 andserver200 in response to executed tasks and other changes.

Optionally, theclient system110 also executes further processing at925 to insert further task user interface elements associated with various elements or sub-elements (either prescribed or not) in the altered document. Based on the identifying codes that were inserted into the altered document (e.g., at

steps

740 and750 discussed above during server processing), further tasks are identified and presentation code or references to presentation code relating to those further tasks is injected into the altered document. Presentation code and executable scripts for executing these tasks may be stored remotely at theserver system200, or locally at theclient system110, but at this stage, theclient system110 determines whether to associate further tasks with altered document elements, and implements the association through insertion of presentation code in a manner similar to that described inFIG. 9. While this stage is illustrated as following

other rendering

910,915,920, this client-side processing925 may precede one or more of these other rendering steps, or in parallel. Finally, at935, the complete altered document is rendered and displayed, together with task user interface elements.

FIG. 11 illustrates a schematic of adocument1000 having prescribed elements identified, without insertion of code for task user interface elements. In this example document, there are multiple prescribed or

non-prescribed elements

1010,1020,1030,1040,1050, comprising one or more content portions;

elements

1020,1030, and1040 contain sub-elements.Element1010 comprisescontent1012 that may be a top-level title or heading for thedocument1000, and in this example is a non-prescribed element; its presence is not required by the framework for the document type.Element1020 comprises three

content portions

1022,1024,1026, wherecontent portion1022 may be a heading and

portions

1024,1026 are paragraphs.Element1030 comprises three content portions as well,1032,1034,1036, where1034 comprises a table or other data presented in tabular format (whether formatted in an HTML table or other tabular arrangement), and1036 contains footnotes referencing the content of the table1034.Element1040 comprises a headingcontent portion1042 and aparagraph content portion1044. Finally,element1050 comprises only a table1052.

FIG. 12 illustrates a possible appearance of the document once altered to include presentation code, and rendered to display the user interface elements defined by the presentation code. Here,

elements

1020,1030,1040, and1050 have been identified as corresponding to prescribed elements although not necessarily complete, whileelement1010 is not associated with any prescribed element. Certain tasks have been associated with the

prescribed elements

1020,1030,1040,1050. As can be seen inFIG. 12, additionaluser interface elements1201 and in somecases1202 have been associated with all of the prescribed elements, and in some cases with individual content portions (e.g.,1036) within a prescribed element. In this example, theuser interface element1201 is associated with a query or lookup task, which when invoked presents on-point, or relevant, reference materials pertaining to the prescribed element or sub-element.User interface element1202 is associated with a “best example” task, which when invoked presents reference materials illustrating a best example of the content pertaining to the prescribed element. The on-point reference materials may be automatically retrieved from theserver system200 in response to invocation of the task at theclient system110. In this example, these two

user interface elements

1201,1202, being associated with informational or look-up tasks, are positioned proximate to the left edge of the corresponding prescribed element or sub-element, immediately above the content portions comprising the element.

The

prescribed elements

1030 and1050 are also associated with specific tasks pertaining to their specific content. In this non-limiting example,

user interface elements

1203,1204,1205, and1206 identify four different types of tasks associated with the tables1034 and1052. The firstuser interface element1203 is associated with a first “data consistency check” task, in which columns of data in the table1034 or1052 are compared against other columns within the same document for consistency. Thus, for example, data in a selected column of table1034 may be compared against a corresponding column of1052. This type of task may be used to confirm that data in one table column or row is replicated correctly in another table column or row within the same document. The seconduser interface element1204 is associated with a second “data consistency check” task, in which data in columns of the associated table are compared to data in columns of other tables retrieved from other documents. These other tables may be stored at theclient system110 or remotely at theserver system200. If not stored at theserver system200, then theserver system200 retrieves the tables from another one or more documents uploaded from theclient system110 or retrieved from another computer system. The tables can be retrieved from these documents using processing techniques similar to those used for the document altered for editing, as described earlier, to identify the tables in the document and read them into arrays in memory at theserver system200. When this second data consistency check task is invoked and the other tables are read into memory, a list of these tables can be presented to the user for selection of the appropriate table(s) and/or row(s) or column(s) for comparison to the subject associated table.

The thirduser interface element1205 is associated with another form of consistency-checking task, in which the columns or headings of the associated table are compared against a reference version of the table to confirm that the types of data expected in the table are included. Finally, the fourthuser interface element1206 is a “check accuracy” task, which determines which columns or rows of the associated table are intended to represent a sum of other columns in the tables and confirms accuracy in the reported totals. This task can also identify incongruent numbers or apparent errors in the table such as empty cells, non-numeric or currency characters, and incorrect or inconsistent decimal placement.

The user interface elements for these tasks, which are used to review consistency or accuracy of the data contained in the prescribed element, are visually distinguished from the user interface elements for the informational tasks with a horizontal separation; as can be seen inFIG. 12, this second set of user interface elements are located proximate to the right of the prescribed element, immediately above the content portions comprising the element. As can be seen in prescribedelement1030, the second set of user interface elements1203-1206 is located immediately above the table content portion, rather than above the entireprescribed element1030, since the tasks pertain specifically to the table rather than the entire prescribed element; however, theuser interface element1201 represents a task that relates to the entire prescribed element, so it is located above all content portions associated with the

prescribed element

1020,1030,1040,1050.

Still further tasks may be associated with validation or consistency checks for non-tabulated data, such as the content ofcontent portion1036. In this example, thecontent portion1036 was identified as containing footnotes or explanatory text for the precedingcontent portion1034, and in this case may include reference numerals or symbols corresponding to reference numerals or symbols in thecontent portion1034. An additional consistency task to confirm that the reference numerals or symbols included in thecontent portion1036 match reference numerals or symbols in the immediately preceding content portion is invoked by actuating graphicaluser interface element1207;user interface element1208 invokes another consistency-checking task in which the content of a given footnote is matched against the content of a row of data or statement in the immediately preceding content portion that contains the corresponding footnote number. Again, since these are consistency checks, they are physically located proximate to the right edge of the prescribed element.

Another example of a task represented by a user interface element in the document is a “missing parts” task, indicated byuser interface element1209. This user interface element may be located in a position where a particular prescribed element was expected to appear (based on framework information for the document type), or in some other position that will be apparent to the user when the document is rendered and displayed on theclient system110. In this example, the “missing parts”user interface element1209 is positioned in a selected location the middle of the document. The “missing parts” task may be associated with a specific prescribed element in the case where the prior processing of the document indicated that a prescribed element was present, but not complete.

As mentioned above, the rendering and positioning of the user interface elements1201-1209 can be accomplished by the insertion of presentation code within the document itself. Table 2 illustrates example pseudocode representing the altered document structure with inserted presentation code:

TABLE 2

Example of presentation code insertion in an altered document.

	<document>
	<prescribed_element id=“012345”>
	<button id=“task_001” class= “task_001_class”
	data-content=“dialog content” target=“012345” />
	<button id=“task_002” class= “task_002_class”
	data-content=“dialog content” target=“012345” />
	<content_portion>
	</content_portion>
	<content_portion>
	</content_portion>
	</prescribed element>
	<prescribed element id=“012346”>
	<button id=“task_003” class= “task_003_class”
	data-content=“dialog content” target=“012346” />
	<content_portion>
	</content_portion>
	</prescribed element>
	</document>

Here, each prescribed element is defined with an identifier (e.g., id=“012345”). Each prescribed element can contain one or more content portions, and display code (e.g., <button id . . . >) for any associated tasks determined to be relevant to the prescribed element. The presentation code includes a reference to the prescribed element identifier or, in the case where the task associated with the prescribed element is designed to act on a target sub-element, the individual content portions containing sub-elements may also be tagged with identifiers and the presentation code will include a reference to the corresponding sub-element identifier. The presentation code is thus associated with a graphic element (e.g., the user interface elements1201-1209), and with a script (stored either at theclient system110 or server system200) executable to implement the task on the identified prescribed element or sub-element. In this way, the altereddocument1000 contains the presentation code and references necessary to invoke the tasks deemed relevant to the document content, and is thus portable to other client systems implementing the client-side functions of thesystem400.

FIG. 13 illustrates a possible graphical user interface for presenting the altered document for execution of tasks using a browser orother user agent142 ordedicated application144. Thegraphical user interface1300 includes a menu orcontrol region1310 and adocument display region1320. The menu orcontrol region1310, in this example, includes a set ofmenu options1312 for carrying out global application functions, uploading and downloading copies of documents, adjusting settings of the application, and invoking various tools or functions of the application. Theregion1310 includesuser interface elements1314 for frequently-accessed actions, including a “tasks”action element1316, and a “next/previous”control element1318, for jumping to immediately previous or next document elements or previous/next tasks in sequence. Thedocument display region1320 displays all or part of the rendered altereddocument1322 and permits the user to manually edit any of the document elements in the document. Additionally, a furtheruser interface element1324 is included to invoke an expanding (i.e., selectively displayable) menu or other user interface feature that permits the user to show or hide various features in the document, such as the various task user interface elements1201-1209. This additional user interface feature need not be an expandable feature; it may be persistently displayed onscreen. Whether persistent or not, this user interface feature can include options selectable by the user to show and hide tasks on the display according to predetermined “viewpoints”. A viewpoint, in this context, is a set of one or more tasks pertaining to a particular objective. For example, one viewpoint may be data consistency; thus all tasks directed to confirming the accuracy or consistency of data in the document would be part of that viewpoint. In the code example in Table 2 above, tasks pertaining to a particular viewpoint could be identified by the assigned “class” value (i.e., all tasks belonging to a particular viewpoint would have the same “class” value). Thus, task user interface elements can be shown or hidden in groups according to viewpoint or class, while other task user interface elements remain hidden or visible, as the case may be.

An option for navigation within the document is illustrated inFIG. 14. The “tasks”action element1316 is actuatable (e.g., by clicking, tapping, or otherwise invoking the corresponding user interface action using a user input mechanism such as pointing device, touchscreen, or voice command) to invoke a selectionuser interface element1410, which lists a set ofprescribed elements1414 for the document's type, andcorresponding indicators1412 identifying the presence information and state of each prescribed element. In this example, theindicators1412 indicate whether the element is present and considered “complete” (i.e., all sub-elements of the prescribed element, if any, are present), “incomplete” (at least one sub-element of the prescribed element missing, and at least one sub-element present), or absent from the document currently displayed (missing entire prescribed element). In the illustration ofFIG. 14, the “complete” indicator is a solid circle; the “incomplete” indicator is a partially filled circle; and the “absent” or “missing” indicator is an empty circle. Other graphical indicators may be used. Selection of a particular prescribed element such as1416 from theset1414 results in thedocument display region1320 being updated to display the portion of thedocument1322 containing the selected element, if not already displayed.FIG. 15 illustrates a possible resultant view of thegraphical user interface1300 as a result of selection of the prescribed element indicated at1416.

While inFIG. 14 the prescribed elements in thelist1414 in the selectionuser interface element1410 are arranged in order of expected or actual appearance in thedocument1322, the prescribed elements may be arranged in other orders, such as alphabetically or in order of completeness. The ordering of theprescribed element list1414 may depend on the requirements for the document set out in the framework for the document; for instance, in some cases it may be a requirement in the framework that the prescribed elements follow a prescribed order, in which case it may be preferred to have the ordering of the prescribed elements in thelist1414 correspond with the prescribed order. Thus, the document type or framework will determine the appearance of the selectionuser interface element1410.

The selectionuser interface element1410 may be a drop-down list, populated using the state information determined by the server during preparation of the altered document. Data for the drop-down list can be delivered together with the altered document to theclient system110, or separately from the altered document.

Actuation of the various user interface elements1201-1209 results in execution of code to implement the associated task with the identified prescribed element or sub-element as a target of the task.FIG. 16 illustrates an example of thegraphical user interface1300 resulting from actuation of an informational or reference task, such as those associated with

user interface elements

1201 and1202. In this example, the task results in display of anoverlay pane1610 over thedocument display region1320. Theoverlay pane1610 includes, in this example, a referenceinformation display region1612, which comprises on-point reference material relating to the target prescribed element or sub-element, and optionallyelement display region1044′, which reproduces some or all of the content of the prescribed element, such as the content ofcontent portion1044. In some implementations, only the referenceinformation display region1612 is included; however, where the referenceinformation display region1612 displays “best example” content, it is preferable to include theelement display region1044′ so that the user can make comparisons between the best example and the actual document content. The referenceinformation display region1612 can include navigation user interface elements, such as a drop-down list, to permit the user to select and display other sections in the on-point reference material by subject or keyword. The user is thus not limited to the on-point reference material relating to the specific target prescribed element or sub-element. Theregion1612 can also include a search interface to permit the user to locate specific reference sections. The content of theregion1612 may be automatically retrieved as a result of a look-up query sent to theserver200 for content tagged as relevant to the type or category of the target prescribed element in response to invocation of the task, without requiring the user to input a particular query keyword or instruction. In this informational or reference task, the comparison need not be automated; however, automatic identification of on-point reference material for the prescribed element facilitates and potentially speeds review of the document, since there is no need to separately query reference materials (for example, using a separate application not integrated into the graphical user interface, or looking up relevant points in printed material).

Optionally, theelement display region1044′ is configured to permit edits to the displayed content. Additional application chrome, such as user interface elements to close (dismiss) theoverlay pane1610, locate or search for additional reference content, scroll through either the reference content or document content in

regions

1612,1044′, editing tools for the content ofregion1044′, etc. may be included in theoverlay pane1610, but are not illustrated inFIG. 16.

FIG. 17 illustrates a possible appearance of thegraphical user interface1300 in response to invocation of theuser interface element1206 corresponding to a “check totals” consistency-checking validation task, in which values in columns or rows of tabular data identified as totals are compared to other values in the table to confirm that the other values sum to the stated totals. While spreadsheet tools are available for carrying out such procedures, it is not unusual for tabular data in reports to be cut and pasted from the original source, and values updated to reflect changed information; this may occur, for instance, when reporting and updating salaries and total compensation levels for officers in a corporate disclosure document. The updating of such information, however, may result in inaccuracies within the table. InFIG. 17, in this example, aninitial dialog box1710 is displayed in response to actuation of theuser interface element1206 to confirm that the validation task should proceed. On confirmation, anew overlay pane1810 is displayed, as illustrated inFIG. 18. Thisoverlay pane1810 includes a display of the content of theprescribed element1034′ that is the identified target of the task. The displayed content includes, in this example, markup or highlighting1815 to illustrate detected errors or discrepancies in the table content, and optionally recommended corrections to rectify the detected errors or discrepancies. The identification of totals and other values may be based on column header information within the table (for instance, by a comparison of the header information against standardized text or validators). Theoverlay pane1810 may include furtheruser interface elements1812 for user editing of the content shown in the displayedprescribed element1034′, undoing changes, and dismissing theoverlay pane1810, printing the displayed content, moving to the next or previous prescribed element, etc. In some implementations, when errors or discrepancies with regard to reference or comparative content (such as other tables in the document or from other sources) are detected and indicated in the displayed document content, rather than manually editing the document to address any errors or discrepancies, the user can instead invoke an instruction to have any recommended corrections automatically applied. These recommended corrections may be formatted within the displayed document in a “markup” format so that the user can review the changes; or alternatively, a list of the corrections may be generated and presented in an accompanying report.

FIG. 19 illustrates an example view of thegraphical user interface1300 in response to actuation of the

user interface element

1207 or1208 to compare columns of tabular content to reference tabular content sourced from another table within the document itself, or from extrinsic material such as another document or reference material, and to validate the content of the tabular content for consistency with these other sources. In response to actuation of the

user interface element

1207 or1208, an initial dialog box may be displayed to permit the user to select the source for the tabular content to be compared (not shown). The source may be retrieved from theserver200, or uploaded by the user at theclient system110.

Once selected, theoverlay pane1910 may be displayed, including various editing, etc. user interface elements1912 (similar to user interface elements1812); a reference orcomparator display region1914, containing at least a portion of reference tabular content to be compared to the target prescribed element; and a prescribedelement display region1034″, displaying the content of the prescribed element associated with the actuated

user interface element

1207,1208. Again, the task may automatically identify discrepancies between the reference tabular content and the actual document content, and indicate them by markup or highlighting1915, optionally together with recommended corrections to rectify the detected discrepancies.

In those circumstances where consistency between actual document content and reference content is being evaluated, the user may be permitted to set different levels of tolerance. For instance, a strict tolerance level may require an exact match between content of the prescribed content in the document and the reference content (e.g., exact title or header match for each column or row, exact value match for remaining cells), or a more relaxed tolerance level that permits synonyms, grammatical variations, etc.

As mentioned earlier, some prescribed content may be determined to be missing from the document when the altered version of the document is originally prepared. Location and insertion of missing content may be implemented through execution of a “missing parts” task, which in the illustrated examples can be invoked from within the document through an embedded taskuser interface element1209 if included in the altered document, or else via the selectionuser interface element1410 listing all prescribed elements for the document type.FIG. 20 illustrates both in thegraphical user interface1300. Thus, tasks may be invoked through the embedded code within the document, or using accompanying menu or selection user interface features.

FIG. 21 illustrates a possible response to actuation of theuser interface element1209. In this example, adialog box2110 is displayed indicating to the user the general status of missing prescribed elements in the document, and providing the option to view the missing components. The content of the dialog box, as with other dialog boxes implemented in response to actuation of one of the task user interface elements1201-1209, may also be embedded in the altered document with the display code.

FIG. 22 illustrates a further view of missing components in the document. Thegraphical user interface1300 now includes afurther overlay2210 listing prescribed elements identified as missing, partially complete, and complete. Not all prescribed elements need be presented in the listing2212; for example, the completed elements may be omitted. Selection of one of the prescribed elements in theoverlay2210, in this example, may result in thedocument display region1320 being updated to show the relevant part of the document, such as the location of a partially complete prescribed element, or the expected location of a missing prescribed element. In this example, however, afurther overlay2310 is displayed, displaying either the content of the prescribed element as it currently exists, if it is incomplete but present; or else a preview of content to be inserted into the

document

2312,2314, as illustrated inFIG. 23. In the example ofFIG. 23, theoverlay2310 providesoptions2316 to insert the content determined to be missing from the document; in this case, either a title, a table, or both. The

preview content

2312,2314 may be stored at theserver200 as template content464 in the framework for the document. In response to the selection of one of the prescribed elements in theoverlay2210, a request for the preview content including an identifier of the prescribed element is transmitted to theserver200. At theserver200, the preview content corresponding to the identified prescribed element is retrieved from the template content. If the content is inserted into the document, it may be inserted into an automatically determined location, inserted in a current location of a cursor or insertion point in thedocument1322, or else appended to the end of the document. The content, once inserted, is formatted in a similar manner to surrounding content.

If the prescribed content is only partially complete, and not altogether missing, theoverlay2310 may display the current content of the document together with a preview of the missing content for insertion.

In all of these overlay examples, the user may be permitted to edit the prescribed element displayed in the overlay. When the overlay is not displayed, editing functions may be made available in thedocument display region1320 to permit, preferably, WYSIWYG editing of the various content portions of the document. It should be noted that it is not necessary for task results or other information to be displayed in an overlay pane as illustrated in the accompanying drawings. Content relating to a task may be presented in other forms. For example, proposed changes to the document may be displayed inline in the document content, or elsewhere in the graphical user interface without interfering with the visibility of the document, such as in an adjacent pane of the graphical user interface.

FIG. 24 illustrates an overview process for handling document editing and validation at theclient system110, starting for example at thegraphical user interface1300 ofFIG. 14. At2405, selection of a particular prescribed element from theselection user interface1410 is detected. In response to the detected selection, the current state of the element is determined at2410. If the prescribed element is not present, then a dialog or overlay to permit insertion of the missing prescribed element (e.g., as shown inFIG. 23) may be displayed, and in response to a user instruction to insert the missing prescribed element, the element is inserted at2420. In order to ensure that significant changes to the document such as insertion of a prescribed element can be rolled back using, for example, therollback module370 at theserver system200, the previous state of the prescribed element is stored at2425 in server memory. In this case, the previous state is “missing”. On the other hand, if the prescribed element is present, or at least partially present, the display at theclient system110 is updated as necessary to display the relevant part of the document containing the prescribed element at2430.

Subsequently, at2435, a command to conduct automated review or validation of the prescribed content is received. This may be one of the validation or consistency checking tasks represented by user interface elements1203-1208; thus, the command may be invoked by executing embedded display code in the document. At2440, in response to invocation of the task, the type of task or review type is determined based on the identifiers or other code embedded in the document; then any appropriate rule sets are loaded at2445. If validation tasks are handled at theserver200, then the determination of the type of task orreview2440 and loading of rule sets and templates2445 are carried out at theserver system200. Next, theserver200 carries out validation of the prescribed element content against the framework at2450. The result, at this stage, may be a determination that content is missing2455 (e.g., a title is missing); a discrepancy2460 (such as a total that does not match other data in the table, or a mismatch between the wording of the document and predefined prescribed element wording); or in some cases, where the task includes such identification, an identification ofsuperfluous content2465 in the document (e.g., extra language that is not specifically required for the prescribed element).

FIG. 25 illustrates interactions between the client system components and server components during the course of editing the altered document at the client system. In a client-server implementation, it may be desirable for not only theclient system110 to maintain backup copies of the document during editing, but also to have changes to the document mirrored or tracked at theserver system200 to permit restoration of the document to a prior state. For efficiency, certain changes may be stored only locally, while other changes are transmitted to the server. The displayed version of the document, however, contains all current changes until the system receives an instruction to roll back the document to an earlier revision. Changes may be handled differently depending on whether the changes are made to a prescribed content portion of the document, or to a non-prescribed content portion.

FIG. 25 illustrates that when an edit is made to a content portion of the document containing non-prescribed elements at2505, an updatedcopy2510 of the document at the client system is stored locally inclient storage150 or160. However, when a prescribed element or sub-element is selected2515 and, for example, aninstruction2520 is received to insert the element into the document, arequest2525 is sent to theserver200 identifying the prescribed content type, and if required the document type. This request is triggered by execution of the task associated with inserting a missing part, invoked at the client. Theserver200 receives the request, and queries2530 therepository380 for the relevant rules for the identified element. Therepository380 responds2535 with the relevant rules and associated information, which includes data for the element to be inserted. As described in connection withFIGS. 20-23, the user may be given the option to preview the content to be inserted, and to instruct its assertion. If this occurs, additional communications between the server and client, not illustrated inFIG. 25, will occur, where the preview content is sent by theserver200 to the client for display, and in response to an instruction received at the client to insert the content, a further instruction is sent back to the server to complete the insertion. Once this instruction is received, both the server and the client must insert content in their respective copies or backups of the document. Theserver200 stores a copy of the element as inserted2540 inserver storage390, and transmits theelement2545 to the client, if it has not been sent already. The client system then updates its copy of the document with the inserted element and stores an updated copy of thedocument2550 in its local storage. As the user may customize the insertion point for the newly added content, the location of the added content within the document may also be transmitted to theserver200, either in a separate transmission or together with therequest2525.

When a change is made to a content portion containing a prescribed element2555 at theclient system110, achange instruction2560 is sent to theserver200. The change instruction may contain only the relevant content portion, or alternatively the entire prescribed element content that contains the edit. This changed data is then stored in the server'sstorage390. The client system also updates the copy of thedocument2570 stored its own memory. Thus, changes to the document at theclient system110 are selectively stored at the server, but are retained at the client in client memory.

FIG. 27 is a schematic of the possible stacks2701-2709 in an instance ofserver memory2700 for a set of nine prescribed elements in a given document. This schematic illustrates that some prescribed elements may have undergone more edits and state changes than other prescribed elements. Because this backup information is stored at the server, it may be possible for the user to request that a given prescribed element in the document be rolled back to a prior version; in response to such a request, the server may retrieve the appropriate version and transmit the data to the client, and optionally discard any subsequent versions of the prescribed element. Because data is stored for each prescribed element in distinct memory stacks, different prescribed elements may be rolled back to different versions. The various user interfaces depicted herein, for example, can include an option invocable by the user to select a prior version of the element (for instance, an “undo” command which permits the user to revert to the immediately preceding version of the element, or another rollback command permitting the user to select an earlier version of the element stored in the memory stacks, arranged for example in reverse chronological order based on the timestamp or index). It will be appreciated that the foregoing description of tracking and storing elements at theserver system200 can apply to both prescribed elements and sub-elements thereof.

FIG. 28 illustrates interaction between theclient system110 and theserver system200 when a validation task, such as checking table totals, is invoked. In this implementation, the server executes the validation task and sends the result to the client. The client system receives a selection of a particular prescribed element or content portion at2805, and an instruction to invoke a validation task at2810. The request to execute thetask2815, including an identifier for the target prescribed element for the task, is sent to theserver200, which requests any relevant rules from therepository380 at2820. When the relevant information is received2825 from therepository380, theserver200 executes the validation task against a copy of the prescribed element content from the document (either received with therequest2815, or else retrieved from server memory390). Thevalidation result2830, which can include marked up or highlighted content identifying discrepancies or other issues, is then sent to theclient system110 for display. The user may then choose to edit the content of the document, as discussed above, to address any discrepancies or other validation results; as mentioned above, recommended corrections to rectify discrepancies may be automatically applied on user instruction.

When the user wishes to download a final version of the document, with all changes integrated into the document, the server may be instructed to collate the prescribed content with other document content and to send the final version to theclient system110. However, since some non-prescribed content may be edited and stored in the client's local memory, the process illustrated inFIG. 28 may be used. When an instruction to download a final version of thedocument2835 is received at theclient system110, the client sends arequest2840 as well as the locally-stored version of thedocument2845 to theserver system200. The client version of the document includes all changes currently applied to the document.

It is contemplated that the final version of the document will usually be delivered to the user in the original format that the document was originally received, such as PDF or a word processing format. Thus, at2850, theserver system200 sends the document to theconversion service330 to have the document converted back to the original format. The conversion process may also include removal of any display code or identifiers that were previously embedded by the server during initial processing. This removal may be carried out by theserver system200 rather than theconversion service330. Theconversion service330 then returns the converted document2855, which in turn is sent by theserver200 to the client at2860.

In some implementations, the user may not wish to have the document returned in its original format, but may request a different format. Either theserver200 may generate the document in this different format, or else theconversion service330 may be used.

The downloaded final version of the document can be subsequently edited by the user without using theserver system200, theweb browser application142 adapted to carry out the above-described functions, and/or the dedicated document editing and verification tool, for example using any appropriate editing application compatible with the downloaded document format. For instance, if the final version of the document is returned to theclient system110 in a word processing format, the user can subsequently open the document in a suitable word processing program, and make any desired edits. The edited document can then be uploaded to theserver system200 and processed as described above in a subsequent session. It will thus be appreciated by those skilled in the art that the user could create or edit the originatingdocument10 using the user's preferred document editing program and save it as an electronic file at theclient system110, and upload this saved file as thedocument20 for processing by the server system; make use of the various validation and other features as described above, then download a final copy of thedocument50; make further edits to thedocument50 using the same preferred document editing program or a different program, or send thedocument50 to another user who makes changes to the document using their own selected document editing program; and then the user, or the other user, may again upload this edited version of thedocument50 to the server system again, for further validation and other tasks as described above.

In the foregoing examples, thedocument20 initially uploaded by the user to theserver system200 was a document that was at least partially complete, as determined by the server system during processing. However, it will be appreciated by those skilled in the art that thedocument20 that is initially uploaded could be substantially empty (e.g., devoid or nearly devoid of any substantive content at all, such as a blank word processing file containing only formatting instructions and/or metadata). When a substantiallyempty document20 is processed, it would be determined during processing (e.g. during step630 illustrated inFIG. 7) that the state of all prescribed elements defined in the framework for the document type is “missing” or “not present”. These missing elements could be inserted in a manner similar to that described with reference toFIGS. 21 and 22. It will also be appreciated that the system contemplated here may permit creation of a “new” document, optionally with template content464 for that document type, as defined by the relevant framework and/or manually selected or created by the user, already inserted into the new document either according to a predetermined order or a user-defined order.

When either theoriginal document20 or the editing copy or altereddocument45 is optionally converted from its initial form and then processed by the server, either during initial processing or in response to invocation of a task, the formatting or design choices applied by the original creator of the document may result in anomalies or inconsistencies that impede proper processing or editing of the document.FIGS. 29 and 30 illustrate a formatting issue that can arise in the presentation of data in a tabular format. The table2900 andfree text block2950 ofFIG. 29 may be considered to be examples of thetabular content1034 andfootnote content1036 inelement1030 depicted inFIG. 11. The table2900, in this example, consists ofseveral rows2901 to2906 andcolumns2911 to2920 defining an array of table cells. The table itself may be constructed using any suitable document markup or formatting directives in the original document. Due to formatting or layout choices made at the time the document was originally created, certain cells (e.g.,2930,2931) were merged from cells in multiple rows and/or columns; other cells (e.g.,2941) may contain what appear to be several independent lines of data. It is possible, however, that some cells were not originally created as merged cells, but appear as such due to the design of cell and table borders applied in theoriginal document20. Similarly, while cells with multiple lines of data such as2941 may appear as a single cell, they might have been originally created using multiple rows of cells, but the borders within the table designed so as to give the appearance of a single table cell.

Some possible design and formatting choices for a cell such as2941 are illustrated inFIG. 30. In example (a), a single table cell3010 is used, with carriage returns used to define separate lines of data and to align the lines of data with the content of adjoining cells. In example (b), the apparentsingle table cell3020 is actually composed of a subcolumn of three

cells

3021,3022, and3023, with the first cell being blank (e.g., not containing any visible ASCII characters) and the remaining cells each containing a line of data. In example (c), a single cell is used, but the individual data are aligned using ASCII characters (e.g., a space, indicated by “.”). Visually, when rendered in the finishedoriginal document20 or when printed, each of cells (a), (b), and (c) may appear identical to the reader. If thedocument20 is retained in its original electronic form or a similar electronic form for provision to theserver system200, the actual formatting of the tabular data can be retained during processing by theserver system200; in the case of example (b), the electronic representation of the tabular data in the altereddocument45 used for editing and other tasks will retain a single data value per cell. In the case of examples (a) or (c), multiple values will be associated with each table cell. In the case where the originatingdocument20 is generated from a document containing only human-visible information (e.g., a printed or PDF document), the OCR process may automatically generate a table structure in the resultant electronic document that associates multiple values with a single table cell. Indeed, in cases where theoriginal document20 is obtained from a scanned copy of a printed document, it is possible that the scanning may fail to detect and reproduce cell or table borders or other visual cues that would assist in optically distinguishing between different cells in the tabular data, due either to imperfectly operating equipment or colour or shading choices in the printed document. In that case, the resultant scanned table may erroneously appear to contain multiple values per cell. The association of multiple values in single table cells may impact the result of certain tasks, such as the consistency-checking validation tasks described above. These tasks may involve the comparison of tabular data to reference or other data on a cell-by-cell basis, in which case the combination of multiple values in a single cell may result in mismatches without further processing.

Accordingly, when a task pertaining to tabular data is invoked, in some implementations theserver system200 pre-processes the tabular data in preparation for task execution.FIGS. 31A and 31B illustrate the handling of problematic tabular data in memory. The tabular data may be literal or numerical content, or a combination of the two; as in the example ofFIG. 29, the data can include currency or other numeric information that is formatted in a particular manner using other ASCII characters. In such embodiments, optionally thesystem200 will detect and strip formatting from the content either during this further processing or when compare data steps are executed.FIG. 31A illustrates an example subset of cells in a table3100, such as the table2900 inFIG. 29. In this example, the tabular data consists of a number of

rows

3111,3112,3113; it can be seen that while each of the cells in

rows

3111 and3113 contain a single value (“Value11”, “Value12”, etc.), the cells ofrow3112 contain two values, here presented on distinct lines (“Value21” and “Value25” in the first cell ofrow3112, etc.). In response to invocation of a task involving consistency checks or other operations on the tabular data, the content of the table is read temporarily into memory into a psuedotable structure. For each row of tabular data comprising only single lines of data per cell, each cell is read into an array entry (or other suitable object) in memory; each row comprising at least one cell having multiple lines of data is parsed to separate the values into multiple subrows of the pseudotable, which are reflected as additional row sets of data in the array or object in memory.FIG. 31B represents a possible arrangement of the data in an array format3150:

rows

3151 and3154 of the array comprise entries, one for each cell value, while

rows

3152 and3153 contain the first and second values extracted fromrow3112 ofsubtable3100. Thus,row3152 contains values “Value21”, “Value22”, “Value23”, and “Value24” from the four cells ofrow3112, whilerow3153 contains values “Value24”, “Value25”, “Value26”, and “Value27” from the same four cells of3112. When comparisons or other computations are carried out on cell values of thesubtable3100 during a task, the pseudotable values are used. Again, it will be appreciated by those skilled in the art that the tabular format used to depict the storage of the values in memory need not follow the format shown inFIGS. 31A and 31B; this format is used for ease of exposition. The pseudotable data may be stored instead in one or more objects or other data structures suitable for storage of one or more data values.

FIG. 32 illustrates an overview process including the pre-processing of tabular data in response to invocation of a task. At3210, invocation of a task is detected; this may be a similar step to2435 described with reference toFIG. 24, where an automated review command is received, but need not be limited to those specific tasks. However, it will be understood that the process ofFIG. 32 generally follows the initial processing of theoriginal document45 described with reference toFIGS. 7-10, since it is carried out in response to task invocation. In some implementations, however, the pre-processing may be carried out in advance of any task invocation so that the pre-processed tabular data is already available in memory.

When the task invoked pertains to tabular data, pre-processing begins at3215, where the first row of the tabular data is retrieved. It is then determined at3220 whether the cells of the row contain multiple values per cell. This determination may be carried out by any suitable heuristics. In one embodiment, the content of each cell may be parsed into individual strings or values according to any spaces, line breaks, tabs, or other formatting characters typically used to distinguish among values. Individual strings or values may in fact comprise multiple literary words or numbers; multiword or multinumber values can be identified by specific characteristics (e.g., they are separated by only one non-breaking space character, or consist of all content between line-breaking characters). If it is determined that at least one cell comprises multiple values, then at3225 the values from that row of the tabular data are stored in multiple subrows of the pseudotable. The number of pseudotable subrows designated for a corresponding row of tabular data is the maximum number of values found in a single cell in the row of tabular data. Thus, in the example ofFIG. 31A, two subrows3152,3153 are generated in the pseudotable for thesingle row3112. Even if one of the cells inrow3112 contained a single value, two subrows would be used in the pseudotable. If, on the other hand, one of the cells inrow3112 was determined to contain three lines of data, three subrows would be used in the pseudotable even if the remainder of the cells in the row contained only one or two. Then, for each cell in turn, each value found in the cell is correlated to a subrow of the pseudotable, and the value assigned to a corresponding cell in that subrow. The correlation to a particular subrow is carried out so as to maintain the relative alignment of the values in the original table row. Since the relative alignment of the values may have been implemented using line-breaking characters (as in the example ofFIG. 30(a), the location of the line-breaking characters with respect to the values in the original table cell may be used to select the appropriate subrow in the pseudotable. Thus, in the example (a), if three subrows are generated to contain the content of cell3010, the corresponding cell in the first subrow would contain a null value; the corresponding cell in the second subrow would contain “1,000,000”; and the corresponding cell in the third subrow would contain “3,000,000”.

If, on the other hand, no cell in the row contains multiple values that require separation into distinct pseudotable subrows, at3230 the values of the cells are written to the an object or array subset corresponding to that row, e.g. with one value per array entry.

Once the values in the cell of the selected row have been assigned to corresponding cells of the pseudotable, it is determined whether there is a next row in the tabular data at3235. If so, the next row is retrieved at3215 and the process repeats. It should be noted that the foregoing process may be implemented for an entire table or set of tabular data, or only for a subset. Generally, this pre-processing is conveniently carried out by the system that also executes the requested task. Thus, in many of the examples contemplated herein, theserver system200 carries out the pre-processing of the tabular content since it also executes the requested task.

If the pre-processing is complete at3215, the task can be implemented at3420. The task may be a comparison task, where the tabular data or a subset thereof is compared to reference data at3245, and the results of that comparison (e.g., discrepancies, and optionally proposed changes to the content) displayed to the user at3250. Optionally, possible corrections to the data may be displayed to the user for selective application to the document. The task may be a validation task permitting the user to optionally edit the tabular content in a “freehand” manner (i.e., not in response to an automated comparison or consistency check), in which case the tabular data is presented to the user for editing at3255; edits may be applied to the editing copy of the document presented onscreen, or may be applied to the pseudotable in anticipation of other tasks to be executed on the data. The task may involve a consistency check, for instance to determine whether the tabular data matches the data presented elsewhere, or to determine whether terms contained in a table sum to a “total” value also contained in the table. With reference to table2900 inFIG. 29, an example of the former consistency check is a comparison of reported salaries incolumn2912 for the named individuals incolumn2911. In some types of documents, for example, historical salary data may also be presented in another table column in the same document. An example graphical user interface for this task was illustrated inFIG. 19. The user may be presented with options to select columns or rows of reference data from another set of tabular data within the document, and to compare the selected reference data with the current table that is the subject of the consistency-checking task. The reference data may be retrieved from a source external to the document. If necessary, similar pre-processing of the selected reference data may be carried out prior to the comparison being carried out. The cell values in the pseudotable(s) corresponding to the same named individuals are compared, and any discrepancies presented to the user as discussed below. Optionally, where the values being compared are numerical values, discrepancies may also be presented as gains or losses, for instance as a percentage. An example of the latter consistency check, also with reference toFIG. 29, is a determination whether each value in the “Total”column2920 is accurate, based on a summation of numerical values from selected other columns (e.g.,2915 through2915) in the pseudotable. The selection of these columns for a consistency check may be done by the user, or may be automated based on an analysis of corresponding column headings. As explained above, these tasks may be carried out at theserver system200, in which case the results of the comparison are sent to theclient system110 and displayed to the user.

FIGS. 33 and 34 illustrate example processes for the various tasks mentioned inFIG. 32. InFIG. 33, a consistency or compare task may involve evaluating the content of the table in the document for compliance with specific rules or content requirements; for instance, specific types of prescribed tables may be required to include columns or rows containing specific information, such as the salary information illustrated inFIG. 29. When the task is implemented for the table, as set out inFIG. 33, the target tabular data set is identified at3310; this may be a subset of the tabular data or the entire table of data, and this identification may be carried out either before or after any necessary pre-processing. At3315, based on the task to be executed, a rule set comprising one or more rules is generated using information from the framework associated with the document type. A rule may include requirements that a particular cell (e.g. a header cell) in the document contain a particular label or a synonym, and/or must not contain other values. The rule set may already be stored in thedata store380, or may be generated from a set of different rules or criteria stored in thedata store380. The rule set is then executed against the target contents of the pseudotable to determine compliance at3320. In some implementations, as a given cell value is determined to comply with a specific rule of the rule set, that cell and the rule are designated as complete, so that neither the cell nor the rule is reused infurther compliance processing3320. For example, thesystem200 may store a list of pointers corresponding to each of the target cell values and the rules, and starting with a first cell value, apply each rule in turn; as each cell or rule is determined to have a match, the pointer is removed from the list and the next cell value is processed against the remainder of the rule set.

Any detected non-compliance may be reflected by a change to the value in the pseudotable, for instance to apply highlighting or other formatting to the pseudotable value or cell, or to insert a proposed correct value in the pseudotable, at3325. An example of non-compliance or proposed corrections is illustrated schematically inFIG. 36A, which depicts thepseudotable3600 after initial proposed corrections or discrepancies have been identified. The content of certain cells of the pseudotable entries in

rows

3612 and3613, in this example, have been altered to include proposed replacement values (“EditValue21”, “EditValue24”, “EditValue27”). In this particular example, it is desirable to show proposed changes to the user in a markup form, so the content of the pseudotable is further altered to apply text decoration or symbols (e.g., underlining proposed changes and/or strikethrough of incorrect or inconsistent values, and/or the application of different text colours or highlighting) to the original values of the pseudotable. Different text decoration or symbols, such as different highlighting or text colours, may be used to identify different levels of discrepancy. For example, where numeric values are compared, a discrepancy consisting of a value lower than the reference value may be indicated by red, while a discrepancy consisting of a value higher than the reference value may be indicated by green. To maintain alignment of values to be displayed in adjacent cells of the table, null values (“EmptyValue”) are added to the data for the cells that were not altered to include proposed corrections or discrepancies in the same pseudotable rows. Proposed changes can include insertion or deletion of columns or rows of data as determined to be required for compliance. Columns or rows for insertion may be retrieved from the template information stored at theserver system200.

The pseudotable values from thepseudotable3600 are then applied to the resultant table ofdata3630 to be displayed to the user as a result of the task, as shown inFIG. 36B. Here, the empty values are rendered as line breaks only with no data in the same line inrow3632, and the markup is preserved when the resultant table is displayed.

Again, a pointer list of pseudotable cell values is generated at3420, and is used to track the comparison of the target pseudotable values to the reference data at3425. When totals are being checked, the comparison may involve summing the reference data, and comparing the sum to the target pseudotable data. When the comparison is between columns or rows of data in the same or different document, the pointer list of values can be used to track which values have been compared. As each value in the pseudotable data is compared, the pseudotable values may be altered to reflect any discrepancies or suggested changes at3430; again this may be in a markup form, as described above. When the task is complete, the values in the edited pseudotable are applied to the tabular data for presentation to the user at3435, and the results presented to the user at3440, as generally described above.

The user interface presented to the user may give the user the option to manually or automatically select proposed corrections or changes to be applied to the tabular data in thedocument45. On an instruction to apply a selected correction, the content of the tabular data in thedocument45 can be altered to remove the older value while retaining the newer value, and removing any formatting that had been added by the executing task.FIG. 36C illustrates the resultant table3650 after all proposed changes have been applied inrow3652. The user may be given the option to select only specific changes to be applied and/or dismiss specific proposed changes so that they are not applied; in the latter case, the proposed correction would be removed from the table, while the old value is retained. Application of any changes to the content do not require use of the pseudotable in memory.FIG. 35 illustrates a process for applying changes to tabular data in thedocument45. As described earlier, the previous version of the element containing the tabular data (i.e., the version that was pre-processed as described inFIG. 32) and its state may already be stored in the stacks at theserver system200 at3510. At3515, an editing instruction is received in response to a user command. The editing change is applied to the element at3520. In implementations where the task is being executed at theserver system200, this change is applied at the server copy of the element as well. The changed, now current, version of the element is then stored in a new stack entry in memory at3525, and a copy of the changed version of the element is sent to theclient system110 for presentation to the user as part of the editing copy of thedocument45 at3530. If the task is being executed at theclient system110, the stack entry may or may not be created at theserver system200. In some implementations, rather than storing entire table element in the memory stack, a stack entry may be maintained for each subelement (e.g., cell) of the table so that changes to individual cells can be rolled back.

The editing instruction can include application of a proposed change, as described above. Other edits can include the insertion or deletion of columns or rows of data either in response to the results of a consistency-checking task, or other “freehand” changes to table content by the user (e.g., a change by the user that is not specifically in response to a detected discrepancy or proposed change, which can also include insertion or deletion of columns or rows of data). These changes may not require use of the pseudotable, since no comparison of actual cell content is required.

As described above with reference toFIG. 25, the handling of edits to prescribed elements in thedocument45 can differ from the handling of non-prescribed elements, such that edits made to content in non-prescribed elements are made, the changes are stored locally inclient storage150 or160 while changes to prescribed elements are echoed at theserver system200 as well as locally. The selective storage of certain changes to content locally versus remotely (or locally only, versus remotely and locally) may be based on which device or system actually executes certain tasks relating to the element, as discussed earlier; if changes are made to content that may be subject to a particular consistency-checking task that is carried out by theserver system200, then preferably a copy of the current version of the element is maintained at theserver system200 so that there is no need for theclient system110 to also send a copy of the element to the server in addition to the instruction to execute the task. Alternatively or additionally, the selection of

system

110 or200 to carry out a given task may be determined by the data resources required for the task. A consistency-checking task that only compares content of thedocument45 with other content within the same document may not require external resources (e.g., template content, reference text information, data from other documents20) would then be executed at theclient system110, provided that theclient system110 was provided with programming code required to implement the task. This code could be provided when thedocument45 is initially delivered to theclient system110, or in response to a request from theclient system110, for instance when the user invokes the task via the appropriate user interface element. A task that requires external resources (which may be stored at the server), however, may be executed on a copy of the document element stored at theserver system200 or received from theclient system110 with the instruction to execute the task. This may reduce the network resources consumed by the system overall.

In still other implementations, the selection of theclient system110 orserver system200 to execute a task may depend on security settings. For instance, when the entire document is sensitive or confidential, and transmission to an external server is discouraged, most processing and tasks can be executed at theclient system110, and theserver200 may provide any external resources or code required by theclient system110 in response to requests sent from the client. If portions of the document are marked sensitive or confidential—for instance, all tabular data may be marked confidential, or specific text passages may be marked confidential—any tasks being executed in relation to these portions are carried out by theclient system110, again with any necessary code or external resources being received from the server in response to client requests. The confidentiality or sensitivity indicator may comprise a tag or other markup within the document elements, or a setting in the client application that designates certain content (or the entire document) confidential. Further, any previous versions of the elements stored for rollback purposes would then be stored at theclient system110 rather than the server system. As well as supplying a measure of security, this reduces the number of required synchronization events that may be required between the client and server systems, since not all changes to thedocument45 need be sent to theserver system200.

Thus, in some implementations, changes to some portions of thedocument45 may be stored only at the client system. An example of document content that may be handled in this manner includes footnote or endnote text and other “free text” passages that are not subject to external compliance requirements such as rules or guidelines.FIG. 37 is a schematic representation of the non-limiting example document content depicted inFIG. 29. In this schematic, thetabular content2900 consists of a number of cells, as discussed above, and the footnote orfree text element2950 comprises a number of

individual footnote entries

3710,3720,3730, and3740. As is conventional in literary works, these footnote entries are set off or identified by a

respective reference indicator

3711,3721,3731,3741, included in the

footnote entries

3710,3720,3730,3740. As can be seen in the literal example ofFIG. 29, these reference indicators are “(1)”, “(2)”, “(2)”, and “(4)”, respectively. During initial processing of thedocument20, such footnote orendnote blocks2950 may be processed according to various framework rules in order to identify such content, and their reference indicators, and to determine whether the block is associated with specific preceding content (such as the table2900 in this example), or with the entire document. The association may be determined based on the identification of strings within other document elements matching the footnote or endnote reference indicators, optionally in combination with identification of common strings or words in the block and the other document elements.

In this example, there are

reference indicators

3712,3722,3732,3742 in the table2900 which may or may not directly correlate to the footnotes or endnotes in theblock2950. Tasks that may be invoked in the system can include a consistency-checking task in which the content of a footnote or endnote block is compared against the entire document or an associated document element to determine whether each footnote or endnote in theblock2950 has a corresponding reference in the document or associated element. In this example, the system would determine if each

reference indicator

3711,3721,3731, and3741 has at least one

corresponding reference indicator

3712,3722,3732,3742 in the associatedelement2900, and vice versa. The task may also check for duplicate reference indicator values; this may be permissible in the document or associated element, but not in the endnote or footnote block itself. In this particular example, with reference toFIG. 29, it can be seen that reference numeral (5) in the table element2900 (indicator3715 inFIG. 37) does not have a match in the footnote orendnote block2950, and further that reference indicator “(2)” is repeated in the block2950 (reference indicators3721,3731). As generally discussed above, in response to this consistency-checking task, the user may be presented with suggested corrections and/or identification of the inconsistencies for automated correction or for manual correction by the user. These suggested corrections or identification of inconsistencies can be presented in a user interface element analogous to those illustrated atFIG. 19 or 23, for example, where the discrepancy or suggested correction is presented in context together with an excerpt from the document, namely, the document element or sub-element containing the detected discrepancy. Alternatively, they can be presented in a summary or report view, which itemizes the discrepancies found in a given element or sub-element. One example of a summary or report view is one like that shown inFIG. 22. While the example ofFIG. 22 lists prescribed elements that are determined to be possibly missing, partially complete, and complete, in the footnote or endnote example, the summary or report view can list those footnote or endnote references that were determined to be consistent and not consistent, or alternatively can list only the references with associated detected discrepancies. A further example is illustrated inFIG. 38, in which anoverlay3810 presents at least some of the discrepancies detected as a result of a consistency-checking task. In this example, a plurality of discrepancies are displayed, and

editing fields

3812,3814 are presented, including any suggested corrections that may have been automatically determined. In thisoverlay3810, user-input edits can be made to multiple portions of the document as displayed in each

field

3812,3814, and committed to the editing copy of the document at once. In this example, only two editing fields are presented for clarity; however, it will be appreciated by those skilled in the art that theoverlay3810 or other graphical user interface can include more than two editing fields, depending on the number of discrepancies located; theoverlay3810 can be scrollable so that overflow content (i.e., those discrepancies and editing fields that cannot be initially displayed in the overlay) can be displayed in theoverlay3810. Thus, the features ofFIG. 38 are not limited to display of two editing fields, but can include three or more.

Further, theapplication144 can also provide an analogous graphical user interface for searching for specific terms in the document, or in a subset of document elements or sub-elements—such as all elements or sub-elements containing footnotes and/or endnotes, or all elements and sub-elements excluding footnotes and/or endnotes—and display a list of all occurrences of the term with context from surrounding document content, in a plurality of editing fields to permit the user to make edits to the document in multiple places in the document, similar to the example ofFIG. 38. Changes made in these multiple editing fields (either in theFIG. 38 example or in this further feature) can be applied to the editing copy of the document stored at theclient system110 and/or sent to the memory at theserver system200, immediately on making the change; alternatively, changes made using this feature are only applied and sent to the server, as necessary, once the user indicates that the changes are to be committed. In the latter alternative, the changes made in the editing fields are stored in temporary memory at theclient system110 separate from the editing copy of the document until they are committed. Once committed, the editing copy of the document stored at the client is updated, and any changes to be stored in the stacks at the server are sent to the server.

Where the required edits are in theblock2950, the changes to the content of the block may be stored only at theclient system110 only, as discussed above. When changes are to be made to tabular content, it may not be necessary to carry out the pseudotable pre-processing described above, since no comparisons between cells are being carried out. However, if changes to the tabular content are being stored in the server memory, then the changes may be transmitted to theserver system200 if the security settings in the system permit.

FIG. 39 illustrates a general process for implementing a footnote or endnote consistency check. At3910, elements of thedocument45 containing footnote or endnote references is identified. These elements and their sub-elements, if any, are parsed at3915 to identify the reference indicators and the content associated with each indicator. Identification at3915 can be carried out by scanning the document elements for characters formatted as reference indicators, or by detecting patterns in the content. For example, the document can be scanned for known formats of reference indicators (such as the number in parentheses, “(1)” used in the example ofFIG. 29). It will be understood by those skilled in the art that reference indicators need not be formatted specifically as illustrated in the drawings. Reference numerals, letters, and other symbols used as indicators in this manner are frequently formatted in documents such as the originatingdocuments20 contemplated herein as superscript characters, and may or may not be set off with parentheses (as in the example ofFIG. 29), brackets, dots, or other characters. Identification of the reference indicators in the document can include scanning for all strings of characters matching expected patterns.

While the foregoing examples illustrate specific text and tabular content, and a footnote or endnote-checking consistency task, it will be understood by those skilled in the art that consistency-checking tasks need not be limited to literary passages or tabular data as illustrated, or specifically to footnote or endnote type content.

The examples and embodiments are presented only by way of example and are not meant to limit the scope of the subject matter described herein. Variations of these examples and embodiments will be apparent to those in the art, and are considered to be within the scope of the subject matter described herein. For example, some steps or acts in a process or method may be reordered or omitted, and features and aspects described in respect of one embodiment may be incorporated into other described embodiments. Further, while the foregoing examples were described and illustrated with reference to a handheld mobile device with a touchscreen interface, they may be implemented with suitable modification on a computing device with a larger display screen or without a touchscreen interface. Where a touchscreen interface is not employed, user input via the graphical user interface may be received from a pointing device and/or a keyboard. Further, while these examples have been illustrated in the context of a full-screen application, where the unified event listing view fills an entirety of the available screen space allocated to application views, these examples may be modified for use in an environment in which applications are displayed only in a window or portion of the screen (i.e., not occupying the entire display screen).

The data employed by the systems, devices, and methods described herein may be stored in one or more data stores. The data stores can be of many different types of storage devices and programming constructs, such as RAM, ROM, flash memory, programming data structures, programming variables, and so forth. Code adapted to provide the systems and methods described above may be provided on many different types of computer-readable media including computer storage mechanisms (e.g., CD-ROM, diskette, RAM, flash memory, computer's hard drive, etc.) that contain instructions for use in execution by one or more processors to perform the operations described herein. The media on which the code may be provided is generally considered to be non-transitory or physical.

Computer components, software modules, engines, functions, and data structures may be connected directly or indirectly to each other in order to allow the flow of data needed for their operations. Various functional units have been expressly or implicitly described as modules, engines, or similar terminology, in order to more particularly emphasize their independent implementation and operation. Such units may be implemented in a unit of code, a subroutine unit, object (as in an object-oriented paradigm), applet, script or other form of code. Such functional units may also be implemented in hardware circuits comprising custom VLSI circuits or gate arrays; field-programmable gate arrays; programmable array logic; programmable logic devices; commercially available logic chips, transistors, and other such components. Functional units need not be physically located together, but may reside in different locations, such as over several electronic devices or memory devices, capable of being logically joined for execution. Functional units may also be implemented as combinations of software and hardware, such as a processor operating on a set of operational data or instructions.

It should also be understood that steps and the order of the steps in the processes and methods described herein may be altered, modified and/or augmented and still achieve the desired outcome. Throughout the specification, terms such as “may” and “can” are used interchangeably. Use of any particular term should not be construed as limiting the scope or requiring experimentation to implement the claimed subject matter or embodiments described herein. Any suggestion of substitutability of the data processing systems or environments for other implementation means should not be construed as an admission that the invention(s) described herein are abstract, or that the data processing systems or their components are non-essential to the invention(s) described herein. Further, while this disclosure may have articulated specific technical problems that are addressed by the invention(s), the disclosure is not intended to be limiting in this regard; the person of ordinary skill in the art will readily recognize other technical problems addressed by the invention(s).

A portion of the disclosure of this patent document contains material which is or may be subject to one or more of copyright, design, or trade dress protection, whether registered or unregistered. The rightsholder has no objection to the reproduction of any such material as portrayed herein through facsimile reproduction of this disclosure as it appears in the Patent and Trademark Office records, but otherwise reserves all rights whatsoever.