CLAIM OF PRIORITYThis U.S. patent application claims priority to U.S. Provisional Patent Application No. 60/242,266, entitled “Method and apparatus for dynamic generation of structured documents and corresponding database representation using matrix mathematical definitions and programmatic mapping” filed Oct. 20, 2000 which is hereby incorporated by reference.[0001]
BACKGROUND OF THE INVENTION1. Field of the Invention[0002]
The invention relates to the field of data representation and, more particularly to a system and methods for generating data representations in a standard markup language using matrix definitions and programmatic mapping.[0003]
2. Description of the Related Art[0004]
The design and use of structured documents has become an important aspect to the development of mechanisms for distributing data and information in a rapid and reliable manner. Structured documents are commonly used for the storage and transmission of information over the Internet and the World Wide Web (WWW). Most documents on the Web utilize a form of a generalized markup language that is universally recognized and is well-suited for numerous data formats including: text, hypertext, multimedia, and the like.[0005]
Recently, the design specifications for markup languages have developed to contain numerous sophisticated features that make it possible to define custom formats for documents that represent complex information structures that may be used in the management of large information repositories. The Extensible Markup Language (XML) specification is one such markup language that is commonly used in the formation of structured documents for both simple and complex data representations. Originally designed to accommodate the needs of web development, this language specification has become widely used in numerous other areas as well. Of the many reasons that XML has become so widely accepted is its mechanisms for controlling the structure and content of documents, as well as, standardizing document linking and display functions.[0006]
XML is a derivative language from Structured Graphics Markup Language (SGML) and permits the definition of custom data representations, similar to database representations, within each document developed using the language. These document representations or structures are called Document Type Definitions (DTDs). DTDs are commonly associated with one or more structured documents known as stylesheets which define visual representations of the DTD and are used in organizing and presenting the information contained in the DTD. Stylesheets may be adapted to display information using numerous approaches including web-browsers, printers, handheld computers, or other electronic devices.[0007]
Unlike less sophisticated markup languages such as Hypertext Markup Language (HTML) where it is possible to create documents with many embedded errors, XML data structures and documents are desirably validated to insure consistency. Type-validation of the contents of an XML document and the associated DTD can be a complex and time consuming task. DTD validation defines the legal building blocks of an XML document and document structure using a list of legal elements. Type-validation insures that the structured document conforms to the open standards set by the World Wide Web Consortium (W3C). This means all data definitions conform to a specific syntax outlined by the W3C standard.[0008]
Conventional approaches to type validation map DTDs and the associated XML information into standard hierarchical data structures (or tree structures). These approaches create a problem in that the use of hierarchical data structures for XML mapping results in the limitation of the data schema based on the constraints of the hierarchical representation of the data. As a result, hierarchical data representation limits flexibility in the definition of the DTDs and inhibits the efficient formation of DTDs with significant complexity. One particular problem associated with conventional parsing and mapping techniques which use hierarchical data structures is that they fail to provide sufficient flexibility to permit the incorporation of recursive and repetitive data structures within the data schema of the DTD. As a result, conventional DTD definition is limited with respect to these characteristics which further limits the ability to generate structured documents.[0009]
Conventional methods used to construct relational data structures for elements of a DTD typically use numerous tables containing fields to store information (attributes) about each element in a data set. Relationships between elements are defined by key references (primary and foreign) which are further stored in fields within the tables for each element. A problem with this method of data organization is that it leads to highly complex data structures that contain many tables and references between tables. As the size of the DTD to be represented in the relational structure increases, a difficulty arises in maintaining a coherent data schema. Furthermore, as DTD complexity increases, a problem arises in validating the data schema and insuring that all of the relationships defined in the data schema are appropriately defined in each table for all required elements. Invalid or missing relationships within the data schema can lead to improper DTD representation and subsequent corruption of the data stored in the data structure representing the DTD. Furthermore, certain relationships such as recursion and replication are not efficiently supported using conventional data representations which lack the ability to easily define these relationships without invalidating the data schema or adding undue complexity to the data representation.[0010]
Another limitation of conventional approaches is the focus on allowing only a hierarchical structure for XML and mapping this structure directly into a relational database. This hierarchical structure approach to mapping is insufficient to achieve complex DTD representations in XML of the type needed to provide functionality in many business settings. As a result, mapping a DTD structure into a relational database using a hierarchical table structure imposes limitations in the ability to create the DTD using W3C standards, which do not impose hierarchical limitations.[0011]
Accordingly, it is desirable to have XML DTD representations to be developed that have complex relationships between elements of the DTD without the limitations imposed by conventional approaches. Furthermore, it is desirable to have a system and method for generating structured documents that permits the use of repeating and recursive data structures within the DTD representation. Use of repeating and recursive data structures is important as it permits the formation of data representations that are not otherwise possible using hierarchical structures with standard markup language elements and allows these elements to be transformed into standard relational database tables.[0012]
SUMMARY OF THE INVENTIONThe system and methods for dynamic generation of structured documents presented herein overcome the limitations of conventional mapping techniques used to represent elements contained in a Document Type Definition (DTD) and map or parse these elements into a corresponding database structure. Typically these elements are defined using a standard markup language such as Extensible Markup Language (XML) or wireless application protocol. Using a matrix representation method for defining and associating elements, DTD representations can be mapped into a corresponding database structure with a reduced database table configuration requirement. One of the distinguishing characteristics of the dynamic generation system is that it accounts for both the element itself, as well as, a path taken to the element which is traversed through a matrix representation. This manner of organization stores singular structure definitions for each element in a matrix representation to thereby reduce the complexity of type-validated DTDs and associated stylesheets. The resulting matrix representation conveniently maps elements from even highly-complex DTD representations to dynamically generate structured documents from the database representation.[0013]
Another feature of the system and methods presented herein is the ability to support unconventional definitions or relationships between elements. For example, a specific element can be designated to have more than one parent element without violating design rules for transformation into XML. Additionally, repeating and recursive structures can be conveniently defined and these structures can be readily resolved without compromising the logical or relational integrity between elements of the DTD. Furthermore, this system can be adapted for use in thin client driven applications to reduce dependence on locally installed (fat client) software otherwise required to obtain functionality of the system.[0014]
In one aspect, the invention comprises a system for structured document generation having a data structure input module, a transformation module, a data element input module, and a document generation module. The data structure input module receives a data structure having a defined arrangement comprising one or more data elements having identifying relationships that associate the data elements. The transformation module transforms the data structure into a matrix representation to thereby preserve the defined arrangement of the data structure wherein the matrix representation comprises an internally recognized organization of the data structure. The data element input module stores user specified information in the matrix representation of the data structure to thereby populate the data structure with information. Finally, the document generation module accesses the matrix representation to generate a structured document comprising a representation of the information stored in the data elements in a markup language.[0015]
In another aspect, the invention comprises a method for generating markup language data representations of a data schema containing a plurality of elements interrelated by one or more relationships. This method defines a matrix representation for the data schema wherein the matrix representation further defines the relationships interrelating the elements in such a manner so as to permit the elements to be deterministically interrelated. Subsequently the method maps the matrix representation of elements into a database structure and stores information in the elements of the matrix representation and accesses the information stored in matrix representation to output at least a portion of the information using a markup language wherein the format of the information is determined by the elements and relationships of the data schema and is represented by an output markup language.[0016]
In still another aspect, the invention comprises a method for representing relationships between elements in a data schema wherein the method identifies the elements and the relationships between the data elements of the data schema and applies a plurality of matrix transformation operations to encode the data schema and further stores the encoded data schema in a database having a fixed number of tables so as to confer independence from data schema complexity.[0017]
In yet another aspect, the invention comprises a method for coding a document type definition into a structured document by receiving the document type definition comprising information defined by a plurality of elements and relationships coded in a pre-arranged structure and mapping the pre-arranged structure of the document type definition into a coded representation comprising a singular mapping of each of the plurality of elements that preserves the relationships coded in the pre-arranged structure. Subsequently, the coded representation is stored in a database construct having a fixed table number that maintains the singular mapping of the plurality of elements and the associated relationships and the elements contained in the coded representation are populated with information. Finally, the structured document is generated by extracting the information contained coded representation stored in the database construct and outputting the information in a markup language.[0018]
In another embodiment, the invention comprises a method for structured document generation that receives a data structure comprising one or more data elements having identifying relationships that associate the data elements and transforms the data structure into a matrix representation comprising an internally recognized organization of the data structure. The data elements of the data structure are transformed into the matrix representation and then populated with information. The information in the matrix representation is subsequently accessed to generate a structured document comprising a representation of the information in a markup language.[0019]
In still another embodiment, the invention comprises a system for structured document generation comprising an input module which receives a data structure comprising one or more data elements having identifying relationships that associate the data elements, a transformation module which transforms the data structure into a matrix representation wherein the matrix representation comprises an internally recognized organization of the data structure, a data element input module which populates the matrix representation with information, and a document generation module which accesses the information of the matrix representation to generate a structured document comprising a representation of the information in a markup language.[0020]
BRIEF DESCRIPTION OF THE DRAWINGSThese and other aspects, advantages, and novel features of the invention will become apparent upon reading the following detailed description and upon reference to the accompanying drawings. In the drawings, same elements have the same reference numerals in which:[0021]
FIG. 1 illustrates one embodiment of a document type definition.[0022]
FIG. 2 illustrates one embodiment of a system for dynamically generating structured documents using matrix representations.[0023]
FIG. 3 illustrates one embodiment of a matrix table set.[0024]
FIG. 4 illustrates one embodiment of modules that provide structured document generation functionality.[0025]
FIG. 5 illustrates one embodiment of a method for validity determination in matrix representation.[0026]
FIG. 6 illustrates one embodiment of an update database module used in conjunction with the matrix table set.[0027]
FIG. 7 illustrates one embodiment of a process used for creating and updating the matrix table.[0028]
FIG. 8 illustrates one embodiment of an add entry method used in conjunction with the matrix table set.[0029]
FIG. 9 illustrates one embodiment of an update entry method used in conjunction with the matrix table set.[0030]
FIG. 10 illustrates one embodiment of a delete entry function used in conjunction with the matrix table set.[0031]
FIG. 11 illustrates one embodiment of a re-arrange function used in conjunction with the matrix table set.[0032]
FIG. 12 illustrates one embodiment of a determine next action method used in conjunction with the matrix table set.[0033]
FIG. 13 illustrates one embodiment of the functionality of the populate DTD module used in conjunction with the matrix table set.[0034]
FIG. 14 illustrates another embodiment of the functionality of the populate DTD module used in conjunction with the matrix table set.[0035]
FIG. 15 illustrates one embodiment of the functionality of the generate document module using a document generation process.[0036]
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTFIG. 1 illustrates an exemplary Document Type Definition (DTD)[0037]100 comprising a plurality of data relationships betweendata elements110 and an associated data schema depicting the relationships using a data matrix ormatrix representation115. TheDTD100 identifies, organizes, and associates the one ormore data elements110 in a meaningful manner. The format of theDTD100 may follow standard conventions for element identification, such as the use of tags and/or identifying characters that define each element and its relationships. In the illustrated embodiment the element identification scheme follows conventional style guidelines set forth in basic the XML specification. It will be appreciated however, that other element identification schemes and style guidelines can be adapted to operate with the matrix mapping system such as those from any conventional programming technique. Style guidelines may additionally be formed that do not formally adhere to any conventionally accepted standard.
In the illustrated embodiment shown in FIG. 1 the[0038]elements110, are represented by the alphanumeric characters “A”, “B”, “C”, “D”, “E”, “F”, “G”, and “X” with a tag “!ELEMENT” defining an instance of theelement110. Eachelement110 may be further relationally associated with other elements indicating that a given element may contain information derived from the other elements or be related to that element in other manners. In one aspect, relationships between elements in theDTD100 define hierarchical orderings or dependencies between theelements110. As an example, the elemental definition and association:
<!ELEMENT A B, C>[0039]
defines an element “A” and associates this element with two other elements “B” and “C”.[0040]
It will be appreciated that each[0041]element110 may desirably represent any of numerous types of data representations including, for example, textual information, numerical information, variables, identifiers, filenames, formulas, character data, pointers, among other possible representations. Furthermore, theDTD100 shown represents a simplified example of a typical DTD and does not depict other tags and formatting characters which may be present in the DTD defined by conventional stylistic guidelines.
In one aspect, the[0042]DTD100 organizes thedata elements110 by identifying relationships using nested tags to define a tree-like structure hierarchy having aroot element120. Theroot element120 forms the basis for subsequent relationships betweendata elements110 that are linked to theroot element120 through the data schema. In the illustrated embodiment, the root element “A” is associated with two child elements “B” and “C”. The child element “B” is likewise associated with still other child elements “D” and “E”. The child element “C” is further associated with other child elements “F” and “G”. TheDTD100 and associated data relationships defined therein are translated125 into thematrix representation115 wherenodes130 representingindividual elements110 are linked byedges135 defining relationships between thenodes130. It will be appreciated by one of skill in the at that theDTD100 shown in FIG. 1 is but one example of a DTD representation. It is conceived that the system and methods described herein will operate with other DTD constructions. For example, these DTD constructions may be simple or complex and arranged in a hierarchical or non-hierarchical manner. Additionally, the DTD construction may be dynamically generated “on the fly” without the requirement of submitting a complete DTD to the system for processing. In one embodiment, the system for structured document creation incorporates a user interface to allow DTD definition, editing, and updating. Aspects of the DTD definition process will be described in greater detail hereinbelow.
Using conventional approaches to resolving XML relationships, a problem is encountered when a non-deterministic relationship is created for a particular data element. In one aspect, the non-determinism arises from the presence of alternative paths that can be used to reach a particular element in the data schema. For example, the associations relating to the element “X” in the[0043]matrix representation115 indicate that this element is a child element to both elements “F” and “G”. In conventional systems relationships of this type are unacceptable and present the problem of being unable to uniquely assign data to the element “X” because it can be arrived at from more than one path. The methods for defining the structured document presented herein overcome this problem and enable different values to be assigned to the element “X” depending on the path taken to reach the element.
Additionally, conventional methods may attempt to solve this problem by giving “X” a unique name for each path (such as “X1”, “X2” etc.) which returns the structure to a true hierarchical tree. This solution however is inefficient because many more definitions of the same data item “X” may be required. The present invention need only define “X” once. Using the methods presented herein, an internal representation of W3C standard XPATH data descriptions may be performed. In one aspect, this manner of data representation improves the efficiency for storing and manipulating data in XML format.[0044]
As will be described in greater detail hereinbelow, a matrix traversal method enables different values to be assigned to the element “X” depending upon the path that is traversed to reach the element. For example, to reach the element “X” from the root node “A”, two paths can be identified. In a first path, the elements in the[0045]data matrix115 are traversed in the order of A→X→C→F→X. In a second path the elements in thedata matrix115 are traversed in the order of A→C→G→X. Differences intraversal140 of thematrix representation115 can be maintained using specialized notation that indicates the order of traversal. In one aspect, dot notation is desirably used to indicate the traversal order such that the first path is indicated by A.C.F.X and the second path is indicated by A.C.G.X.
In addition to providing the aforementioned distinguishing matrix traversal paths, the[0046]matrix representation115 also accommodate recursion within the data schema. The recursive aspects of matrix traversal are shown in theDTD100, by the association:
<!Element X A*>[0047]
This association defines a reference from the element “X” back to the original root element “A”. The “*” is DTD notation meaning “A” can occur 0 or more times under the element “X”. Ordinarily, introduction of such a relationship in a DTD may be valid, but is prohibitive when using conventional methods as it introduces a loop or recursive relationship into the data schema. Loop or recursive relationships in the data schemas used by conventional methods present a problem in that they cannot be resolved in a deterministic manner. The[0048]matrix representation115 accommodates the presence of these relationships which can be notated as before to indicate the order and degree of traversal. In the present invention, since “A” can be defined to occur 0 times under “X”, the recursion may be ended when “A” does not occur in that position. For example, the path defined by the sequence A.C.F.X.A.C.G defines a traversal path that proceeds once through thematrix representation115 and upon a second traversal, through the first occurrence of “A” under “X” of thematrix representation115 arrives at element “G”. Thus, complex traversal paths can be defined that incorporate both recursive relationships, as well as, non-deterministic (repetitive) associations for eachdata element110.
Additional flexibility in the[0049]matrix traversal operations140 is accomplished by supporting repeating (as described above for “A”), as well as recursive matrix traversals through the use of subscripting in thetraversal definition150. Subscripting in thetraversal definition150 defines values that represent a desired repetitive iteration of the element “A”. For example, the matrix definition A.C.G.X.A(3).C.F defines arecursive traversal150 of the data schema through the third iteration of “A” under “X” on through C and subsequently proceeding to reach element “F”. The equivalent W3C XPATH notation would be “A\C\G\X\A[3]\C\F”. It will be appreciated that the use of the repeating subscript notation increases the flexibility in document definition and permits element repetition as well as recursive traversal without endless loop formation. It will also be appreciated that XPATH conformance is maintained at all levels of the structure.
The aforementioned methodology for defining[0050]matrix definitions150 results in the ability to represent virtually anyDTD100 and transform the DTD representation into a relational model that may be subsequently used to form structureddocuments170 using a standard markup language such as XML. In one aspect, this method also facilitates the mapping of the structured document into a relational database by means of recursive keys. As will be described in greater detail hereinbelow, this feature allows for efficient storage and retrieval of information in a secure manner, while reducing disk space requirements needed to store the data schema and reducing system overhead in storing, retrieving, and maintaining information in the data schema.
FIG. 2 illustrates a[0051]system172 for dynamically generating structureddocuments170 usingmatrix representations115. Thesystem172 comprises a plurality of modules that interact with one another to receive and create DTD's, create documents against those DTD's, insert, delete and update data within the documents and retrieve documents with their associated DTD's. In one aspect the output documents created by thesystem172 are presented in W3C standard format. The modules of thesystem172 include a DTD input and createmodule175, aDTD transformation module180, a dataelement input module182 and a structured document generation andpresentation module185. In one aspect, thesemodules175,180,182,185 represent software components that may be integrated into a wide variety of applications and hardware configurations designed to receive and process structured documents. In one exemplary application, themodules175,180,182,185 are integrated into a thin client architecture that generates structureddocuments170 via DTD input received through a web browser interface which is processed by the DTD input and createmodule175. Using this thin client package, the software components necessary for producing the structureddocument170 from a DTD representation reside on a server computer that is desirably accessible to one or more client computers through a networking connection. One advantage of the abovementioned thin client architecture is that the client computers need not contain any specialized software for generating the structureddocument170. Instead conventional web browsers may be used to interact with the structured document generation system. Furthermore, the DTD may be manually input or defined in a “live” manner without the use of a pre-existing input file. It will be appreciated by those of skill in the art that the thin client implementation of the system provides a number of advantageous features. Some of the beneficial features of the thin client system include facilitated user interaction through the use of a common and familiar interface, reduced maintenance and upgrade requirements, and increased accessibility and portability compared to conventional fat client architectures. In a fat client architecture, the structured document generation system is designed as a standalone application that is installed locally on each computer that will be used to produce structured documents. While the structured document generation system can be readily integrated into such an architecture, the thin client approach is typically more appropriate in instances where the application will be in use by large numbers of users.
The structured[0052]document generation system172 is configured to receive user-definedDTDs100 using the DTD input and createmodule175. Thismodule175 can be desirably configured to accept a wide variety of input DTD formats and the format of the DTD need not adhere to any conventional standard. In one aspect, the format of the DTD maintains compliance with standards set forth by W3C XML specifications. In one embodiment, theDTD100 is represented in a manner similar to that presented in FIG. 1 where a plurality of elements are defined using keywords, identifiers, and tags. The structure and format of theDTD100 defines various relationships between theelements110 of theDTD100 and thesystem172 resolves these relationships to create thematrix representation115. As previously indicated, the DTD may be predefined in a file format which is received by the DTD input and createmodule175 or alternatively theDTD100 may be input in a “live” manner via a user interface into thesystem172. The user interface desirably provides functionality for allowing the user to define aDTD100 in an environment where the user can edit and visualize theDTD100 as it is being built.
In one aspect, the[0053]DTD input module175 receives theDTD100 and verifies the DTD structural validity. During this time, the DTD input and createmodule175 recognizes and identifies the appropriate use of keywords, identifiers, and tags to verify that the structure of theDTD100 meets syntactic and stylistic requirements imposed by thesystem172. Should theDTD100 fail to meet these requirements theinput module175 may attempt to convert theDTD100 into a structurally valid representation or, if unable to process theinput DTD100, output an error or notification signal indicating that theDTD100 does not meet structural constraints imposed by the system.
When the DTD structural validity has been verified, the[0054]transformation module180 performs a series of operations that transform theinput DTD100 into thematrix representation115. As previously indicated, thematrix representation115 is an internally recognized data schema that defines theelements110 and relationships provided by theDTD100. As will be described in greater detail hereinbelow, thetransformation180 represents theelements110 and relationships of theDTD100 in a matrix structure defined by a consolidated table set.
In one aspect, the[0055]DTD transformation180 overcomes the limitations of conventional systems by creating a data schema that is represented by a consolidated table set wherein all elements and relationships in any DTD can be represented in a fixed table number. Use of a fixed table number aids in maintaining consistency in the data relationships, avoids increased data schema complexity resulting from the use of many tables, and provides a mechanism to define and resolve recursive and replicated relationships in a convenient and reliable manner. The use of a fixed table number additionally simplifies administration requirements for maintaining the matrix representation and data contained therein.
Upon conversion of the[0056]DTD100 into a series of elements represented by thematrix representation115 and defined by the consolidated table set, the elements of the DTD representation are populated with data using the dataelement input module182. Thismodule182 receives user input, for example in the form of a defined file of data or via direct user input to store information to be desirably represented within theDTD100. The structured document generation andpresentation module185 subsequently accesses the stored information to generate documents in a markup language such as W3C standard XML format. The XML document can then be translated, as the user sees fit, via standard transformation languages such as XSL Transformations (XSLT) to produce output in any number of different formats. The generation andpresentation module185 recognizes the relationships defined in the consolidated table set and furthermore utilizes specialized matrix traversal operations, such as those defined by example in FIG. 1A to access and present stored data contained in the matrix representation. In one aspect, the matrix traversal operations facilitate the resolution of recursive and repetitive data structures and preserve the syntactic and logical integrity of structured documents created by thesystem172. In one aspect, the structured document generation andpresentation module185 comprises functionality for producing output structured documents that are stored and retrieved in the form of a file. Alternatively, themodule185 may generate structured documents that are directly output to a display screen or printer for use by the user.
FIG. 3 illustrates a consolidated table set[0057]200 that may be used to defineDTDs100 of substantial complexity in a simplified manner. In one aspect, the consolidated table set200 defines a plurality of relational tables that store information about the elements and relationships defined by theDTD100. The relational tables comprise a DTD structure table210, a DTD key table230, an DTD attribute table250, and a node naming table or XML table270.
The DTD structure table[0058]210 comprises a plurality offields201 that define the characteristics of thematrix representation115 of theDTD100. The characteristics of thematrix representation115 include anode name212, achild name214, aniteration code216, asequence code218, anext sequence code220, and afirst flag identifier222. The structureddocument generation system172 identifies thematrix representation115 of theDTD100, and for each element, populates the appropriate fields of the DTD structure table210. Likewise, the DTD key table230 comprises a plurality offields201 that further define characteristics of thematrix representation115 and include anode name232, anode ID234, and a times usedidentifier236. The DTD attribute table250 comprises aNode ID field252 and anattribute field254. Finally, the XML table270 comprise akey identifier272, adata identifier274, asequence code275 and anext sequence276.
Each of the above mentioned tables is utilized in conjunction with one another to fully describe each of the elements and relationships designated in the[0059]matrix representation115 of theDTD100. In one aspect, the structure table210, the key table230, and the attribute table250 are interrelated by fields whose definitions correspond between two of more tables. For example, the key table230 and the attribute table250 can be interrelated by the use of the identical field identifier;node name212,232. Similarly, the key table230 and the attribute table250 can be interrelated by the use of the identical field identifier;node ID234,252. Also, thechild name214 is related to thenode name212 as a foreign key from the structure table210, back to itself, allowing for recursive traversal of the DTD structure. The XML table270 contains thetraversal definitions150 and the associated data required for the path definition. Thus, information contained in each table210,230,250 can be readily associated and related to specific nodes of thematrix representation115 through the use of interrelated field identifiers. Furthermore, each node in thematrix representation115 is uniquely defined by one or more of the identifiers such asnode name212 andnode id234 to associate information contained in the tables210,230,250 with the various nodes of thematrix representation115.
In one aspect, each table[0060]210,230,250 is variably sized and grows and contracts as needed to accommodate DTDs of different sizes and complexities. Thus, highly complex DTDs can be represented with the same table complexity as more simplistic DTDs. This method of organizing data and relationships, is beneficial in that it facilitates maintaining data schema integrity and reduces that likelihood that the data schema will be invalidated by improper or inappropriate data associations which often result as a result of attempts to transform highly complex DTDs into data schemas with many tables.
In conventional methods the data relationships depicted in FIG. 1, such as node, “X” being accessible (owned) by more the one other node, “F” and “G”, may only be accomplished by giving the owned node a unique name for each path, such as “X1”, “X2” etc. This limitation of conventional systems returns the structure to a true hierarchical tree and is inefficient because many more definitions of the same data item “X” may be required thus increasing the complexity of the tree structure. In the present invention, an mutually owned node such as shown for “X” in FIG. 1 need only be defined once. This results in a simplified data structure and increases the flexibility of developing complex DTD representations.[0061]
FIG. 4 is a schematic illustration of system modules that provide functionality for structured document generation. These modules include: (1) a create[0062]DTD module282 for the data schema recognition, (2) a populateDTD module284 which processes information representative of data to be desirably represented by the DTD, and (3) a generate structureddocument module286 that presents the information represented by the DTD in a desired format. Thesemodules282,284,286 interact using various matrix operation methods as described below to accomplish necessary processing and manipulation of thematrix representation115 and associated data.
The[0063]Create DTD module282 further comprises a plurality of submodules including anedit input module290, anupdate database module292, and a determinenext action module294. TheCreate DTD module280 implements thematrix representation115 using theedit input module290 where entries in the consolidated matrix tables200 are populated based on input information. Information relating to each node of thematrix representation115 is desirably input into thesystem172 in an organized manner wherein the consolidated matrix tables200 store the information for each node of thematrix representation115 and further associate one or more relations with other nodes of thematrix representation115. The nodes are input into the matrix tables200 as a series of submissions or entries where theedit input module290 extracts relevant information from theDTD100 and theupdate database module294 stores the extracted information in a database which represents the matrix tables200. In one aspect, the first submission or entry is considered the name and/or the “root” of the “matrix” (corresponding to element “A” in thematrix representation115 in FIG. 1). This designation of the root node as first submission that is used to populate the matrix tables200 serves as a reference point for subsequent document population and retrieval by thesystem172. During rendering of the DTD, theedit input module290 identifies information and attributes including those to be associated with fields comprising thenode names212,child names214,iteration codes216, andnode ID codes234 that are identified within theDTD100 and their values passed to theappropriate fields201 of the matrix tables200 define the schema of thematrix representation115. This information is subsequently used by theupdate database module292 to populate the matrix tables200 and to store the information of thematrix representation115 in theappropriate fields201 of the matrix tables200. In one aspect, each value that is stored in the table200 is checked for validity using a plurality of pre-defined DTD rules. Validity may be established by identifying a desirable syntax or structure of theinput DTD100. The syntactic or structural requirements of theDTD100 may further be validated using definitions, guidelines, and rules specified by the world wide web consortium (W3C).
FIG. 5 illustrates a[0064]process300 by which each node in thematrix representation115 is checked for validity. Each node corresponds to a separate element in theDTD100 whose attributes and relationships are stored in the matrix tables200 by the createDTD module282 in the manner described in conjunction with FIG. 4. In one aspect, theDTD100 is defined and encoded by validating the contents and relationships for each node in thematrix representation115 as they are stored in the matrix tables200.
The[0065]edit input module290 verifies the logical construction of theDTD100 using thevalidation process300 shown in FIG. 5. Thevalidation process300 is executed for each node in thematrix representation115 commencing with the root node. Thevalidation process300 identifies DTD structures that violate rules of logical construction that would prevent theDTD100 from being converted into thematrix representation115. Using thevalidation process300 in this manner insures that the resultingmatrix representation115 that is defined in the matrix tables200 will accurately reflect the elements and logical associations described by theDTD100.
The[0066]validation process300 commences by receiving information comprising the node, child nodes, associated iteration codes, and attributes of thematrix representation115 to be validated. Theprocess300 then proceeds to make adetermination305 as to whether or not a first child exists in thematrix representation115 for the root node. If the root node is determined to not contain any children, theprocess300 proceeds to aterminal state325 where control is passed to theupdate database module292. Additionally, theprocess300 may proceed to thetermination state325 if the first child of the root node is blank or contains only character data lacking any further relationships. In either instance theDTD100 is determined to be a valid construction by thevalidation process300 and therefore subsequent processing of thematrix representation115 can occur without concern for logical discrepancies in the representation of the data matrix by the matrix tables200. In one aspect, in the current node under analysis, the XML identifier “#PCDATA” is recognized by theedit input module290 as comprising only character data contained within the node with no relationships to further children (thus resulting in termination of the validation process for the root node). Thisprocess300 is then repeated for each unique node name in the DTD in a manner similar to that described above for the root node.
If the[0067]child determination state305 identifies the presence of a child node, thevalidation process300 proceeds to aname validation state310. Thename validation state310 checks for proper construction of a name attribute that will be associated with thenode name field212,232 of the matrix table200. This process is repeated for each child node to insure that all nodes in the matrix representation have a corresponding name that is properly formatted and distinguished. If name validation fails for any node in thematrix representation115 the process proceeds to aterminal state315 where an error condition is identified and returned. In instances where an error condition has occurred, thevalidation process300 is halted and theDTD100 is identified as failing to provide a valid construction useable in representing thematrix representation115 and populating the matrix tables200.
If the[0068]name validation state310 is passed for all nodes in thematrix representation115, theprocess300 proceeds to aloop identification state315 where each node is tested against the currently building data structure to determine, based on its iteration code, if an infinite loop will result. Although various legal iteration codes and associated structures can be described by thematrix representation115 and corresponding matrix tables200, an infinite loop renders the data structure indefinite. Infinite loops occur when there is a circular relationship within thematrix representation115 and the iteration code for the nodes along the path, specified for example by the traversal definition, is designated such that an irresolvable or unending path would be defined. Because a DTD containing an infinite loop cannot properly be resolved, identification of such a condition is made during theloop identification state315 to prevent the system from generating an unresolvable matrix representation. In the case of infinite error identification, theprocess300 proceeds to theterminal state315 where an error condition is identified and returned.
If the input values for the DTD pass the aforementioned edit checks of[0069]name validation310 andinfinite loop identification315, theprocess300 proceeds to theterminal state325 where theupdate database module292 receives the validated information for further processing. As previously indicated, thisprocess300 is useful in establishing that theinput DTD100 can be properly represented bymatrix representation115 without introducing nondeterministic or unresolvable logical relationships. For every unique node name desired within the DTD, theprocess300 must be executed until a complete structure has been reached as depicted by thematrix representation115.
When control is passed to the update database module[0070]292 a request for rearrangement or database updating can then be made. As shown in FIG. 6, theupdate database module292 comprises three modules including: an addnew module330, adelete module332, and anupdate module334. Thesemodules330,332,334 perform operations necessary to populate the matrix tables200 with valid information that has been passed to theupdate database module292 by theedit input module290. Briefly described, the addnew module330 incorporates new entries or elements into the matrix tables200 representative of nodes within thematrix representation115. Thedelete module332 removes elements from the matrix tables200 representative of nodes within thematrix representation115 that may be desirably removed from a current matrix. Theupdate module334 changes information, relationships, or entries within the existingmatrix representation115 to reflect desirable alterations in theDTD100. As will be described in greater detail hereinbelow, the functionalities of thesemodules330,332,334 are used in conjunction with various methods to generate thematrix representation115 using the matrix tables200 and to further update or modify the contents of the matrix tables200 to reflect desired alterations in theDTD100.
FIG. 7 illustrates a[0071]process350 used for creating and updating the matrix tables200 to reflect a desiredmatrix representation115. Thisprocess350 may be called by theupdate database module292 after validation of the input data by thevalidation process300 by theedit input mode290.
The[0072]process350 commences with thedetermination355 of whether a matrix table set200 exists for the current data elements. If the current call is the first call to the createDTD module280, an initial entry is created in both the DTD structure table210 and the DTD key table230 using the node data as input to a add new entry function instate357. As will be discussed in greater detail hereinbelow, the createDTD module282 utilizes the addnew module330 to enter the data from the input node into the matrix tables200 using the add new entry function. Entry of this data therefore creates a new matrix representation within the matrix tables200 that may be subsequently populated with additional data and relations. Furthermore, the newly entered data (representative of the first entry for the DTD100) by the addnew module330 represents the root node in thematrix representation115.
If the input node data is not the first entry for the[0073]matrix representation115 of the DTD100 (i.e. a root node already exists) then the node data is considered to be child node information and the associated information, attributes and relationships are entered into the existing matrix tables200. In processing the child node information, theprocess350 identifies ID code information instate360 that may be associated the current child node being processed. If ID code information is identified, theprocess350 proceeds to a determine if a DTD key match is present instate365. In thisstate365, the DTD key table230 is checked for a previous description of the child node by attempting to match the ID code with one of those codes present in the DTD key table230. If a matching ID code and DTDKEY are found instate365, no change is made and theprocess350 proceeds to the next child node where identification of the ID code for the node is made instate360.
When no ID code is found for the child node in[0074]state360 and no DTDKEY match is identified instate362, the child node information is entered into the existing matrix tables200 instate357 by the add entry function. If no ID code is found for the child node instate360 but a DTDKEY match is identified instate357, theprocess350 proceeds to astate370 where an update entry function is called to update the matrix tables200 with the current node information. Alternatively, when an ID code for the child node has been identified instate360 but no match is found between the ID code and the DTDKEY instate365, the current node information is determined to not be present in the matrix tables200 and the process proceeds to the update entry function instate370.
Using the[0075]abovementioned process350 traversal of all nodes within thematrix representation115 is accomplished and the information contained in the nodes is entered into the matrix tables200. In thisprocess350, the checkpointing operations of ID code identification and DTDKEY matching are desirably implemented to insure that the matrix representation is accurately reflected in the matrix tables200. Furthermore, these operations prevent existing node data from being overwritten or updated in an inappropriate manner.
FIG. 8 illustrates one embodiment of the[0076]add entry process357 used during the matrix table creation andupdate process350. Thismethod357 adds node entries comprising data and information into the DTD structure table210 and key table230 when a unique ID is encountered by the creation andupdate process350. In thismethod357, the information corresponding to the node entry is inserted into theDTD structure210 wherein information associated with thenode name212, thecurrent child214, theunique sequence code218, and theiteration code216 are updated for the particular node entry. In one aspect a linked list data structure is created for each child node using thesequence218 andnext sequence220 fields. This linked list structure is but one example of a suitable data structure that may be used to maintain siblings in the predefined order entered by the user. The use of the linked list structure also allows for rapid updates to the matrix representation since only the “links” or references of the linked list need to be moved to effect a desired change in order, rather than all of the records. Thesefields218,220 contain pointers or references to associated nodes by pointing to appropriate field information in the matrix tables200. Furthermore, the node ordering can be accomplished using alternative data structures such as stacks, queues, hash tables and the like. It will be appreciated that other data structures can also be implemented to perform the sibling maintenance functionality and as such these additional implementations are conceived to be additional embodiments of the present invention.
The[0077]add entry method357 commences in astate400 where node information is received from theupdate database function350. A determination is then made instate405 to determine if the child is either blank or has the value of “#PCDATA” (indicating text only information with no subsequent references or associations). If the child node does not contain reference to further nodes, the fields of the DTD structure and attribute tables210,250 are updated with information from the current child node instate415 and no entry is made in the DTD key table indicating no other Child entries will be associated with the current child. Otherwise, when the child node information is found to contain valid reference information to other child nodes, themethod357 proceeds to astate410 where the information from the current child node is updated and the subsequent child nodes are processed recursively by themethod300. When the update process for all child nodes has been completed, themethod357 proceeds to astate420 that returns control to the caller (or the update database function in this embodiment).
FIG. 9 illustrates one embodiment of the[0078]update entry process370 used to enter information in the matrix tables200 of thematrix representation115. Like other methods of thesystem172, thisprocess370 incorporates a number of conditional data checks to determine the action that should be taken when transforming theDTD100 into thematrix representation115. Theupdate entry process370 commences in astate440 where node information is received from theupdate database function350. Conditional data checks used in thisprocess370 commence with aDTDKEY check445 to determine if the current node is represented in DTD key table230. Should the DTDKEYkey check445 positively identify the name in the DTD key table230 then theprocess370 proceeds to an IDcode check state450 that determines if the current node possess an ID code that matches an entry in the DTD key table230. If the ID code is determined not to match instate450 theprocess370 proceeds to anew state452 where the current child value is updated to reflect the new child value.
Returning to[0079]state445, if the entry is not found or the entry is blank, but an ID code exists, as determined bystate455, then the times usedfield236 in the DTD key table230 corresponding to the current ID is checked instate460. In thisstate460, if the times used value is determined to be greater than “1” the value is decremented instate462 and theprocess370 proceeds to astate465 where the add new entry function is called on the current child if it is not blank. The results of this call create a new entry in both the DTD structure and the DTD key tables210,230.
Returning to[0080]state460, if the times usedfield236 is equal to “1” then theprocess370 proceeds to astate467 where adelete entry function467 is called with the current ID value.
Returning to[0081]state455, if an entry is found, but there is no matching ID Code present in the DTD key table230 then theprocess370 proceeds tostate469 where the times used value is incremented in the DTD key table230 and a new record corresponding to the information contained in the child node is stored as a new record in the matrix tables.
FIG. 10 illustrates one embodiment of the[0082]delete entry process332 to remove node information from the matrix representation stored in the matrix tables200. In one aspect, thedelete entry function332 receives node information instate485 and proceeds tostate490 where the delete and update operations are performed. During the delete operation theprocess332 proceeds through a series of states to find all instances in the DTD structure where the node to be deleted exists for a given ID value and deletes all records for that node and all of the node's children recursively. The recursive removal of the node's children is performed instate492 by identifying a record in the DTD key table230 having the ID value to be deleted. For each ID match that is made instate492, the function proceeds tostate494 where the times used value is identified. If the times usedfield236 is equal to “1” then the entry is deleted instate496. Otherwise the times used value is decremented instate498.
If a record exists in the DTD structure table[0083]210 that has achild name field214 equal to the current node being deleted, then theupdate sequence218 andnext sequence220 values are updated accordingly. In the above-described instance where a linked list data structure is used in conjunction with the node definitions, the values contained in thesequence218 andnext sequence220 fields are updated according to linked list rules for organization and the record is deleted.
The above mentioned delete[0084]entry process332 removes nodes using the value of each child until there a no more nodes to be deleted in the current recursive deletion sequence.
As previously described, the[0085]update database module292 may be called with a re-arrange request. In one embodiment, the re-arrange request serves to alter the relationships in the matrix representation. Are-arrange process500 shown in FIG. 11 performs the operations necessary to modify the contents of the matrix tables200 to accommodate changes to thematrix representation115. There-arrange function500 performs a number of operations related to manipulation of the matrix tables200 and may include a move-upfunction504, a move-down function506, and an insert betweenfunction508. Eachfunction504,506,508 may further be called by theupdate database module292 upon receiving the node information and a request in the form of a code or data sequence instate502.
If a request is received to perform the move up[0086]function504, thefirst flag222 and thenext sequence value220 of the DTD structure table210 corresponding to the values passed are updated. In one aspect, the passed values result in the an updating of the linked list structure such that node referenced by the passed values is moved up in relation to other nodes within the linked list represented by the DTD structure table210. In a similar manner, other child nodes with dependencies or references to the current node referenced by the passed values are similarly moved up in the structure. Thus when the move up function is called and the node operation is performed, the relationships between the current node and its corresponding child nodes are preserved to maintain consistency in the data schema.
If the move up command designates a node that is already the first node in the structure (i.e. the root node with a first sequence flag value of “1”) there will be no action taken to modify the linked list structure as the node cannot be moved up any further in the list. Otherwise, the normal operations associated with linked list programming techniques can be applied to update the[0087]next sequence value220 to contain the appropriate values representative of the updated position of the node in the list or table.
If a request is received to perform the move down[0088]function506 thefirst flag222 and thenext sequence value220 of the DTD structure table210 corresponding to the values passed are updated in a similar manner to that described above for the move upfunction504. Using the move downfunction506, the passed values result in the updating of the linked list structure such that node referenced by the passed values is moved down in relation to other nodes within the linked list represented by the DTD structure table210. In a similar manner, other child nodes with dependencies or references to the current node referenced by the passed values are similarly moved down in the list. If the move down command designates a node that is already the last node in the list (designated by anext sequence value220 equal to “0”) there will be no action taken to modify the linked list structure as the node cannot be moved down any further in the structure. Otherwise, the normal operations associated with linked list programming techniques can be applied to update thenext sequence value220 to give the effect of moving the current node down in the structure.
If a request is received to perform the insert between[0089]process508, thefirst flag222 and thenext sequence value220 of the DTD structure table210 corresponding to the values passed are updated to reflect the desired position where the node will be inserted within thematrix representation115. If thenext sequence value220 is “0” then the node is identified as the last node in the structure and no action is taken. Otherwise, the insert betweenfunction508 inserts a new record into theDTD table structure210 with thecurrent node name212 and the child name set214 to blank. Additionally, thenext sequence value220 is appropriately set on this record and associated records to give the effect of inserting a new child into the structure. In one aspect, the newly inserted record becomes a place holder whose values can be updated by later operations.
Taken together the above described operations provide necessary functionality to manipulate and organize the contents of the matrix tables[0090]200. Although these operations have been described in the context of a recursive mode of organization, using linked lists to maintain order at any given level, it will be appreciated by one skilled in the art that other data structures may be used to perform similar functions to represent the matrix representation and re-arrange its contents as needed or desired.
FIG. 12 illustrates a[0091]process525 used by the determinenext action module294 of the createDTD module282. Thisprocess525 functions to determine the next action taken by thesystem172 after a function call, give control back to the appropriate caller, and ready the next request or operation to be performed. Themethod525 commences in a scan DTDkey table state530 where the DTD key table is read through and the DTD structure table210 is checked for records containing nodes that match each key (state535). If no records are found, theprocess525 terminates instate540 and the node is returned to the caller will all children, associated iteration codes, ID's and attributes. At this point, the caller is freed to perform other operations with new nodes and input information at which point,process300 will begin again. In the instance where all DTDKEY entries have corresponding nodes in the DTD structure table210, the storage of thematrix representation115 in the matrix tables200 is completed instate545 and the original root node and its associated children, iteration codes, ID's, and attributes are returned to the caller for update processing and the structure is considered complete.
The aforementioned functions, processes, and modules operate in a coordinated manner to generate a logical construction of the[0092]matrix representation115 within the matrix tables. In one aspect, the logical construction represents only the nodes and relationships between the nodes defined in thematrix representation115. Completion of these operations therefore provides the skeletal framework of the DTD and its logical constraints. As will be described in greater detail hereinbelow, other functions, processes, and modules of thesystem172 are then desirably employed to populate the skeletal framework of theDTD100 with data and information representative of specific information that is to be stored in theDTD100. Furthermore, the populated DTD structure can then be used to generate structured documents which can be requested or returned to a user as needed or desired.
FIG. 13 illustrates exemplary functionality of the populate[0093]DTD module284 which processes information representative of data represented by theDTD100. In one aspect, thismodule284 utilizes adedicated process550 to populate the DTD structure with markup language information such as XML data. Theprocess550 operates by constructing a path to the desired node with the child attribute set214 to “#PCDATA” and the attribute structure in the matrix table set200 is reviewed to determine the allowable types of data. Thisprocess550 desirably returns all paths to the caller, and allows the caller to select a desired path to utilize when storing data within a node. To construct each path, theprocess550 commences by receiving a node and path in state555 (beginning with the root node and current path from the DTD structure table210). Theprocess550 then proceeds to astate560 where the ID code for the associated Node is retrieved using the DTD key table230. The ID code information for the node is then concatenated in a defined format instate565 to build the path data or instructions used to traverse thematrix representation115. In one aspect, the aforementioned dot notation may be used to symbolize the traversal order where the ID code is added to the current path value. Theprocess550 then calls each child recursively to determine if “#PCDATA” exists in the path to the node (state575). If “#PCDATA” or another desired text identifier is encountered, theprocess550 proceeds to astate580 where the process terminates and the current path is returned to the caller. Otherwise theprocess550 determines if the node has been processed earlier instate585. In one aspect, the identification of the node having been processed earlier may result from the structure being formed as a “matrix” and not as a typical hierarchical structure. If it is determined that the node had been encountered previously, theprocess550 proceeds tostate590 where theprocess550 terminates and control is returned to the caller.
In[0094]state585, when it is determined that the node has not been encountered before, theprocess550 returns tostate555 where a new node is recursively retrieved giving the child as the node and passing the current path. Based upon the iteration code, theprocess550 may return a given path to the caller and query if another iteration is desired. In the case of further iteration, the module continues processing the nodes in the manner described above.
In one aspect, the recursive calls and caller requests for selected information during iterations of the[0095]abovementioned process550 may result in the return of a plurality of paths to the desired node. This allows the caller to select a particular path from the available paths and use it to store appropriate data based upon the attributes of the node into theXML structure270 using the selected path as the key.
FIG. 14 illustrates an alternative functionality of the populate[0096]DTD module284. In this embodiment node information can be updated by supplying a node definition in W3C standard XPATH notation. The populateDTD module284 uses this information to select amatrix key270 where appropriate an permits the selecting and updating of single objects within the matrix representation. As shown in FIG. 14, theprocess591 begins in astate592 where XPATH notational sequences or data is received and parsed to identify individual elements within the sequence. For each element in the XPATH data, the process retrieves the ID code instate592 and concatenates the ID code in a defined format instate593. If an iteration number exists, as determined instate595, the iteration number is parsed and traversed instate596. The resulting traversal follows the linked list of nodes until the node containing an ID code equivalent to the iteration is reached. This node is retrieved instate597 and the correct sequence number is then concatenated to update the node definition. If the iteration number is determined not to exist instate595, an error is returned instate598 indicating that at least a portion of the structure is missing.
FIG. 15 illustrates exemplary functionality of the generate[0097]document module286 using adocument generation process600 which creates structured documents using the matrix representation stored by the matrix tables200 and contains data represented by theDTD100. In order to generate a document from the DTD/XML structure, one or more paths are constructed to the desired child node set to “PCDATA” (or other text identifier) and the attribute structure is identified to determine the types of data that are to be generated. Theprocess600 desirably returns data associated with the paths to the caller of thegeneration process600 to provide a means to construct the entire document using the data access paths.
The[0098]process600 commences by inputting a selected node from the DTD structure and current path instate605. Typically, the first input node comprises the root node and theprocess600 proceeds to astate610 where the ID code for the Node is retrieved from the DTD key table230. The ID code information for the node is then concatenated in a defined format instate615 to build the path data or instructions used to traverse thematrix representation115. In one aspect, the aforementioned dot notation may be used to symbolize the traversal order where the ID code is added to the current path value. Instate620 each child is retrieved by fetching all records from the DTD table that match the selected Node. The linked list structure,500 is traversed to maintain order among siblings. For each child, the linked list structure in270 is also traversed for any repeating iterations of a node to maintain proper data ordering. Theprocess600 subsequently performs a recursive retrieval instate625 giving the child as the node and passing the current path. Based upon the iteration code instate635, theprocess600 may retrieve multiple iterations of the path instate640. When “#PCDATA” is encountered instate645, theprocess600 retrieves the data from theXML structure270 and returns to the caller instate650. In the case of further iteration, the module continues processing the Nodes in the manner described above. Based on the recursive calls and defined iterations, a complete document may be generated by the method.
The invention as described herein may be used in conjunction with most data or document preparation and presentation systems. As such, this invention may be adapted to applications which include: connection and integration of existing businesses database systems to permit complex data representations to be created and represented, document and knowledge management through the use of XML as a document representation language, development of XML applications using a dedicated platform applications for creating the data representations in the database. Furthermore, the[0099]system172 is adaptable for use with and on intra, inter and extranets through standard browser-based applications.
Although the invention has been described and pictured in a preferred form with a certain degree of particularity, it is understood that the present disclosure of the preferred form, has been made only by way of example, and that numerous changes in the details of construction and combination and arrangement of parts may be made without departing from the spirit and scope of the invention as hereinafter claimed.[0100]
For example, the technology disclosed herein, may be developed and deployed on Internet Standard and Open Source code and can be implemented on a variety of hardware and software platforms such as LINUX or Microsoft NT operating systems running on Intel CISC based servers. The software may also be implemented on UNIX operating systems running on a variety of RISC based computers from vendors such as Hewlett Packard (HP) and Sun Microsystems (SUN), for example without significant modification to existing source code. In addition, the software is adaptable to Oracle's Oracle Application Server web server software and the Open Source Apache web server. Furthermore, the[0101]system172 can be implemented on other Open Source and proprietary web servers such as Microsoft's IIS with the addition of custom PERL CGI scripts to send and retrieve data to and from the database to the web server. In a particular embodiment, the software implementing the functions according to the present invention can be programmed using Oracle's PL-SQL programming language. This code is desirably stored as procedures in a physical Oracle database. The use of JAVA, C++ or other programming languages permits other physical data bases to be used such as IBM's DB2, Sybase or Microsoft SQL Server. Although this invention is designed around the relational database model, it can be mapped to any model that does not restrict the user to only hierarchical structures. This allows the table structures described herein to be implemented using various database models, including but not limited to flat files and object-oriented relational database models.
As previously discussed, one significant benefit of the structural document generation system is the use of the reduced database table configuration. This database configuration incorporates a fixed table number to encode and store the transformed DTD structure represented by the matrix representation as well as the information which is used to populate the encoded DTD. Since the table number is fixed, the database becomes highly scalable and can be used to represent both simple and highly complex DTDs with substantially less administrative overhead as compared to the prior art. Conventional systems for DTD representation are limited by undesirable table complexity (i.e., more tables that must be relationally interconnected) which increases significantly as the complexity of the DTD increases. Therefore, the present invention can be used to accommodate many different types or classes of DTDs with more efficiency compared to the prior art.[0102]
Additionally, the present invention addresses the need for a DTD definition scheme that can be used to resolve complex relationships that may not be supported by conventional systems. For example, recursive, repetitive, and multiple dependent elements may be readily defined using the DTD matrix transformation. Furthermore, the deterministic traversal paths or matrix traversals are used to resolve potential non-determinism in a DTD representation adding increased flexibility and convenience to defining templates for structured documents.[0103]
A further feature of the present invention is that the matrix representation methods described herein preserve the structural characteristics of the DTD. For example, if a hierarchical ordering of elements or information is presented in the input DTD, the matrix transformation processes may preserve the underlying hierarchical order which is reflected in the output structured document. This is also true for non-hierarchical orderings of the input DTD that are likewise preserved when output as a structured document.[0104]
Although the foregoing description of the invention has shown, described and pointed out novel features of the invention, it will be understood that various omissions, substitutions, and changes in the form of the detail of the apparatus as illustrated, as well as the uses thereof, may be made by those skilled in the art without departing from the spirit of the present invention. Consequently the scope of the invention should not be limited to the foregoing discussion but should be defined by the following claims.[0105]