JP2010186412A

Movatterモバイル変換

Info

Publication number: JP2010186412A
Application number: JP2009031450A
Authority: JP
Inventors: Hideo Munechika; 秀生宗近; Kiyoaki Tamura; 清朗田村
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2009-02-13
Filing date: 2009-02-13
Publication date: 2010-08-26

Abstract

【課題】アプリケーションの処理ロジックが変更される場合においても、メモリ使用量を抑止した構文解析を実現する構造化文書の管理方法及び装置を提供する。
【解決手段】構造化文書のＸＭＬ解析処理部103に対して、構造化文書の構造単位の親子兄弟属性関係パターンを、事前解析部116が解析した事前解析結果を保存する状態遷移テーブル119を付加する。また、事前解析結果に沿って状態遷移テーブル119を遷移する機能を有する状態遷移テーブル検索部117と、状態遷移時にメモリ使用量を抑制した要素ノードを生成する機能を有する省メモリ要素ノード生成部118を付加する。アプリケーションの入力となる構造化文書は、親子兄弟属性関係が等しいことが多いので、状態遷移テーブル119に沿って状態遷移が成功する確率は高い。その結果、従来の構造化文書管理装置よりもメモリ使用量を抑制することができる。
【選択図】図１Provided is a structured document management method and apparatus for realizing syntax analysis in which memory usage is suppressed even when processing logic of an application is changed.
A state transition table 119 for storing a pre-analysis result obtained by analyzing a parent-child sibling attribute relation pattern of a structural unit of a structured document and a pre-analysis unit 116 is added to the XML analysis processing unit 103 of the structured document. To do. In addition, a state transition table search unit 117 having a function of transitioning the state transition table 119 according to the pre-analysis result, and a memory saving element node generation unit 118 having a function of generating an element node that suppresses the memory usage at the time of state transition Is added. Since structured documents that are input to an application often have the same parent-child sibling attribute relationship, the probability of a successful state transition along the state transition table 119 is high. As a result, the memory usage can be suppressed as compared with the conventional structured document management apparatus.
[Selection] Figure 1

Description

Translated fromJapanese

本発明は、文書管理方法及び管理装置に係り、特に、構造化文書の構文解析処理に要するメモリ使用量を抑制可能とした文書管理方法及び管理装置に関する。 The present invention relates to a document management method and management apparatus, and more particularly to a document management method and management apparatus capable of suppressing the memory usage required for syntax analysis processing of a structured document.

メモリ使用量を抑制可能とした構造化文書管理装置に関する従来技術として、例えば、特許文献１等に記載された技術が知られている。この従来技術は、アプリケーションに含まれる要素のうち、キーとして利用されない要素を１つの要素にまとめるデータ構造変換処理を、構文解析対象である構造化文書に施すことによって、キーとして利用されない要素のメモリ使用量を抑制することを可能としたものである。 For example, a technique described inPatent Document 1 is known as a conventional technique related to a structured document management apparatus capable of suppressing the memory usage. This prior art provides a memory of an element that is not used as a key by performing a data structure conversion process that combines elements that are not used as a key among elements included in an application into a structured document to be parsed. It is possible to suppress the amount used.

特開２００８−２１７８０９号公報JP 2008-217809 A

前述した従来技術による構造化文書の管理は、アプリケーションに含まれるキーとして利用されない要素についてメモリの使用量を抑制するため、アプリケーションの処理ロジックに特化したものである。このため、前述した従来技術は、アプリケーションの処理ロジックを変更すると、例えば、非キー要素であったものをキー要素として扱うようにロジックを変更すると、メモリ使用量の抑制度合いが変化してしまい、アプリケーションのロジックを変更しなければならない場合に、たとえ、構造化文書の構造に何ら変更を施さなくても、メモリ使用量に影響を及ぼしてしまうという問題点を生じさせてしまう。 The above-described management of structured documents according to the prior art is specialized in application processing logic in order to reduce the amount of memory used for elements that are not used as keys included in the application. For this reason, in the above-described prior art, when the processing logic of the application is changed, for example, when the logic is changed so that what was a non-key element is handled as a key element, the degree of suppression of the memory usage changes, When the logic of the application must be changed, even if no change is made to the structure of the structured document, there is a problem that the memory usage is affected.

本発明の目的は、前述した従来技術の問題点を解決し、メモリ使用量の抑制度合いが、アプリケーションの処理ロジックの変更によって、メモリ使用量の抑制度合いが影響されることを少なくして、構文解析処理におけるメモリ使用量を抑制することを可能とした構造化文書の文書管理方法及び管理装置を提供することにある。 The object of the present invention is to solve the above-mentioned problems of the prior art, and to reduce the amount of memory usage by reducing the amount of memory usage by changing the processing logic of the application. It is an object of the present invention to provide a document management method and management apparatus for structured documents that can suppress memory usage in analysis processing.

本発明は、前述の目的を達成するために、アプリケーションの処理ロジックが変わっても構造化文書の構造が変わらない限り、メモリ使用量の抑制度合いが同程度になるような機能を構造化文書管理装置に対して付加する。 In order to achieve the above-mentioned object, the present invention provides a function for reducing the amount of memory used to the same extent as long as the structure of the structured document does not change even if the processing logic of the application changes. Add to the device.

具体的には、本発明は、入力される構造化文書の構文解析を行う文書管理方法であって、文書管理装置が備える構造化文書の解析処理部が、事前解析処理手段と、該事前解析処理手段が生成した状態遷移テーブルを検索する状態遷移テーブル検索手段とを有し、前記事前解析処理手段は、入力される構造化文書の構造単位の親子兄弟関係パターンを構文解析処理前に事前解析して該親子兄弟関係を格納した状態遷移テーブルを生成し、前記構造化文書の解析処理部は、入力された文書の構文解析処理中に、前記状態遷移テーブル検索手段により前記状態遷移テーブルを参照し、前記状態遷移テーブルに格納済みの文字列の処理が要求されている場合に、前記状態遷移テーブルに格納済みの解析結果を返すことを特徴とする。 Specifically, the present invention is a document management method for performing a syntax analysis of an input structured document, wherein a structured document analysis processing unit provided in the document management apparatus includes a pre-analysis processing unit and the pre-analysis processing unit. A state transition table retrieving unit that retrieves the state transition table generated by the processing unit, and the prior analysis processing unit preliminarily analyzes the parent-child sibling relationship pattern of the structural unit of the input structured document before the parsing process. A state transition table storing the parent-child sibling relationship is generated by analysis, and the structured document analysis processing unit generates the state transition table by the state transition table search means during the parsing process of the input document. When the processing of the character string stored in the state transition table is requested, the analysis result stored in the state transition table is returned.

本発明によれば、構文解析処理を行う前に「要素の親子・兄弟・属性関係が、業務システムで扱う構造化文書と同じであるような、代表的な構造化文書」を元に状態遷移テーブルを一度だけ作成し、構文解析処理中に前記状態遷移テーブルを参照することにより、構文解析処理におけるメモリ使用量を抑制することが可能となる。また、メモリ使用量の抑制度合いが、アプリケーションの処理ロジックの変更に影響されることを少なくすることができる。 According to the present invention, before performing the parsing process, the state transition is based on “a typical structured document whose element parent / child / sibling / attribute relationship is the same as the structured document handled in the business system”. By creating the table only once and referring to the state transition table during the parsing process, it is possible to suppress the memory usage in the parsing process. In addition, it is possible to reduce the degree of suppression of the memory usage amount from being affected by changes in the processing logic of the application.

本発明の一実施形態によるＸＭＬ文書管理装置の機能構成と、その入出力データの流れの概要を示す図である。It is a figure which shows the outline | summary of the function structure of the XML document management apparatus by one Embodiment of this invention, and the flow of its input-output data.ＸＭＬ文書の文書構造と、構文解析後のメモリ上のデータ構造とを説明する図である。It is a figure explaining the document structure of an XML document, and the data structure on the memory after parsing.図２に示したノードツリーを詳細に示した図である。It is the figure which showed the node tree shown in FIG. 2 in detail.本発明の実施形態によりメモリ使用量を抑制したノードの例について説明する図である。It is a figure explaining the example of the node which suppressed the memory usage by embodiment of this invention.代表的なＸＭＬ文書を入力として、ＸＭＬ構文解析処理の前に図１に示す事前解析部の処理で作成した状態遷移テーブルの構成を示す図である。It is a figure which shows the structure of the state transition table created by the process of the preanalysis part shown in FIG. 1 before the XML syntax analysis process by using a typical XML document as an input.事前解析部が状態遷移テーブルを作成する処理の動作を説明するフローチャートである。It is a flowchart explaining operation | movement of the process in which a prior analysis part produces a state transition table.本発明の実施形態でのＸＭＬ文書の解析処理の動作を説明するフローチャートである。It is a flowchart explaining the operation | movement of the analysis process of the XML document in embodiment of this invention.

以下、本発明による構造化文書の文書管理方法及び管理装置の実施形態を図面により詳細に説明する。なお、以下に説明する本発明の実施形態は、構造化文書として、構造化文書を表現する言語の１つであるＸＭＬ文書を扱うものとしている。 Embodiments of a document management method and management apparatus for structured documents according to the present invention will be described below in detail with reference to the drawings. In the embodiment of the present invention described below, an XML document that is one of languages for expressing a structured document is handled as a structured document.

図１は本発明の一実施形態によるＸＭＬ文書管理装置の機能構成と、その入出力データの流れの概要を示す図である。 FIG. 1 is a diagram showing a functional configuration of an XML document management apparatus according to an embodiment of the present invention and an outline of a flow of input / output data thereof.

本発明の実施形態によるＸＭＬ文書管理装置は、主記憶装置１０２と、ＨＤＤ等の補助記憶装置１０６と、プログラムにより構成され、本発明のために設けられた各種機能部の処理を実行するＣＰＵてあるプロセッサ１０４と、図示しないキーボード、マウス等の入力装置及び表示装置、印刷装置等の出力装置とがバス１０５に接続されて構成されている計算機システムの中に構築されている。そして、補助記憶１０６には、本発明の実施形態で解析するＸＭＬ文書１０７が格納されており、また、主記憶装置１０２には、本発明の実施形態の文書管理において、文書の解析を行うＸＭＬ解析処理部１０３と、アプリケーション１１５と、ＸＭＬ解析処理部１０３により生成される状態遷移テーブル１１９とが格納されている。 An XML document management apparatus according to an embodiment of the present invention includes amain storage device 102, anauxiliary storage device 106 such as an HDD, and a program, and a CPU that executes processing of various functional units provided for the present invention. Acertain processor 104 and an input device such as a keyboard and a mouse (not shown) and an output device such as a display device and a printing device are constructed in a computer system configured by being connected to abus 105. An XMLdocument 107 to be analyzed in the embodiment of the present invention is stored in theauxiliary storage 106, and an XML for analyzing the document in the document management of the embodiment of the present invention is stored in themain storage device 102. An analysis processing unit 103, anapplication 115, and a state transition table 119 generated by the XML analysis processing unit 103 are stored.

また、ＸＭＬ解析処理部１０３は、ＸＭＬ構文解析メイン部１０８、状態遷移テーブル検索部１１７、要素開始タグ解析部１０９、要素文字チェック部１１０、属性解析・ノード生成部１１１、要素ノード生成部１１２、コンテンツ解析・テキストノード生成部１１３、要素終了タグ解析部１１４、事前解析部１１６、省メモリ要素ノード生成部１１８の各機能部を備えて構成されている。前述し各機能部は、プログラムにより構成されており、プロセッサ１０４により実行されることにより、それぞれの機能が構築されるものである。 The XML analysis processing unit 103 includes an XML syntax analysismain unit 108, a state transition table search unit 117, an element starttag analysis unit 109, an elementcharacter check unit 110, an attribute analysis /node generation unit 111, an elementnode generation unit 112, The content analysis / textnode generation unit 113, the element endtag analysis unit 114, thepre-analysis unit 116, and the memory saving elementnode generation unit 118 are provided. Each functional unit described above is configured by a program, and each function is constructed by being executed by theprocessor 104.

補助記憶装置１０６に格納されたＸＭＬ文書１０７は、ＸＭＬ解析処理部１０３に入力され、ＸＭＬ解析処理部１０３により構文解析の処理が行われる。ＸＭＬ文書は、“要素”を基本単位とした階層構造を有する構造化文書の１つである。要素は、親子関係を持つことができる。また、要素には、“属性”という情報を付加することができる。さらに、要素は、その子として他の要素及びテキストを持つことができる。通常、ある１つの要素の子に該当する要素及びテキストを総称して、その要素の“コンテンツ”と呼ぶ。 TheXML document 107 stored in theauxiliary storage device 106 is input to the XML analysis processing unit 103, and syntax analysis processing is performed by the XML analysis processing unit 103. The XML document is one of structured documents having a hierarchical structure with “element” as a basic unit. Elements can have a parent-child relationship. Further, information called “attribute” can be added to the element. In addition, an element can have other elements and text as its children. Usually, elements and text corresponding to children of a certain element are collectively called “content” of the element.

前述において、一般的なＸＭＬ解析処理部１０３は、ＸＭＬ文書の基本単位である要素を構文解析する場合、ＸＭＬ構文解析メイン部１０８を起点として、要素開始タグ解析部１０９、要素文字チェック部１１０、属性解析・ノード生成部１１１、要素ノード生成部１１２、コンテンツ解析・テキストノード生成部１１３、要素終了タグ解析部１１４が順番に呼び出されて構文解析を行う。なお、ノードとは、ＸＭＬ文書を構文解析して認識された要素、属性、テキストをメモリ上に保持する単位のことである。ＸＭＬ文書に含まれる要素、属性、テキストは、構文解析処理を経ることにより、それぞれ要素ノード、属性ノード、テキストノードとしてメモリ上に保持される。ＸＭＬ解析処理部１０３は、生成された前述のノードをアプリケーション１１５に渡す。 In the above description, when a general XML analysis processing unit 103 parses an element that is a basic unit of an XML document, an element starttag analysis unit 109, an elementcharacter check unit 110, The attribute analysis /node generation unit 111, the elementnode generation unit 112, the content analysis / textnode generation unit 113, and the element endtag analysis unit 114 are called in order to perform syntax analysis. Note that a node is a unit that holds elements, attributes, and text recognized in a syntax analysis of an XML document in a memory. The elements, attributes, and text included in the XML document are stored in the memory as element nodes, attribute nodes, and text nodes, respectively, through a parsing process. The XML analysis processing unit 103 passes the generated node to theapplication 115.

要素開始タグ解析部１０９、要素終了タグ解析部１１４は、要素の開始タグ及び終了タグを字句解析する。字句解析とは、ＸＭＬ文書に含まれる文字列を、構造部分（マークアップと呼ばれる）を表す“＜”や“＞”などと、それ以外の部分とに分解する処理である。要素文字チェック部１１０は、要素に含まれる文字がＸＭＬ仕様で定義された文字に合致しているかどうかをチェックする処理である。これらの処理は比較的多くの時間を消費する。 The element starttag analysis unit 109 and the element endtag analysis unit 114 perform lexical analysis on the element start tag and the end tag. Lexical analysis is a process of decomposing a character string included in an XML document into “<” and “>” representing a structural part (referred to as markup) and other parts. The elementcharacter check unit 110 is a process for checking whether or not the character included in the element matches the character defined in the XML specification. These processes are relatively time consuming.

本発明の実施形態は、前述したような一般的なＸＭＬ解析処理部の構成に対して、本発明により事前解析部１１６、状態遷移テーブル検索部１１７、省メモリ要素ノード生成１１８を追加し、主記憶装置１０２の上に状態遷移テーブル１１９を追加したものである。 In the embodiment of the present invention, apre-analysis unit 116, a state transition table search unit 117, and a memory savingelement node generation 118 are added to the configuration of the general XML analysis processing unit as described above. A state transition table 119 is added on thestorage device 102.

図２はＸＭＬ文書の文書構造と、構文解析後のメモリ上のデータ構造とを説明する図であり、次に、これについて説明する。 FIG. 2 is a diagram for explaining the document structure of the XML document and the data structure on the memory after the syntax analysis. Next, this will be described.

図２（ａ）はＸＭＬ文書における要素と、要素から生成されるノードとを説明するものである。要素は、開始タグ２０１で始まり、終了タグ２０２で終わる。開始タグと終了タグとの間にはコンテンツが含まれていてもよい。コンテンツは、コンテンツ２０３（「００１」）のようにテキストだけであってもよく、コンテンツ２０４のように要素２０５（開始タグ２０１を＜コード＞、コンテンツ２０３を「００１」、終了タグを＜／コード＞とした要素）を含んでもよい。例えば、要素２０５は、テキストだけを含み、要素２０６（開始タグが＜仕入品＞、終了タグが＜／仕入品＞）は、要素とテキストの両方を含むものとしている。また、要素は属性２０７を保持することができる。 FIG. 2A illustrates the elements in the XML document and the nodes generated from the elements. The element begins with astart tag 201 and ends with anend tag 202. Content may be included between the start tag and the end tag. The content may be only text such as the content 203 (“001”), and the element 205 (thestart tag 201 is <code>, thecontent 203 is “001”, and the end tag is </ code> like the content 204. >) May be included. For example, theelement 205 includes only text, and the element 206 (the start tag is <purchase> and the end tag is </ purchase>) includes both the element and the text. An element can hold anattribute 207.

一般に、業務システムでＸＭＬ文書を処理するアプリケーションには、図２（ａ）に示しているような要素を複数個持ったＸＭＬ文書が、数百〜数万といったオーダで入力される。そして、図１に示して説明したＸＭＬ解析処理部１０３は、これらのＸＭＬ文書を構文解析処理する。構文解析結果は、図２（ｂ）に示すようなノードツリー２０８としてメモリ上に構築される。図２（ｂ）に示すノードツリー２０８は、図２（ａ）に示すＸＭＬ文書の要素２０６に対応するノードツリーの例である。 In general, an XML document having a plurality of elements as shown in FIG. 2A is input to an application that processes an XML document in a business system on the order of hundreds to tens of thousands. Then, the XML analysis processing unit 103 described with reference to FIG. 1 performs a syntax analysis process on these XML documents. The parsing result is constructed on a memory as anode tree 208 as shown in FIG. Anode tree 208 shown in FIG. 2B is an example of a node tree corresponding to theelement 206 of the XML document shown in FIG.

図３は図２（ｂ）に示したノードツリーを詳細に示した図である。 FIG. 3 shows the details of the node tree shown in FIG.

図２により説明したように、要素には親子兄弟関係及び要素−属性関係があり、これらに対応するノード間の関係をメモリ上で表現する必要があり、そのために、一般的には“参照”が使用される。参照の実装例としては、Ｊａｖａ(登録商標)言語における参照、Ｃ言語におけるポインタ等ががある。参照がメモリ上に占めるサイズは、プロセッサのバス幅に応じて３２ビットあるいは６４ビットであることが多い。 As described with reference to FIG. 2, an element has a parent-child sibling relationship and an element-attribute relationship, and the relationship between nodes corresponding to these must be expressed in a memory. Therefore, generally, “reference” is used. Is used. Examples of the reference implementation include a reference in Java (registered trademark) language, a pointer in C language, and the like. The size of the reference on the memory is often 32 bits or 64 bits depending on the bus width of the processor.

図３に示すノード群の最上位要素ノードである“仕入品”の要素ノード３０１は、属性及び兄弟ノードを持たないため、属性関係及び兄弟関係を表す参照３０２、３０３、３０４がnullを指している。“商品名”要素ノード３０５は、子要素ノード３０６及び属性ノード３０７を持つので、それらを参照している。また、“コード”要素ノード３０８の属性ノードへの参照はnullとなり、“備考”要素ノード３０９の子要素ノード及び属性ノードへの参照はnullとなる。通常、nullを参照していても、参照を確保するためには、メモリ領域（３２ビットあるいは６４ビット等）を確保する必要があるが、本発明の実施形態は、nullへの参照を確保しないようにすることによってメモリ使用量を抑制している。 Since theelement node 301 of “purchased goods” which is the highest element node of the node group shown in FIG. 3 does not have an attribute and a sibling node, references 302, 303, and 304 representing the attribute relation and the sibling relation indicate null. ing. Since the “product name”element node 305 has achild element node 306 and anattribute node 307, they are referred to. Further, the reference to the attribute node of the “code”element node 308 is null, and the reference to the child element node and attribute node of the “remarks”element node 309 is null. Normally, even if referring to null, in order to secure the reference, it is necessary to secure a memory area (32 bits or 64 bits), but the embodiment of the present invention does not secure a reference to null. By doing so, the memory usage is suppressed.

図４は本発明の実施形態によりメモリ使用量を抑制したノードの例について説明する図である。 FIG. 4 is a diagram illustrating an example of a node in which the memory usage is suppressed according to the embodiment of the present invention.

本発明の実施形態では、図４（ｂ）のテーブル４０１に示すように、要素ノードの種別を、兄弟要素ノードの有無、子要素ノードの有無、属性ノードの有無の組み合わせにより、ｔｙｐｅ１〜ｔｙｐｅ８の８つに分類する。要素ノードの種類をこのように分類しておくことにより、本発明の実施形態は、兄弟要素ノード、子要素ノード、属性ノードを持たない要素に対応するノードにnull参照を持たせる必要がなくなり、結果としてメモリ使用量を抑制した要素ノード（これを省メモリ要素ノードと称する）を生成することができる。 In the embodiment of the present invention, as shown in the table 401 of FIG. 4B, the type of element node is set to type 1 to type 8 depending on the combination of the presence / absence of sibling element nodes, the presence / absence of child element nodes and the presence / absence of attribute nodes. Classify into 8 categories. By classifying the types of element nodes in this way, the embodiment of the present invention does not need to have a null reference for nodes corresponding to elements that do not have sibling element nodes, child element nodes, and attribute nodes. As a result, it is possible to generate an element node (referred to as a memory-saving element node) in which the memory usage is suppressed.

図４（ａ）には、図３に示したノードツリーの各ノードに前述したような分類を行った場合のノードツリーを示している。この図４（ａ）において、例えば、“備考”ノード４０２は子要素及び属性を持たないｔｙｐｅ５のノードで表現されるため、図３に示して説明した“備考”ノード３０９と比較して、null参照２つ分のメモリを節約することができる。省メモリ要素ノードを生成するためには、要素ノード生成部において、兄弟要素の有無、子要素の有無、属性の有無に応じて８種類の中から適切なノードを生成する必要がある。しかし、単に適切な要素ノードを選択的に生成する処理を、従来の解析処理部の中に挿入して増加させると、従来のＸＭＬ解析処理部の処理と比較して処理ステップが多くなり、処理性能が劣化してしまう。このため、本発明の実施形態は、予め、状態遷移テーブルを１１９を作成しておき、この状態遷移テーブルを用いて、要素ノードを高速に生成することを可能としたものである。 FIG. 4A shows a node tree when the above-described classification is performed on each node of the node tree shown in FIG. In FIG. 4A, for example, the “remarks”node 402 is represented by atype 5 node having no child elements and attributes, so that it is null as compared with the “remarks”node 309 described with reference to FIG. Memory for two references can be saved. In order to generate a memory-saving element node, it is necessary for the element node generation unit to generate an appropriate node from eight types according to the presence / absence of sibling elements, the presence / absence of child elements, and the presence / absence of attributes. However, if the process of simply generating an appropriate element node is inserted into the conventional analysis processing unit and increased, the number of processing steps increases as compared with the process of the conventional XML analysis processing unit. Performance will deteriorate. For this reason, in the embodiment of the present invention, a state transition table 119 is created in advance, and an element node can be generated at high speed using this state transition table.

図５は代表的なＸＭＬ文書を入力として、ＸＭＬ構文解析処理の前に図１に示す事前解析部１１６の処理で作成した状態遷移テーブルの構成を示す図である。ここで示している代表的なＸＭＬ文書は、「要素の親子・兄弟・属性関係が、アプリケーションが扱うＸＭＬ文書と同じであるようなＸＭＬ文書」であり、図２に示して説明したＸＭＬ文書を例としている。図５に示す状態遷移テーブルの作成方法については後述するが、ここでは状態遷移テーブルの意味を説明する。ここに示している状態遷移テーブルは、構文解析すべき文書を事前解析部１１６に入力して事前解析部１１６が予め作成しておくものであり、実際の構文解析は、事前解析の処理の後に、構文解析すべき文書がＸＭＬ構文解析メイン部１０８に入力されて開始される。そして、図５に示す状態遷移テーブルは、構文解析時に次に説明するように利用される。 FIG. 5 is a diagram showing the configuration of the state transition table created by the processing of thepre-analysis unit 116 shown in FIG. 1 before the XML syntax analysis processing using a typical XML document as an input. The typical XML document shown here is an “XML document in which the parent-child / sibling / attribute relationship of the elements is the same as the XML document handled by the application”, and the XML document described with reference to FIG. As an example. The method of creating the state transition table shown in FIG. 5 will be described later, but here, the meaning of the state transition table will be described. In the state transition table shown here, a document to be parsed is input to thepre-analysis unit 116, and thepre-analysis unit 116 creates in advance. The document to be parsed is input to the XML parsingmain unit 108 and started. The state transition table shown in FIG. 5 is used as will be described below at the time of syntax analysis.

構文解析処理の初期状態Ｓ（５０１）において、“＜仕入品”という要素開始タグ５０２が入力されると、状態１（５０３）に遷移し、図４により説明したｔｙｐｅ３の要素ノード４０４を省メモリ要素ノードとして生成する。状態１（５０３）において、タグの終端文字である“＞”５０４が入力されると、状態２（５０５）に遷移し、図４により説明したｔｙｐｅ７の要素ノード４０３を省メモリ要素ノードとして生成する。続いて、“＜コード”という要素開始タグ５０６が入力されると、状態３（５０７）に遷移する。そして、状態３（５０７）において、タグの終端文字である“＞”とそれに続く任意のテキスト５０８が入力されると、状態４（５０９）に遷移する。状態４（５０９）において、“＜／コード＞”という要素終了タグ５１０が入力されると、状態５（５１１）に遷移する。状態５（５１１）において、“＜商品名”という要素開始タグ５１２が入力されると、状態６（５１３）に遷移し、図４により説明したｔｙｐｅ８の要素ノード４０５を省メモリ要素ノードとして生成する。状態６（５１３）において、任意の属性の並び及び任意のテキスト５１４が入力された場合、状態７（５１５）に遷移する。状態７（５１５）において、“＜／商品名＞”という要素終了タグ５１６が入力されると、状態８（５１７）に遷移する。状態８（５１７）において、“＜／備考／＞”という空要素タグ５１８が入力されると、状態９（５１９）に遷移し、図４により説明したｔｙｐｅ５の要素ノード４０２を省メモリ要素ノードとして生成する。状態９（５１９）において、“＜／仕入品＞”という要素終了タグ５２０が入力されると、状態１０（５２１）に遷移する。 In the initial state S (501) of the parsing process, when the element start tag 502 “<purchased product” is input, the state transitions to the state 1 (503), and theelement node 404 oftype 3 described with reference to FIG. 4 is omitted. Create as a memory element node. In state 1 (503), when “>” 504, which is the terminal character of the tag, is input, the state transits to state 2 (505), and theelement node 403 oftype 7 described with reference to FIG. 4 is generated as a memory-saving element node. . Subsequently, when theelement start tag 506 of “<code” is input, the state transits to the state 3 (507). Then, in the state 3 (507), when “>” which is the terminal character of the tag and anysubsequent text 508 are input, the state transits to the state 4 (509). When the element end tag 510 “</ code>” is input in the state 4 (509), the state transitions to the state 5 (511). When the element start tag 512 of “<product name” is input in the state 5 (511), the state transitions to the state 6 (513), and theelement node 405 of thetype 8 described with reference to FIG. 4 is generated as a memory-saving element node. . In state 6 (513), when an arbitrary attribute list and arbitrary text 514 are input, the state transitions to state 7 (515). When the element end tag 516 “</ product name>” is input in the state 7 (515), the state transitions to the state 8 (517). In state 8 (517), when anempty element tag 518 of “</ Remarks />” is input, the state transitions to state 9 (519), and theelement node 402 oftype 5 described with reference to FIG. 4 is set as a memory saving element node. Generate. In the state 9 (519), when theelement end tag 520 "</ purchased item>" is input, the state transits to the state 10 (521).

前述したような図４に示す状態遷移図において、横方向への遷移と、縦方向への遷移とは、特に意味を持つものではない。 In the state transition diagram shown in FIG. 4 as described above, the transition in the horizontal direction and the transition in the vertical direction are not particularly significant.

前述したように、「解析中にＸＭＬ文書から切り出した文字列」が、状態遷移テーブルの矢印上に記載された「遷移文字列」と一致すれば状態を進め、一致しなければ状態遷移テーブルの使用を終了する。一致した場合、図１における要素開始タグ解析部１０９及び要素文字チェック部１１０の処理が不要になる。なぜなら、状態遷移テーブル作成時にこれらの処理は済んでおり、必要なデータは、状態遷移テーブルが“遷移用文字列”として保持しているからである。従って、状態遷移テーブルの遷移が成功し続ければ、すなわち、入力文字列と遷移用文字列とが一致し続ければ、従来の構文解析処理が行う特定の処理をスキップすることにより構文解析処理を高速に進めつつ、メモリ使用量が少ない省メモリ要素ノードを生成することができる。 As described above, if the “character string cut out from the XML document during analysis” matches the “transition character string” described on the arrow of the state transition table, the state is advanced. End use. If they match, the processing of the element starttag analysis unit 109 and the elementcharacter check unit 110 in FIG. 1 becomes unnecessary. This is because these processes are completed when the state transition table is created, and necessary data is held in the state transition table as “transition character strings”. Therefore, if the transition of the state transition table continues to be successful, that is, if the input character string and the character string for transition continue to match, the parsing process is skipped and the parsing process is skipped. Thus, a memory-saving element node with a small memory usage can be generated.

なお、状態遷移テーブルの遷移が失敗したとき、すなわち、入力文字列と遷移用文字列とが一致しなかった場合、従来からの通常の構文解析処理と同じように、図１における要素開始タグ解析部１０９、要素文字チェック部１１０、要素ノード生成部１１２を実行する必要がある。このため、遷移の失敗が頻発すると構文解析処理性能は劣化し、メモリ使用量も従来技術と同じというペナルティが生じることになる。しかし、前述で説明したように、本発明は、「アプリケーションの入力となる構造化文書は似通ったものになる」ことを前提としているため、ペナルティが発生する頻度は非常に少ない。 When the transition of the state transition table fails, that is, when the input character string and the character string for transition do not match, the element start tag analysis in FIG. Theunit 109, the elementcharacter check unit 110, and the elementnode generation unit 112 need to be executed. For this reason, if the failure of the transition occurs frequently, the parsing processing performance deteriorates, and a penalty that the memory usage is the same as that of the prior art is generated. However, as described above, the present invention is based on the premise that “the structured document that is input to the application is similar”, and therefore the frequency of occurrence of penalties is very low.

図６は事前解析部１１６が状態遷移テーブルを作成する処理の動作を説明するフローチャートであり、次に、これについて説明する。 FIG. 6 is a flowchart for explaining the operation of the process in which thepre-analysis unit 116 creates the state transition table, which will be described next.

（１）まず、状態遷移テーブル中に初期状態を作成し登録する。この初期状態の作成登録し、図５に示した初期状態Ｓ（５０１）を作成し登録することである（ステップ６０１）。(1) First, an initial state is created and registered in the state transition table. The initial state is created and registered, and the initial state S (501) shown in FIG. 5 is created and registered (step 601).

（２）次に「要素の親子・兄弟・属性関係が、アプリケーションが扱うＸＭＬ文書と同じであるような、代表的なＸＭＬ文書」の先頭から入力文字列を読み込み、入力文字列の種別を判定する。入力文字列の種別としては、１．要素開始タグ（“＜”から始まる）、２．要素終了タグ（“＜”から始まる）、３．テキスト（“＜”以外で始まる）のどれかである（ステップ６０２）。(2) Next, the input character string is read from the head of “a typical XML document whose element parent / child / sibling / attribute relationship is the same as that of the XML document handled by the application”, and the type of the input character string is determined. To do. The types of input character strings are: Element start tag (starts with “<”), 2. 2. Element end tag (starts with “<”); Any text (beginning with something other than “<”) (step 602).

（２）ステップ６０２の入力文字列の種別を判定で、入力文字列の種別が要素開始タグ、または、要素終了タグであった場合、現在の状態（最初は初期状態）においてその入力文字列が指す遷移先状態が既に遷移テーブル内に存在するか否かを内判定する（ステップ６０３）。(2) When the type of the input character string is determined instep 602 and the type of the input character string is an element start tag or an element end tag, the input character string is displayed in the current state (initial state at first). It is determined whether or not the indicated transition destination state already exists in the transition table (step 603).

（３）ステップ６０３の判定で、入力文字列が指す遷移先状態が既に遷移テーブル内に存在していた場合、新たな状態を作成せずに既存の状態に遷移し（６０４）、存在しなかった場合、入力文字列である開始・終了タグを遷移先文字列とした新たな遷移先の状態を作成する（ステップ６０４〜６０６）。(3) If it is determined instep 603 that the transition destination state indicated by the input character string already exists in the transition table, the transition is made to the existing state without creating a new state (604) and does not exist. In the case of a new transition destination, a new transition destination state is created using the start / end tags, which are input character strings, as transition destination character strings (steps 604 to 606).

（４）そして、ステップ６０２の入力文字列の種別を判定で、入力文字列の種別が要素開始タグであった場合、次の文字列が属性である場合があるが、属性に関する情報は状態遷移テーブルに保持しないので単にスキップし、次に、タグの終端文字である“＞”が入力されるので遷移先の状態を作成し、その状態の遷移用文字列として“＞”を設定する（ステップ６０７、６０８）。(4) When the input character string type is determined instep 602 and the input character string type is an element start tag, the next character string may be an attribute. Since it is not held in the table, it is simply skipped. Next, since the tag end character “>” is input, a transition destination state is created, and “>” is set as the state transition character string (step) 607, 608).

（５）ステップ６０２の入力文字列の種別を判定で、入力文字列の種別がテキストであった場合、現在の状態において「任意のテキスト」が指す遷移先状態が既に存在するか否かを判定し、存在しなかった場合、「任意のテキスト」を表す特殊記号を遷移用文字列入力設定して、新たな状態を遷移先の状態として作成する（ステップ６０９〜６１１）。(5) When the type of the input character string is determined instep 602 and the type of the input character string is text, it is determined whether or not the transition destination state indicated by “arbitrary text” already exists in the current state. If it does not exist, a special symbol representing “arbitrary text” is input and set as a transition character string, and a new state is created as a transition destination state (steps 609 to 611).

（６）ステップ６０７、６０８、６１１の処理の後、あるいは、ステップ６０９の判定で、現在の状態において「任意のテキスト」が指す遷移先状態が既に存在していた場合、前述まての処理を、全ての要素の読み込みが終了するまで行ったか否かを判定し、終了したいなかった場合、ステップ６０２からの処理に戻って処理を繰り返し、そうでなければ、ここでの処理を終了する（ステップ６１２）。(6) After the processing insteps 607, 608, and 611, or in the determination instep 609, if the transition destination state pointed to by “arbitrary text” already exists in the current state, the above processing is performed. It is determined whether or not all elements have been read. If not, the process returns to step 602 to repeat the process. Otherwise, the process ends here (step 612).

前述したような事前解析部１１６での処理により、図５に説明したような遷移テーブルが生成されることになる。 The transition table as illustrated in FIG. 5 is generated by the processing in thepre-analysis unit 116 as described above.

図７は本発明の実施形態でのＸＭＬ文書の解析処理の動作を説明するフローチャートであり、次に、これについて説明する。なお、ここでの処理の開始時には、事前解析部１１６での処理により、図５に説明したような遷移テーブルが生成されている。 FIG. 7 is a flowchart for explaining the operation of the XML document analysis processing according to the embodiment of the present invention. Next, this will be described. At the start of the processing here, the transition table as described in FIG. 5 is generated by the processing in thepre-analysis unit 116.

（１）まず、ＸＭＬ解析処理部１０３は、現在の状態を初期状態にリセットする。これは、カレントステートに「Ｓ」をセットする処理である（ステップ７０１）。(1) First, the XML analysis processing unit 103 resets the current state to the initial state. This is a process of setting “S” in the current state (step 701).

（２）次に、ＸＭＬ解析処理部１０３は、ＸＭＬ文書１０７を読み込んで、ＸＭＬ構文解析メイン部１０８が要素の読み込みを開始する（ステップ７０２）。(2) Next, the XML analysis processing unit 103 reads theXML document 107, and the XML syntax analysismain unit 108 starts reading elements (step 702).

（３）状態遷移テーブル検索部１１７は、読み込み中の要素開始タグが状態遷移テーブルの現在状態から出る遷移用文字列と一致するか否かを判定し、一致していなければ、現在の状態を初期状態にリセットする。この場合、図１において、要素開始タグ解析部１０９、要素文字チェック部１１０の処理に移行して、従来技術の場合と同様な処理となる（ステップ７０３、７０４）。(3) The state transition table search unit 117 determines whether or not the element start tag being read matches the transition character string output from the current state of the state transition table. Reset to the initial state. In this case, in FIG. 1, the process proceeds to the process of the element starttag analysis unit 109 and the elementcharacter check unit 110, and the process is the same as in the case of the prior art (steps 703 and 704).

（４）ステップ７０３の判定で、読み込み中の要素開始タグが状態遷移テーブルの現在状態から出る遷移用文字列と一致した場合、遷移用文字列に従って状態遷移し、省メモリ要素ノード生成部１１８が遷移先状態の要素typeに従って、省メモリ要素ノードを生成する（ステップ７０５、７０６）。(4) If it is determined instep 703 that the element start tag being read matches the character string for transition output from the current state of the state transition table, the state transition is performed according to the character string for transition, and the memory-saving element node generation unit 118 A memory saving element node is generated according to the element type of the transition destination state (steps 705 and 706).

（５）次に、属性解析・ノード生成部１１１が属性があれば解析して属性ノードを生成する（ステップ７０７）。(5) Next, if there is an attribute, the attribute analysis /node generation unit 111 analyzes it and generates an attribute node (step 707).

（６）次に、タグの終端を表す文字“＞”が必ず出現し、その後、場合によってはテキストが出現するので、これらの文字列、すなわち、“＞”と任意のテキストを表す特殊文字とを遷移用文字列とする状態が現在の状態に隣接している場合、隣接している状態に遷移し、さもなければ現在の状態を初期状態にリセットする。ここでの処理は、コンテンツ解析・テキストノード生成部１１３により行われる（ステップ７０８）。(6) Next, since the character “>” representing the end of the tag always appears, and then the text appears in some cases, these character strings, that is, “>” and a special character representing any text, If the state having the character string for transition is adjacent to the current state, the state transitions to the adjacent state, otherwise the current state is reset to the initial state. This processing is performed by the content analysis / text node generation unit 113 (step 708).

（７）次に、これまでのステップにより省メモリ要素ノードが生成されたか否かを判定し、省メモリ要素ノードが生成されていなかった場合、通常の要素ノードを生成する（ステップ７０９、７１０）。(7) Next, it is determined whether or not a memory-saving element node has been generated by the above steps. If no memory-saving element node has been generated, a normal element node is generated (steps 709 and 710). .

（８）ステップ７１０の処理の後、あるいは、ステップ７０９の判定で、省メモリ要素ノードが生成されていた場合、要素終了タグ解析部１１４が、読み込み中の要素終了タグを表す文字列が、状態遷移図中の現在状態における遷移用文字列と一致するか否かを判定する（ステップ７１１）。(8) After the processing ofstep 710 or when the memory-saving element node has been generated in the determination ofstep 709, the element endtag analysis unit 114 indicates that the character string representing the element end tag being read is in the state It is determined whether or not it matches the character string for transition in the current state in the transition diagram (step 711).

（９）ステップ７１１の判定で、読み込み中の要素終了タグを表す文字列が、状態遷移図中の現在状態における遷移用文字列と一致した場合、遷移用文字列に従って状態遷移し、一致しなければ、現在の状態を初期状態にリセットする（ステップ７１２、７１３）。(9) If the character string representing the element end tag being read matches the character string for transition in the current state in the state transition diagram in the determination instep 711, the state transitions according to the character string for transition and must match. If so, the current state is reset to the initial state (steps 712 and 713).

（10）ステップ７１２、７１３の処理の終了後、全ての要素の読み込みが終了し、前述の処理を行ったか否かを判定し、全ての要素に対する処理がすんでいなかった場合、ステップ７０２からの処理に戻って処理を繰り返し、全ての要素に対する処理がすんでいた場合、ここでの処理を終了する（ステップ７１４）。(10) After completion of the processing insteps 712 and 713, it is determined whether or not reading of all elements has been completed, and whether or not the above-described processing has been performed. Returning to the processing, the processing is repeated, and when the processing for all the elements has been completed, the processing here ends (step 714).

前述した本発明の実施形態での各処理は、プログラムにより構成し、本発明が備えるＣＰＵに実行させることができ、また、それらのプログラムは、ＦＤ、ＣＤＲＯＭ、ＤＶＤ等の記録媒体に格納して提供することができ、また、ネットワークを介してディジタル情報により提供することができる。 Each process in the above-described embodiment of the present invention is configured by a program and can be executed by a CPU included in the present invention. These programs are stored in a recording medium such as an FD, CDROM, or DVD. It can be provided and can be provided by digital information via a network.

前述した処理において、読み込み中の要素開始タグが状態遷移テーブルの現在状態から出る遷移用文字列と一致するかどうかを判定するステップ７０３の処理は、状態遷移テーブルに登録された文字列と現在解析中の入力文字列とを単純に文字列比較するだけであり、負荷の低い処理である。従来技術によるＸＭＬ構文解析処理は、要素開始タグ解析部１０９による要素タグの解析（ＸＭＬとしての字句規則に合致していることを確認する処理）及び要素文字チェック部１１０による要素文字チェック（ＸＭＬの要素名として許される文字だけが使われていることを確認する処理）という比較的負荷が高い処理が必要であった。 In the processing described above, the processing instep 703 for determining whether the element start tag being read matches the character string for transition output from the current state of the state transition table is the same as the processing of the character string registered in the state transition table and the current analysis. This is a process with a low load by simply comparing the character string with the input character string. The XML parsing processing according to the prior art includes element tag analysis by the element start tag analysis unit 109 (processing for confirming that it matches the lexical rule as XML) and element character check by the element character check unit 110 (XML The process that confirms that only the characters allowed for the element name are used was required.

前述した本発明の実施形態によれば、入力文字列が図５に示して説明した事前解析処理で生成した状態遷移図に沿ったものであれば、言い換えると、入力ＸＭＬ文書が状態遷移テーブル生成の入力となった「要素の親子・兄弟・属性関係が、アプリケーションが扱うＸＭＬ文書と同じであるような、代表的なＸＭＬ文書」と同一構造であるという条件を満たせば、負荷が低い処理で省メモリ要素ノードを数多く生成することができる。本発明の実施形態は、これにより、高速かつメモリ使用量を抑制した構文解析処理を行うＸＭＬ文書管理装置を実現することができる。 According to the above-described embodiment of the present invention, if the input character string is in accordance with the state transition diagram generated by the pre-analysis processing described with reference to FIG. 5, in other words, the input XML document is generated as a state transition table. If the condition that the same structure as the “representative XML document in which the parent-child / sibling / attribute relationship of the element is the same as that of the XML document handled by the application” is satisfied is satisfied, Many memory-saving element nodes can be generated. Accordingly, the embodiment of the present invention can realize an XML document management apparatus that performs syntax analysis processing at high speed and with reduced memory usage.

一般に、アプリケーションの入力となる個々の構造化文書は、同じ構造を持つものであり、親子・兄弟・属性関係が等しいことが多い。例えば、「“仕入品”要素の子要素が、“商品名”要素」といった構造が繰り返し出現する。また、アプリケーションの入力となる複数の構造化文書が、すべて同一構造を持っていることも多い。このため、前出の例を再度用いることとすると、アプリケーションが１０００個の構造化文書を受け取った場合、個々の構造化文書のすべてが「“仕入品”要素の子要素は、“商品名”要素である」という構造をしていることが多い。 In general, each structured document that is an input of an application has the same structure and often has the same parent-child / sibling / attribute relationship. For example, a structure such as “a child element of the“ purchased product ”element is a“ product name ”element” repeatedly appears. In many cases, a plurality of structured documents serving as application inputs all have the same structure. Therefore, if the above example is used again, when the application receives 1000 structured documents, all of the individual structured documents are “children of the“ purchase ”element are“ product names ”. In many cases, the structure is “element”.

本発明の実施形態は、アプリケーションの入力となる構造化文書の前述のような性質に着目して処理を行っているため、構文解析の処理を高速化することができる。 In the embodiment of the present invention, processing is performed by paying attention to the above-described property of the structured document that is an input of the application, so that the parsing process can be speeded up.

１０１計算機システム
１０２主記憶装置
１０３ＸＭＬ解析処理部
１０４プロセッサ
１０５バス
１０６補助記憶装置
１０７ＸＭＬ文書
１０８ＸＭＬ構文解析メイン部
１０９要素開始タグ解析部
１１０要素文字チェック部
１１１属性解析・ノード生成部
１１２要素ノード生成部
１１３コンテンツ解析・テキストノード生成部
１１４要素終了タグ解析部
１１５アプリケーション
１１６事前解析部
１１７状態遷移テーブル検索部
１１８省メモリ要素ノード生成
１１９状態遷移テーブルDESCRIPTION OFSYMBOLS 101Computer system 102 Main storage device 103 XMLanalysis process part 104Processor 105Bus 106Auxiliary storage device 107XML document 108 XML parsingmain part 109 Element starttag analysis part 110 Elementcharacter check part 111 Attribute analysis /node generation part 112 Elementnode Generation unit 113 Content analysis / textnode generation unit 114 Element endtag analysis unit 115Application 116 Pre-analysis unit 117 State transitiontable search unit 118 Memory savingelement node generation 119 State transition table

Claims

Translated fromJapanese

入力される構造化文書の構文解析を行う文書管理方法であって、
文書管理装置が備える構造化文書の解析処理部が、事前解析処理手段と、該事前解析処理手段が生成した状態遷移テーブルを検索する状態遷移テーブル検索手段とを有し、
前記事前解析処理手段は、入力される構造化文書の構造単位の親子兄弟関係パターンを構文解析処理前に事前解析して該親子兄弟関係を格納した状態遷移テーブルを生成し、
前記構造化文書の解析処理部は、入力された文書の構文解析処理中に、前記状態遷移テーブル検索手段により前記状態遷移テーブルを参照し、前記状態遷移テーブルに格納済みの文字列の処理が要求されている場合に、前記状態遷移テーブルに格納済みの解析結果を返すことを特徴とする文書管理方法。A document management method for parsing an input structured document,
The analysis processing unit of the structured document provided in the document management apparatus has a pre-analysis processing unit and a state transition table search unit that searches for a state transition table generated by the pre-analysis processing unit,
The pre-analysis processing unit generates a state transition table storing the parent-child sibling relationship by pre-analyzing the parent-child sibling relationship pattern of the structural unit of the input structured document before syntax analysis processing;
The analysis processing unit of the structured document refers to the state transition table by the state transition table search means during the parsing process of the input document, and requests processing of a character string stored in the state transition table. If it is, the document management method returns the analysis result stored in the state transition table.

請求項１記載の文書管理方法であって、
前記事前解析処理手段は、出現し得ない兄弟構造や属性構造を省いたメモリ構造を、前記状態遷移テーブルに格納することを特徴とする文書管理方法。The document management method according to claim 1,
The pre-analysis processing means stores a memory structure in which sibling structures and attribute structures that cannot appear in the state transition table are stored in the state transition table.

請求項１または２記載の文書管理方法であって、
前記解析処理部は、入力された文書の構文解析処理中に、構文解析中の文字列と前記状態遷移テーブルに格納した要素文字列との文字列の比較を行って文書の構文解析を実現することを特徴とする文書管理方法。A document management method according to claim 1 or 2, wherein
The parsing unit performs parsing of the document by comparing the character string being parsed with the element character string stored in the state transition table during the parsing process of the input document. A document management method characterized by the above.

入力される構造化文書の構文解析を行う文書管理装置であって、
文書管理装置が備える構造化文書の解析処理部が、事前解析処理手段と、該事前解析処理手段が生成した状態遷移テーブルを検索する状態遷移テーブル検索手段とを有し、
前記事前解析処理手段は、入力される構造化文書の構造単位の親子兄弟関係パターンを構文解析処理前に事前解析して該親子兄弟関係を格納した状態遷移テーブルを生成し、
前記構造化文書の解析処理部は、入力された文書の構文解析処理中に、前記状態遷移テーブル検索手段により前記状態遷移テーブルを参照し、前記状態遷移テーブルに格納済みの文字列の処理が要求されている場合に、前記状態遷移テーブルに格納済みの解析結果を返すことを特徴とする文書管理装置。A document management device for parsing an input structured document,
The analysis processing unit of the structured document provided in the document management apparatus has a pre-analysis processing unit and a state transition table search unit that searches for a state transition table generated by the pre-analysis processing unit,
The pre-analysis processing unit generates a state transition table storing the parent-child sibling relationship by pre-analyzing the parent-child sibling relationship pattern of the structural unit of the input structured document before syntax analysis processing;
The analysis processing unit of the structured document refers to the state transition table by the state transition table search means during the parsing process of the input document, and requests processing of a character string stored in the state transition table. If it is, the document management apparatus returns an analysis result stored in the state transition table.