JP2008287412A

Movatterモバイル変換

Info

Publication number: JP2008287412A
Application number: JP2007130465A
Authority: JP
Inventors: Arei Kobayashi; 亜令小林; Shigeki Muramatsu; 茂樹村松; Daisuke Iwami; 大介岩見
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2007-05-16
Filing date: 2007-05-16
Publication date: 2008-11-27

Abstract

【課題】符号化対象文書内の各要素や属性によって最適な符号化を行えるデータ型が異なる場合に、効率的な符号圧縮が行えない。
【解決手段】符号化対象文書内の各要素、属性に対する符号化テーブルに、同じ要素名であるが異なるデータ型を持つ符号化テーブルを定義する。そして符号化時に、符号化対象文書内の各要素、属性について、最も効率よく符号化できるデータ型を動的に選択しながら、符号化を行う。
【選択図】図４Efficient code compression cannot be performed when a data type that can be optimally encoded differs depending on each element or attribute in an encoding target document.
Encoding tables having the same element name but different data types are defined in the encoding table for each element and attribute in the encoding target document. At the time of encoding, encoding is performed while dynamically selecting a data type that can be encoded most efficiently for each element and attribute in the encoding target document.
[Selection] Figure 4

Description

Translated fromJapanese

本発明は、拡張可能なテキスト形式の構造型記述言語で記載された文書データを符号化する文書データ符号化方法、符号化システム及びそのプログラムに関する。 The present invention relates to a document data encoding method, an encoding system, and a program for encoding document data described in a structural description language in an extensible text format.

従来、伝送すべきデータ量を削減するために、文書データを符号化及び復号する方法がある。この方法を実現するためには、送信装置及び受信装置はそれぞれ、変換テーブルを所持する必要がある。変換テーブルは、構造型記述言語と符号データとを１対１に対応付けたものである。送信装置は、変換テーブルに基づいて文書データを符号データに符号化する。一方、受信装置は、変換テーブルに基づいて符号データを文書データに復号する。このような方法は、インターネットにおけるセキュリティの観点からも有効である。変換テーブルを有さないクライアントは、符号データを復号することができないからである。 Conventionally, there is a method for encoding and decoding document data in order to reduce the amount of data to be transmitted. In order to realize this method, each of the transmission device and the reception device needs to have a conversion table. The conversion table is a one-to-one correspondence between a structured description language and code data. The transmission device encodes the document data into code data based on the conversion table. On the other hand, the receiving device decodes the code data into document data based on the conversion table. Such a method is also effective from the viewpoint of security on the Internet. This is because a client that does not have a conversion table cannot decode the encoded data.

具体的には、ＸＭＬ(eXtensible Markup Language)又はＳＧＭＬ(Standard Generalized Markup Language)準拠の符号化テーブルを用いて、ＸＭＬ／ＳＧＭＬ文書データの符号化を行う方法がある（例えば特許文献１）。この方法は、ＸＭＬ／ＳＧＭＬ形式の構造型記述言語で記載された変換テーブルに、要素名、要素値、属性名及び属性値の項目に符号長及び符号を定義し、第１の要素名に対する第２の要素名が親子関係を示す符号長及び符号とを定義したものである。この変換テーブルを用いて文書データを符号化することにより、データ伝送量を削減することができる。また、復号側装置は、符号データからの元の文書データを復元させることないために、パーサも必要としない。 Specifically, there is a method of encoding XML / SGML document data using an encoding table compliant with XML (eXtensible Markup Language) or SGML (Standard Generalized Markup Language) (for example, Patent Document 1). This method defines a code length and a code for the element name, element value, attribute name, and attribute value items in the conversion table described in the XML / SGML structured type description language, and sets the first element name to the first element name. The element name of 2 defines the code length and code indicating the parent-child relationship. By encoding the document data using this conversion table, the data transmission amount can be reduced. Further, since the decoding side apparatus does not restore the original document data from the code data, it does not require a parser.

また、特許文献２は、この符号化圧縮率を向上させるために、文書構造の近似性に着眼し、文書構造や属性値や要素値の出現頻度を用いた符号化方法を開示している。 Patent Document 2 discloses an encoding method using document structure, attribute values, and appearance frequency of element values, focusing on the closeness of the document structure in order to improve the encoding compression rate.

図１は、符号化サーバを含むシステム構成図である。 FIG. 1 is a system configuration diagram including an encoding server.

サーバ４は、ＸＭＬ形式の文書データを符号化サーバ６へ送信する。符号化サーバ６は、変換テーブルサーバ７から受信した変換テーブルを用いて、文書データを符号化する。その符号データは、クライアント５へ送信される。クライアント５は、変換テーブルサーバ７から受信した変換テーブルを用いて、文書処理を行う。図１によれば、ＸＭＬ形式の文書データを送信する既存のサーバに変更を加えることなく、符号化サーバをプロキシサーバとして利用することができる。 Theserver 4 transmits document data in XML format to theencoding server 6. Theencoding server 6 encodes the document data using the conversion table received from the conversion table server 7. The code data is transmitted to theclient 5. Theclient 5 performs document processing using the conversion table received from the conversion table server 7. According to FIG. 1, the encoding server can be used as a proxy server without changing an existing server that transmits XML-format document data.

特開２００２−２５９１９４号公報JP 2002-259194 A特開２００５−２１５９５１号公報JP 2005-215951 A

しかしながら、上記の従来技術では、符号化テーブル（変換テーブル）を設計する際に、文書内の各々の属性値や要素値のデータ型をあらかじめ設定する必要があるため、設定されたデータ型の値の出現頻度に応じた符号化は可能であるが、出現する値によって動的にデータ型を変更することは不可能であった。これにより、符号化対象文書内の各要素や属性によって最適な符号化を行えるデータ型が異なる場合に、動的にデータ型を選択することができず、その結果、効率的な符号圧縮が行えないという課題を有していた。 However, in the above-described prior art, when designing the encoding table (conversion table), it is necessary to set the data type of each attribute value or element value in the document in advance. However, it is impossible to dynamically change the data type depending on the value that appears. As a result, when the data type that can be optimally encoded differs depending on each element or attribute in the encoding target document, the data type cannot be selected dynamically, and as a result, efficient code compression can be performed. Had no problem.

従って、本発明は、符号化対象文書内の各要素値と属性値について、最も効率よく符号化を行えるデータ型を動的に選択し、符号化を行うことのできる文書データの符号化方法、符号化システム及びそのプログラムを提供することを目的とする。 Therefore, the present invention dynamically selects a data type that can be most efficiently encoded for each element value and attribute value in the encoding target document, and encodes document data that can be encoded. It is an object to provide an encoding system and a program thereof.

上記目的を実現するため本発明による文書データ符号化方法は、拡張可能なテキスト形式の構造型記述言語で記載された文書データを符号化する文書データ符号化方法において、前記文書データをパース処理し、文書構造の解析を行い、該文書データの要素値及び属性値を取得するステップと、同じ要素値及び属性値に対して異なる複数のデータ型を定義した符号化テーブルを用いて、前記要素値及び属性値に対して最適なデータ型を選択するステップと、前記符号化テーブルを用いて、前記要素値及び属性値を選択されたデータ型により符号化を行うステップとを含んでいる。 To achieve the above object, a document data encoding method according to the present invention is a document data encoding method for encoding document data described in an extensible text format structured description language, wherein the document data is parsed. Analyzing the document structure, obtaining the element value and attribute value of the document data, and using the encoding table defining a plurality of different data types for the same element value and attribute value. And selecting an optimal data type for the attribute value, and encoding the element value and the attribute value with the selected data type using the encoding table.

また、前記最適なデータ型を選択するステップは、前記符号化テーブルに明示的または暗黙的に定義された優先度に従い最適なデータ型を選択するステップであることも好ましい。 The step of selecting the optimum data type is preferably a step of selecting the optimum data type according to the priority explicitly or implicitly defined in the encoding table.

また、前記最適なデータ型を選択するステップは、異なる複数のデータ型で前記要素値及び属性値を符号化して最も圧縮率の高くなるデータ型を選択するステップであることも好ましい。 The step of selecting the optimum data type is preferably a step of selecting the data type having the highest compression rate by encoding the element value and the attribute value with a plurality of different data types.

また、前記符号化を行うステップは、前記要素値及び属性値の符号に、前記選択されたデータ型を明記する符号を付加することも好ましい。 In the encoding step, it is also preferable to add a code specifying the selected data type to the code of the element value and attribute value.

上記目的を実現するため本発明による文書データ符号化処理システムは、拡張可能なテキスト形式の構造型記述言語で記載された文書データを符号化する文書データ符号化処理システムにおいて、前記文書データをパース処理し、文書構造の解析を行い、該文書データの要素値及び属性値を取得する符号化対象文書パース機能と、同じ要素値及び属性値に対して異なる複数のデータ型を定義した符号化テーブルを用いて、前記要素値及び属性値に対して最適なデータ型を選択する最適データ型選択機能と、前記符号化テーブルを用いて、前記要素値及び属性値を選択されたデータ型により符号化を行う符号化機能とを備えている。 To achieve the above object, a document data encoding processing system according to the present invention is a document data encoding processing system that encodes document data described in an extensible text format structured description language, and parses the document data. An encoding target document parsing function for processing, analyzing the document structure, and obtaining the element value and attribute value of the document data, and an encoding table defining a plurality of different data types for the same element value and attribute value And an optimal data type selection function for selecting an optimal data type for the element value and attribute value, and the encoding value is used to encode the element value and attribute value using the selected data type. And an encoding function for performing.

上記目的を実現するため本発明による文書データ符号化プログラムは、拡張可能なテキスト形式の構造型記述言語で記載された文書データを符号化する文書データ符号化プログラムにおいて、前記符号化対象文書パース機能が、前記文書データをパース処理し、文書構造の解析を行い、該文書データの要素値及び属性値を取得するステップと、同じ要素値及び属性値に対して異なる複数のデータ型を定義した符号化テーブルを用いて、前記最適データ型選択機能が、前記要素値及び属性値に対して最適なデータ型を選択するステップと、前記符号化機能が、前記符号化テーブルを用いて、前記要素値及び属性値を選択されたデータ型により符号化を行うステップとしてコンピュータを機能させる。 To achieve the above object, a document data encoding program according to the present invention is a document data encoding program for encoding document data described in an extensible text format structured description language. A step of parsing the document data, analyzing the document structure, obtaining the element value and attribute value of the document data, and defining a plurality of different data types for the same element value and attribute value The optimal data type selection function uses the encoding table to select the optimal data type for the element value and attribute value, and the encoding function uses the encoding table to select the element value. And causing the computer to function as a step of encoding the attribute value according to the selected data type.

本発明によれば、符号化対象文書内に出現する同じ要素名、属性名に対応する値のデータ型が異なる場合であっても、要素名、属性名に対して最適なデータ型を動的に選択できるため、効率よく符号化を行うことが可能となり、符号圧縮率をより向上させることができる。このような符号化方法は、ＸＭＬ及びＳＧＭＬ準拠の全ての文書データの符号化に適用でき、複数種の記法による数列や数値と文字列が混在する値が多く含まれる文書データに対して特に効率的である。 According to the present invention, even when the data types of the values corresponding to the same element name and attribute name appearing in the encoding target document are different, the optimum data type is dynamically selected for the element name and attribute name. Therefore, efficient encoding can be performed, and the code compression rate can be further improved. Such an encoding method can be applied to encoding of all document data conforming to XML and SGML, and is particularly efficient for document data including a large number of numerical values or a mixture of numerical values and character strings in a plurality of notations. Is.

以下では、図面を用いて、本発明を実施するための最良の形態について詳述する。 Hereinafter, the best mode for carrying out the present invention will be described in detail with reference to the drawings.

図２は、本発明による文書データの符号処理方法である。図２によれば、文書データ１２は、複数の文書データ１２０及び１２１によって拡張されている。一方、変換テーブル１１も、拡張された文書データに対応して、複数の変換テーブル１１０及び１１１のリンク情報を定義している。これにより、ＸＭＬ形式の文書データ１２は、変換テーブル１１を用いて符号化１０される。 FIG. 2 shows a document data code processing method according to the present invention. According to FIG. 2, thedocument data 12 is expanded by a plurality ofdocument data 120 and 121. On the other hand, the conversion table 11 also defines link information of a plurality of conversion tables 110 and 111 corresponding to the expanded document data. As a result, thedocument data 12 in the XML format is encoded 10 using the conversion table 11.

また、図２によれば、符号データは、変換テーブル２１を用いて、直接的に文書処理３０され、ブラウザ２４に表示される。本発明によれば、符号データには、要素の論理構造も含まれる。従って、文書データに復号する必要もなく、更にパーサ２３によって論理構造を解析する必要もない。 In addition, according to FIG. 2, the code data is directly processed 30 using the conversion table 21 and displayed on thebrowser 24. According to the present invention, the code data includes the logical structure of the elements. Accordingly, there is no need to decrypt the document data, and there is no need to analyze the logical structure by the parser 23.

図３は、本発明の機能構成図である。機能構成は、大きく、符号化対象文書３１、符号化テーブル３２、符号化処理ソフト３３、符号化結果３４の４つに分かれ、本発明の主部は、符号化テーブル３２及び符号化処理ソフト３３である。以下に各々の説明を示す。 FIG. 3 is a functional configuration diagram of the present invention. The functional configuration is broadly divided into four documents: anencoding target document 31, an encoding table 32,encoding processing software 33, and anencoding result 34. The main part of the present invention is the encoding table 32 and theencoding processing software 33. It is. Each explanation is shown below.

符号化対象文書３１は、ＸＭＬ、ＳＧＭＬの形式で記載された構造化文書であり、符号化を行う文書のことを指す。具体的な例として、拡張子が.html、.svgである文書データが該当する。 Theencoding target document 31 is a structured document described in XML or SGML format, and indicates a document to be encoded. As a specific example, document data having extensions of .html and .svg is applicable.

符号化テーブル３２は、拡張可能なテキスト形式の構造型記述言語で記載され、符号化を行う文書構造（スキーマ）毎に定義された符号化ルールである。これは、符号化対象文書毎に用意するものではなく、ＸＭＬ、ＳＧＭＬの形式毎に用意されている。この中で、出現可能性のある要素名、要素の親子関係、付随する属性名群、それらに対応する符号、各要素値、属性値のデータ型を定義している。 The encoding table 32 is an encoding rule described for each document structure (schema) to be encoded, which is written in an extensible text format structured description language. This is not prepared for each encoding target document, but is prepared for each format of XML and SGML. Among them, element names that may appear, parent-child relationships of elements, attribute name groups that accompany them, codes corresponding to them, element values, and data types of attribute values are defined.

符号化処理ソフト３３は、符号化対象文書３１と符号化テーブル３２を読み込み、符号化処理を行う機能を持つ。本ソフトの機能構成は、符号化対象文書読込機能３３１、符号化対象文書パース機能３３２、符号化テーブル読込機能３３３、最適データ型選択機能３３４、符号化機能３３５、符号化結果出力機能３３６の６つに分かれる。以下に各々の説明を示す。 Theencoding process software 33 has a function of reading theencoding target document 31 and the encoding table 32 and performing an encoding process. The functional configuration of this software is 6 of an encoding targetdocument reading function 331, an encoding targetdocument parsing function 332, an encodingtable reading function 333, an optimum datatype selection function 334, anencoding function 335, and an encodingresult output function 336. Divided into two. Each explanation is shown below.

符号化対象文書読込機能３３１は、ＸＭＬ、ＳＧＭＬ形式の符号化対象文書を読み込む機能を持つ。 The encoding targetdocument reading function 331 has a function of reading an encoding target document in XML or SGML format.

符号化対象文書パース機能３３２は、読み込まれた符号化対象文書をパース処理する。つまり、文書構造の解析を行い、テキストの文字列をツリー構造に変換し、符号化対象文書の要素名、属性名、要素値、属性値を取得する。 The encoding targetdocument parsing function 332 parses the read encoding target document. That is, the document structure is analyzed, the text string is converted into a tree structure, and the element name, attribute name, element value, and attribute value of the encoding target document are acquired.

符号化テーブル読込機能３３３は、符号化対象の各要素及び属性について、対応する符号化テーブルを読み込む機能を持つ。 The encodingtable reading function 333 has a function of reading a corresponding encoding table for each element and attribute to be encoded.

最適データ型選択機能３３４は、符号化対象の要素値及び属性値について、対応する複数のデータ型から最適な符号化型を選択する機能を持つ。 The optimum datatype selection function 334 has a function of selecting an optimum encoding type from a plurality of corresponding data types for the element value and attribute value to be encoded.

符号化機能３３５は、符号化対象の要素値及び属性値について、選択したデータ型に基づき符号化処理を行う機能を持つ。さらに、符号化対象の要素名及び属性名についても、符号化処理を行う機能を持つ。 Theencoding function 335 has a function of performing encoding processing on the element value and attribute value to be encoded based on the selected data type. Further, the element name and attribute name to be encoded also have a function of performing encoding processing.

符号化結果出力機能３３６は、符号化結果を出力する機能を持つ。 The encodingresult output function 336 has a function of outputting the encoding result.

符号化結果３４は、符号化結果のデータのことである。 Theencoding result 34 is data of the encoding result.

図４は、本発明によるに符号化処理のフローチャートである。
（Ｓ４１）符号化対象文書読込機能３３１が、ＸＭＬ、ＳＧＭＬ形式の符号化対象文書を読み込む。
（Ｓ４２）符号化対象文書パース機能３３２が、符号化対象文書のパース処理を行う。
（Ｓ４３）符号化テーブル読込機能３３３が、符号化対象の各要素及び属性について、対応する符号化テーブルを読み込む。
（Ｓ４４１）符号化機能３３５が、要素名及び属性名の文書構造に関する情報を符号化する。ここの符号化は従来技術、例えば特許文献２の方法により符号化を行う。
（Ｓ４４２）最適データ型選択機能３３４が、最適な符号化型を選択する。ここで、符号化テーブルに定義されている各要素値、属性値に対応する複数のデータ型から最適なデータ型を選択する際には、符号化対象の要素値、属性値の値を参照し、最も符号化圧縮率が高くなるデータ型を選択する。例えば、すべてのデータ型で符号化を行い、最も圧縮率が高いデータ型を選択することも考えられる。さらに、符号化テーブルに定義されている暗黙、もしくは明記された優先順位に基づきデータ型を選択を行うことも考えられる。その後、符号化機能３３５が、選択したデータ型に基づき要素値、属性値を符号化する。
（Ｓ４５）符号化結果出力機能３３６が、符号化対象文章内の全ての要素、属性について符号化を行った結果を出力する。FIG. 4 is a flowchart of the encoding process according to the present invention.
(S41) The encoding targetdocument reading function 331 reads the encoding target document in XML or SGML format.
(S42) The encoding targetdocument parsing function 332 performs a parsing process of the encoding target document.
(S43) The encodingtable reading function 333 reads the corresponding encoding table for each element and attribute to be encoded.
(S441) Theencoding function 335 encodes information regarding the document structure of the element name and attribute name. The encoding here is performed by a conventional technique, for example, the method ofPatent Document 2.
(S442) The optimum datatype selection function 334 selects the optimum encoding type. Here, when selecting the optimal data type from the multiple data types corresponding to each element value and attribute value defined in the encoding table, refer to the element value and attribute value to be encoded. The data type with the highest encoding compression rate is selected. For example, it is possible to perform encoding with all data types and select a data type with the highest compression rate. Furthermore, it is conceivable to select a data type based on an implicit or explicit priority defined in the encoding table. Thereafter, theencoding function 335 encodes the element value and the attribute value based on the selected data type.
(S45) The encodingresult output function 336 outputs the result of encoding all elements and attributes in the encoding target sentence.

図５は、符号化対象文書のサンプルである。以下では、この文書を例にとって本発明の処理を説明する。図６は、図５の符号化対象文書に対応した符号化テーブルの一例であり、ＸＭＬで記述されている。 FIG. 5 is a sample encoding target document. In the following, the processing of the present invention will be described using this document as an example. FIG. 6 is an example of an encoding table corresponding to the encoding target document in FIG. 5, and is described in XML.

本符号化テーブルの１行目には要素値と要素値、または属性値と属性値を区切るセパレータとして、","が定義されている。これによって要素値、または属性値を分割して符号化することができる。また、本例では、１つの属性（points属性）及び１つ要素（sample要素）で、整数型、固定小数点型、文字列型の３つの異なるデータ型（int,fixedpoint,char）を持つパターンの符号化テーブルを示している。また、この例では、上に記述されているデータ型が優先度高と、暗黙的に解釈される。つまり整数型の優先度が最も高く、次に固定小数点型であり、文字列型の優先度が最も低くなる。なお、これは一例であり、優先度を明記する符号化テーブルも考えられる。 In the first line of the encoding table, “,” is defined as a separator for separating an element value and an element value or an attribute value and an attribute value. Thus, the element value or attribute value can be divided and encoded. Also, in this example, one attribute (points attribute) and one element (sample element) have a pattern with three different data types (int, fixedpoint, char) of integer type, fixed-point type, and string type. The encoding table is shown. In this example, the data type described above is implicitly interpreted as high priority. That is, the integer type has the highest priority, the fixed-point type is next, and the character string type has the lowest priority. Note that this is an example, and an encoding table that specifies priority is also conceivable.

ここで、points_int,points_fixedpoint,points_charはそれぞれ、属性名pointsと整数型、固定小数点型、文字列型のデータ型種別を組み合わせた記法の一例である。 Here, points_int, points_fixedpoint, and points_char are examples of a notation that combines attribute name points and data types of integer type, fixed point type, and character string type, respectively.

図５の符号化対象文書では、符号化対象要素として、４つのsample要素が記述されており、上の３つのsample要素は、points属性が異なる記法で記述されている。符号化処理ソフト３３は、この符号化対象要素のパース処理を行った後、各要素の要素値及び各属性の属性値の符号化に適した符号化テーブルを選択する。本例では、符号化対象の各要素及び属性はそれぞれ、sample及びpointsであるため、対応する符号化テーブルとして、図６に示した符号化テーブルが選択され、読み込まれる。 In the encoding target document in FIG. 5, four sample elements are described as encoding target elements, and the above three sample elements are described in a notation with different points attributes. Theencoding processing software 33 performs a parsing process on the encoding target element, and then selects an encoding table suitable for encoding the element value of each element and the attribute value of each attribute. In this example, since the elements and attributes to be encoded are sample and points, respectively, the encoding table shown in FIG. 6 is selected and read as the corresponding encoding table.

図５の１番目のpoints属性は、図６の符号化テーブルで定義されたセパレータ","で値を分割することができ、各々の値は、全て整数型の数値として表現されていることが分かる。そこでこの属性については、整数型で符号化を行った場合が最も圧縮率が高くなると判断されるため、整数型で符号化を行う。 The first points attribute of FIG. 5 can divide values by the separator “,” defined in the encoding table of FIG. 6, and each value is expressed as an integer type numerical value. I understand. Therefore, for this attribute, since it is determined that the compression rate is the highest when encoding is performed using the integer type, encoding is performed using the integer type.

図５の２番目のpoints属性は、図６の符号化テーブルで定義されたセパレータ","で値を分割することができ、各々の値は、全て固定小数点型の数値として表現されていることが分かる。そこでこの属性については、固定小数点型で符号化を行った場合が最も圧縮率が高くなると判断されるため、固定小数点型で符号化を行う。 The value of the second points attribute in FIG. 5 can be divided by the separator “,” defined in the encoding table of FIG. 6, and each value is expressed as a fixed-point number. I understand. Therefore, for this attribute, since it is determined that the compression rate is the highest when encoding is performed using the fixed-point type, encoding is performed using the fixed-point type.

図５の３番目のpoints属性は、図６の符号化テーブルで定義されたセパレータで値を分割することができず、ひとつの文字列として表現されていると解釈できる。つまり、整数型、固定小数点の何れでもないため、この属性については、文字列型で符号化を行う。 The third points attribute in FIG. 5 cannot be divided by the separator defined in the encoding table in FIG. 6 and can be interpreted as being expressed as one character string. That is, since it is neither an integer type nor a fixed point, this attribute is encoded in a character string type.

図５の４番目のsample要素は、図６の符号化テーブルで定義されたセパレータ","で値を分割することができ、各々の値は、全て整数型の数値として表現されていることが分かる。そこでこの要素については、整数型で符号化を行った場合が最も圧縮率が高くなると判断されるため、整数型で符号化を行う。 The value of the fourth sample element in FIG. 5 can be divided by the separator “,” defined in the encoding table of FIG. 6, and each value is expressed as an integer type numerical value. I understand. Therefore, for this element, since it is determined that the compression rate is the highest when encoding is performed using the integer type, encoding is performed using the integer type.

このように、同じ属性名に対応する属性値であっても文書によって値の特性が異なる場合がある。本発明では複数のデータ型を符号化テーブルに定義することにより、従来不可能であった、その値の特性を識別することにより、動的にデータ型を選択して符号化を行い、符号圧縮率を向上させることが可能になる。 In this way, even attribute values corresponding to the same attribute name may have different value characteristics depending on the document. In the present invention, by defining a plurality of data types in the encoding table, by identifying the characteristic of the value, which was impossible in the past, the data type is dynamically selected and encoded, and the code compression The rate can be improved.

図７は、図５の符号化対象文書を図６の符号化テーブルで符号化した結果である。この結果は本発明に関係する部分についてのみ記載されており、実際の符号化結果には、復号を行うときに必要となるデータも含まれる。この図の中で、図７（ａ）は図５の１番目のpoints属性の符号化を、図７（ｂ）は図５の２番目のpoints属性の符号化を、図７（ｃ）は図５の３番目のpoints属性の符号化を、図７（ｄ）は図５の４番目のsample要素の符号化を、それぞれ示している。 FIG. 7 shows the result of encoding the encoding target document of FIG. 5 with the encoding table of FIG. This result is described only for the part related to the present invention, and the actual encoding result includes data necessary for decoding. 7A, FIG. 7A shows the encoding of the first points attribute of FIG. 5, FIG. 7B shows the encoding of the second points attribute of FIG. 5, and FIG. FIG. 7D shows the encoding of the third points attribute in FIG. 5, and FIG. 7D shows the encoding of the fourth sample element in FIG.

図７の［１］は、それぞれの要素の開始を示している符号である。本例では、図５の各行を個別に符号化しているためすべて同じ値になっている。図７の［２］は、要素の名称を示す符号で、図６の１行目に記載されているように要素sampleは、“００００００００”に符号化される。 [1] in FIG. 7 is a code indicating the start of each element. In this example, since each row of FIG. 5 is encoded individually, all the values are the same. [2] in FIG. 7 is a code indicating the name of the element, and the element sample is encoded to “00000000” as described in the first line of FIG.

図７の［３］は、属性値、要素値存在判定フラグであり、最適データ型選択機能３３４により選択されたpoints属性値またはsample要素値の属性を示している。図７（ａ）［３］の“１０００００”は、整数型のpoints属性値が存在することを示しており、図７（ｂ）［３］の“０１００００”は、固定小数点型のpoints属性値が存在することを示しており、図７（ｃ）［３］の“００１０００”は、文字列型のpoints属性値が存在することを示しており、図７（ｄ）［３］の“０００１００”は、整数型のsample要素値が存在することを示している。このように本例では、上位３ビットで属性値のデータ型を示し、下位３ビットで要素値のデータ型を示しており、さらにその中で、符号化テーブルに定義されたデータ型順に、各データ型を１ビットに割り当てたビット列によってデータ型の存在を示している。 [3] in FIG. 7 is an attribute value / element value presence determination flag, and indicates the attribute of the points attribute value or the sample element value selected by the optimum datatype selection function 334. “100000” in FIG. 7A [3] indicates that there is an integer type points attribute value, and “010000” in FIG. 7B [3] indicates a fixed point type points attribute value. “001000” in FIG. 7 (c) [3] indicates that there is a character string type points attribute value, and “000100” in FIG. 7 (d) [3]. "" Indicates that there is an integer type sample element value. As described above, in this example, the upper 3 bits indicate the data type of the attribute value, and the lower 3 bits indicate the data type of the element value. Further, in the order of the data types defined in the encoding table, The presence of the data type is indicated by a bit string in which the data type is assigned to 1 bit.

この後の図７の［４］において、それぞれの属性値、要素値を対応するデータ型で符号化した結果が格納される。 In [4] of FIG. 7 thereafter, the result of encoding each attribute value and element value with the corresponding data type is stored.

なお、このような本発明によるプログラムは、符号化サーバ上で実行させることができる。本形態ではＸＭＬ形式の文書データを有するサーバは、符号化サーバにＸＭＬ形式の文書データを送信し、符号化サーバが符号化を行い、符号化した結果をクライアントに送信する。この場合、既存のサーバに変更を加えることなく、符号化サーバをプロキシサーバとして利用することができる。 Such a program according to the present invention can be executed on the encoding server. In this embodiment, a server having XML-format document data transmits XML-format document data to the encoding server, the encoding server performs encoding, and transmits the encoded result to the client. In this case, the encoding server can be used as a proxy server without changing the existing server.

また、本プログラムを既存のサーバで実行させることもできる。本形態ではＸＭＬ形式の文書データは、既存のサーバ上で符号化され、クライアントに送信される。この場合、クライアントに変更を加えることがなく利用することができる。 In addition, this program can be executed on an existing server. In this embodiment, document data in XML format is encoded on an existing server and transmitted to the client. In this case, it can be used without changing the client.

また、本発明によるプログラムは、サーバ上で動作させることに限定されない。クライアントで動作させることにより、クライアント内に保存されたＸＭＬ、ＳＧＭＬ形式の文書データを圧縮することができる。 Further, the program according to the present invention is not limited to operating on a server. By operating the client, XML and SGML document data stored in the client can be compressed.

また、以上述べた実施形態は全て本発明を例示的に示すものであって限定的に示すものではなく、本発明は他の種々の変形態様及び変更態様で実施することができる。従って本発明の範囲は特許請求の範囲及びその均等範囲によってのみ規定されるものである。 Moreover, all the embodiment described above shows the present invention exemplarily, and does not limit the present invention, and the present invention can be implemented in other various modifications and changes. Therefore, the scope of the present invention is defined only by the claims and their equivalents.

符号化サーバを含むシステム構成図である。It is a system configuration diagram including an encoding server.本発明における文書データの符号処理方法である。3 is a document data code processing method according to the present invention.本発明の機能構成図である。It is a functional block diagram of this invention.本発明によるに符号化処理のフローチャートである。5 is a flowchart of an encoding process according to the present invention.符号化対象文書のサンプルである。This is a sample document to be encoded.図５の符号化対象文書に対応した符号化テーブルの一例である。It is an example of the encoding table corresponding to the encoding object document of FIG.図５の符号化対象文書を図６の符号化テーブルで符号化した結果である。It is the result of encoding the encoding target document of FIG. 5 with the encoding table of FIG.

符号の説明Explanation of symbols

１０符号化
１１、２１、１１０、１１１、２１０、２１１変換テーブル
１２、２２、１２０、１２１テキスト形式の文書データ
２３パーサ
２４ブラウザの表示画面
３０文書処理
４既存のサーバ
５クライアント
６符号データ
７変換テーブルサーバ
８インターネット
３１符号化対象文書
３２符号化テーブル
３３符号化処理ソフト
３４符号化結果
３３１符号化対象文書読込機能
３３２符号化対象文書パース機能
３３３符号化テーブル読込機能
３３４最適データ型選択機能
３３５符号化機能
３３６符号化結果出力機能10Encoding 11, 21, 110, 111, 210, 211 Conversion table 12, 22, 120, 121 Document data in text format 23Parser 24 Display screen ofbrowser 30 Document processing
DESCRIPTION OFSYMBOLS 4Existing server 5Client 6 Code data 7Conversion table server 8Internet 31Encoding object document 32 Encoding table 33Encoding processing software 34Encoding result 331 Encoding objectdocument reading function 332 Encoding objectdocument parsing function 333 EncodingTable reading function 334 Optimal datatype selection function 335Coding function 336 Encoding result output function

Claims

Translated fromJapanese

拡張可能なテキスト形式の構造型記述言語で記載された文書データを符号化する文書データ符号化方法において、
前記文書データをパース処理し、文書構造の解析を行い、該文書データの要素値及び属性値を取得するステップと、
同じ要素値及び属性値に対して異なる複数のデータ型を定義した符号化テーブルを用いて、前記要素値及び属性値に対して最適なデータ型を選択するステップと、
前記符号化テーブルを用いて、前記要素値及び属性値を選択されたデータ型により符号化を行うステップと、
を含んでいることを特徴とする文書データ符号化方法。In a document data encoding method for encoding document data described in an extensible textual structured description language,
Parsing the document data, analyzing the document structure, and obtaining element values and attribute values of the document data;
Selecting an optimal data type for the element value and attribute value using an encoding table defining a plurality of different data types for the same element value and attribute value;
Encoding the element values and attribute values with the selected data type using the encoding table;
A document data encoding method comprising:

前記最適なデータ型を選択するステップは、前記符号化テーブルに明示的または暗黙的に定義された優先度に従い最適なデータ型を選択するステップであることを特徴とする請求項１に記載の文書データ符号化方法。 The document according to claim 1, wherein the step of selecting the optimal data type is a step of selecting an optimal data type according to a priority explicitly or implicitly defined in the encoding table. Data encoding method.

前記最適なデータ型を選択するステップは、異なる複数のデータ型で前記要素値及び属性値を符号化して最も圧縮率の高くなるデータ型を選択するステップであることを特徴とする請求項１に記載の文書データ符号化方法。 The step of selecting the optimum data type is a step of encoding the element value and the attribute value with a plurality of different data types to select a data type having the highest compression rate. The document data encoding method described.

前記符号化を行うステップは、前記要素値及び属性値の符号に、前記選択されたデータ型を明記する符号を付加することを特徴とする請求項１から３のいずれか１項に記載の文書データ符号化方法。 4. The document according to claim 1, wherein the encoding includes adding a code specifying the selected data type to the code of the element value and the attribute value. 5. Data encoding method.

拡張可能なテキスト形式の構造型記述言語で記載された文書データを符号化する文書データ符号化処理システムにおいて、
前記文書データをパース処理し、文書構造の解析を行い、該文書データの要素値及び属性値を取得する符号化対象文書パース機能と、
同じ要素値及び属性値に対して異なる複数のデータ型を定義した符号化テーブルを用いて、前記要素値及び属性値に対して最適なデータ型を選択する最適データ型選択機能と、
前記符号化テーブルを用いて、前記要素値及び属性値を選択されたデータ型により符号化を行う符号化機能と、
を備えていることを特徴とする文書データ符号化処理システム。In a document data encoding processing system that encodes document data described in an extensible textual structured description language,
An encoding target document parsing function that parses the document data, analyzes the document structure, and acquires element values and attribute values of the document data;
An optimal data type selection function for selecting an optimal data type for the element value and attribute value using an encoding table that defines a plurality of different data types for the same element value and attribute value;
An encoding function for encoding the element value and the attribute value with a selected data type using the encoding table;
A document data encoding processing system comprising:

拡張可能なテキスト形式の構造型記述言語で記載された文書データを符号化する文書データ符号化プログラムにおいて、
前記符号化対象文書パース機能が、前記文書データをパース処理し、文書構造の解析を行い、該文書データの要素値及び属性値を取得するステップと、
同じ要素値及び属性値に対して異なる複数のデータ型を定義した符号化テーブルを用いて、前記最適データ型選択機能が、前記要素値及び属性値に対して最適なデータ型を選択するステップと、
前記符号化機能が、前記符号化テーブルを用いて、前記要素値及び属性値を選択されたデータ型により符号化を行うステップと、
してコンピュータを機能させることを特徴とする文書データ符号化プログラム。In a document data encoding program for encoding document data described in an extensible textual structured description language,
The encoding target document parsing function parses the document data, analyzes the document structure, and acquires element values and attribute values of the document data;
Using the encoding table defining a plurality of different data types for the same element value and attribute value, wherein the optimum data type selection function selects an optimum data type for the element value and attribute value; ,
The encoding function encoding the element value and the attribute value with the selected data type using the encoding table;
A document data encoding program which causes a computer to function.