JP2003152548A

Movatterモバイル変換

Info

Publication number: JP2003152548A
Application number: JP2001348826A
Authority: JP
Inventors: Yasuyuki Tsukui; 保幸津久井
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2001-11-14
Filing date: 2001-11-14
Publication date: 2003-05-23

Abstract

(57)【要約】【課題】データ圧縮に使用される文字列検索方式にお
いて、従来方法のようにハッシュテーブルを使用せず
に、ツリー構造を応用したもの。これにより、ある種の
偏りのある入力データにおいてもよりよい検索結果を得
られることを目的とする。【解決手段】入力バイトの固定長ウィンドウ内におい
て発現する同一文字列の検索において、その各文字の下
位ビットをキーとして構成される多段のツリー構造に文
字列の発現位置を記憶するステップと、すでに同一のノ
ードに以前の文字列発現位置が格納されていたかどうか
判断するステップと、以前の発現位置から始まる文字列
と、現在注目している文字列を比較し、特定の文字数以
上の一致があるか確認するステップをもつ文字列検索方
法。(57) [Summary] [PROBLEMS] To apply a tree structure to a character string search method used for data compression without using a hash table as in the conventional method. Accordingly, it is an object to obtain a better search result even for a certain kind of biased input data. SOLUTION: In a search for the same character string that appears in a fixed-length window of input bytes, a step of storing a character string expression position in a multi-stage tree structure configured by using lower bits of each character as a key; A step of determining whether or not the previous character string expression position is stored in the same node, and comparing the character string starting from the previous expression position with the character string currently focused on, and there is a match with a specific number of characters or more String search method with the step of checking whether

Description

Translated fromJapanese

【発明の詳細な説明】Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、バイト列からなる
一般データの圧縮に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to compression of general data composed of byte strings.

【０００２】[0002]

【従来の技術】可逆的（ロスレス）圧縮方式として広く
使用されているものとして、ＬＺ７７アルゴリズム（Ｌ
ｅｍｐｅｌ−Ｚｉｖ１９７７，”ＡＵｎｉｖｅｒｓ
ａｌＡｌｇｏｒｉｔｈｍｆｏｒＳｅｑｕｅｎｔｉａ
ｌＤａｔａＣｏｍｐｒｅｓｓｉｏｎ”，ＩＥＥＥ
ＴｒａｎｓａｃｔｉｏｎｓｏｎＩｎｆｏｒｍａｔｉ
ｏｎＴｈｅｏｒｙ，Ｖｏｌ．２３，Ｎｏ．３，ｐｐ．
３３７−３４３．）および、ハフマン・エンコーディン
グ（Ｈｕｆｆｍａｎ，Ｄ．Ａ．，”ＡＭｅｔｈｏｄ
ｆｏｒｔｈｅＣｏｎｓｔｒｕｃｔｉｏｎｏｆＭ
ｉｎｉｍｕｍＲｅｄｕｎｄａｎｃｙＣｏｄｅｓ”，
ＰｒｏｃｅｅｄｉｎｇｓｏｆｔｈｅＩｎｓｔｉｔ
ｕｔｅｏｆＲａｄｉｏＥｎｇｉｎｅｅｒｓ，Ｓ
ｅｐｔｅｍｂｅｒ１９５２，Ｖｏｌｕｍｅ４０，Ｎｕ
ｍｂｅｒ９，ｐｐ．１０９８−１１０１）を応用したも
のがあげられる。2. Description of the Related Art As a widely used lossless compression method, the LZ77 algorithm (L
empel-Ziv 1977, "A Univers
alAlgorithm for Sequentia
l Data Compression ”, IEEE
Transactions on Information
on Theory, Vol. 23, No. 3, pp.
337-343. ) And Huffman encoding (Huffman, DA, "A Method"
for the Construction of M
minimumRedundancy Codes ”,
Proceedings of the Insit
ute of Radio Engineers, S
eptember 1952, Volume40, Nu
mber9, pp. 1098-1101) is applied.

【０００３】代表的なものは、オープンソースとして広
く利用されているＤＥＦＬＡＴＥ圧縮フォーマット（Ｄ
ｅｕｔｓｃｈ，Ｌ．Ｐ．，ＲＦＣ−１９５１，”ＤＥＦ
ＬＡＴＥＣｏｍｐｒｅｓｓｅｄＤａｔａＦｏｒｍ
ａｔＳｐｅｃｉｆｉｃａｔｉｏｎ”，Ｍａｙ１９９
６，ｆｔｐ：／／ｆｔｐ．ｕｕ．ｎｅｔ／ｐｕｂ／ａｒ
ｃｈｉｖｉｎｇ／ｚｉｐ／ｄｏｃ／）、およびＺＬｉｂ
処理系（Ｇａｉｌｌｙ，Ｊ．−Ｌ．，ａｎｄＡｄｌｅ
ｒ，Ｍ．，ＺＬＩＢｄｏｃｕｍｅｎｔａｔｉｏｎａ
ｎｄｓｏｕｒｃｅｓ，ｆｔｐ：／／ｆｔｐ．ｕｕ．ｎ
ｅｔ／ｐｕｂ／ａｒｃｈｉｖｉｎｇ／ｚｉｐ／ｄｏｃ
／）である。A typical example is the DEFLATE compression format (D) widely used as open source.
eutsch, L .; P. , RFC-1951, "DEF
LATE Compressed Data Form
at Specification ", May 199
6, ftp: // ftp. uu. net / pub / ar
chiving / zip / doc /), and ZLib
Processing system (Gailly, J.-L., and Adle)
r, M. , ZLIB documentation a
nd sources, ftp: // ftp. uu. n
et / pub / archiving / zip / doc
/).

【０００４】これらのロスレス圧縮方式のほとんどは、
入力バイト列を固定長のウィンドウに分割し、このウィ
ンドウ内での現在位置以前の部分に、おなじ文字列で始
まる文字列がないかを検索し、もしこれが存在した場合
には実際の生文字列を圧縮結果に出力する代わりに、そ
の以前の文字列へのオフセットおよび一致する文字列長
を元にしたコードを出力することでより高い圧縮率を達
成する。Most of these lossless compression systems are
Divides the input byte string into a fixed-length window, searches the part before the current position in this window for a character string beginning with the same character string, and if it exists, the actual raw character string. A higher compression rate is achieved by outputting a code based on the offset to the previous character string and the matching character string length instead of outputting to the compressed result.

【０００５】またこれらの圧縮方式では、ウィンドウ内
での以前の文字列の検索には、発現するすべての部分文
字列（通常３文字程度）から生成したハッシュ値による
ハッシュテーブルを管理し、これによって検索の高速化
をはかっている。Further, in these compression methods, in order to search for a previous character string within a window, a hash table of hash values generated from all the substrings (usually about 3 characters) that appear is managed, and this is used. We are trying to speed up the search.

【０００６】[0006]

【発明が解決しようとする課題】部分文字列（通常３文
字程度）から生成したハッシュ値による検索の高速化
は、特定の種類のバイト入力では、効率的に動作しない
場合がある。典型的なものとしては、プリンタコマンド
やページ記述言語におけるような、類似した部分文字列
で始まる文字列を多く含む場合や、十進数での近い値と
なる座標表示を文字列として多数含む場合があげられる
（“１２５００．００”と“１２５０２．１０”な
ど）。The speedup of the search by the hash value generated from the partial character string (usually about 3 characters) may not work efficiently with a particular type of byte input. Typically, there are many character strings that start with similar substrings, such as in printer commands and page description languages, and many character strings that contain coordinate values that are close to each other in decimal. (“12500.00” and “12502.10”).

【０００７】また、上記圧縮処理系の多くでは、ハッシ
ュ値が同じになる部分文字列を救うために、同一のハッ
シュ値でも複数（例えば最大１２８エントリ）の文字列
発現位置を記憶可能としているが、上のようなあまりに
も多くの同一部分文字列の発現がある場合にはこれらの
複数の記憶領域を簡単に使い切ってしまい結果として圧
縮効率が落ちたり、また、これらの複数エントリを順次
かつ多数検索する必要が出てくるため、検索速度が遅く
なる傾向があった。In many compression processing systems described above, in order to save a partial character string having the same hash value, a plurality of character string expression positions (for example, 128 entries at maximum) can be stored even with the same hash value. , If there are too many identical substring occurrences as above, these storage areas can be used up easily, resulting in reduced compression efficiency. The search speed tended to slow down because it became necessary to search.

【０００８】[0008]

【課題を解決するための手段】本発明では、文字列検索
時におけるハッシュテーブルの使用をおこなわず、この
代わりに、各文字コードの下位の数ビット（３ないし５
ビット）を使用して、文字ごとに段の深くなる多段のツ
リー型記憶構造を構成し、これによって文字列検索を高
速化するものである。In the present invention, the hash table is not used when searching for a character string. Instead of this, a few lower bits (3 to 5) of each character code are used.
(Bit) is used to construct a multi-stage tree-type storage structure in which the depth increases for each character, thereby speeding up the character string search.

【０００９】請求項１記載の発明は、入力バイトの固定
長ウィンドウ内において発現する同一文字列の検索にお
いて、すでに現時点での検索位置以前に存在した数文字
からなる部分文字列を、その各文字の下位ビットをキー
として構成される多段のツリー構造によってその部分文
字列の固定長発現ウィンドウ内での発現位置を記憶する
ステップと、現時点での注目する文字列の頭の部分文字
列の各文字の下位ビットをキーとして前記の文字列発現
位置記憶用の多段のツリー構造をたどることにより、こ
の部分文字列と同一のツリー構造の特定のノードにすで
に同一の部分文字列で始まる可能性のある文字列の固定
長ウィンドウ内での発現位置が格納されていたかどうか
判断するステップと、この対応ノードに以前の発現位置
が格納されていた場合に、その以前の発現位置から始ま
る文字列と、現在注目している文字列を比較し、特定の
文字数以上の一致をみた場合には、これをデータ圧縮に
利用することを特徴とする。According to the first aspect of the present invention, in the search for the same character string appearing in the fixed length window of the input byte, a partial character string consisting of several characters already existing before the current search position is converted into a character string. The step of memorizing the expression position in the fixed length expression window of the substring by the multi-stage tree structure configured by using the lower bit of the key and each character of the substring at the head of the current string of interest By tracing the above-mentioned multi-stage tree structure for storing the character string expression position using the lower bit of the key as a key, it is possible that a specific node of the same tree structure as this partial string already starts with the same partial string. The step of determining whether the expression position in the fixed length window of the character string was stored, and the previous expression position was stored in this corresponding node. The case, compared with a character string starting from the previous expression position, the character string currently focused, if a match is found above a certain number of characters, characterized by using this data compression.

【００１０】請求項２記載の発明は、上記、請求項１の
文字列検索方法において、文字列発現位置記憶用の多段
のツリー構造の位置記憶を拡張し、単一の位置だけでな
く、部分文字列の各文字の下位ビットをキーとしてツリ
ー構造をたどった結果、同じノードに割り振られる複数
の部分文字列に対応する固定長ウィンドウ内での複数の
発現位置を記憶するものとし、現在注目している文字列
を比較する時点では、それら複数の発現位置のそれぞれ
から始まる文字列と比較し、とくに一致する文字列長の
長いものの文字列発現位置を利用してデータ圧縮に利用
することを特徴とする。According to a second aspect of the present invention, in the above-mentioned character string search method according to the first aspect, the position storage of a multi-stage tree structure for storing a character string expression position is expanded to include not only a single position but also a partial position. As a result of traversing the tree structure using the lower bit of each character of the string as a key, it is assumed that the multiple expression positions in the fixed-length window corresponding to the multiple partial strings allocated to the same node are stored. When comparing the existing character strings, it is compared with the character string starting from each of these multiple expression positions, and the character string expression position of the matching long character string is used for data compression. And

【００１１】請求項３記載の発明は、上記、請求項１ま
たは請求項２の文字列検索方法において、文字列発現位
置記憶用の多段のツリー構造の最大ノード数に限界を設
け、部分文字列の記憶数が増加したためにツリー構造の
ノード数がこの限界に達した場合には、規定の部分文字
列長に達しない場合には、その部分文字列の以降の部分
をツリー構造のキーとして使用せず、そこまでの部分文
字列によってツリー構造をたどった結果のノードに文字
列の発現位置を格納することを特徴ととする。According to a third aspect of the present invention, in the character string search method according to the first or second aspect, the maximum number of nodes of a multi-stage tree structure for storing a character string expression position is limited, and a partial character string is set. If the number of nodes in the tree structure reaches this limit due to an increase in the number of storages in the tree, and if the specified substring length is not reached, the subsequent part of the substring is used as a key in the tree structure. Instead, the expression position of the character string is stored in the node resulting from tracing the tree structure with the partial character strings up to that point.

【００１２】[0012]

【発明の実施の形態】以下に、本発明の実施の形態の例
について図１を参照しながら説明する。DETAILED DESCRIPTION OF THE INVENTION An example of an embodiment of the present invention will be described below with reference to FIG.

【００１３】入力バイトの固定長ウィンドウ内（１０
２）において発現する同一文字列の検索において、ウィ
ンドウにある文字列を先頭（図１では左）から圧縮して
ゆく場合に、あらかじめ決められた長さＳの部分文字列
のすべて（正確には、ウィンドウの長さがＷの場合は、
Ｗ−Ｓ＋１回）がツリー構造に登録処理される。図１で
は、現在の処理位置は矢印１０１であらわされている。
ここでは、部分文字列の長さは、簡単のためＳ＝３と
し、ツリー構造の各ノードのスロット数Ｎは、１６とす
る。これは、部分文字列の各文字の下位４ビットでツリ
ー構造のノードのスロットをインデックスすることをあ
らわす。Within a fixed length window of input bytes (10
In the search for the same character string that occurs in 2), when compressing the character string in the window from the beginning (left in FIG. 1), all the partial character strings of a predetermined length S (more accurately, , If the window length is W,
W-S + 1 times) is registered in the tree structure. In FIG. 1, the current processing position is represented by an arrow 101.
Here, the length of the partial character string is S = 3 for simplicity, and the number of slots N of each node in the tree structure is 16. This represents that the lower 4 bits of each character of the substring index the slot of the node in the tree structure.

【００１４】ツリー構造のトップノード（１１１）は、
最初の登録処理の始まる前に生成されている。以降のノ
ードは、新しい部分文字列が登録されるにしたがって動
的に追加される。追加されるノードには２種類あり、そ
の下に他のノードをリンクする中間ノード（１１２、１
１３）と、文字位置へのポインタを格納する終端ノード
がある（１１４−１１６）。The top node (111) of the tree structure is
It is generated before the start of the first registration process. Subsequent nodes are dynamically added as new substrings are registered. There are two types of nodes to be added, and there are intermediate nodes (112, 1) that link other nodes below them.
13) and there is a terminal node that stores a pointer to the character position (114-116).

【００１５】新しく追加されたノードが中間ノードと終
端ノードのどちらかになるかは、ノードの深さが部分文
字列の長さＳと等しくなるか、ノードの総数がツリー構
造の最大ノード数に限界に達したかのいずれかの場合で
ある。登録処理は、この条件に達するまで中間ノードが
存在しなければ追加し、この条件に達するか、終端ノー
ドが存在し、その対応するスロットが空であれば、その
終端ノードのスロットに現在の文字位置のポインタを記
録する。これらの場合は、以前に同じノードに対応する
文字列がなかったということであるから、検索は失敗す
る。Whether the newly added node is an intermediate node or an end node depends on whether the depth of the node is equal to the length S of the substring, or the total number of nodes is the maximum number of nodes in the tree structure. Either the limit has been reached. The registration process will add if there is no intermediate node until this condition is reached, and if this condition is reached or if there is an end node and its corresponding slot is empty, the current character is in the slot of that end node. Record the position pointer. In these cases, the search fails because there was no corresponding string previously in the same node.

【００１６】これに対して、図１で表されている状態
は、すでに同じノードに対応する文字列が登録されてい
た場合である。現在の注目文字列（１０１）は、“ＡＢ
Ｃ”であるので、“Ａ”の下位４ビットからインデック
ス“０２”が得られたとすれば、ノード１１１のスロッ
トＦ０２から、第二段のノード１１３が選択される。二
文字目は“Ｂ”でありその下位４ビットからインデック
ス“０１”が得られたとすれば、ノード１１３のスロッ
トＦ１１から、第三段のノード１１５が選択される。同
様に三文字目は“Ｃ”から終端スロットＣ２１が選択さ
れる。この場合は、以前の文字位置を格納している文字
位置記憶の連結リストが存在していたので、後の検索の
ため、文字位置記憶の連結リストの最後に現在の位置を
追加する。図１は、このリストへの追加後の状態を（１
２１）表している。On the other hand, the state shown in FIG. 1 is the case where the character string corresponding to the same node has already been registered. The current attention character string (101) is "AB
Since it is C ”, if the index“ 02 ”is obtained from the lower 4 bits of“ A ”, the node 113 in the second stage is selected from the slot F02 of the node 111. The second character is“ B ”. If the index “01” is obtained from the lower 4 bits, the node 115 in the third stage is selected from the slot F11 of the node 113. Similarly, the third character is “C” to the end slot C21. In this case, because there is a linked list of character position storage that stores the previous character position, the current position is added to the end of the linked list of character position storage for later retrieval. Figure 1 shows the state after adding to this list (1
21) It is shown.

【００１７】次に前の連結リストをたどって、実際の文
字列検索をおこなう。ここでの候補は以前の文字列、
“ＡＢＣＤ”と“ＡＢＣＥ”である。検索はリストの後
ろから（最近登録した近いほうの文字列から）おこな
う。“ＡＢＣＥ”よりはリストの頭にある“ＡＢＣＤ”
のほうが現在の文字列により長くマッチするのでこちら
が検索結果として採用される。Next, an actual character string search is performed by tracing the preceding linked list. The candidates here are the previous strings,
“ABCD” and “ABCE”. The search is performed from the back of the list (from the recently registered nearer string). "ABCD" at the head of the list rather than "ABCE"
Will match the current string longer and will be used as the search result.

【００１８】[0018]

【発明の効果】請求項１記載の方法によれば、部分文字
列（通常３文字程度）から生成したハッシュ値による検
索の高速化と異なり、類似した部分文字列で始まる文字
列を多く含む場合など特定の偏りのあるバイト入力であ
っても、ツリー構造によって細分化された部分文字列の
分類が可能となり、ハッシュ方式ではひとつのエントリ
に集中する可能性の高かった部分文字列もツリー構造の
異なるノードに記憶される場合が多くなる。According to the method of the present invention, unlike the case where the hash value generated from a partial character string (usually about 3 characters) is used to speed up the search, a large number of character strings starting with similar partial character strings are included. Even with byte inputs with a certain bias, it is possible to classify substrings that have been subdivided by the tree structure, and the substrings that were likely to be concentrated in one entry in the hash method also have a tree structure. It is often stored in different nodes.

【００１９】請求項２記載の方法によれば、部分文字列
が同一か、または、各文字の下位のビットのみをツリー
構造の作成や移動に利用しているため偶然に最終的な位
置情報格納のためのノードが同一になる場合でも、それ
ら複数の文字列発現位置を記憶可能としているため、そ
れぞれの発現位置にある文字列を順次検索することで、
もっとも一致文字列長の長い文字列が検索可能となる。According to the method of claim 2, since the partial character strings are the same or only the lower bits of each character are used for creating or moving the tree structure, the final position information is stored by chance. Even if the nodes for are the same, it is possible to store multiple character string expression positions, so by sequentially searching the character strings at each expression position,
The character string with the longest matching character string length can be searched.

【００２０】また、この場合には、請求項１記載の方法
による効果との相乗効果で、ひとつのノードに記憶され
る文字列発現位置の数は、ハッシュテーブルによる方式
より低減し、集中度が低くなる傾向があるため、検索の
高速化が期待できる。また、同一ノードに記憶可能な発
現位置の数に限界を設ける場合は、集中度が低くなる結
果として、集中により捨てられる発現位置が減少するた
め、文字列検索可能性の向上に寄与し、圧縮方式と組み
合わせた場合には圧縮率の向上が期待できる。Further, in this case, the number of character string expression positions stored in one node is smaller than that of the method using the hash table, and the degree of concentration is high, due to the synergistic effect of the method according to the first aspect. Since it tends to be low, high-speed search can be expected. Also, if the number of expression positions that can be stored in the same node is limited, the degree of concentration will decrease, and as a result, the number of expression positions that are discarded due to concentration will decrease, which contributes to the improvement of the character string searchability and compression. When combined with the method, an improvement in compression rate can be expected.

【００２１】請求項３記載の方法によれば、ノード数が
限界設定を超えた時点からは、部分文字列の長さをＮ文
字からＮ−１文字などに動的に変更することで、ノード
数のそれ以上の増加を抑止することで、入力文字列のパ
ターンが多数に渡る場合に、ツリー方式のデータ構造
が、際限なく記憶容量を消費することを抑えることがで
きる。According to the method of claim 3, when the number of nodes exceeds the limit setting, the length of the partial character string is dynamically changed from N characters to N-1 characters and the like. By preventing the number from further increasing, it is possible to prevent the tree-type data structure from endlessly consuming the storage capacity when there are many patterns of the input character string.

【図面の簡単な説明】[Brief description of drawings]

【図１】入力文字列ウィンドウ、文字位置記憶ツリー構
造、複数文字位置記憶の関係図。FIG. 1 is a relationship diagram of an input character string window, a character position storage tree structure, and a plurality of character position storages.

【符号の説明】[Explanation of symbols]

１０１入力文字列ウィンドウ中の現在注目文字位置１０２入力文字列ウィンドウ１１１−１１３文字位置記憶ツリー構造（中間ノー
ド）１１４−１１６文字位置記憶ツリー構造（終端ノー
ド）１２０複数文字位置記憶１２１文字位置記憶の連結リスト101 current character position in the input character string window 102 input character string window 111-113 character position storage tree structure (intermediate node) 114-116 character position storage tree structure (terminal node) 120 multiple character position storage 121 character position storage Linked list

Claims

Translated fromJapanese

【特許請求の範囲】[Claims]

【請求項１】入力バイトの固定長ウィンドウ内におい
て発現する同一文字列の検索において、すでに現時点での検索位置以前に存在した数文字からな
る部分文字列を、その各文字の下位ビットをキーとして
構成される多段のツリー構造によってその部分文字列の
固定長発現ウィンドウ内での発現位置を記憶するステッ
プと、現時点での注目する文字列の頭の部分文字列の各文字の
下位ビットをキーとして前記の文字列発現位置記憶用の
多段のツリー構造をたどることにより、この部分文字列
と同一のツリー構造の特定のノードにすでに同一の部分
文字列で始まる可能性のある文字列の固定長ウィンドウ
内での発現位置が格納されていたかどうか判断するステ
ップと、この対応ノードに以前の発現位置が格納されていた場合
に、その以前の発現位置から始まる文字列と、現在注目
している文字列を比較し、特定の文字数以上の一致をみ
た場合には、これをデータ圧縮に利用するための文字列
検索方法。1. In a search for the same character string that appears in a fixed length window of input bytes, a partial character string consisting of several characters already existing before the current search position is used, with the lower bit of each character as a key. The step of storing the expression position of the substring in the fixed length expression window by the multi-stage tree structure composed, and the lower bit of each character of the substring at the head of the current string of interest as a key By tracing the above-mentioned multi-stage tree structure for storing the character string expression position, a fixed length window of a character string that may already start with the same substring at a specific node of the same tree structure as this substring The step of determining whether the expression position in the node was stored, and if the previous expression position was stored in this corresponding node, the previous expression position was stored. A character string search method for comparing the character string starting from the current position with the character string of interest at present and using it for data compression when a match is found over a specified number of characters.

【請求項２】上記、文字列発現位置記憶用の多段のツ
リー構造の位置記憶を拡張し、単一の位置だけでなく、
部分文字列の各文字の下位ビットをキーとしてツリー構
造をたどった結果、同じノードに割り振られる複数の部
分文字列に対応する固定長ウィンドウ内での複数の発現
位置を記憶するものとし、現在注目している文字列を比
較する時点では、それら複数の発現位置のそれぞれから
始まる文字列と比較し、とくに一致する文字列長の長い
ものの文字列発現位置を利用してデータ圧縮をおこなう
ための上記、請求項１の文字列検索方法。2. The above-mentioned position memory of a multi-stage tree structure for character string expression position memory is expanded to include not only a single position but also a single position.
As a result of tracing the tree structure using the lower bit of each character of the substring as a key, it is assumed that the multiple expression positions in the fixed-length window corresponding to the multiple substrings allocated to the same node are stored. At the time of comparing the character strings that are being compared, it is compared with the character string starting from each of the plurality of expression positions, and the above-mentioned for performing data compression by using the character string expression position of the matching long character string length in particular. The character string search method according to claim 1.

【請求項３】上記、文字列発現位置記憶用の多段のツ
リー構造の最大ノード数に限界を設け、部分文字列の記
憶数が増加したためにツリー構造のノード数がこの限界
に達した場合には、規定の部分文字列長に達しない場合
には、その部分文字列の以降の部分をツリー構造のキー
として使用せず、そこまでの部分文字列によってツリー
構造をたどった結果のノードに文字列の発現位置を格納
することを特徴ととする、上記、請求項１または請求項
２の文字列検索方法。3. When the maximum number of nodes in the multi-stage tree structure for storing character string expression positions is set to a limit, and the number of nodes in the tree structure reaches this limit due to an increase in the number of stored partial character strings, If the specified substring length is not reached, the subsequent part of the substring is not used as a key of the tree structure, and the character string is added to the node resulting from tracing the tree structure by the substrings up to that point. The character string search method according to claim 1 or 2, wherein the expression position of the string is stored.