JP5735539B2

Movatterモバイル変換

Info

Publication number: JP5735539B2
Application number: JP2012546558A
Authority: JP
Inventors: マツケル、ベン; タル、マーヤン; ラハブ、アビアド
Original assignee: Vaultive Ltd
Current assignee: Vaultive Ltd
Priority date: 2009-12-31
Filing date: 2010-12-30
Publication date: 2015-06-17
Anticipated expiration: 2030-12-30
Also published as: CA2786058A1; CN102782692A; WO2011080745A3; IL220662A; CA2786058C; WO2011080745A2; JP2013516642A; EP2520063A2

Description

Translated fromJapanese

企業や組織は、インターネットおよびワールドワイドウェブにより、デジタル形式のウェブアプリケーション等のドキュメントで企業や個人にサービスを提供することができ、企業や個人は、パソコンおよびウェブブラウザを用いてこれらサービスへのアクセスおよび利用が可能である。ネットワークを介して入手できるこのようなドキュメント、特にアプリケーションの作成は通常、サービス型ソフトウェア（ＳａａＳ：ＳｏｆｔｗａｒｅａｓａＳｅｒｖｉｃｅ）と称する。ＳａａＳ形式で提供されるアプリケーションの例としては、電子メール、インスタントメッセージ、生産性ツール、顧客関係管理（ＣＲＭ）、企業資源計画（ＥＲＰ）、人的資源アプリケーション、ブログ、ソーシャルネットワーキングサイト等が挙げられる。 Companies and organizations can provide services to companies and individuals using documents such as digital web applications over the Internet and the World Wide Web. Companies and individuals can access these services using a personal computer and a web browser. And available. The creation of such documents, especially applications, available over the network is commonly referred to as service-based software (SaaS). Examples of applications provided in SaaS format include e-mail, instant messaging, productivity tools, customer relationship management (CRM), enterprise resource planning (ERP), human resource applications, blogs, social networking sites, etc. .

ただし、このモデルには本質的に、セキュリティ上のリスクが存在する。メッセージ、顧客記録、および企業財務等のユーザデータは、リモートサーバに格納されるため、ユーザデータのデータ提供者が管理できなくなる。個人情報や企業情報がリモートサーバに格納されると、データ所有者は多くのリスクに曝される。このことは、情報をホスティングするコンピュータシステムおよび情報所有者とホスティングシステムとを接続するネットワークを所有する事業体を情報所有者が信頼する必要があることを暗に示している。 However, this model has inherent security risks. Since user data such as messages, customer records, and corporate finances are stored in the remote server, the data provider of the user data cannot be managed. Data owners are exposed to many risks when personal and corporate information is stored on remote servers. This implies that the information owner needs to trust the entity that owns the computer system that hosts the information and the network that connects the information owner and the hosting system.

たとえば、周知の会計ソフトウェアソリューションでは、ソリューションプロバイダのサーバに格納する会計情報を顧客がポスティングする必要がある。このようなシステムでは、顧客がソリューションプロバイダに会計情報を委ねるため、そのプライバシーおよびインテグリティに対する一定の管理を放棄することが必要となる。 For example, known accounting software solutions require customers to post accounting information to be stored on the solution provider's server. Such systems require that certain control over their privacy and integrity be abandoned as the customer entrusts accounting information to the solution provider.

特定のソフトウェアアプリケーションにおいては、然るべき復号化法または復号化鍵を持たない誰に対してもデータが解読不能となるように、様々な暗号化法が用いられている。たとえば、情報所有者には、アプリケーションプロバイダによって、セキュアソケットレイヤー（secure service socket:ＳＳＬ）暗号化または別の方法を用いたクライアントとホスト間の送受信データの暗号化が可能となる場合および要求される場合のうちの少なくとも一方がある。これにより、インターネットサービスプロバイダ（internet service provider:ＩＳＰ）およびその他の潜在的な盗聴者による送受信中のデータの閲覧を阻止することができる。したがって、データは、ホスティングされたアプリケーションへの到着時に復号化され、そのホスティングされたアプリケーションのベンダーは、所有者の非暗号化データの閲覧および操作が可能となる。ただし、この方法では、ホスティングされたアプリケーションのベンダーに機密データが曝される。 In certain software applications, various encryption methods are used so that the data cannot be decrypted by anyone who does not have the proper decryption method or decryption key. For example, information owners are required and required by the application provider to allow encrypted data sent and received between the client and host using secure socket layer (SSL) encryption or another method. There is at least one of the cases. This can prevent browsing of data being transmitted and received by Internet service providers (ISPs) and other potential eavesdroppers. Thus, the data is decrypted upon arrival at the hosted application, allowing the hosted application vendor to view and manipulate the owner's unencrypted data. However, this method exposes sensitive data to the hosted application vendor.

特許文献１は、クライアントとサーバ間でネットワークを介して送信されたデータの一部を選択的に暗号化する装置および方法を記載している。この装置は、データの第１の部分をデータの第２の部分から分離する解析手段と、データの第１の部分のみを暗号化する暗号化手段と、暗号化されたデータの第１の部分をデータの第２の部分と結合する結合手段とを備えている。また、この装置は、クライアントに組み込まれ、データの暗号化部分を復号化する復号化手段をさらに備えている。 Patent Document 1 describes an apparatus and method for selectively encrypting part of data transmitted between a client and a server via a network. The apparatus includes an analyzing means for separating a first portion of data from a second portion of data, an encrypting means for encrypting only the first portion of data, and a first portion of encrypted data. Means for combining the second portion of the data with the second portion of the data. In addition, this apparatus further includes decryption means that is incorporated in the client and decrypts the encrypted portion of the data.

特許文献２は、ダウンロードされたソフトウェアオブジェクトによるコンピュータネットワーク暗号化の改良を開示している。この出願は、ウェブサーバコンピュータとリモートクライアントコンピュータとを接続するワールドワイドウェブ等の公衆ネットワーク上の送受信信号に含まれる財務等の極秘データを保護する方法およびシステムを記載している。ウェブサーバとクライアント間のすべての機密通信に用いる所望の（通常は強固な）個別暗号化規格を決定し、ウェブサーバからクライアントに自動でダウンロードすることにより当該規格に応じて暗号化する機能をクライアントに「プッシュ」し、クライアントのウェブブラウザでソフトウェアオブジェクトを実行することにより選択した規格に準じて暗号化／復号化タスクを実行することによって、クライアントが元々はそのような強固な暗号化機能を有していなくても強固な暗号化が容易に保証される。 U.S. Patent No. 6,099,077 discloses an improvement in computer network encryption with downloaded software objects. This application describes a method and system for protecting confidential data such as finance contained in transmission / reception signals on a public network such as the World Wide Web connecting a web server computer and a remote client computer. The client decides the desired (usually strong) individual encryption standard to be used for all confidential communications between the web server and the client, and automatically downloads it from the web server to the client. The client originally has such a strong encryption function by performing the encryption / decryption task according to the selected standard by “pushing” to the client and executing the software object in the client's web browser. Even if it is not, strong encryption is easily guaranteed.

これらの手法をホスティングされたＳａａＳアプリケーションに適用する際の問題点として、このようなアプリケーションにおいては、ネットワークを介して操作可能とされたデータ等の運用情報を非暗号化することによってアプリケーションプロバイダによる情報操作を可能とする必要があるため、アプリケーションプロバイダにデータが曝されることになる。あるいは、セキュリティ上の利害関係者に対して、操作中にデータが脆弱となってしまう。 As a problem when applying these methods to a hosted SaaS application, in such an application, information by an application provider is obtained by unencrypting operation information such as data that can be operated via a network. Data needs to be exposed to the application provider because it needs to be operational. Or, the data becomes vulnerable during operation to security stakeholders.

米国特許第７，１６５，１７５号明細書US Pat. No. 7,165,175国際公開第０１／０４７２０５号International Publication No. 01/047205

本発明の一実施形態に係る、中間モジュールおよびその周囲を含むシステムを示した図である。FIG. 2 shows a system including an intermediate module and its surroundings, according to one embodiment of the present invention.本発明の一実施形態に係る、クライアント端末からネットワークノードへのデータフローを示した図である。It is the figure which showed the data flow from the client terminal to a network node based on one Embodiment of this invention.本発明の一実施形態に係る、ネットワークノードからクライアント端末へのデータフローを示した図である。It is the figure which showed the data flow from the network node to a client terminal based on one Embodiment of this invention.本発明の一実施形態に係る、サーバ側での検索を可能にするデータの暗号化方法および暗号化データのインデキシング方法を示した図である。It is the figure which showed the data encryption method and indexing method of encryption data which enable the search by the server side based on one Embodiment of this invention.正規化プロセスおよびセンテンスを含む入力テキストの一例を示した図である。It is the figure which showed an example of the input text containing a normalization process and a sentence.本発明の一実施形態に係る、単語の処理例を示した図である。It is the figure which showed the example of a processing of the word based on one Embodiment of this invention.本発明の一実施形態に係る、暗号化データのサーバ側でのソートを可能にするデータの暗号化方法を示した図である。It is the figure which showed the encryption method of the data which enables the sort by the server side of encryption data based on one Embodiment of this invention.本発明の一実施形態に係る、順序維持関数の生成方法を示した図である。It is the figure which showed the production | generation method of the order maintenance function based on one Embodiment of this invention.本発明の一実施形態に係る、３つの異なる鍵を用いて生成された３つの順序維持暗号化関数の一例を示した図である。FIG. 4 is a diagram illustrating an example of three order-maintaining encryption functions generated using three different keys according to an embodiment of the present invention.本発明の一実施形態に係る、暗号化ユーザデータの検索を可能にするデータフローを模式的に示した図である。It is the figure which showed typically the data flow which enables the search of encryption user data based on one Embodiment of this invention.

本発明の上記およびその他の目的、特徴、および利点については、添付の図面とともに以下の詳細な説明を考慮することによって、より明らかとなるであろう。図面中、異なる図であっても、同様の要素には同一の符号を付与している。 These and other objects, features and advantages of the present invention will become more apparent upon consideration of the following detailed description in conjunction with the accompanying drawings. In the drawings, similar elements are given the same reference numerals even in different drawings.

以下の詳細な説明においては、本発明が十分に理解されるように、多くの具体的詳細を記載している。ただし、当業者には当然のことながら、本発明は、これらの具体的詳細なしに実行してもよい。他の例では、本発明が不明瞭になることのないように、周知の方法、手順、および構成要素については詳細に説明していない。 In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, it will be apparent to those skilled in the art that the present invention may be practiced without these specific details. In other instances, well-known methods, procedures, and components have not been described in detail so as not to obscure the present invention.

一般的なデータフロー
図１を参照して、この図は、本発明の一実施形態に係る、中間モジュール２００およびその周囲を含むシステムを示すとともに、ワークステーション２３０のクライアントモジュールからネットワークノード２６０のアプリケーションサービスプロバイダへのデータフローを示している。General Data Flow Referring to FIG. 1, this figure shows a system including an intermediate module 200 and its surroundings, as well as an application of a network node 260 from a client module of a workstation 230, according to one embodiment of the present invention. The data flow to the service provider is shown.

中間モジュール２００は、阻止モジュール２１０およびデータ保護モジュール２２０を備えていてもよい。また、中間モジュール２００は、公衆ネットワーク２５０等のネットワークを介して、クライアント端末２３０（たとえば、トラステッド（信頼できる）ワークステーション等）およびネットワークノード２６０（たとえば、アプリケーションサービスプロバイダ等）と動作可能に接続されていてもよい。当然のことながら、図１は本発明の一実施形態を例示したものに過ぎず、その他のネットワーク構成も可能である。たとえば、トラステッドワークステーション２３０と中間モジュール２００とが互いにリモートな関係となって、トラステッドネットワークリンクを介して動作可能に接続されていてもよい。 The intermediate module 200 may include a blocking module 210 and a data protection module 220. The intermediate module 200 is also operatively connected to a client terminal 230 (eg, a trusted workstation) and a network node 260 (eg, an application service provider) via a network such as the public network 250. It may be. Of course, FIG. 1 is merely illustrative of one embodiment of the present invention, and other network configurations are possible. For example, the trusted workstation 230 and the intermediate module 200 may be in a remote relationship with each other and operably connected via a trusted network link.

たとえば、トラステッドワークステーション２３０は、複数の組織に対応する複数の中間モジュールに接続され、公衆ネットワークを介して１または複数のアプリケーションサービスプロバイダとのデータトラフィックを仲介するようにしてもよい。 For example, the trusted workstation 230 may be connected to a plurality of intermediate modules corresponding to a plurality of organizations and mediate data traffic with one or more application service providers over a public network.

ただし、中間モジュールは、本出願の全体にわたって参照するに、クライアント装置上に存在していてもよく、たとえば、クライアント装置に関連する施設のゲートウェイサーバまたはトラステッドクライアント装置および非トラステッド（信頼できない）サーバと接続した１または複数の独立サーバに設けられていてもよい。 However, the intermediate module may be present on the client device for reference throughout this application, for example, a facility gateway server or trusted client device associated with the client device and an untrusted (untrusted) server. You may be provided in the connected 1 or several independent server.

したがって、阻止モジュールおよびデータ保護モジュールのうちの少なくとも一方は、たとえばブラウザプラグイン、オペレーティングシステムドライバまたはモジュール、ソフトウェアライブラリ、または別のソフトウェアコンポーネントとしてトラステッドワークステーションに組み込まれていてもよい。 Thus, at least one of the blocking module and the data protection module may be incorporated into the trusted workstation as, for example, a browser plug-in, operating system driver or module, software library, or another software component.

別の例として、中間モジュールは、非トラステッドアプリケーションの直前に配置して、当該非トラステッドアプリケーションへのすべてのアクセスが中間モジュールを通過するようにしてもよい。 As another example, an intermediate module may be placed immediately before an untrusted application so that all access to the untrusted application passes through the intermediate module.

さらに別の例として、中間モジュールは、クライアントモジュールから入力データが送信され、処理済みデータを非トラステッドサーバに送信する独立サーバであってもよい。
トラステッドワークステーション２３０は、中間モジュールと相互作用可能なクライアントコンポーネント２４０が組み込まれたクライアントコンピュータであってもよい。また、クライアントコンポーネント２４０は、ウェブブラウザで動作するウェブアプリケーションのＨＴＭＬ形式であってもよい。一方、ネットワークノード２６０は、ＳａａＳベンダーのＨＴＴＰウェブサーバであってもよい。クライアントコンポーネント２４０は、ＡＰＩクライアントソフトウェアを備えていてもよい。また、その追加または代替として、ネットワークノード２６０にリモートアクセスするその他任意の方法を備えていてもよい。As yet another example, the intermediate module may be an independent server that receives input data from a client module and transmits processed data to an untrusted server.
The trusted workstation 230 may be a client computer incorporating a client component 240 that can interact with the intermediate module. The client component 240 may be in the HTML format of a web application that runs on a web browser. On the other hand, the network node 260 may be an HTTP web server of a SaaS vendor. The client component 240 may comprise API client software. In addition, as an addition or alternative, any other method for remotely accessing the network node 260 may be provided.

エンドユーザは、クライアントコンポーネント２４０を用いて、ネットワークノード２６０との間での受け渡しまたは読み出しを目的としたデータの入力、読み出し、および操作を行うことができる。エンドユーザとしては、ソフトウェアエージェント（たとえば、ウェブブラウザ等）を利用する人間およびクライアントＡＰＩを使用する自動化エージェント等が挙げられる。 An end user can use the client component 240 to input, read, and manipulate data for the purpose of passing to or reading from the network node 260. End users include humans using software agents (eg, web browsers) and automated agents using client APIs.

中間モジュール２００の阻止モジュール２１０は、トラステッドワークステーション２３０からの（未処理）入力テキストを阻止するか、あるいは受信し、当該入力テキストをデータ保護モジュール２２０に供給して処理してもよい。阻止モジュール２１０は、クライアントコンポーネント２４０とネットワークノード２６０間を流れるデータを阻止してもよいし、その修正や通常のデータフローに対する割り込みも可能である。たとえば、阻止モジュールは、認証セッションを開始することにより、ネットワークノード２６０に格納されたデータにエンドユーザがアクセス可能であることを判定するようにしてもよい。阻止モジュール２１０は、ウェブプロキシサーバであってもよい（または、ウェブプロキシサーバにより実行してもよい）。 The blocking module 210 of the intermediate module 200 may block or receive (unprocessed) input text from the trusted workstation 230 and provide the input text to the data protection module 220 for processing. The blocking module 210 may block data flowing between the client component 240 and the network node 260, and can modify or interrupt normal data flow. For example, the blocking module may determine that an end user has access to data stored on the network node 260 by initiating an authentication session. The blocking module 210 may be a web proxy server (or may be executed by a web proxy server).

データ保護モジュール２２０は、入力テキストを受信して選択的に処理してもよい。選択処理されない入力テキストは、実質的に処理されずに、または選択処理されたテキストよりも少ない処理で未処理テキストとしてネットワークノード２６０に送信され、記憶システム２７０での操作および記憶のうちの少なくとも一方を行うようにしてもよい。処理対象のテキストについては、データ保護モジュール２２０が入力テキストを処理して処理済みテキストを供給するようにしてもよい。また、公衆ネットワーク２５０を介して非トラステッドアプリケーションサービスプロバイダ２６０に供給し、記憶や操作等を行うようにしてもよい。したがって、本発明の実施形態によれば、アプリケーションサービスプロバイダ２６０が未処理テキストを受信するのではなく、処理済みテキストを記憶して操作するようにしてもよい。この処理には、以下に説明するように、検索およびソートのうちの少なくとも一方が可能な暗号化法の適用による暗号化テキストデータの供給が含まれる。本発明の実施形態によれば、この処理において、アプリケーションサービスプロバイダ２６０にいずれの入力テキストを処理済み形式で送信し、いずれの入力テキストを未処理形式で送信するかを選択することにより、テキストを選択的に暗号化するようにしてもよい。 Data protection module 220 may receive and selectively process the input text. The input text that is not selected is sent to the network node 260 as unprocessed text without being processed substantially or with less processing than the selected text, and at least one of manipulation and storage at the storage system 270. May be performed. For the text to be processed, the data protection module 220 may process the input text and supply the processed text. Further, it may be supplied to the untrusted application service provider 260 via the public network 250 to perform storage, operation, or the like. Therefore, according to the embodiment of the present invention, the application service provider 260 may store and operate the processed text instead of receiving the unprocessed text. This process includes supplying encrypted text data by applying an encryption method capable of at least one of searching and sorting, as will be described below. According to an embodiment of the present invention, in this process, the input text is sent to the application service provider 260 in the processed format and the text is selected by selecting which input text is sent in the unprocessed format. You may make it selectively encrypt.

当然のことながら、中間モジュール２００は、１または複数のサーバ、１または複数のワークステーション、１または複数のパソコン、１または複数のラップトップコンピュータ、１または複数のメディアプレーヤ、１または複数の携帯データ端末、１または複数の集積回路、および１または複数のプリント配線板、専用ハードウェアのうちの少なくともいずれか、またはそれらの組み合わせを備えていてもよい。 Of course, the intermediate module 200 can be one or more servers, one or more workstations, one or more personal computers, one or more laptop computers, one or more media players, one or more portable data. The terminal may include one or more integrated circuits, and one or more printed wiring boards, at least one of dedicated hardware, or a combination thereof.

データフロー割り込み
中間モジュール２００は、暗号化および復号化のうちの少なくとも一方に対して追加的または無関係な機能を具備または提供してもよい。また、クライアントであるトラステッドワークステーション２３０とサーバである非トラステッドアプリケーション２６０間の通常のメッセージフローを変更してもよい。このような追加機能には、暗号化により失われるサーバ側の機能を補償する効果があってもよい。The data flow interrupt intermediate module 200 may comprise or provide additional or unrelated functions for at least one of encryption and decryption. Further, the normal message flow between the trusted workstation 230 that is a client and the untrusted application 260 that is a server may be changed. Such additional functions may have the effect of compensating for server-side functions lost due to encryption.

本発明の実施形態によれば、中間モジュールは、クライアント装置から入力データを受信し、当該入力データのサーバへの送信を阻むか、または許可しない等により送信を阻止してもよい。また、中間モジュールは、サーバの代わりに、関連する機能を入力データに適用するようにしてもよい。たとえば、中間モジュールは、この機能の結果に基づいて、クライアント装置への少なくとも１つのメッセージを生成するようにしてもよい。 According to the embodiment of the present invention, the intermediate module may receive the input data from the client device and prevent the transmission of the input data by preventing or not permitting the transmission of the input data to the server. The intermediate module may apply related functions to the input data instead of the server. For example, the intermediate module may generate at least one message to the client device based on the result of this function.

本発明の一部の実施形態によれば、中間モジュールは、上記少なくとも１つのメッセージに対する応答をクライアント装置から取得し、その応答に基づき、入力テキストを処理して処理済み入力テキストを取得し、当該処理済み入力テキストをサーバに送信するようにしてもよい。 According to some embodiments of the invention, the intermediate module obtains a response to the at least one message from the client device, processes the input text based on the response to obtain the processed input text, and The processed input text may be transmitted to the server.

たとえば、サーバは一般的に、入力テキストのスペルをチェックし、たとえばスペルの間違った単語や訂正箇所を提示するフィードバックメッセージをユーザに供給してもよい。ただし、サーバが受信したテキストが暗号化されている場合、本発明の実施形態によれば、サーバは、処理済みテキストの復号化なしでのスペルチェックは行えなくてもよい。したがって、本発明の実施形態によれば、中間モジュールは、スペルチェック等の追加機能を入力テキストに適用するとともに、当該入力データに対するスペルチェック機能の結果としてのエラーメッセージ、スペル訂正の提示、エラー非検出メッセージ等のフィードバックメッセージをユーザに供給するようにしてもよい。 For example, the server may typically check the spelling of the input text and provide the user with a feedback message that presents, for example, a misspelled word or correction. However, if the text received by the server is encrypted, according to an embodiment of the present invention, the server may not be able to perform a spell check without decrypting the processed text. Therefore, according to the embodiment of the present invention, the intermediate module applies an additional function such as a spell check to the input text, and also provides an error message, a spell correction presentation, an error non-display as a result of the spell check function for the input data. A feedback message such as a detection message may be supplied to the user.

本発明の一実施形態において、上記のような追加機能としては、たとえばユーザデータ（またはその一部）のコピーを格納し、クライアントからの検索要求に応じて中間モジュールを検索することによりサーバ側の検索機能を置き換えること等が挙げられる。 In one embodiment of the present invention, the additional function as described above includes, for example, storing a copy of user data (or a part thereof) and searching for an intermediate module in response to a search request from a client. For example, replacing the search function.

また、本発明の一実施形態において、上記のような追加機能としては、ユーザデータの暗号化および復号化が可能となる前にクライアントと中間モジュールとの間で認証セッションを開始すること等が挙げられる。 In the embodiment of the present invention, the additional function as described above includes starting an authentication session between the client and the intermediate module before the user data can be encrypted and decrypted. It is done.

また、本発明の一実施形態において、上記のような追加機能としては、入力データの書式チェック等が挙げられる。また、必要に応じて、入力データが第１の書式である場合は、当該第１の書式と異なる第２の書式で情報を送るようにクライアントに要求すること等が挙げられる。そのような受信および要求書式のうちの少なくとも一方としては、たとえば（ａ）入力テキストの既知の型からの差異のみを送信するデルタ符号化書式、（ｂ）完全入力テキスト型、（ｃ）特定のドキュメント書式に含まれる入力テキスト、またはそれらの組み合わせ等が挙げられる。たとえば、入力データがデルタ符号化書式で受信され、中間モジュールが完全入力テキスト書式の入力データを要求するようにしてもよい。特定のドキュメント書式のその他の例としては、ＰＤＦ、ＤＯＣ、ＨＴＭＬ等が挙げられるが、これらに限定されるものではない。 In one embodiment of the present invention, the additional function as described above includes input data format check and the like. If necessary, if the input data is in the first format, the client may be requested to send information in a second format different from the first format. At least one of such receive and request formats includes, for example: (a) a delta encoding format that transmits only differences from a known type of input text, (b) a complete input text type, (c) a specific Input text included in the document format, or a combination thereof. For example, input data may be received in a delta encoded format and an intermediate module may request input data in a fully input text format. Other examples of specific document formats include, but are not limited to, PDF, DOC, HTML, and the like.

本発明の実施形態によれば、処理済みテキストは、ネットワークノード２６０のたとえば記憶システム２７０に格納し、公衆ネットワーク２５０を介してリモート操作するようにしてもよい。この処理は、以下に説明するように、アプリケーションサービスプロバイダで処理済みデータを復号化することなく、トラステッドユーザおよび非トラステッドサーバアプリケーションのうちの少なくとも一方に対してトランスペアレントまたは不可視となるように、処理済みテキストに検索およびソートのうちの少なくとも一方が適用可能となされていてもよい。記憶システム２７０は、以下の説明においてデータベースを示す場合もあるが、任意の適当なデジタル記憶アーキテクチャであって、ＲＡＩＤ（レイド：ＲｅｄｕｎｄａｎｔＡｒｒａｙｏｆＩｎｄｅｐｅｎｄｅｎｔＤｉｓｋｓ）等の任意の適当なハードウェア上に格納されていてもよい。 According to an embodiment of the present invention, the processed text may be stored, for example, in the storage system 270 of the network node 260 and remotely operated via the public network 250. This process has been processed to be transparent or invisible to at least one of the trusted user and untrusted server applications without decrypting the processed data at the application service provider, as described below. At least one of search and sorting may be applicable to the text. Storage system 270 may represent a database in the following description, but may be any suitable digital storage architecture and stored on any suitable hardware such as RAID (Redundant Array of Independent Disks). It may be.

以上から、図１のデータフローに例示するように、トラステッドワークステーション２３０は、アプリケーションサービスプロバイダ２６０が使用する「ＡｃｍｅＣｏｒｐ．」等の未処理入力データを供給するようにしてもよい。この入力テキストは、中間モジュール２００のたとえば阻止モジュール２１０により阻止され、データ保護モジュール２２０により処理されるようにしてもよい。データ保護モジュール２２０は、入力テキストを処理することにより、処理済みデータ「ＤＨＦＯＥＦＲＧＥＪＩＣ」として模式的に示すように、トークンと称する１または複数の個々のテキスト単位および暗号化可能な制御データを生成し、ネットワーク２５０を介して当該処理済みデータを非トラステッドアプリケーションサービスプロバイダ２６０に送信するようにしてもよい。そして、ユーザによる操作およびデータベース２７０への格納のうちの少なくとも一方を行ってもよい。当然のことながら、「ＤＨＦＯＥＦＲＧＥＪＩＣ」は模式的に示したに過ぎず、任意の適当な暗号化アルゴリズム等を用いて、たとえば任意の記号セットを得るようにしてもよい。以下に説明するように、本発明の一実施形態によれば、韓国語または中国語の記号等、ラテン語ではない文字記号または記号を用いてもよい。 From the above, as illustrated in the data flow of FIG. 1, the trusted workstation 230 may supply raw input data such as “Acme Corp.” used by the application service provider 260. This input text may be blocked by, for example, blocking module 210 of intermediate module 200 and processed by data protection module 220. The data protection module 220 processes the input text to generate one or more individual text units called tokens and control data that can be encrypted, as schematically shown as processed data “DHFOEFFRGEJIC” The processed data may be transmitted to the untrusted application service provider 260 via the network 250. Then, at least one of operation by the user and storage in the database 270 may be performed. As a matter of course, “DHFOEFRGEJIC” is merely schematically shown, and an arbitrary symbol set may be obtained by using any appropriate encryption algorithm or the like. As will be described below, according to one embodiment of the present invention, non-Latin character symbols or symbols such as Korean or Chinese symbols may be used.

図２を参照して、この図は、本発明の一実施形態に係る、クライアント端末２３０からアプリケーションサービスプロバイダ２６０への一般化されたデータフローを示した図である。エンドユーザは、暗号化されていない（プレーンテキストの）入力テキストを供給してもよい。この入力データは、クライアント端末２３０からネットワークノード２６０側へ送信され、阻止モジュール２１０で阻止されてもよい。阻止モジュール２１０は、入力データを処理して処理済みデータを供給するデータ保護モジュール２２０に入力テキストを供給してもよい。この処理には、入力テキストの少なくとも一部の暗号化が含まれる。そして、処理済みデータは、阻止モジュール２１０への送信後、公衆ネットワーク２５０を介して送信し、ネットワークノード２６０で受信して、ＳａａＳアプリケーション等のアプリケーションで操作してデータベース２７０に格納してもよい。当然のことながら、入力データは、記憶システム２７０に格納される新規または更新データであってもよいし、検索コマンド等の１または複数のパラメータ等、ＳａａＳアプリケーションに渡されてリアルタイム操作される任意のデータであってもよい。 Referring to FIG. 2, this figure shows a generalized data flow from the client terminal 230 to the application service provider 260, according to one embodiment of the present invention. The end user may supply unencrypted (plain text) input text. This input data may be transmitted from the client terminal 230 to the network node 260 side and blocked by the blocking module 210. Blocking module 210 may provide input text to data protection module 220 that processes the input data and provides processed data. This process includes encryption of at least a portion of the input text. The processed data may be transmitted to the blocking module 210 and then transmitted via the public network 250, received by the network node 260, and stored in the database 270 by operating with an application such as a SaaS application. Of course, the input data may be new or updated data stored in the storage system 270, or any one of one or more parameters such as a search command that is passed to the SaaS application and manipulated in real time. It may be data.

図３を参照して、この図は、本発明の一実施形態に係る、ネットワークノード２６０からクライアント端末２３０へのデータフローを示した図である。このようなプロセスは、ユーザが読み出すかまたは検索要求を行うことによってワークステーション２３０で開始してもよい。検索する用語等の要求パラメータは、図２に関連して上述したように処理してもよいし、ネットワークノード２６０のアプリケーションは、供給された処理済みパラメータに基づいて処理済みデータの検索またはソートを行ってもよい。また、ネットワークノード２６０は、たとえば検索または読み出し要求に応じて処理済みデータを読み出してもよい。この場合、処理済みデータには、暗号化された部分が含まれていてもよい。この処理済みデータは、公衆ネットワーク２５０を介して、クライアント端末２３０側に送信されてもよい。阻止モジュール２１０は、処理済みデータを阻止してデータ保護モジュール２２０に供給することにより、処理済みデータ内の任意の暗号化データを識別するようにしてもよい。識別した任意の暗号化データは、復号化して阻止モジュール２１０に供給することにより、データ通信を再開するようにしてもよい。また、阻止モジュール２１０は、未処理データ（復号化プレーンテキストデータ）をクライアントコンポーネント２４０に転送してユーザに表示するようにしてもよい。 Referring to FIG. 3, this figure shows a data flow from the network node 260 to the client terminal 230 according to an embodiment of the present invention. Such a process may be initiated at workstation 230 by a user reading or making a search request. Request parameters, such as search terms, may be processed as described above in connection with FIG. 2, and the network node 260 application may search or sort the processed data based on the supplied processed parameters. You may go. The network node 260 may read processed data in response to a search or read request, for example. In this case, the processed data may include an encrypted part. This processed data may be transmitted to the client terminal 230 side via the public network 250. The blocking module 210 may identify any encrypted data in the processed data by blocking the processed data and providing it to the data protection module 220. Any identified encrypted data may be decrypted and supplied to the block module 210 to resume data communication. The blocking module 210 may also transfer raw data (decrypted plain text data) to the client component 240 for display to the user.

トークン化および正規化全般
ネットワークノード２６０で動作するアプリケーションは、格納データを検索して結果を返すよう要求される場合がある。図１０は、本発明の一実施形態に係る、暗号化ユーザデータの検索を可能にするデータフローを模式的に示した図である。General Tokenization and Normalization Applications running on the network node 260 may be required to retrieve stored data and return results. FIG. 10 is a diagram schematically illustrating a data flow that enables retrieval of encrypted user data according to an embodiment of the present invention.

まず、クライアント２４０は、データを入力し、中間モジュール２００を介して非トラステッドアプリケーション２６０に複数の格納要求を出してもよい。中間モジュールは、検索可能なすべての単語が暗号化された検索可能な単語にマッピングされて検索可能なすべての入力単語が厳密に１つの対応する暗号化された検索可能な単語を有するようにユーザ入力を暗号化する。暗号化された検索可能な単語は、暗号化前に正規化されていてもよい。 First, the client 240 may input data and issue multiple storage requests to the untrusted application 260 via the intermediate module 200. The intermediate module maps all searchable words to encrypted searchable words so that all searchable input words have exactly one corresponding encrypted searchable word. Encrypt the input. Encrypted searchable words may be normalized before encryption.

たとえば、図１０において、「ＢＡＤ」、「Ｂａｄ」、および「ｂａｄ」という単語はすべて、「ｃｃｃｃ」という単語に暗号化されるため、「ｂａｄ」を検索すると「ＢＡＤ」および「Ｂａｄ」を含む結果が供給される。 For example, in FIG. 10, the words “BAD”, “Bad”, and “bad” are all encrypted into the word “cccc”, so a search for “bad” includes “BAD” and “Bad”. Results are supplied.

図１０において、「ｔｈｅ」および「ａ」という単語は、検索不可能と考えられるため、個々の暗号化された検索可能なトークンにはならない。これに対して、「ｄｏｇ」および「ｃａｔ」という単語はそれぞれ、「ｅｅｅｅ」および「ｂｂｂｂ」という暗号化された検索可能な単語にマッピングされる。検索可能な単語および検索不可能な単語の格標識を有する情報は、「ＺＺＺｙｔｕｖ」および「ＺＺＺａｂｃｄ」という暗号化されたトークンに含まれる。 In FIG. 10, the words “the” and “a” are considered non-searchable and therefore do not become individual encrypted searchable tokens. In contrast, the words “dog” and “cat” are mapped to the encrypted searchable words “eeee” and “bbbb”, respectively. Information with searchable and non-searchable word case markers is contained in the encrypted tokens “ZZZytub” and “ZZZabcd”.

図４を参照して、この図は、本発明の一実施形態に係る、ユーザテキストデータのサーバ側での検索およびインデキシングのうちの少なくとも一方を可能にするよう設計されたデータ処理方法１００を示した模式図である。上述の通り、この方法１００は、中間モジュールのたとえばデータ保護モジュールにより適用してもよい。当然のことながら、処理済みデータを受信して未処理データに変換する方法は、上記方法と実質的に逆であってもよい。 Referring to FIG. 4, this figure illustrates a data processing method 100 designed to enable at least one of server-side searching and indexing of user text data according to an embodiment of the present invention. It is a schematic diagram. As described above, the method 100 may be applied by an intermediate module, such as a data protection module. Of course, the method of receiving processed data and converting it to raw data may be substantially the reverse of the above method.

方法１００は、まずステップ１１０において、たとえばクライアント端末とネットワークノード間に動作可能に接続された中間モジュールにより、入力メッセージを受信する。
ステップ１１１においては、処理対象の入力メッセージ内の個々のデータ単位を識別する。たとえば、入力メッセージとしては、名前、名字、およびドキュメント本文等のフィールドが挙げられる。The method 100 first receives an input message at step 110, for example by an intermediate module operatively connected between the client terminal and the network node.
In step 111, individual data units in the input message to be processed are identified. For example, the input message includes fields such as name, surname, and document text.

ステップ１１２においては、識別したすべてのデータ単位に対して反復的に、まず未処理データ単位を取得し（ステップ１１３）、取得したデータ単位を処理するか否かを選択する。処理済みのデータ単位は、個々に処理してもよいし、まとめて処理してもよい。 In step 112, for all identified data units, unprocessed data units are first acquired (step 113), and whether or not the acquired data units are to be processed is selected. The processed data units may be processed individually or collectively.

ステップ１１４においては、入力データを処理するか否かを判定し、修正されない入力データについては保持する（ステップ１３０）。ステップ１１５においては、入力データの単位テキストを処理すべきか否か、およびいずれの部分を処理すべきか、のうちの少なくとも一方を判定する。たとえば、暗号化に適さない入力テキストの部分としては、「ＯＲ」や「ＡＮＤ」等の検索接続語またはデータに施す特殊なサーバ処理を示す「｛ｉｍｐｏｒｔａｎｔ｝」や「＠ｌｏｃａｔｉｏｎ」等のアプリケーション固有の有意なテキストマークアップ等が挙げられる。 In step 114, it is determined whether or not the input data is to be processed, and the input data that is not modified is held (step 130). In step 115, it is determined whether or not the unit text of the input data is to be processed and which part is to be processed. For example, as an input text part that is not suitable for encryption, a search connection word such as “OR” or “AND” or application specific such as “{important}” or “@location” indicating special server processing applied to data. Significant text markup.

処理対象の入力テキストについては、ステップ１１６に進んで、入力テキストをトークンと称する個々のテキスト単位に分割する（入力テキストからトークンを決定するプロセスは、本明細書ではトークン化と称する）。トークン化は任意であって、方法１００には、（ａ）すべての入力データの単一トークンとしての暗号化、（ｂ）暗号化に適すると判定された入力データの個別暗号化による複数の処理済みトークンの供給（各処理済みトークンは１つの入力テキストを表す）、または（ｃ）それらの組み合わせが含まれていてもよい。 For the input text to be processed, proceed to step 116 to divide the input text into individual text units called tokens (the process of determining tokens from input text is referred to herein as tokenization). Tokenization is optional, and method 100 includes: (a) encryption of all input data as a single token; (b) multiple processing by individual encryption of input data determined to be suitable for encryption. Supply of used tokens (each processed token represents one input text), or (c) a combination thereof may be included.

次にステップ１１７に進んで、特定の入力トークンを検索に不適と評価してもよい。たとえば、個々の単語を判定する基準は、所定の単語のリスト、英語辞書頻度リスト等の単語頻度リストにおける単語頻度閾値、単語の長さ、またはそれらの組み合わせであってもよい。 Proceeding to step 117, the particular input token may be evaluated as unsuitable for the search. For example, the criteria for determining individual words may be a word frequency threshold in a word frequency list such as a predetermined word list, an English dictionary frequency list, a word length, or a combination thereof.

ステップ１１８においては、たとえば文字種、発音区別符号、合字分割、ユニコード文字記号の合成または分解（ユニコード規格により規定）等、検索に重要ではない情報を検索可能な入力トークンから抽出する。抽出した情報は、別個の場所に格納して後々利用するようにしてもよいし、制御トークンと称する出力トークンに設定してもよい。また、テキストトークンは、抽出情報を含まない正規化形式に変換してもよい。本明細書においては、このプロセスを正規化と称する。正規化は任意であって、任意の適当な方法で行ってもよい。 In step 118, information that is not important for the search, such as character type, diacritical code, ligature division, composition or decomposition of Unicode character symbols (specified by the Unicode standard), is extracted from the searchable input token. The extracted information may be stored in a separate place and used later, or may be set in an output token called a control token. Further, the text token may be converted into a normalized format that does not include extraction information. In this specification, this process is referred to as normalization. Normalization is optional and may be performed in any suitable manner.

ステップ１１９においては、検索可能なトークン、検索可能なトークンから抽出された情報、および入力のその他の部分等、暗号化するすべての情報単位のビット表示を取得することにより、暗号化法を用いて暗号化する。情報単位は、検索可能または検索不可能として分類してもよい。検索不可能な情報単位は、結合してもよいし分解してもよい。また、入力テキストにおける検索可能なトークンの順序は変更してもよいし、元の順序を示す指標を検索不可能な情報単位に付加してもよい。 In step 119, the encryption method is used by obtaining a bit representation of all information units to be encrypted, such as the searchable token, the information extracted from the searchable token, and other parts of the input. Encrypt. Information units may be classified as searchable or non-searchable. Information units that cannot be retrieved may be combined or decomposed. In addition, the order of searchable tokens in the input text may be changed, and an index indicating the original order may be added to information units that cannot be searched.

ステップ１２０においては、ＡＥＳ（ＡｄｖａｎｃｅｄＥｎｃｒｙｐｔｉｏｎＳｔａｎｄａｒｄ）またはＤＥＳ（ＤａｔａＥｎｃｒｙｐｔｉｏｎＳｔａｎｄａｒｄ）等の暗号化法を用いて情報単位を暗号化する。 In step 120, the information unit is encrypted using an encryption method such as AES (Advanced Encryption Standard) or DES (Data Encryption Standard).

ステップ１２１においては、以下に詳細に説明するように、たとえばユニコードの１または複数の所定の隣接部分等、文字記号セットから得られた文字記号シーケンスから成る出力テキスト単位に暗号化ビット表示を変換する。この文字記号セットは、前もって定義しておくことにより復号化を補助するようにしてもよい。 In step 121, the encrypted bit representation is converted to an output text unit consisting of a character symbol sequence obtained from a character symbol set, such as one or more predetermined adjacent portions of Unicode, as described in detail below. . This character / symbol set may be defined in advance to assist in decoding.

ステップ１２２においては、ステップ１２１で得られた出力テキストにより入力メッセージ中の入力データ単位を置き換える。
この方法では、識別したすべての入力単位に対してステップ１１２〜１２２を適用し続けた後、サーバアプリケーションをホスティングするネットワークノードに処理済みメッセージを送信する（ステップ１３１）。In step 122, the input data unit in the input message is replaced with the output text obtained in step 121.
In this method, after continuing to apply steps 112-122 to all identified input units, the processed message is transmitted to the network node hosting the server application (step 131).

トークン化
上述の通りデータ処理方法がトークン化を含み、トークン化が多数の工程を含んでいてもよい。当然のことながら、以下のトークン化との関連で説明する工程の一部は任意である。さらに、当然のことながら、非トークン化すなわちトークン化処理済みデータの未処理データへの変換は、上記方法と実質的に逆であってもよい。Tokenization As described above, the data processing method includes tokenization, and tokenization may include multiple steps. Of course, some of the steps described in the context of tokenization below are optional. Furthermore, it will be appreciated that the conversion of non-tokenized or tokenized data to raw data may be substantially the reverse of the above method.

暗号化ユーザデータ上の検索を可能にするため、入力テキストは、トークン化と称するプロセスにおいて多数のセグメントに分割してもよい。個々に検索可能な用語を有するセグメントは（未処理）入力トークンと称するが、通常、入力トークンは全単語である。トークンでない入力セグメントは、「検索不可能な情報セット」と称する情報セットに付加される。このようなセグメントとしては、句読点や空白等の文字記号が挙げられる。 In order to enable searching on encrypted user data, the input text may be divided into a number of segments in a process called tokenization. Segments with individually searchable terms are referred to as (raw) input tokens, but typically input tokens are all words. An input segment that is not a token is added to an information set called an “unsearchable information set”. Such segments include character symbols such as punctuation marks and spaces.

トークン化に関連して、複数の単語を１つのトークンとして結合してもよいし、１つの単語を２つ以上の構成トークンに分割してもよい。たとえば、「ｗｈｉｔｅｂｏａｒｄ」という複合語は、個々に検索可能な「ｗｈｉｔｅ」および「ｂｏａｒｄ」というトークンに分解してもよい。たとえば、中国語または日本語等の言語では通常、空白または文語テキストにおいて単語を分離する別の明確な文字記号は使用しない。このため、１つの中国語入力テキストは、複数の入力トークンに分割してもよい。また、このような結合または分割の指標は、検索不可能な情報セットに付加してもよい。 In connection with tokenization, a plurality of words may be combined as one token, or one word may be divided into two or more constituent tokens. For example, the compound word “whiteboard” may be broken down into individually searchable tokens “white” and “board”. For example, languages such as Chinese or Japanese typically do not use separate distinct character symbols that separate words in white space or sentence text. For this reason, one Chinese input text may be divided into a plurality of input tokens. Further, such a combination or division index may be added to an information set that cannot be searched.

トークン化には、単語の形態的異形の検出、入力トークンの正規化形式への変更、および元の入力トークンの指標の検索不可能な情報セットへの付加が含まれていてもよい。たとえば、単語の形態的異形としては、名詞の複数形と単数形（「ｗｏｒｄ」、「ｗｏｒｄｓ」）、動詞の活用（「ｃｒｙ」、「ｃｒｉｅｄ」、「ｃｒｙｉｎｇ」）等が挙げられる。 Tokenization may include detecting morphological variations of words, changing the input token to a normalized form, and appending the original input token index to an unsearchable set of information. For example, morphological variants of words include plural and singular forms of nouns (“word”, “words”), utilization of verbs (“cry”, “cried”, “crying”) and the like.

トークン化には、検索される可能性が低い単語の検出、検索可能な入力トークンセットからのそれら単語の削除、および検索不可能な情報セットへの付加が含まれていてもよい。たとえば、このような検出には、（ａ）所定の単語セット、（ｂ）（この頻度閾値を超える頻度の単語は検索不可能と考えられる）単語頻度リストおよび頻度閾値を有する辞書、（ｃ）検索可能な単語の最小長さおよび最大長さのうちの少なくとも一方、または（ｂ）それらの任意の組み合わせを使用してもよい。 Tokenization may include the detection of words that are unlikely to be searched, the removal of those words from the searchable input token set, and the addition to non-searchable information sets. For example, such detection may include (a) a predetermined word set, (b) a word frequency list (words with a frequency exceeding this frequency threshold are considered unsearchable) and a dictionary having a frequency threshold, (c) At least one of the minimum and maximum lengths of searchable words, or (b) any combination thereof may be used.

トークン化は、文字種、発音区別符号、合字、またはユニコードの合成／分解等の特定の文字記号特性を無視するサーバ側の検索およびインデキシングのうちの少なくとも一方に対応していてもよい。たとえば、「ＴｏＫｅＮ」および「ｔＯｋＥｎ」の検索では、テキスト検索時に同じ結果が得られてもよく、「ｔｏｋｅｎ」という単語の異形を含むすべての文字列が検索結果に現れる。 Tokenization may correspond to at least one of server-side searching and indexing that ignores certain character-symbol characteristics such as character type, diacritics, ligatures, or Unicode composition / decomposition. For example, in the search for “ToKeN” and “tOkEn”, the same result may be obtained at the time of text search, and all character strings including a variant of the word “token” appear in the search result.

特性を区別しない上記のような検索は、（１）すべての入力文字記号を１つの正準形式に変換し、（２）元の文字記号を示す指標を生成し、（３）この指標を検索不可能な情報セットに付加することによって対応するようにしてもよい。たとえば、トークン化は、入力トークン文字記号を１つの文字種（たとえば、小文字）に変換し、元の文字種を示す指標を検索不可能な情報セットに付加することによって、サーバ側で文字種を区別しない検索に対応するようにしてもよい。 Searches that do not distinguish between characteristics are as follows: (1) All input character symbols are converted into one canonical form, (2) An index indicating the original character symbol is generated, and (3) This index is searched. You may make it respond | correspond by adding to an impossible information set. For example, tokenization is a search that does not distinguish between character types on the server side by converting the input token character symbol to one character type (for example, lower case) and adding an index indicating the original character type to an unsearchable information set. You may make it respond | correspond.

テキストマークアップおよび拡張情報
本発明の一実施形態によれば、入力テキストの処理には、アプリケーション固有のテキスト（少なくとも１つの処理命令）の検出が含まれていてもよい。また、これらの処理命令を非確定的変換済みテキストに付加してもよいし、この情報を処理済みテキストのプレーンテキストに残しておいてもよい。これにより、非トラステッドサーバは、このテキスト拡張情報に関連する任意の処理を適用してもよい。たとえば、ＨＴＭＬは、ＨＴＭＬタグをテキストに埋め込むことによって書式情報をユーザテキストに付加可能なテキスト拡張である。このシステムでは、（１）ＨＴＭＬタグの検索不可能情報への付加、（２）暗号化を伴わない入力ＨＴＭＬタグの出力処理済みテキストへの包含によるサーバ側処理の許可、および（３）ＨＴＭＬタグの通常テキストとしての取り扱い（たとえば、非ＨＴＭＬタグの入力テキストに対する任意の処理のＨＴＭＬタグへの適用）のいずれか少なくとも１つにより入力ＨＴＭＬタグを処理するようにしてもよい。 Text Markup and Extended Information According to one embodiment of the present invention, processing of input text may include detecting application-specific text (at least one processing instruction). These processing instructions may be added to the non-deterministic converted text, or this information may be left in the plain text of the processed text. Thereby, the untrusted server may apply an arbitrary process related to the text extension information. For example, HTML is a text extension that allows formatting information to be added to user text by embedding HTML tags in the text. In this system, (1) addition of HTML tags to non-searchable information, (2) permission of server side processing by inclusion of input HTML tags without encryption in output processed text, and (3) HTML tags The input HTML tag may be processed by at least one of the following processing as normal text (for example, applying arbitrary processing to the input text of the non-HTML tag to the HTML tag).

本発明の一部の実施形態によれば、入力テキストに少なくとも１つの処理命令を検出した場合に、中間モジュールは、当該少なくとも１つの処理命令を変換しないと決定してもよい。 According to some embodiments of the invention, if at least one processing instruction is detected in the input text, the intermediate module may determine not to convert the at least one processing instruction.

本発明の一部の実施形態によれば、入力テキストに少なくとも１つの処理命令を検出した場合に、中間モジュールは、当該少なくとも１つの処理命令を非確定的に変換することを決定してもよい。 According to some embodiments of the invention, if at least one processing instruction is detected in the input text, the intermediate module may decide to convert the at least one processing instruction indeterminately. .

このシステムでは、時間、ユーザ、または処理済みテキスト生成時にシステムが把握しているその他の情報等の文脈情報を、検索不可能な情報セットに付加してもよい。
たとえば、本発明の実施形態によれば、このシステムでは、「重要」または「機密」等の特別指標を暗号化トークンに付加してもよい。これにより、復号化に際してこれらの指標が通知され、入力情報の復号化を示すイベントが生成され、たとえばレコードをログファイルに追加することによってこのイベントが処理されるようにしてもよい。In this system, contextual information such as time, user, or other information that the system knows when generating processed text may be added to the unsearchable information set.
For example, according to an embodiment of the present invention, the system may add a special indicator such as “important” or “confidential” to the encryption token. Thereby, these indices are notified at the time of decoding, and an event indicating the decoding of the input information is generated. For example, this event may be processed by adding a record to the log file.

トークンの順序付け
入力テキストの処理には、処理済みテキストにおける入力トークンの順序の変更が含まれていてもよい。順序の変更に際しては、元の入力テキストにおける入力トークンの順序を示すトークン順序指標を生成し、検索不可能な情報セットに付加するようにしてもよい。Token Ordering Processing of input text may include changing the order of input tokens in the processed text. When changing the order, a token order index indicating the order of the input tokens in the original input text may be generated and added to the information set that cannot be searched.

余剰トークン
入力テキストの処理には、出力テキストに含める少なくとも１つの偽造または擬似余剰トークンの生成が含まれていてもよい。このような擬似トークンによれば、暗号化テキストの統計解析に対する堅牢性を向上させることができる。余剰擬似トークンには、設定目標の統計分布を付加することによって、擬似トークンを隠蔽するとともに統計解析による復号化をさらに困難化するようにしてもよい。この少なくとも１つの余剰トークンは、秘密鍵へのアクセス後にのみ処理済みテキストに含まれるその他のトークンと識別可能である。たとえば、擬似トークンの目標分布のモデルとしては、英語の単語頻度を使用してもよい。Surplus tokens The processing of the input text may include the generation of at least one forged or pseudo surplus token to be included in the output text. According to such a pseudo token, the robustness with respect to the statistical analysis of the encrypted text can be improved. By adding a statistical distribution of a set target to the surplus pseudo token, the pseudo token may be concealed and decoding by statistical analysis may be made more difficult. This at least one surplus token is distinguishable from other tokens included in the processed text only after access to the private key. For example, English word frequency may be used as a model of the target distribution of pseudo tokens.

トークン化プロセス
検索不可能な情報セットは、１または複数の検索不可能なトークン（本明細書では制御トークンとも称する）に配置してもよく、これらのトークンは処理済み出力テキストに含めてもよい。制御トークンは、正規化入力トークンセットの前、後ろ、または内部に設けてもよい。検索不可能な情報セットは、全部または一部を暗号化した後、処理済み出力テキストに含めるようにしてもよい。Tokenization process The non-searchable information set may be placed in one or more non-searchable tokens (also referred to herein as control tokens) and these tokens may be included in the processed output text . The control token may be provided before, behind, or within the normalized input token set. The unsearchable information set may be included in the processed output text after all or part of the information set is encrypted.

暗号化の前に、検索不可能な情報セットおよび検索可能なトークンのビット表示を取得してもよい。そのようなビット表示の取得には、特定の符号化および圧縮方法での入力データの圧縮および符号化が含まれていてもよい。 Prior to encryption, a non-searchable set of information and a bit representation of the searchable token may be obtained. Obtaining such a bit representation may include compression and encoding of the input data with a specific encoding and compression method.

エラー検出指標を生成して検索不可能な情報セットに付加してもよい。たとえば、入力テキストのチェックサムを計算して検索不可能な情報セットに付加してもよい。
取得した入力トークンのビット表示は、場合によっては検索不可能な情報セットとともに、全部または一部を暗号化してもよい。検索可能な入力トークンの暗号化では、いずれの入力トークンインスタンスに対しても１つの暗号化形式が提供される。一方、検索不可能な情報の暗号化では、いずれの同じ情報セットインスタンスに対しても１または複数の暗号化形式が提供される。複数の暗号化形式によりセキュリティが向上する場合もあるが、ユーザデータを復号化せずに特定のサーバ側演算を行うのが困難または不可能となる可能性がある。複数の暗号化形式には、暗号化形式に埋め込まれた少なくとも１ビットの暗号化ソルトを使用してもよい。An error detection index may be generated and added to an unsearchable information set. For example, a checksum of the input text may be calculated and added to an unsearchable information set.
The bit representation of the acquired input token may be encrypted in whole or in part, possibly with an information set that cannot be retrieved. Searchable input token encryption provides one form of encryption for any input token instance. On the other hand, in the encryption of unsearchable information, one or a plurality of encryption formats are provided for any same information set instance. Although security may be improved by a plurality of encryption formats, it may be difficult or impossible to perform specific server-side operations without decrypting user data. For the plurality of encryption formats, an encryption salt of at least 1 bit embedded in the encryption format may be used.

その後、暗号化形式は、適当な符号化方法によりテキスト形式に変換してもよい。このような符号化方法は、次の特性のうちの少なくとも１つを有するものであってもよい。すなわち、（ａ）暗号化トークンを分離することにより、非トラステッドサーバアプリケーションが処理済みテキスト内の検索可能な単位を決定できるようにしてもよいし、（ｂ）非トラステッドサーバアプリケーションが検索可能な単位を決定しない文字記号セットを使用してもよいし（たとえば、文字記号「＋」を用いて非トラステッドサーバアプリケーションにより単語を分離してもよく、このため、暗号化トークンの符号化には不適であってもよい。また、たとえば、英語とヘブライ語の両者の文字記号を用いることにより、アプリケーションが両セットのシーケンスを分離するようにしてもよい）、（ｃ）サーバ側の長さ制限が満たされにくくなるようにコンパクトな表示を行ってもよいし、（ｄ）中間モジュールにおいて、符号化および復号化に効率的なアルゴリズムを使用してもよい。 Thereafter, the encrypted format may be converted into a text format by an appropriate encoding method. Such an encoding method may have at least one of the following characteristics. That is, (a) by separating the cryptographic token, the untrusted server application may be able to determine the searchable unit in the processed text, or (b) the untrusted server application is searchable unit. May be used (for example, the character symbol “+” may be used to separate words by an untrusted server application, which makes it unsuitable for encoding cryptographic tokens. Also, for example, the application may separate both sets of sequences by using both English and Hebrew character symbols), (c) the server-side length limit is met. The display may be compact so that it is less likely to occur. (D) In the intermediate module, And it may be used an efficient algorithm for decryption.

本発明の一部の実施形態によれば、処理済みテキストは、たとえばユニコード文字記号セットの少なくとも１つの隣接サブセットを含む文字記号セットのような、所定の文字記号セットから選択された文字記号列を含んでいてもよい。一部の実施形態において、この少なくとも１つの隣接サブセットは、文字、数字、または両者のカテゴリの文字記号を含んでいてもよい。また、一部の実施形態において、処理済みテキストでの使用のため選択される文字記号は、ユニコード文字記号セットの複数の隣接サブセットから選択されたものであってもよく、たとえば、ユニコード文字記号セットの２つ、３つ、４つ、または５つの別個のサブセットが選択されてもよい。一部の実施形態において、ユニコード文字記号セットのサブセットの数は、１よりも大きく１０以下であってもよい。 According to some embodiments of the present invention, the processed text is a character symbol string selected from a predetermined character symbol set, such as a character symbol set including at least one adjacent subset of the Unicode character symbol set. May be included. In some embodiments, the at least one adjacent subset may include letters, numbers, or a character symbol of both categories. Also, in some embodiments, the character symbol selected for use in the processed text may be selected from multiple adjacent subsets of the Unicode character symbol set, for example, a Unicode character symbol set. Two, three, four, or five distinct subsets may be selected. In some embodiments, the number of subsets of the Unicode character symbol set may be greater than 1 and less than or equal to 10.

本発明の一部の実施形態において、ユニコード文字記号セットのサブセットは、ハングル、中国・日本・韓国（ＣＪＫ）統合表意文字、およびそれらの組み合わせから選択された１または複数のサブセットであってもよい。したがって、ＵＴＦ−１６符号化によりユーザ入力を格納するサーバアプリケーションには、たとえば韓国語文字記号を用いてもよい。文字記号のみを含む韓国語文字記号は、ユニコード文字記号セットの１つの範囲を表すため、効率的な符号化および復号化が可能である。たとえば、同じ理由で中国語文字記号セットを用いてもよい。中国語文字記号セットは、韓国語よりも範囲が広いものの、その使用は、個々の中国語文字記号の検索およびインデキシングのうちの少なくとも一方を個別に行うサーバアプリケーションには不適な場合がある。 In some embodiments of the present invention, the subset of the Unicode character symbol set may be one or more subsets selected from Hangul, Chinese-Japan-Korea (CJK) integrated ideographs, and combinations thereof. . Therefore, for example, a Korean character symbol may be used in a server application that stores user input by UTF-16 encoding. A Korean character symbol including only character symbols represents one range of the Unicode character symbol set, and thus can be efficiently encoded and decoded. For example, a Chinese character / symbol set may be used for the same reason. Although Chinese character symbol sets have a wider range than Korean, their use may be unsuitable for server applications that individually search and / or index individual Chinese character symbols.

ＵＴＦ−８符号化によりユーザ入力を格納するサーバアプリケーションには、たとえばＢＡＳＥ６４符号化を場合により修正して用いてもよい。ＢＡＳＥ６４符号化自体には、文字記号「＋」および「／」を含み、これによりサーバアプリケーションは、１つの暗号化トークンが１または複数の暗号化単語を有すると結論付けてもよい。 For server applications that store user input via UTF-8 encoding, for example, BASE64 encoding may be modified as appropriate. The BASE64 encoding itself includes the character symbols “+” and “/” so that the server application may conclude that one cryptographic token has one or more cryptographic words.

暗号化トークンの分離には、たとえば空白文字記号を用いてもよい。たとえば電子メールアドレスのフィールド等、空白文字記号が期待できない場合は、暗号化トークンの分離にピリオド「．」等の別の文字記号を用いてもよい。 For example, a space character symbol may be used to separate the encryption tokens. For example, when a space character symbol cannot be expected, such as a field of an e-mail address, another character symbol such as a period “.” May be used for separating the encryption token.

処理済み出力テキストは、非トラステッドサーバから送信された場合、中間モジュールでの受信時に非暗号化テキストに含めてもよい。このシステムでは、復号化を開始するため、統計的に有意な特徴を処理済みテキストに生成してもよい。たとえば、このシステムでは、非暗号化テキスト内で暗号化テキストを検出する際に、希少な文字記号またはその組み合わせを検索対象の処理済みテキストに含めるようにしてもよい。 If the processed output text is sent from an untrusted server, it may be included in the unencrypted text when received at the intermediate module. In this system, statistically significant features may be generated in the processed text to begin decoding. For example, in this system, when detecting encrypted text in unencrypted text, a rare character symbol or a combination thereof may be included in the processed text to be searched.

本発明の一部の実施形態によれば、出力トークンが特定の長さ制限を越えないように、処理済み出力テキストを２つ以上の出力トークンに配置してもよい。たとえば、第１の出力トークンには５０文字の長さ制限を適用し、後続の出力トークンには１０００文字の長さ制限を適用してもよい。 According to some embodiments of the invention, the processed output text may be placed in more than one output token so that the output token does not exceed a certain length limit. For example, a 50 character length limit may be applied to the first output token and a 1000 character length limit may be applied to subsequent output tokens.

確定的暗号化と非確定的暗号化との組み合わせ
本発明の一部の実施形態では、入力テキストの確定的変換、非確定的変換、またはそれらの組み合わせを用いてもよい。本発明の実施形態では、入力データ（またはその一部）を確定的、非確定的、またはそれらの組み合わせのいずれで変換するかを決定した後、その決定に基づき、少なくとも１つの秘密鍵を用いて、入力テキストを確定的、非確定的、またはそれらの組み合わせで変換して処理済みテキストを取得し、当該処理済みテキストをサーバに送信するようにしてもよい。Combination of deterministic and non-deterministic encryption In some embodiments of the present invention, deterministic conversion of input text, non-deterministic conversion, or a combination thereof may be used. In an embodiment of the present invention, after determining whether input data (or a part thereof) is to be converted deterministic, non-deterministic, or a combination thereof, at least one secret key is used based on the determination. Then, the input text may be converted deterministically, nondeterministically, or a combination thereof to obtain the processed text, and the processed text may be transmitted to the server.

本明細書において、入力テキストの非確定的変換は、その結果が複数の出力候補の１つとなる変換である。入力テキストの確定的変換は、出力候補を１つだけ含む変換である。通常はいずれの変換においても、１または複数の出力候補の決定に秘密鍵を用いるか、または秘密鍵に依存してもよい。 In this specification, non-deterministic conversion of input text is conversion in which the result is one of a plurality of output candidates. The deterministic conversion of input text is a conversion including only one output candidate. Normally, in any conversion, a secret key may be used for determining one or a plurality of output candidates, or may depend on the secret key.

本発明の実施形態によれば、たとえば秘密鍵に応じて可逆暗号化を適用するか、または秘密鍵を用いて不可逆暗号化を行うことにより、確定的トークン表示を取得するようにしてもよい。また、たとえば秘密鍵を用いて対称暗号化アルゴリズムを適用するか、公開／私有鍵対の私有鍵を秘密鍵として用いて非対称暗号化アルゴリズムを適用するか、または秘密鍵に応じて他の可逆変換を行うことにより、非確定的トークン表示を取得するようにしてもよい。 According to the embodiment of the present invention, the deterministic token display may be acquired by applying reversible encryption according to a secret key or performing irreversible encryption using a secret key, for example. Also, for example, applying a symmetric encryption algorithm using a private key, applying an asymmetric encryption algorithm using the private key of a public / private key pair as a private key, or other reversible transformations depending on the private key To obtain a non-deterministic token display.

本発明の一部の実施形態において、サーバは、過去に入力された入力テキストに対する検索機能を提供するものであってもよい。このような場合、中間モジュールは、入力テキストにおいて個々の検索可能なトークンを確定的に変換することを選択してもよい。このような確定的な変換により、処理済みの検索可能な用語を含む将来の検索クエリーがサーバで正しく処理可能となってもよい。また、たとえばセキュリティの向上のため、入力テキストの一部を非確定的に変換してもよい。本発明の実施形態によれば、入力テキストの一部を確定的に変換することにより、入力テキストの一部の反復インスタンス間の正確なマッチを要するサーバ側機能が得られるようにしてもよい。たとえば、前後の変更がわずかに異なる入力テキストの複数の変更をサーバが比較する場合、当該サーバは、単語ごとまたは行ごとの差異解析を行ってもよい。したがって、このような例では、入力テキストの単語または行を確定的に変換することにより、上記のようにサーバ上で正確なマッチ動作が得られる。 In some embodiments of the present invention, the server may provide a search function for input text input in the past. In such cases, the intermediate module may choose to deterministically convert individual searchable tokens in the input text. Such deterministic conversion may allow the server to correctly process future search queries that include processed searchable terms. Further, for example, a part of the input text may be converted indeterminately for improving security. According to an embodiment of the present invention, a server-side function that requires an exact match between repeated instances of a portion of the input text may be obtained by deterministically converting a portion of the input text. For example, when a server compares a plurality of changes in input text with slightly different before and after changes, the server may perform a difference analysis for each word or line. Therefore, in such an example, an accurate matching operation can be obtained on the server as described above by deterministically converting the word or line of the input text.

たとえば、本発明の一実施形態における入力テキストの処理工程には、（１）入力テキストの一部または全部を１または複数の処理済みトークンへ非確定的に暗号化する工程と、（２）入力テキストの一部または全部の適当な入力トークンに対応する処理済みトークンを生成する工程と（たとえば、入力テキストのトークン化後、正規化後等）、（３）非確定的および確定的に変換された処理済みデータを処理済み出力テキストに含めてネットワークノードに送信および格納する工程とを含んでいてもよい。本発明の一部の実施形態によれば、入力テキストを確定的、非確定的、またはそれらの組み合わせのいずれで変換するかの決定は、当該単語が単語セットの要素であるか否かに基づいて行ってもよい。このように、たとえば検索に利用可能となる入力トークンを確定的に変換して、上記単語を検索可能とするようにしてもよい。検索に基づくレコードの格納に際しては、確定的または非確定的に変換された処理済みデータを含む処理済み入力テキストを検索結果として返すようにしてもよい。これに対して、検索に利用可能とならない入力トークンについては、確定的に変換する必要はない。 For example, the input text processing step in one embodiment of the present invention includes (1) non-deterministically encrypting part or all of the input text into one or more processed tokens; and (2) input Generating processed tokens corresponding to appropriate input tokens in part or all of the text (eg after tokenizing the input text, after normalization, etc.), (3) non-deterministic and deterministic conversion Including the processed data included in the processed output text and transmitted to and stored in the network node. According to some embodiments of the invention, the determination of whether to convert the input text deterministic, non-deterministic, or a combination thereof is based on whether the word is an element of a word set. You may go. In this way, for example, an input token that can be used for searching may be definitely converted so that the word can be searched. When storing a record based on a search, a processed input text including processed data converted deterministically or nondeterministically may be returned as a search result. In contrast, input tokens that are not available for search need not be deterministically converted.

本発明の一部の実施形態において、入力テキストを確定的、非確定的、またはそれらの組み合わせのいずれで変換するかの決定は、当該単語の長さに基づいて行ってもよい。これにより、たとえば、入力テキストの単語をその長さに基づいて非確定的に変換することを決定してもよい。また、本発明の一実施形態の一例として、たとえば２文字以下の短い単語については非確定的に変換し、３文字以上の長い単語については確定的に変換するようにしてもよい。以上、このような方法では、最小文字数を下回る短い単語が検索不可能であってもよい。 In some embodiments of the present invention, the decision to convert the input text as deterministic, non-deterministic, or a combination thereof may be based on the length of the word. Thereby, for example, it may be determined to convert a word of the input text indefinitely based on its length. Further, as an example of an embodiment of the present invention, for example, a short word of 2 characters or less may be converted indeterminately, and a long word of 3 characters or more may be converted deterministically. As described above, with such a method, a short word that is less than the minimum number of characters may not be searchable.

本発明の一実施形態において、非確定的な変換は第１の鍵を用いて行い、確定的な変換は第２の鍵を用いて行うようにしてもよい。
本発明の一部の実施形態において、第１の鍵と第２の鍵とは同一であってもよい。また、本発明の別の実施形態において、第１の鍵と第２の鍵とは異なっていてもよい。In one embodiment of the present invention, non-deterministic conversion may be performed using a first key, and deterministic conversion may be performed using a second key.
In some embodiments of the invention, the first key and the second key may be the same. In another embodiment of the present invention, the first key and the second key may be different.

本発明の一部の実施形態において、出力テキストの全長が長さ制限を超える場合は、１または複数の確定的に生成されたトークンを省略または削除するようにしてもよい。また、本発明の一部の実施形態においては、入力テキストの少なくとも一部を変換しないよう決定してもよい。 In some embodiments of the invention, one or more deterministically generated tokens may be omitted or deleted if the total length of the output text exceeds the length limit. Also, in some embodiments of the invention, it may be determined not to convert at least a portion of the input text.

本発明の実施形態に係る処理済みテキストの読み出しプロセスは、実質的に逆に作用してもよい。すなわち、処理済みテキストを中間モジュールで受信するとともに、処理済みテキストに適当な逆処理を適用して元の入力テキストを取得するようにしてもよい。本発明の一部の実施形態においては、元の入力テキストをクライアント装置に送信あるいは他の方法によって供給することにより、たとえばクライアント装置を操作するユーザまたはアプリケーションに対して表示または提供するようにしてもよい。 The process of reading processed text according to embodiments of the present invention may work substantially in reverse. That is, the processed text may be received by the intermediate module, and an appropriate reverse process may be applied to the processed text to obtain the original input text. In some embodiments of the present invention, the original input text may be sent or otherwise provided to the client device, for example to be displayed or provided to a user or application operating the client device. Good.

検索クエリーの処理
中間モジュールで受信される入力テキストは、検索対象の少なくとも１つの検索語を含む検索クエリーであってもよい。検索クエリーの入力テキストは、（ａ）ネットワークノードにおいて正しい検索機能を促進するとともに、（ｂ）ネットワークノードが当該テキストをクライアントに送り返した場合に中間モジュールで検索クエリーの復号化が行えるように、中間モジュールで処理してもよい。検索クエリーは一般に、他の入力テキストと同様にネットワークノードで処理されるが、別の処理ステップを適用してもよい。Search Query Processing The input text received at the intermediate module may be a search query that includes at least one search term to be searched. The input text of the search query (a) facilitates the correct search function at the network node and (b) the intermediate query so that the intermediate module can decrypt the search query when the network node sends the text back to the client. It may be processed by a module. The search query is generally processed at the network node as any other input text, but other processing steps may be applied.

本発明の一部の実施形態において、入力テキストを変換する工程は、第１の鍵を用いて検索クエリーの少なくとも１つの検索語を確定的に変換することにより、少なくとも１つの確定的変換済み検索語を生成する工程を含んでいてもよい。これにより、処理済み入力テキストをサーバに送信する工程は、複数の確定的変換済み検索語をサーバに送信する工程を含んでいてもよい。また、本発明の一部の実施形態において、検索クエリーの複数の検索語は、取り扱いおよび変換を別個に行うようにしてもよい。 In some embodiments of the invention, the step of converting the input text comprises at least one deterministic transformed search by deterministically converting at least one search term of the search query using the first key. A step of generating a word may be included. Thus, the step of transmitting the processed input text to the server may include the step of transmitting a plurality of definitive converted search terms to the server. In some embodiments of the present invention, a plurality of search terms of a search query may be handled and converted separately.

本発明の一部の実施形態において、処理済み検索クエリーは実質的に、確定的変換済み検索語のみを含んでいてもよく、その確定的な変換が可逆変換であってもよい。ネットワークノードは、処理済みの用語を検索して、その結果セットをクライアントに返すようにしてもよい。また、中間モジュールは、処理済みの検索語を用いて元の入力テキストを取得してもよい。 In some embodiments of the invention, the processed search query may contain substantially only deterministic transformed search terms, and the definitive transformation may be a reversible transformation. The network node may search for the processed terms and return the result set to the client. Further, the intermediate module may obtain the original input text using the processed search terms.

本発明の一部の実施形態において、検索クエリーを変換する工程は、第２の鍵を用いて検索クエリーの実質的に全体を非確定的に変換することにより、非確定的変換済みテキストを生成する工程と、論理和演算子（たとえば、「ＯＲ」演算子等）を用いて上記少なくとも１つの確定的変換済み検索語と非確定的変換済みテキストとを結合して結合処理済みテキストを取得する工程とを含んでいてもよく、処理済み入力テキストをサーバに送信する工程は、当該結合処理済みテキストをサーバに送信する工程を含む。ネットワークノードは、処理済み検索語および非確定的処理済みテキストを分離検索し、確定的変換済み検索語に基づいて結果を取得する（または非検出となる）が、非確定的変換済みテキストについては結果を取得しない。したがって、検索の結果は、処理済み検索語の検索の結果を返すことになってもよい。本発明の一実施形態に係る上記方法を用いることにより、中間モジュールは、ネットワークノードから非確定的変換済みテキストを受信し、そこから検索クエリーの元の入力テキストを取得するようにしてもよい。 In some embodiments of the invention, transforming the search query generates non-deterministic transformed text by transforming substantially the entire search query non-deterministically using the second key. And combining the at least one deterministic transformed search word and the non-deterministic transformed text using a logical sum operator (for example, an “OR” operator or the like) to obtain a combined processed text And transmitting the processed input text to the server includes transmitting the combined processed text to the server. The network node performs a separate search for processed search terms and non-deterministic processed text and obtains a result based on a deterministic converted search term (or becomes undetected), but for non-deterministic converted text Do not get results. Therefore, the search result may return the search result of the processed search word. By using the above method according to an embodiment of the present invention, the intermediate module may receive non-deterministic transformed text from the network node and obtain the original input text of the search query therefrom.

処理済みテキストの保存場所
ネットワークノードサーバの中には、クエリー等の要求に応じて不完全な検索結果を返すものがあってもよい。たとえば、検索クエリーの結果が１００文字のフィールドである場合、サーバは、フィールドの先頭から２０文字のみを返すようにしてもよい。そして、ユーザが検出レコードを選択した場合、サーバは、フィールド全体を提供する。本発明の実施形態によれば、中間モジュールは、このような制約の中でも動作できるものとする。また、本発明の実施形態によれば、サーバが処理済みテキストの複数単位を省略する場合、これらの単位は、処理済みテキスト内の個々のトークン、処理済みテキスト全体、またはその両者であってもよい。Storage location of processed text Some network node servers may return incomplete search results in response to requests such as queries. For example, if the result of the search query is a 100 character field, the server may return only 20 characters from the beginning of the field. And if the user selects a detection record, the server provides the entire field. According to embodiments of the present invention, the intermediate module can operate under such constraints. Also, according to embodiments of the present invention, if the server omits multiple units of processed text, these units may be individual tokens in the processed text, the entire processed text, or both. Good.

本発明の実施形態によれば、上記の問題は、たとえば中間モジュールまたは中間モジュールが管理、制御、あるいはアクセス可能な記憶装置に処理済みテキストの保存場所を設けることによって解決してもよい。このシステムでは、復号化ステップにおいて元の入力テキストを取得する前に、不完全な状態から以下のように再生を試みてもよい。すなわち、（１）中間モジュールは、たとえば非トラステッドサーバやそれに付随する記憶装置を介さずに、復号化ステップにおいてトラステッド記憶装置に完全な処理済みテキスト単位を記憶させるようにしてもよい。（２）不完全な処理済みテキストがサーバから送信されて中間モジュールで受信された場合は、トラステッド記憶ユニットを参照して、当該不完全な処理済みテキスト単位にマッチするかまたは対応する１または複数の完全な処理済みテキスト単位が存在するか否かを判定する。（３）存在する場合は、中間モジュールが不完全な処理済みテキスト単位を対応する完全な処理済みテキスト単位で置き換えることにより、再生処理済みテキストを取得する。（４）再生処理済みテキストは、逆処理法（たとえば、秘密鍵を用いた復号化等）により処理して、元の入力テキストを取得する。そして、必要に応じて、元の入力テキストすなわち未処理テキストをクライアント装置に供給してもよい。 According to embodiments of the present invention, the above problems may be solved, for example, by providing a storage location for processed text in an intermediate module or a storage device that can be managed, controlled, or accessed by the intermediate module. In this system, before obtaining the original input text in the decoding step, playback may be attempted from the incomplete state as follows. That is, (1) the intermediate module may store the complete processed text unit in the trusted storage device in the decryption step without going through an untrusted server or a storage device associated therewith, for example. (2) If incomplete processed text is sent from the server and received by the intermediate module, refer to the trusted storage unit to match one or more corresponding or corresponding to the incomplete processed text unit Determine whether there is a complete processed text unit. (3) If present, the intermediate module replaces the incomplete processed text unit with the corresponding complete processed text unit to obtain the playback processed text. (4) The reproduced text is processed by an inverse processing method (for example, decryption using a secret key) to obtain the original input text. If necessary, the original input text, that is, unprocessed text may be supplied to the client device.

本発明の一部の実施形態において、保存場所に格納されるものは、処理済みテキストに付随する少なくとも１つの完全な処理済み要素であってもよい。たとえば、処理済み要素としては、上記の処理済みテキスト全体であってもよいし、処理済みテキストに含まれる単語またはその他の部分であってもよい。 In some embodiments of the invention, what is stored in the storage location may be at least one complete processed element associated with the processed text. For example, the processed element may be the entire processed text, or a word or other part included in the processed text.

このような保存場所を用いるシステムおよび方法は、たとえば検索要求、記録要求、または報告要求等、クライアント装置からの任意の適当な要求に適用してもよい。
ベイトを用いた非トラステッドサーバ変換の検出
非トラステッドサーバは、多数の変換のうちの１または複数を処理済みユーザデータのインスタンスに頻繁に適用してもよい。このような変換は、トラステッドワークステーション上のクライアントコンポーネントによって期待されてもよいが、本明細書に記載の中間モジュールは把握していなくてもよい。したがって、本発明の実施形態によれば、中間モジュールは、処理済みユーザデータに適用される変換の種類を推測する方法を利用してもよい。The system and method using such a storage location may be applied to any suitable request from a client device, such as a search request, a record request, or a report request.
Detecting Untrusted Server Transformations Using Baits An untrusted server may frequently apply one or more of many transformations to an instance of processed user data. Such a conversion may be expected by a client component on the trusted workstation, but the intermediate modules described herein may not be known. Thus, according to an embodiment of the present invention, the intermediate module may utilize a method for inferring the type of transformation applied to the processed user data.

本発明の一実施形態によれば、中間モジュールは、既知の場所の暗号化ユーザデータに超過情報を付加してもよい（本明細書ではベイトと称する）。ベイトは、処理済みユーザデータが中間モジュールで受信された場合、処理済みユーザデータに適用される変換の種類を推測するために使用してもよい。ベイトを使用可能な変換の適用例としては、特定の文字記号符号化方法およびＨＴＭＬタグの除去等が挙げられるが、これらに限定されるものではない。 According to one embodiment of the invention, the intermediate module may add excess information to the encrypted user data at a known location (referred to herein as a bait). The bait may be used to infer the type of transformation applied to the processed user data when the processed user data is received at the intermediate module. Examples of conversion applications that can use baits include, but are not limited to, specific character-symbol encoding methods, HTML tag removal, and the like.

たとえば、非トラステッドサーバは、受信した暗号化ユーザデータに様々な、かつ場合に応じた符号化方法やそれらの組み合わせをその時に適用してもよい。暗号化テキストが非トラステッドサーバから中間モジュールで受信された場合は、非トラステッドサーバアプリケーションが使用する多数の符号化方法のうちの１つで暗号化テキストを符号化することにより、トラステッドワークステーション上のクライアントコンポーネントと接続するようにしてもよい。この符号化方法は、サーバが生成するメッセージで示唆してもよいし、示唆しなくてもよい。クライアントコンポーネントは通常、サーバコンポーネントを認識して、使用される符号化方法を確実に把握していてもよい。ただし、中間モジュールは、暗号化テキストのインスタンスごとに使用される個別の符号化を把握していなくてもよい。それでも、クライアントコンポーネントへの供給前にユーザデータを復号化する場合、本発明の実施形態に係る中間モジュールは、サーバが適用しクライアントが期待する符号化方法と同じものを使用できるものとする。すなわち、非トラステッドサーバおよびトラステッドワークステーションが使用する符号化方法を中間モジュールが把握していない場合は、中間モジュールによる処理および再処理で情報が喪失または歪曲される可能性がある。 For example, the untrusted server may apply various and appropriate encoding methods and combinations thereof to the received encrypted user data at that time. If the ciphertext is received at the intermediate module from the untrusted server, the ciphertext is encoded on one of the many encoding methods used by the untrusted server application on the trusted workstation. You may make it connect with a client component. This encoding method may or may not be suggested by a message generated by the server. The client component typically recognizes the server component and may reliably know the encoding method used. However, the intermediate module need not keep track of the individual encoding used for each instance of ciphertext. Still, when decoding user data before supply to the client component, the intermediate module according to the embodiment of the present invention can use the same encoding method as applied by the server and expected by the client. That is, if the intermediate module does not know the encoding method used by the untrusted server and the trusted workstation, information may be lost or distorted by processing and reprocessing by the intermediate module.

符号化方法の検出を容易にするため、中間モジュールは、符号化ベイトとして知られる所定の文字記号を暗号化テキストに付加してもよい。符号化ベイトは、クライアントコンポーネントへの供給前に暗号化ユーザデータとともにサーバで暗号化してもよい。中間モジュールが暗号化トークンを検出した場合は、暗号化テキストのインスタンスの符号化に用いられる符号化方法の種類を推測するために符号化ベイトを調べてもよい。したがって、中間モジュールは、推測した符号化方法を用いて処理済みメッセージの復号化テキストを符号化してもよい。符号化方法の例としては、（ａ）ＵＴＦ−８符号化、（ｂ）ＵＴＦ−８が後続するＨＴＭＬエスケープ文字列を用いた符号化、（ｃ）ＪＡＶＡＳｃｒｉｐｔ（登録商標）エスケープ文字列を用いた符号化に続くＪａｖａＳｃｒｉｐｔエスケープ文字列の再使用およびその後のＬａｔｉｎ−１符号化（別称、ＩＳＯ−８８５９−１）の実行等が挙げられるが、これらに限定されるものではない。たとえば、ＪａｖａＳｃｒｉｐｔエスケープは通常、文字記号をバックスラッシュおよび別の文字記号で置き換えることにより作用する。一例として、改行文字記号は、バックスラッシュおよび文字記号「ｎ」、すなわち文字列「＼ｎ」で置き換えられる。 To facilitate detection of the encoding method, the intermediate module may add a predetermined character symbol known as the encoding bait to the encrypted text. The encoded bait may be encrypted at the server with the encrypted user data before being supplied to the client component. If the intermediate module detects an encryption token, the encoding bait may be examined to infer the type of encoding method used to encode the ciphertext instance. Accordingly, the intermediate module may encode the decoded text of the processed message using the estimated encoding method. Examples of encoding methods include (a) UTF-8 encoding, (b) encoding using an HTML escape string followed by UTF-8, and (c) JAVAScript (registered trademark) escape string. Examples include, but are not limited to, reuse of JavaScript escape character strings following encoding and subsequent execution of Latin-1 encoding (also known as ISO-8859-1). For example, a JavaScript escape usually works by replacing a character symbol with a backslash and another character symbol. As an example, the newline character symbol is replaced with a backslash and the character symbol “n”, ie the string “\ n”.

本発明の一部の実施形態において、ベイトは、処理済みテキストの少なくとも１つの変換可能な文字記号にマッチする置換文字記号または置換文字記号列（たとえば、１または複数のエスケープ文字記号）への置換を含む少なくとも１つの変換を検出するのに使用してもよい。 In some embodiments of the present invention, the bait replaces the processed text with a replacement character symbol or replacement character symbol string (eg, one or more escape character symbols) that matches at least one convertible character symbol. May be used to detect at least one transformation comprising:

本明細書では、山括弧「＜」およびバックスラッシュ「＼」から成る符号化ベイトを用いる例を提供する。ユーザは、文字列「Ｔｈｉｓ ‘ ｉｓａｑｕｏｔｅ」を入力してもよい。この文字列は、たとえば「ＱＩＦＪＤＪＮＺＯＰ」に暗号化される。暗号化においては、「ＱＩＦＪＤＪＮＺＯＰ」が「＜＼ＱＩＦＪＤＪＮＺＯＰ」になるように、暗号化トークンにベイトが付加される（＜＼がベイトである）。サーバは、暗号化文字列を受信し、ＪａｖａＳｃｒｉｐｔファイルでクライアントに送信してもよい。ＪａｖａＳｃｒｉｐｔファイルにおいては、サーバがバックスラッシュのみをエスケープする必要があり、山括弧はその必要がない。したがって、クライアントに送信されるメッセージには「＜＼＼ＱＩＦＪＤＪＮＺＯＰ」が含まれる。ここで、ベイトの元のバックスラッシュは、別のバックスラッシュによりエスケープされている。中間モジュールは、元の山括弧およびエスケープされたバックスラッシュで始まるメッセージの暗号化トークンを検出した場合、当該トークンがＪａｖａＳｃｒｉｐｔエスケープであると推測してもよい。その結果として、中間モジュールは、入力されたＱＩＦＪＤＪＮＺＯＰを「Ｔｈｉｓ ‘ ｉｓａｑｕｏｔｅ」に復号化してもよい。ただし、クライアントがＪａｖａＳｃｒｉｐｔエスケープテキストを期待しているものと推測した場合、モジュールは、ＪａｖａＳｃｒｉｐｔエスケープを用い、たとえば引用符をエスケープして「Ｔｈｉｓ＼‘ ｉｓａｑｕｏｔｅ」を生成することにより、復号化文字列を符号化するようにしてもよい。このように、復号化された引用符は、符号化ベイトにより推測された符号化規則を使用している。その後、復号化および符号化された文字列はクライアントに転送される。 The present description provides an example using an encoded bait consisting of angle brackets “<” and backslash “\”. The user may input the character string “This' is a quote”. This character string is encrypted to, for example, “QIFJDJNZOP”. In encryption, a bait is added to the encryption token so that “QIFJDJNZOP” becomes “<\ QIFJDJNZOP” (<\ is a bait). The server may receive the encrypted character string and send it to the client in a JavaScript file. In a JavaScript file, the server needs to escape only the backslash, not the angle brackets. Therefore, the message transmitted to the client includes “<\\ QIFJDJNZOP”. Here, the bait's original backslash is escaped by another backslash. If an intermediate module detects an encryption token in a message that begins with the original angle bracket and an escaped backslash, it may infer that the token is a JavaScript escape. As a result, the intermediate module may decrypt the input QIFJDJNZOP into “This' is a quote”. However, if the client speculates that the JavaScript escape text is expected, the module uses the JavaScript escape to generate the decrypted character by, for example, escaping the quotes to generate “This \ 'is a quote”. The column may be encoded. Thus, the decoded quotes use the encoding rules inferred by the encoding bait. Thereafter, the decoded and encoded character string is transferred to the client.

ベイトを使用可能な別の例としてはＨＴＭＬ変換が挙げられるが、そのうちのＨＴＭＬタグの除去は特殊なケースである。非トラステッドサーバは、ＨＴＭＬマークアップで補強されたテキストを受信し、ＨＴＭＬタグの全部または一部が除去された受信テキストのインスタンスを生成し、これらインスタンスをクライアントコンポーネントに返してもよい。この場合、中間モジュールは、処理済みユーザデータにＨＴＭＬタグベイトを含めてもよい。ＨＴＭＬタグベイトは、処理済みユーザデータの受信時に中間モジュールによって除去するとともに、その存否に基づいて、ＨＴＭＬタグが復号化ユーザデータから除去可能であるか否かを推測し、これにより、クライアントコンポーネントに返されるメッセージ中の復号化ＨＴＭＬタグを保持または除去するようにしてもよい。 Another example where bait can be used is HTML conversion, of which HTML tag removal is a special case. The untrusted server may receive text augmented with HTML markup, generate instances of the received text with all or part of the HTML tags removed, and return these instances to the client component. In this case, the intermediate module may include an HTML tag bait in the processed user data. The HTML tag bait is removed by the intermediate module upon receipt of the processed user data, and based on the presence / absence of the HTML tag bait, it is inferred whether the HTML tag can be removed from the decrypted user data, and is returned to the client component. The decrypted HTML tag in the message may be retained or removed.

一部の実施形態においては、複数のベイトを処理済みテキストに付加することにより、非トラステッドサーバが適用する複数の変換または符号化方法を検出するようにしてもよい。 In some embodiments, multiple transformations or encoding methods applied by the untrusted server may be detected by adding multiple baits to the processed text.

長さ制限
本発明の一部の実施形態においては、入力テキストの複数の個別部分を変換してもよく、当該入力テキストの複数部分の少なくとも１つは、たとえば各部の省略により最大の文字記号数以上を含まない。また、本発明の一部の実施形態においては、入力テキストの複数の個別部分を変換してもよく、当該入力テキストの複数の各部分は、たとえば各部の省略により最大の文字記号数以上を含まない。Length Restriction In some embodiments of the present invention, multiple individual parts of the input text may be converted, and at least one of the multiple parts of the input text may have a maximum number of character symbols, for example by omission of each part. Does not include the above. In some embodiments of the present invention, a plurality of individual parts of the input text may be converted, and each part of the input text includes, for example, the maximum number of character symbols or more by omitting each part. Absent.

トークン化の例
図５を参照して、この図は、センテンス「ＴｈｉｓｓｅｎｔｅｎｃｅｈａｓＦＩＶＥｗｏｒｄｓ！」を含む入力テキストの正規化およびトークン化を示した図である。入力テキスト５１０は、センテンス「ＴｈｉｓｓｅｎｔｅｎｃｅｈａｓＦＩＶＥｗｏｒｄｓ！」を含む。このセンテンスは、「Ｔｈｉｓ」、「ｓｅｎｔｅｎｃｅ」、「ｈａｓ」、「ＦＩＶＥ」、「ｗｏｒｄｓ」、および「！」という入力トークンにトークン化してもよい。また、これらの入力トークンは、正規化により正規化入力トークンおよびメタデータを供給してもよい。正規化入力トークンは、「Ｔｈｉｓ」、「ｓｅｎｔｅｎｃｅ」、「ｈａｓ」、「ｆｉｖｅ」、「ｗｏｒｄｓ」、および「！」という書式となる。また、「ｓｅｎｔｅｎｃｅ」に付随するメタデータは「小文字」である。「ＦＩＶＥ」に付随するメタデータは「大文字」、「ｗｏｒｄｓ」に付随するメタデータは「小文字」および「複数」である。Example of Tokenization Referring to FIG. 5, this figure illustrates normalization and tokenization of input text that includes the sentence “This sentence has five words!”. The input text 510 includes the sentence “This sentence has FIVE words!”. This sentence may be tokenized into input tokens “This”, “sentence”, “has”, “FIVE”, “words”, and “!”. Also, these input tokens may supply normalized input tokens and metadata by normalization. The normalized input token is in the format “This”, “sentence”, “has”, “five”, “words”, and “!”. Further, the metadata accompanying “sentence” is “lowercase”. Metadata associated with “FIVE” is “uppercase”, and metadata associated with “words” is “lowercase” and “plurality”.

次に、共通の入力トークンである単語「ｔｈｉｓ」、「ｈａｓ」、および非単語「！」を検出する。これらの入力トークンは、非確定的に暗号化してもよく、たとえばソルト（「＊」と表示）で暗号化してもよい。 Next, the words “this”, “has”, and the non-word “!”, Which are common input tokens, are detected. These input tokens may be encrypted indeterminately, for example, with a salt (indicated as “*”).

そして、共通ではない入力トークン「ｗｏｒｄ」、「ｓｅｎｔｅｎｃｅ」、および「ｆｉｖｅ」を検出する。これらの単語は、確定的に暗号化してもよい。
入力トークンの順序は変更してもよく、これに応じて、順序メタデータを生成してもよい。この順序メタデータ、文字種メタデータ、および複数メタデータは、制御トークン５３０に含まれていてもよい。Then, input tokens “word”, “sentence”, and “five” that are not common are detected. These words may be definitely encrypted.
The order of the input tokens may be changed, and order metadata may be generated accordingly. The order metadata, character type metadata, and plural metadata may be included in the control token 530.

ソート支援
多くのＳａａＳアプリケーションに共通のテキスト処理機能は、特定のフィールドまたはその他属性の辞書式順序でレコードをソートすることである。したがって、この機能は、順序維持暗号化プロセスにより処理済みテキストを供給する際に有用となる場合がある。Sort Support A text processing function common to many SaaS applications is to sort records in a lexicographic order of specific fields or other attributes. Thus, this feature may be useful in supplying processed text with an order-maintaining encryption process.

順序維持手法は多数存在するが、いずれを実行してもよい。たとえば、順序維持は、以下のいずれの方法でも得られる。（ａ）全レコードのリストを阻止モジュール上に保持し、必要に応じて部位固有の順序付けを行う。この方法ではほとんどの場合、表示およびデータ管理の両方に各サーバ機能の複製が必要となる。（ｂ）サーバにＡＰＩを提供して、特定文字列のソート順序のクエリーを行う。または、（ｃ）ネットワークノードでの修正なしに実際のソート順序を維持する、辞書式にソート可能な表示を生成する。 There are many order keeping techniques, but any of them may be executed. For example, order maintenance can be obtained by any of the following methods. (A) A list of all records is held on the blocking module, and site-specific ordering is performed as necessary. In most cases, this method requires replication of each server function for both display and data management. (B) Provide an API to the server to query the sorting order of a specific character string. Or (c) generate a lexically sortable display that maintains the actual sort order without modification at the network node.

本発明に係る暗号化方法では、以下の各ステップまたはそれらの組み合わせを適用することによって、入力テキストのレコードの順序を維持するようにしてもよい。すなわち、（１）入力データ（数値化されていない場合）を数値に変換し、（２）数値に順序維持変換を適用して出力数値を取得し、（３）出力数値から辞書式にソート可能な表示を取得し、（４）処理済み出力テキストにおいて、辞書式にソート可能な表示を（テキストデータ中の）接頭辞文字列または出力データ全体として使用する。この順序維持変換は、単調増加関数であってもよい。また、この順序維持関数は、乱数源から生成可能な私有鍵を用いることにより、その機能をパラメータ化してもよい。私有鍵は、まとめてソートされた入力セットごとにセットで生成してもよい。本発明の実施形態によれば、以下に説明するように、順序情報を生成する工程は、秘密鍵依存の順序維持関数を入力テキストに適用する工程を含んでいてもよい。 In the encryption method according to the present invention, the order of records of the input text may be maintained by applying the following steps or combinations thereof. (1) Convert input data (if not digitized) to numerical values, (2) Apply order-maintaining conversion to numerical values to obtain output numerical values, and (3) Sort output numerical values lexicographically (4) In the processed output text, a display that can be sorted lexicographically is used as the prefix character string (in the text data) or the entire output data. This order maintaining transformation may be a monotonically increasing function. Further, this order maintaining function may parameterize its function by using a private key that can be generated from a random number source. A private key may be generated as a set for each input set sorted together. According to an embodiment of the present invention, as described below, the step of generating order information may include the step of applying a secret key dependent order maintaining function to the input text.

本発明の一部の実施形態によれば、入力テキストの不完全型に基づいて順序情報を生成してもよい。本発明のさらに別の実施形態によれば、入力テキストの複数の不完全単語に基づいて、当該単語の登場順に順序情報を生成してもよい。 According to some embodiments of the present invention, order information may be generated based on an incomplete type of input text. According to still another embodiment of the present invention, order information may be generated in the order of appearance of the words based on a plurality of incomplete words in the input text.

本発明の一部の実施形態によれば、中間モジュールは、順序維持変換を適用することによって入力テキストを処理してもよく、当該順序維持変換は、入力テキストに基づいて、照合規則に応じた入力テキスト候補セットにおける当該入力テキストの相対順序を示す順序情報を生成すること、入力テキストを変換して処理済みテキストを取得すること、処理済みテキストをサーバに送信することを含む。また、本発明の一部の実施形態によれば、順序情報を接頭辞として処理済み入力データに付加するとともに当該結合した順序情報および処理済み入力データをサーバに送信することにより、当該処理済み入力テキストに関連して当該順序情報をサーバに送信してもよい。 According to some embodiments of the present invention, the intermediate module may process the input text by applying an order maintaining transform, the order maintaining transform depending on a matching rule based on the input text. Generating order information indicating the relative order of the input text in the input text candidate set, converting the input text to obtain processed text, and sending the processed text to the server. Also, according to some embodiments of the present invention, the processed input is added by adding the sequence information as a prefix to the processed input data and sending the combined sequence information and processed input data to the server. The order information may be transmitted to the server in association with the text.

順序維持暗号化方法に関連するセキュリティ上のリスクを低減するため、中間装置は、順序維持出力の生成時に、入力データの縮小部分のみを考慮してもよい。入力を縮小して入力データの取得部分を抑えるには、（ａ）「ｔｈｅ」や「ａ」等の特定の単語を無視したり、（ｂ）あらゆる単語の特定箇所またはそれ以降のすべての文字記号を無視したり（たとえば、「ｚｅｂｒａ」の文字記号における「ｒａ」を無視したり）、（ｃ）レコード内の最後の単語を無視したり、（ｄ）順序維持関数の入力定義域を縮小したり、（ｅ）文字種等の特定の文字記号特性を無視したり、または（ｅ）それらを組み合わせたりしてもよい。 In order to reduce the security risks associated with the order-maintaining encryption method, the intermediate device may only consider a reduced portion of the input data when generating the order-maintaining output. To reduce the input data acquisition part by reducing the input, (a) ignore specific words such as “the” and “a”, or (b) all characters at or after a specific part of any word Ignore the symbol (eg, ignore “ra” in the letter symbol “zebra”), (c) ignore the last word in the record, or (d) reduce the input domain of the order keeping function Or (e) ignoring specific character symbol characteristics such as character types, or (e) combining them.

図７は、本発明の一実施形態に係る、処理済みテキストに含まれるテキストデータの順序維持表示の取得に利用可能な方法１７０の様々なステップを示した図である。ステップ１７１においては、暗号化する入力テキストを受信する。ステップ１７２においては、入力テキストから特定の単語を除去する。ステップ１７３においては、文字種、発音区別符号、合字等の特定の文字記号特性を除去する。ステップ１７４においては、入力テキストから最後の文字記号が除去されるように、暗号化方法の所定のパラメータに応じて入力単語を省略する。 FIG. 7 is a diagram illustrating various steps of a method 170 that can be used to obtain an in-order display of text data contained in processed text, according to one embodiment of the present invention. In step 171, the input text to be encrypted is received. In step 172, specific words are removed from the input text. In step 173, specific character / symbol characteristics such as character type, diacritic code, and ligature are removed. In step 174, the input word is omitted according to a predetermined parameter of the encryption method so that the last character symbol is removed from the input text.

ステップ１７５においては、入力テキストにおける特定の最後の単語を除去する。以上から、任意のステップ１７２、１７３、１７４、および１７５の１または複数を実行することにより、生成される入力テキストの長さを抑えるようにしてもよい。ステップ１７６においては、（任意で長さを抑えた）入力テキストを数値に変換して入力数値を取得する。ステップ１７７においては、入力数値に順序維持関数を適用して出力数値を取得する。ステップ１７８においては、出力数値から順序維持表示を取得する。最後に、ステップ１７９においては、処理済みテキストの接頭辞または暗号化データ全体として順序維持表示を設定する。 In step 175, the particular last word in the input text is removed. From the above, the length of the input text to be generated may be suppressed by executing one or more of the arbitrary steps 172, 173, 174, and 175. In step 176, the input text (optionally reduced in length) is converted to a numerical value to obtain the input numerical value. In step 177, an order maintaining function is applied to the input numerical value to obtain an output numerical value. In step 178, the order maintaining display is obtained from the output numerical value. Finally, in step 179, the order maintaining display is set as the processed text prefix or the entire encrypted data.

ステップ１７２〜１７６の適用を示す以下の例では、入力テキスト「ＴｈｅＧｒｅｅｎＺｅｂｒａ」の入力数値を以下のように計算する。すなわち、（ａ）入力トークンセット「ＴｈｅＧｒｅｅｎＺｅｂｒａ」を受信し、（ｂ）重要ではない入力トークン「ｔｈｅ」を無視して有意な入力トークン「ＧｒｅｅｎＺｅｂｒａ」を提供し、（ｃ）有意な入力トークンを正規化して「ｇｒｅｅｎｚｅｂｒａ」を提供し、（ｃ）たとえばユーザ定義に基づき、各入力トークンの先頭から３文字のみを選択して６つの有意な文字記号「ｇｒｅｚｅｂ」を提供し、（ｄ）各文字の数値をそれぞれの入力トークンにおける位置の重みに基づいて表１のように計算し、（ｅ）文字の数値を合算して入力トークンセットの数値０．２９６１９９７９００６８３４５を提供する。 In the following example showing the application of steps 172 to 176, the input numerical value of the input text “The Green Zebra” is calculated as follows. That is, (a) the input token set “The Green Zebra” is received, (b) the unimportant input token “the” is ignored, and the significant input token “Green Zebra” is provided, and (c) the significant input Normalize the tokens to provide “green zebra”, (c) select only 3 characters from the beginning of each input token, for example, based on user definition, and provide 6 significant character symbols “green zeb” ( d) The numerical value of each character is calculated as shown in Table 1 based on the position weight in each input token, and (e) the numerical values of the characters are added to provide the input token set value of 0.296199790068345.

重みＷは、アルファベットサイズＡに対する文字記号位置の負の指数Ｐとして、Ｗ＝Ａ^−Ｐと表してもよい。英語テキストの場合、アルファベットサイズは２６である。Weight W as negative exponential P character symbol positions relative to the alphabet size A, may be expressed as W = A^-P. For English text, the alphabet size is 26.

図８は、本発明の一実施形態に係る、順序維持関数の生成方法３００を示しており、たとえば方法１７０のステップ１７７で使用する。ステップ１８０においては、たとえばユーザまたはプログラムによる設定に応じて関数の定義域（Ｄ_１，Ｄ_２）および値域（Ｒ_１，Ｒ_２）を決定する。ステップ１８１においては、順序維持関数の出力値の計算に使用する私有鍵Ｋを取得する。ステップ１８２においては、（場合により、方法１７０のステップ１７６から）入力値Ｖ_ｉｎを受信する。ステップ１８３および１８４においては、元の値域に含まれる鍵依存の位置が始点および終点となるように関数の値域を変更する。ステップ１８５においては、関数の定義域に含まれる関数の鍵Ｋに依存した点Ｄ_ｍｉｄがＤ_ｍｉｄ＝ｆ_１（Ｄ_１，Ｄ_２，Ｋ）を満たすように選択する。ステップ１８６においては、点Ｒ_Ｌ＝ｆ_２（Ｒ_１，Ｒ_２，Ｋ，ｎ）およびＲ_Ｈ＝ｆ_３（Ｒ_１，Ｒ_２，Ｋ，ｎ）がＲ_１＜Ｒ_Ｌ＜Ｒ_ｈ＜Ｒ_２を満たすように選択する。ここで、Ｒ_ＬおよびＲ_ｈは、関数の鍵Ｋおよび反復回数ｎのうちの少なくとも一方（初期値はｎ＝１）によって決まる。ステップ１８７においては、入力数値Ｖ_ｉｎが現在の定義域（Ｄ_１，Ｄ_２）の下方部（Ｄ_１，Ｄ_ｍｉｄ）または上方部（Ｄ_ｍｉｄ，Ｄ_２）のいずれに含まれるかを確認する。Ｖ_ｉｎが下方部に含まれる場合は、ステップ１８８ａを実行し、そうでなければステップ１８８ｂを実行する。ステップ１８８ａおよび１８８ｂにおいては、関数の定義域（Ｄ_１，Ｄ_２）および値域（Ｒ_１，Ｒ_２）を修正する。すなわち、ステップ１８８ａにおいては、（Ｄ_１，Ｄ_２）を（Ｄ_１，Ｄ_ｍｉｄ）に設定し、（Ｒ_１，Ｒ_２）を（Ｒ_１，Ｒ_Ｌ）に設定する。ステップ１８８ｂにおいては、（Ｄ_１，Ｄ_２）を（Ｄ_ｍｉｄ，Ｄ_２）に設定し、（Ｒ_１，Ｒ_２）を（Ｒ_Ｈ，Ｒ_２）に設定する。ステップ１８５〜１８８は、ステップ１８９で所定の停止基準が満たされるまで繰り返す。停止基準の例としては、閾値サイズＤ_{ｔｈｒｅｓｈｏｌｄ}が現在の定義域サイズ｜Ｄ｜＝Ｄ_２−Ｄ_１を超えること、閾値サイズＲ_{ｔｈｒｅｓｈｏｌｄ}が現在の値域サイズ｜Ｒ｜＝Ｒ_２−Ｒ_１を超えること、またはそれらの組み合わせが挙げられる。 FIG. 8 illustrates a method 300 for generating an order keeping function according to an embodiment of the invention, for example, used in step 177 of method 170. In step 180, for example, the domain (D₁ , D₂ ) and the range (R₁ , R₂ ) of the function are determined according to settings by the user or program. In step 181, a private key K used for calculating the output value of the order maintaining function is obtained. In step 182, (optionally, the step 176 of method 170) receives an input value_{V in.} In steps 183 and 184, the range of the function is changed so that the key-dependent positions included in the original range are the start point and the end point. In step 185, the point D_mid depending on the function key K included in the domain of the function is selected so as to satisfy D_mid = f₁ (D₁ , D₂ , K). In step 186, the points R_L = f₂ (R₁ , R₂ , K, n) and R_H = f₃ (R₁ , R₂ , K, n) are R₁ <R_L <R_h <R. Select to satisfy₂ . Here, R_L and R_h are determined by at least one of the function key K and the number of iterations n (initial value is n = 1). In step 187, it is confirmed whether the input numerical value Vin is included_in the lower part (D₁ , D_mid ) or the upper part (D_mid , D₂ ) of the current domain (D₁ , D₂ ). . If Vin is included_{in the} lower part, step 188a is executed, otherwise step 188b is executed. In steps 188a and 188b, the domain (D₁ , D₂ ) and the range (R₁ , R₂ ) of the function are modified. That is, in step 188a, (D₁ , D₂ ) is set to (D₁ , D_mid ), and (R₁ , R₂ ) is set to (R₁ , R_L ). In step 188b, (D₁ , D₂ ) is set to (D_mid , D₂ ), and (R₁ , R₂ ) is set to (R_H , R₂ ). Steps 185 to 188 are repeated until a predetermined stop criterion is satisfied in step 189. As an example of a stop criterion, the threshold size D_threshold exceeds the current domain size | D | = D₂ −D₁ , and the threshold size R_threshold exceeds the current range size | R | = R₂ −R₁ . Or a combination thereof.

以下に、方法１７０のステップ１７８で利用可能な符号化方法を例示する。まず、順序維持関数により生成された変換数値が０．３４４３２３９４７であり、辞書式にソート可能な表示が１０文字長で英語の小文字のみを含むものと仮定する。表２は、算術符号化方法の１０回反復による１０文字長の辞書式にソート可能な表示の生成を示したものである。 The following illustrates an encoding method that can be used in step 178 of method 170. First, it is assumed that the converted numerical value generated by the order maintaining function is 0.3443323947, and the display that can be sorted lexicographically is 10 characters long and includes only English lowercase letters. Table 2 shows the generation of a 10-character long lexicographic display by 10 iterations of the arithmetic coding method.

表２に示すように、辞書式にソート可能な表示は「ｈｘｓｕｔｇｅｓｌｃ」である。 As shown in Table 2, the display that can be sorted lexicographically is “hxsutgeslc”.

ここで、物理的なコンピュータ可読媒体を設けることができる。この媒体には、プロセッサによる実行に際して、プロセッサに方法１００またはその一部を実行させることができる命令を格納する。このような物理的なコンピュータ可読媒体としては、ディスク、ディスケット、テープ、カセット、メモリースティック、フラッシュメモリーユニット、揮発性メモリーユニット等が考えられる。 Here, a physical computer readable medium may be provided. The medium stores instructions that, when executed by the processor, cause the processor to execute the method 100 or portions thereof. As such a physical computer-readable medium, a disk, a diskette, a tape, a cassette, a memory stick, a flash memory unit, a volatile memory unit, and the like are conceivable.

本明細書では、本発明の特定の特徴を例示・説明したが、当業者であれば多くの改良、置換、変形、および均等物を想到し得るであろう。したがって、当然のことながら、添付の請求の範囲は、このような改良や変形がすべて本発明の精神に含まれるように網羅するものである。 Although specific features of the invention have been illustrated and described herein, many modifications, substitutions, variations, and equivalents will occur to those skilled in the art. Accordingly, it is to be understood that the appended claims are intended to cover all such modifications and changes as fall within the spirit of the invention.

Claims

Translated fromJapanese

前記中間モジュールによって、前記処理済みテキストに逆処理を適用して未処理入力テキストを取得すること、
前記中間モジュールによって、前記決定した少なくとも１つの変換に基づいて前記未処理入力テキストを修正することをさらに備える、請求項１に記載の方法。Applying an inverseprocess to the processed text to obtain raw input textby the intermediate module ;
The method of claim 1, further comprising modifying the raw input textby the intermediate module based on the determined at least one transformation.

前記中間モジュールによって、前記修正した未処理入力テキストを前記クライアント装置に送信することをさらに備える、請求項２に記載の方法。The method of claim 2, further comprising transmitting the modified raw input text to the client deviceby the intermediate module .

前記複数の変換のうちの少なくとも１つの変換が、前記処理済みテキストの少なくとも１つの変換可能な文字記号の対応する置換文字記号または置換文字記号列への置換を含み、
前記処理済みテキストにベイトを含めることは、前記処理済みテキストに前記少なくとも１つの変換可能な文字記号を含めることを含む、請求項１に記載の方法。At least one conversion of the plurality of conversions includes replacing the processed text with at least one convertible character symbol to a corresponding replacement character symbol or replacement character symbol string;
The method of claim 1, wherein including a bait in the processed text includes including the at least one convertible character symbol in the processed text.

前記中間モジュールによって、前記処理済みテキストに逆処理を適用して未処理入力テキストを取得すること、
前記中間モジュールによって、前記未処理入力テキストの少なくとも１つの変換可能な文字記号を前記対応する置換文字記号または置換文字記号列に置換することによって前記未処理入力テキストを修正することをさらに備える、請求項４に記載の方法。Applying an inverseprocess to the processed text to obtain raw input textby the intermediate module ;
The intermediate module further comprises modifying the raw input text by replacing at least one convertible character symbol of the raw input text with the corresponding replacement character symbol or replacement character symbol string. Item 5. The method according to Item 4.

前記中間モジュールによって、前記修正した未処理入力テキストを前記クライアント装置に送信することをさらに備える、請求項５に記載の方法。The method of claim 5, further comprising transmitting the modified raw input text to the client deviceby the intermediate module .

前記複数の変換のうちの少なくとも１つの変換が、前記処理済みテキストにおけるＨＴＭＬタグの省略を含み、
前記処理済みテキストにベイトを含めることが、前記処理済みテキストにＨＴＭＬタグを含めることを含む、請求項１に記載の方法。At least one of the plurality of conversions includes an omission of an HTML tag in the processed text;
The method of claim 1, wherein including a bait in the processed text comprises including an HTML tag in the processed text.

前記中間モジュールによって、前記処理済みテキストに逆処理を適用して未処理入力テキストを取得すること、
前記中間モジュールによって、前記未処理入力テキストに含まれるＨＴＭＬタグを省略することによって前記未処理入力テキストを修正すること、
前記中間モジュールによって、前記修正した未処理入力テキストを前記クライアント装置に送信することをさらに備える、請求項７に記載の方法。Applying an inverseprocess to the processed text to obtain raw input textby the intermediate module ;
Modifying the raw input text by omitting HTML tags contained in the raw input textby the intermediate module ;
The method of claim 7, further comprising transmitting the modified raw input text to the client deviceby the intermediate module .

前記制御部がさらに、
前記処理済みテキストに逆処理を適用して未処理入力テキストを取得し、
前記決定した少なくとも１つの変換に基づいて前記未処理入力テキストを修正するように構成された、請求項９に記載のシステム。The control unit further includes:
Apply reverseprocessing to the processed text to obtain raw input text;
The system of claim 9, configured to modify the raw input text based on the determined at least one transformation.

前記制御部がさらに、前記修正した未処理入力テキストを前記クライアント装置に送信するように構成された、請求項１０に記載のシステム。 The system of claim 10, wherein the controller is further configured to send the modified raw input text to the client device.

前記複数の変換のうちの少なくとも１つの変換が、前記処理済みテキストの少なくとも１つの変換可能な文字記号の対応する置換文字記号または置換文字記号列への置換を含み、
前記制御部が、前記処理済みテキストに前記少なくとも１つの変換可能な文字記号を含めることにより、前記入力テキストを処理して前記処理済みテキストを取得するように構成された、請求項９に記載のシステム。At least one conversion of the plurality of conversions includes replacing the processed text with at least one convertible character symbol to a corresponding replacement character symbol or replacement character symbol string;
The control unit according to claim 9, wherein the control unit is configured to process the input text to obtain the processed text by including the at least one convertible character symbol in the processed text. system.

前記制御部がさらに、
前記処理済みテキストに逆処理を適用して未処理入力テキストを取得し、
前記未処理入力テキストの少なくとも１つの変換可能な文字記号を前記対応する置換文字記号または置換文字記号列に置換することによって前記未処理入力テキストを修正するように構成された、請求項１２に記載のシステム。The control unit further includes:
Apply reverseprocessing to the processed text to obtain raw input text;
13. The raw input text is configured to modify the raw input text by replacing at least one convertible character symbol of the raw input text with the corresponding replacement character symbol or replacement character symbol string. System.

前記制御部がさらに、前記修正した未処理入力テキストを前記クライアント装置に送信するように構成された、請求項１３に記載のシステム。The system of claim 13, wherein the controller is further configured to send the modified raw input text to the client device.

前記複数の変換のうちの少なくとも１つの変換が、前記処理済みテキストにおけるＨＴＭＬタグの省略を含み、
前記制御部が、前記処理済みテキストにＨＴＭＬタグを含めることにより、前記入力テキストを処理して前記処理済みテキストを取得するように構成された、請求項９に記載のシステム。At least one of the plurality of conversions includes an omission of an HTML tag in the processed text;
The system of claim 9, wherein the controller is configured to process the input text to obtain the processed text by including an HTML tag in the processed text.

前記制御部がさらに、
前記処理済みテキストに逆処理を適用して未処理入力テキストを取得し、
前記未処理入力テキストに含まれるＨＴＭＬタグを省略することによって前記未処理入力テキストを修正し、
前記修正した未処理入力テキストを前記クライアント装置に送信するように構成された、請求項１５に記載のシステム。The control unit further includes:
Apply reverseprocessing to the processed text to obtain raw input text;
Correcting the raw input text by omitting the HTML tag contained in the raw input text;
The system of claim 15, configured to send the modified raw input text to the client device.