RELATED APPLICATIONThis application is a continuation of U.S. patent application Ser. No. 13/210,398, filed on Aug. 16, 2011, and entitled “DYNAMIC SYMMETRIC SEARCHABLE ENCRYPTION”, the entirety of which is incorporated herein by reference.
BACKGROUNDMany users store various types of documents in a remote repository (commonly known as “cloud storage”), administered by an external entity. As the term is generally used herein, a document can correspond to any unit of information, such as a text-bearing document (such as email), a music file, a picture, a financial record, and so on. A user may opt to store documents in the remote repository for various reasons, e.g., based on factors pertaining to convenience, accessibility, storage capacity, reliability, etc.
Contractual obligations may require the entity which administers the remote repository to minimize the risk of unauthorized access to a user's documents. However, from a technical perspective, there may be little which prevents the entity itself from accessing and examining personal documents of a user. This may understandably unsettle a user. For instance, the user's documents may contain sensitive information that the user does not wish to divulge to any person, including the entity which administers the remote repository.
A user may address this concern by encrypting the documents and storing the documents in encrypted form at the remote repository. This approach effectively prevents the entity which administers the remote repository (or anyone else) from examining the documents. However, this approach also prevents the user from performing any meaningful operations on the documents that are stored in the remote repository. For instance, the encryption of the documents precludes the user from performing an on-line search of the documents. The user may address this situation by downloading all the documents and decrypting them. But this solution runs counter to the user's initial motivation for storing the documents in the remote repository.
The cryptographic community has developed technology that is referred to herein as Symmetric Searchable Encryption (SSE), which utilizes symmetric key encryption to generate an encrypted index that can be employed in connection with keyword searches. That is, a user can set forth a keyword, and the encrypted index can be analyzed to locate documents that include such keyword, wherein the entity that administers the remote repository that retains the encrypted files and index remains unaware of which files include which keywords. At least some SSE schemes require linear time search, where each indexed document is analyzed to ascertain whether the respective document includes a keyword. This approach may be prohibitively inefficient, particularly for relatively large document collections. Other existing SSE schemes allow for sublinear search; however, these schemes are inefficient with respect to updating an index that is employed when a document collection is searched. This can be problematic for data collections where documents frequently change, such as emails.
SUMMARYThe following is a brief summary of subject matter that is described in greater detail herein. This summary is not intended to be limiting as to the scope of the claims.
Various technologies pertaining to an efficient and dynamic Symmetric Searchable Encryption (SSE) scheme are described herein. This scheme may be particularly advantageous with respect to relative large document (file) collections that are subject to frequent updates. Exemplary file collections include email collections, image collections (where metadata is applied to the images), word processing document collections, amongst other collections of files that include at least some text. The scheme is employed to generate an encrypted index that is retained on a remote repository together with encrypted files. The encrypted index can be searched without the administrator of the remote repository learning which files include which keywords, and also without the administrator of the remote repository learning the identity of the keyword set forth by a searcher.
The encrypted index can be generated by first generating an unencrypted index at the client computing device, wherein the unencrypted index identifies which files include a predefined set of keywords (a dictionary). The unencrypted index is in the form of a tree structure, where leaf nodes of the trees are pointers to the files in the file collection. In an example, the tree can be a red-black tree, where each node has two immediate child nodes. Vectors are assigned to nodes, where a vector assigned to a node includes a series of values that correspond to keywords in the dictionary. A value in the vector indicates whether a file that is pointed to by a leaf node hierarchically at or beneath the node includes a keyword that corresponds to the vector.
This index is then encrypted to generate an encrypted index. The files themselves can be encrypted through utilization of any suitable symmetric encryption scheme, as the file encryption is not dependent upon the scheme employed to encrypt the index. Encryption of the index is undertaken by utilizing a probabilistic algorithm to generate a secret key, which is based upon a security parameter (which may define a length of the secret key in bits). Another probabilistic algorithm can be employed to generate an encrypted index and a sequence of ciphertexts, wherein the probabilistic algorithm outputs the encrypted index and the sequence of ciphertexts as a function of the secret key, the unencrypted index, and the files (in a sequence). The encrypted index and the sequence of ciphertexts can be retained at the remote repository.
When the user wishes to perform a keyword search over files in the remote repository, the user sets forth a keyword, and a (possibly probabilistic) algorithm can output a search token based upon the keyword and the secret key described above. The search token is transmitted to the remote repository, which can locate one or more ciphertexts. The located ciphertexts correspond to files that include the keyword. Located ciphertexts are returned to the client computing device employed by the user, and the ciphertexts are decrypted to generate the files that include the keyword.
The encrypted index can be updated relatively efficiently to reflect changes in the file collection (additions or removals of files). That is, the unencrypted index need not be regenerated from scratch, and the ciphertexts need not be re-transmitted to the remote repository. To update the encrypted index to include a representation of a file that has been added to the file collection, a (possibly probabilistic) algorithm can generate an add token, wherein the add token is output based upon the secret key and the file that is added to the document collection. At the remote repository, the add token, the encrypted index, and the ciphertext is provided to a deterministic algorithm, which outputs an updated encrypted index and an updated sequence of ciphertexts. The ciphertext and encrypted index can be updated in a similar fashion when a file is deleted from the file collections. In any event, the encrypted index and ciphertext is dynamically updated without requiring the unencrypted index to be entirely regenerated and without requiring the sequence of ciphertexts to be regenerated and retransmitted from the client computing device to the remote repository.
Other aspects will be appreciated upon reading and understanding the attached figures and description.
BRIEF DESCRIPTION OF THE DRAWINGSFIG. 1 is a functional block diagram of an exemplary system that facilitates generating an encrypted index that can be employed at a remote repository to search for content in encrypted files.
FIG. 2 is a functional block diagram of an exemplary system that facilitates searching over encrypted files and updating an encrypted index.
FIG. 3 is a functional block diagram of an exemplary system that facilitates building a dictionary of keywords.
FIG. 4 is an exemplary depiction of an index.
FIG. 5 is a flow diagram that illustrates an exemplary methodology for generating an encrypted index that can be employed in connection with searching over encrypted files.
FIG. 6 is a flow diagram that illustrates an exemplary methodology for updating an encrypted index that can be employed in connection with searching over encrypted files.
FIG. 7 is a flow diagram that illustrates an exemplary methodology for updating an encrypted index that can be employed in connection with searching over encrypted files.
FIG. 8 is an exemplary computing system.
DETAILED DESCRIPTIONVarious technologies pertaining to a dynamic efficient Symmetric Searchable Encryption (SSE) scheme will now be described with reference to the drawings, where like reference numerals represent like elements throughout. In addition, several functional block diagrams of exemplary systems are illustrated and described herein for purposes of explanation; however, it is to be understood that functionality that is described as being carried out by certain system components may be performed by multiple components. Similarly, for instance, a component may be configured to perform functionality that is described as being carried out by multiple components. Additionally, as used herein, the term “exemplary” is intended to mean serving as an illustration or example of something, and is not intended to indicate a preference.
As used herein, the terms “component” and “system” are intended to encompass computer-readable data storage that is configured with computer-executable instructions that cause certain functionality to be performed when executed by a processor. The computer-executable instructions may include a routine, a function, or the like. It is also to be understood that a component or system may be localized on a single device or distributed across several devices.
With reference toFIG. 1, anexemplary system100 that facilitates generating and updating an encrypted index that can be employed to locate files that include one or more queried keywords is illustrated. Thesystem100 comprises aclient computing device102, which can be any suitable computing device, including but not limited to a desktop computing device, a laptop computing device, a tablet computing device, a mobile telephone, or any other suitable computing device. Thesystem100 further comprises aremote repository104, which is accessible to theclient computing device102 by way of a suitable network connection, including but not limited to an Internet connection, a cellular telephone connection, an intranet connection, or other suitable connection. A user of the client computing device, for instance, may desirably store a file collection on the remote repository104 (such that storage space is not consumed on the client computing device102). As mentioned above, however, the user of theclient computing device102 may wish to prohibit an administrator of theremote repository104 from accessing the files “in the clear.” Accordingly, a user of theclient computing device102 desirably encrypts files that are to be retained in theremote repository104, but further wishes to have the ability to perform operation on the encrypted files, such as search for files that include specified keywords.
Theclient computing device102 comprises adata store106, wherein thedata store106 may be any suitable computer-readable medium, including but not limited to memory, a removable disk, a flash drive, a hard drive, etc. Thedata store106 comprises afile collection108, wherein files in thefile collection108 are desirably encrypted and retained on theremote repository104. For instance, thefile collection108 can be a collection of emails, images (with metadata assigned thereto that describes content of the images), videos (with metadata assigned thereto), web pages, word processing documents, spreadsheet documents, slideshow presentation documents, or any other suitable type of document that includes text.
Thedata store106 further comprises adictionary110, which includes a plurality of predefined keywords that may be searched for by a user of theclient computing device102. Thedictionary110, for instance, may include an entirety of the English dictionary, a subset of words that are most commonly searched, a subset of words that are probabilistically determined to be most frequently the subject of a search, words that are most common in the English language, etc. The user, when performing a search, may be restricted to retrieving documents that include at least one of the keywords in thedictionary110.
Thedata store106 additionally comprises anindex112 that indicates which files in thefile collection108 include which keywords in thedictionary110. In an exemplary embodiment, theindex112 can be a data structure that is in the form of a tree, wherein the tree comprises a plurality of layers of nodes, and wherein leaf nodes of the tree are pointers to files in thefile collection108. Vectors are assigned to nodes, wherein a number of entries of a vector corresponds to (is equal to) a number of keywords in thedictionary110. An entry at a certain location in a vector that is assigned to a particular node is indicative of whether or not a file in thefile collection108 with a pointer represented by a leaf node beneath the particular node includes a keyword represented by the entry of the vector at the certain location. Pursuant to an example, the tree can be a red-black (binary) tree, wherein each node in the tree has two immediate children. In another exemplary embodiment, the tree can be a tertiary tree. In still yet another exemplary embodiment, the tree can be a “fat” tree, wherein the tree has two levels where the degree of internal nodes is O (√{square root over (n)}). In such an approach, encrypted Bloom filters of distinct keywords can be stored at leaves of the tree. Other tree structures are also contemplated, and are intended to fall under the scope of the hereto-appended claims.
Theclient computing device102 further comprises areceiver component114 that receives thefile collection108, thedictionary110, and theindex112. Anencrypter component116 is in communication with thereceiver component112, and outputs an encrypted index and a sequence of ciphertexts. The encrypted index corresponds to the index112 (the encrypted index will also be a tree structure with an equivalent number of nodes and hierarchy), and the sequence of ciphertexts corresponds to a sequence of files in thefile collection108. Theencrypter component116 generates the encrypted index and the sequence of ciphertexts based at least in part upon a sequence of files in thefile collection108, theindex112, and a secret key. The secret key can be generated at random and based at least in part upon a security parameter, which can define a length of the secret key (e.g., 128 bits, 192 bits, 256 bits, . . . ). Theclient computing device102 then transmits the encrypted index and the sequence of ciphertexts to theremote repository104, and can optionally delete thefile collection108 from thedata store106.
Subsequently, theuser118 may wish to perform a search for files from thefile collection108 that comprise at least one keyword included in thedictionary110, wherein such files are encrypted as ciphertext on theremote repository104. Additionally, theuser118 may wish to add a file to thefile collection108 and/or delete a file from thefile collection108. Accordingly, theuser118 can issue a query that includes the keyword, generate a new file to be included in thefile collection108, or delete an existing file from the file collection.
In a first example, the user may wish to perform a search over files in the file collection108 (retained as ciphertexts on the remote repository104) using a particular keyword. Atoken generator component120 receives the keyword, and additionally receives the secret key generated by the encrypter component116 (which can be based on a password set forth by the user118). Thetoken generator component120 then generates a search token based upon the keyword and the secret key, and transmits the search token to theremote repository104. The search token can include data that indicates to theremote repository104 how to analyze the encrypted index to locate appropriate ciphertexts. Theclient computing device102 further comprises adecrypter component122 that receives at least one ciphertext that is retrieved by theremote repository104 responsive to receipt of the search token. Thedecrypter component122 can decrypt the received ciphertext using the secret key that was employed by theencrypter component116 to generate the ciphertext. This results in output of an unencrypted file (a file in the file collection108) to theuser118, wherein the unencrypted file includes the keyword set forth by theuser118. From the perspective of theuser118, theuser118 has submitted at least one keyword and has been provided with files that include such keyword. Theremote repository104 has performed the search, but is unaware of any content of files that were retrieved during the search and is further unaware of the keyword utilized to perform the search.
In a second example, the user may wish to add a file to thefile collection108. Thetoken generator component120 can be in communication with thereceiver component114, and receives the secret key and the file, as well as theindex112 from thereceiver component114, and outputs an “add token” to theremote repository104 that is generated based at least in part upon the secret key, the file desirably added, and theindex112. The add token can include an update to the encrypted index, which may comprise a subtree that is a portion of the encrypted index. Theremote repository104 can then update the sequence of ciphertexts and the encrypted index based upon the add token, and can allow the added file to be retrieved when theuser118 performs a search using a keyword that is included in the added file without requiring theclient computing device102 to regenerate (and re-encrypt) the entirety of theindex112, without requiring downloading of all of the ciphertext from theremote repository104, etc.
In a third example, the user may wish to delete a file from thefile collection108. For instance, theuser118 can retrieve ciphertext corresponding to the desirably deleted file from theremote repository104, and can decrypt such file using the secret key. Thetoken generator component120 receives the secret key, the file, and theindex112, and outputs a “delete token” to theremote repository104. The delete token comprises an update to the encrypted index, which is a subtree of the tree of the encrypted index. Theremote repository104 can then update the sequence of ciphertexts and the encrypted index based upon the delete token. If theuser118 subsequently performs a search using a keyword in thedictionary110 that was included in the deleted file, such file will not be returned to the user. Accordingly, again, the encrypted index can be updated at theremote repository104 without requiring theclient computing device102 to re-encrypt each of the files in thefile collection108, without requiring theclient computing device102 to entirely re-generate theindex112, and without requiring theclient computing device102 to entirely re-encrypt theindex112.
Now referring toFIG. 2, anexemplary system200 that facilitates efficiently searching over encrypted documents and updating an encrypted index is illustrated. Thesystem200 comprises theclient computing device102 in communication with theremote repository104. Theremote repository104 comprises adata store202 that includes anencrypted index204 and ciphertexts206 (generated by the encrypter component116). Theencrypted index204 is an encryption of theindex112 generated at theclient computing device102.
Theremote repository104 further comprises asearch component208 that receives a search token from theclient computing device102. Thesearch component208 accesses theencrypted index204 responsive to receipt of the search token, and retrieves at least one ciphertext based at least in part upon the search token. As described above, the search token corresponds to a keyword, wherein files that include the keyword are desirably located. The at least one ciphertext retrieved by thesearch component208 is an encryption of a file that includes the keyword, and such ciphertext is transmitted to theclient computing device102 by way of a suitable network connection
Theremote repository104 further comprises anadd component210 that receives an add token from theclient computing device102, and updates theencrypted index204 and theciphertexts206 responsive to receipt of the add token. Updating of theencrypted index204 and theciphertexts206 responsive to receipt of the add token allows for a ciphertext that is an encryption of a particular file to be retrieved by thesearch component208 responsive to thesearch component208 receiving a search token that corresponds to a keyword in thedictionary110 from theclient computing device102. As described above, the add token is generated at the client computing device responsive to theuser118 adding a file to thefile collection108.
Theremote repository104 also comprises adelete component212 that receives a delete token from theclient computing device102 and updates theencrypted index204 and theciphertexts206 responsive to receipt of the delete token. Updating of theencrypted index204 and theciphertexts206 responsive to receipt of the delete token allows for a search performed by thesearch component208 to fail to retrieve the ciphertext corresponding to the deleted file even if the deleted file includes a keyword corresponding to a search token. It can thus be ascertained that theencrypted index204 and theciphertexts206 can be dynamically updated at theremote repository104 without requiring theclient computing device102 to regenerate theindex112 and/or re-encrypt theindex112.
Referring now toFIG. 3, anexemplary system300 that facilitates building thedictionary110 and theindex112 is illustrated. Thesystem300 resides on theclient computing device102 and comprises thedata store106, which includes thefile collection108. Adictionary builder component302 accesses thefile collection108 and outputs thedictionary110 based at least in part upon content of thefile collection108. For instance, thedictionary builder component302 can select some threshold number of most frequently used words in the file collection108 (excluding common words such as “the”, “a”, “an”, “and”, etc.), and such keywords can be collectively stored as thedictionary110. In another exemplary embodiment, thedictionary builder component302 can build thedictionary110 based upon most frequently queried keywords across multiple users. In yet another exemplary embodiment, thedictionary builder component302 can build the dictionary based upon content of search engine query logs. In still yet another exemplary embodiment, thedictionary110 can be manually constructed by a user or group of users.
Thesystem300 further comprises anindex builder component304 that builds the index112 (δ). Theindex builder component304 receives the file collection108 (a sequence of n files f=(f1, . . . fn)), wherein an order is assumed on such files. The order can be imposed by an order of unique identifiers assigned to the files: id(f1), . . . , id(fn). Thedictionary110 can be built, and can be implemented with a red-black tree Ton top of the identifiers. The leaves of the tree are pointers to the respective files, which may be stored on disk. Theindex builder component304, at each internal node u of the tree, can store an m-bit vector datau, where m is a number of keywords in thedictionary110. The ith bit of datauaccounts for keyword wi, for i=1, . . . m. Specifically, if datau[i]=1, then there is at least one path from u to some leaf storing id(f), such that f includes the keyword wi. Theindex builder component304 can guarantee the aforementioned property by computing datauas follows: for every leaf l storing id(f), datal[i]=1 if an only if the file f includes the keyword wi. For an internal node u of the tree T, (a node other than a leaf), with left child v and right child z., theindex builder component304 can compute datauof the as follows:
datau=datav+dataz,   (1)
where + denotes the bitwise Boolean OR operation. It can be ascertained that theindex112 generated by theindex builder component304 can allow for both keyword-based operations (by following paths from the root to the leaves) and file-based operations (by following paths from the leaves to the root).
Further, the space complexity of theindex112 can be O (mn), and constructing theindex112 can take time O (mn). The search time for a keyword w using theindex112 can be O (|fw| log n), where |fw| is the number of files containing w. The time to update theindex112 can be O (m log n).
Turning now toFIG. 4, anexemplary index400 that can be built by theindex builder component404 is illustrated. In this example, thedictionary110 comprises five keywords; it is to be understood, however, that thedictionary110 is not limited to a particular number of keywords. Additionally, theindex400 is shown as being a red-black tree; however, as will be understood by one skilled in the art, the tree can be any suitable computer-implemented tree structure.
The tree comprises eight leaf nodes402-416, which represent pointers to files in thefile collection108. Therefore, thefile collection108, in this example, comprises eight different files. Each of the leaf nodes402-416 is assigned a vector418-432, and each vector includes five entries that correspond to the five keywords in thedictionary110. Thus, in an example, thevector418 assigned to the leaf node includes a first entry that corresponds to a first keyword in thedictionary110, a second entry that corresponds to a second keyword in thedictionary110, and so forth. Values of the entries of thevector418 indicate whether the file pointed to by the file pointer represented by theleaf node402 includes the corresponding keyword from thedictionary110. In an example, if thedictionary110 includes the keywords “aardvark”, “bear”, “cat”, “dove”, and “elephant”, then values of entries of thevector418 indicate whether or not the file pointed to by theleaf node402 includes such keywords. In the example shown inFIG. 4, the file pointed to by theleaf node402 includes the keyword “elephant”, but does not include the other keywords in the dictionary. Similarly, the file pointed to by theleaf node402 includes the keywords “dove” and “elephant”, but does not include “aardvark”, “bear”, or “cat” from thedictionary110.
The tree further comprises a plurality of internal nodes434-440 and a plurality of vectors442-448 respectively assigned thereto. Each of the internal nodes is a parent to two leaf nodes. Thus, for instance, thenode434 is a parent node toleaf nodes402 and404. Each of the vectors442-448 has a plurality of entries that correspond to the keywords in thedictionary110. Values of entries in a vector in the plurality of vectors442-448 are determined by computing an OR over corresponding entries in the vectors assigned to the leaf nodes402-418. For instance, theleaf nodes418 and420 are children of theintermediate node434. Thevector442 has values for entries that indicate that at least one of theleaf nodes418 or420 points to a file comprises the keywords “dove” and/or “elephant”, for example. In another example, the values of thevector448 indicate that at least one of theleaf nodes414 or416 points to a file that includes the keywords “bear” and/or “dove”.
The tree can further comprise another set of internal nodes450-452, wherein the nodes450-452 are assigned respective vectors454-456. As before, entries of the vectors454-456 correspond to keywords in thedictionary110, and values of such entries indicate whether any leaf nodes beneath such intermediate nodes450-452 point to files that include keywords represented by the vectors. Therefore, values of the vectors454-456 are determined by computing an OR over values of the respective vectors442-448. Accordingly, values of entries in thevector454 indicate that at least one of the leaf nodes418-424 points to a file that includes the keyword “cat”, at least one of the leaf nodes418-424 points to a file that includes the keyword “dove”, and at least one of the leaf nodes418-424 points to a file that includes the keyword “elephant”. Similarly, values of thevector456 indicate that at least one of the leaf nodes426-432 points to a file that includes the keyword “bear”, at least one of the leaf nodes426-432 points to a file that includes the keyword “dove”, and at least one of the leaf nodes426-432 points to a file that includes the keyword “elephant”.
The tree further comprises aroot node458 that has avector460 corresponding thereto, wherein entries of the vector correspond to keywords in thedictionary110. Values of thevector460 of theroot node458 indicate whether any of the leaf nodes402-416 point to files that include respective keywords of thedictionary110. The entries of the vector have respective values of 0, 1, 1, 1, and 1. Accordingly, none of leaf nodes402-416 point to a file that includes the keyword “aardvark”, while at least one of the leaf nodes402-416 points to a file that includes the keyword “bear”, at least one of the leaf nodes402-416 points to a leaf node that points to a file that includes the keyword “cat”, at least one of the leaf nodes402-416 points to a leaf node that points to a file that includes the keyword “dove”, and at least one of the leaf nodes points to a file that includes the keyword “elephant”. Therefore, it can be understood that each leaf node need not be analyzed to return a set of files that includes the keyword “aardvark”; rather, if the user employed the tree to locate files that include the keyword “aardvark”, review of thevector460 assigned to theroot node458 would indicate that none of the files in thefile collection108 include such keyword. Similarly, if the user issued a query to locate all files that include the keyword “cat”,nodes434,436, and402-408 need not be analyzed, as the entry corresponding to the keyword “cat” in thevector454 assigned to thenode450 indicates that none of the leaf nodes402-408 points to a file that includes such keyword. As mentioned previously, while the tree is shown as having a binary structure, any suitable tree structure can be employed to represent files in thefile collection108.
Returning again toFIGS. 1 and 2, additional detail pertaining to theencrypter component116, thetoken generator component120, thedecrypter component122, thesearch component208, theadd component210, and thedelete component212 will now be provided. As mentioned, thesystems100 and200 facilitate searchable encryption, which allows theclient computing device102 to encrypt data in such a way that it can later generate search tokens for theremote repository104. Further, thesystems100 and200 facilitate dynamic updating of an encrypted index, thereby allowing files to be added or removed from thefile collection108. A dynamic SSE scheme is distributed across theclient computing device102 and theremote repository104.
For purposes of explanation, a private-key encryption scheme is a set of three polynomial-time algorithms ϵ=(Gen; Enc; Dec), such that Gen(1k; r) is a probabilistic algorithm that takes a security parameter k and randomness r and returns a secret key K; Enc(K; p) is a probabilistic algorithm takes a key K and a message p and returns a ciphertext c; and Dec(K; c) is a deterministic algorithm that takes a key K and a ciphertext c and returns p if K was the key under which c was produced.
For purposes of explanation, a private key encryption scheme is a set of the following cryptographic tools: 1) a pseudo-random function (PSR) P: {0,1}k×[p]→{0,1}k; 2) a random oracle H: {0,1}*×[p]→{0,1} to which an adversary (the remote repository104) may have access; and3) a pseudo-random function G:{0,1}k×[p]→{0,1}k.
Theencrypter component116 comprises a probabilistic algorithm Gen that receives a security parameter k as input and outputs a secret key K. This can be represented as follows: K←Gen(1k). Pursuant to an example, the algorithm Gen can be configured to generate four random k-bit strings K1, K2, K3, and r. Theencrypter component116 can instantiate one private-key semantically secure encryption scheme for encrypting files by calling K4←ϵ.Gen(1k; r). Theencrypter component116 may then set K :=(K1; K2; K3; K4).
Theencrypter component116 can further comprise a probabilistic algorithm Enc that receives K, the index112 (δ), and the file collection108 (f). Such algorithm Enc outputs the encrypted index204 (γ) and the ciphertexts206 (c) for retention at theremote repository104. This can be represented as follows: (γ; c)←Enc(K; δ; f). Such an algorithm can act as follows: a private-key can be generated per keyword wiby calling ϵ.Gen(1k; GK2(wi)) for i=1, . . . , m. This generates a secret key per keyword, e.g., the keys ϵ.SKifor i=1, . . . , m, where SK is a symmetric key. Thereafter, for 1≤j≤n, cj←ϵ.Enc(K4, fj) is employed to output a vector of ciphertexts c. Theindex builder component304 builds the index δ as described above, and theencrypter component116 can delete the files f while causing the ciphertexts c to be retained in the data store106 (the identifiers for the files can remain unchanged). For every node v on the tree T that has identifier id(v), the following can be undertaken by the encrypter component116: 1) two look-up tables of m random entries can be instantiated, λ0vand λ1v. A domain of the lookup tables can be {0,1}k, whereas the range of the look-up tables is [m]. Theencrypter component116 can then cause λ0vand λ1vto be stored at node v; 2) for every i=1, . . . , m, theencrypter component116 can set λbiv[PK1(wi)]←ϵi.Enc(ϵi.SK, datav[i]), where biis computed as the output of a random oracle, e.g., bi=H (PK3(wi),id(v)), and where datavis the vector output by theindex builder component304; 3) for every i=1, . . . , m, theencrypter component116 can set λblv[PK1(wi)]←ϵi.Enc(ϵi.SK,datav[l]), wherebl is the complement of bianddatav[l] is the complement of datav[i]; 4) vector datavis deleted; and 5) γ:=T and c :=(c1, . . . , cn) can be output by theencrypter component116 and transmitted to theremote repository104.
Subsequently, theuser118 may wish to perform a search for files encrypted at theremote repository104. Thetoken generator component120 can include a (possibly probabilistic) algorithm that takes as input the secret key K and a keyword w (that resides in the dictionary110) and outputs a search token τs. This algorithm can be represented as follows: τs←SrchToken(K, w). The algorithm can call ϵiGen (1k; GK2(w)), and can output τs:=(PK1(wi),PK3(wi), ϵi.SK).
Thesearch component208 comprises a deterministic algorithm that receives the search token96s, the ciphertexts c, and the encrypted index γ, and outputs a sequence of ciphertexts cw(ciphertexts that correspond to files that include the keyword w). Such an algorithm can be represented as follows: cw:=Search(γ, c, τs). The algorithm can execute by first parsing τsas (τ1, τ2, τ3). An algorithm referred to herein as search(root(T)) can be called, wherein T is the red-black tree included in the encrypted index γ. v and z can be left and right children of a node u, respectively. The algorithm search(u) can be recursively defined as follows: 1) a bit b=H(τ2, id(u)) can be output, and a=ϵi. Dec(τ3, λbu[τ1]) can be computed; 2) if a=0, a null is returned; 3) if u is a leaf, then cw:=cw∪cu, where cvis the ciphertext corresponding to id(u). Otherwise, search(v) and search (z) can be called.
Thedecrypter component122 comprises a deterministic algorithm Dec(K, c) that takes as input the secret key K and a ciphertext c (output by the search component208) and outputs an unencrypted file f that includes the keyword w. This can be expressed formally as follows: fi:=ϵ. Dec(K4, ci).
As mentioned above, thetoken generator component120 can generate an add token responsive to an indication that a file f has been added to thefile collection108. Theclient computing device102 transmits a message to theremote repository104 to indicate that an add token is desirably generated. Theremote repository104 can respond to the message with a receipt r, which can be a subtree (described below). Thetoken generator component120 can comprise a (possibly probabilistic) algorithm τa←AddToken(K, f, r), which receives the secret key K, the file f that is added, and the receipt r as input and outputs an add token τa. This algorithm can operate as follows: an encryption of the added file can be computed: C←ϵ.Enc(K4, f). The receipt r can be a certain subtree T* that is related to the portion of a red-black tree T that is accessed during an (structural) update. If T′ is the updated version of T, and T* is a subtree of T′, then T* can be defined as follows: the subtree T* comprises all nodes v of T′ that have at least one descendent (e.g., at least one node in their subtree) that has changed its structure after the update (e.g., by obtaining a new node as a new child or whole substree as a new child). For every node v of the subtree T* that has an identifier that indicates it belongs to subtree T*, the algorithm AddToken can perform the following actions: 1) instantiate two vectors of m random entries, λ0vand λ1v, which are stored at node v; 2) for every i=1, . . . , m, encrypted vectors kv[i] are updated by setting λbv[PK1(wi)]←ϵi. Enc(ϵi.SK,kv[i]), where b is a bit computed as the output of a random oracle, e.g., b=H (PK3(wi), id(v)) and kvis the updated vector due to the addition of file f; 3) output τa:=({(id(v),λ0v,λ1v):v ∈T′(q)}, c, id(f))=(τ1, τ2, τ3).
Theadd component210 on theremote repository104 receives the add token τa. Theadd component210 can include the deterministic algorithm (γ′, c′) :=Add (γ, c, τa) which takes as input the encrypted index γ, the sequence of ciphertexts c, and the add token τagenerated by thetoken generator component120, and outputs a new encrypted index γ′ and a new sequence of ciphertexts c′. This algorithm can parse τaas (τ1, τ2, τ3). Using τ3, a structural update can be undertaken on the tree T (on the encrypted index γ), and a new tree T′ is output. Using τ1, updated encryption information is copied on the new tree T′. γ′ is then output to be the new encrypted index containing T′, and c′ to be the new set of ciphertexts that include c.
Thetoken generator component120 can also be configured to generate a delete token by way of τd←DelToken(K, f), which is a (possibly probabilistic) algorithm that takes as input the secret key K and a file f and outputs a delete token τd. This algorithm can operate the same as AddToken, with the difference that in this the delete token can be defined as τd:=({(id(v), λ0v, λ1v): v∈T′(q)}, id(f))=(τ1, τ2). This delete token can be transmitted to theremote repository104.
Thedelete component212 on theremote repository104 receives the delete token τd. Theadd component210 can include the deterministic algorithm (γ′, c′) :=Del(γ, c, τd), which takes as input the encrypted index γ, the sequence of ciphertexts c, and the delete token γdgenerated by thetoken generator component120, and outputs a new encrypted index γ′ and a new sequence of ciphertexts c′. This algorithm can parse τdas (τ1, τ2). Using τ2, a structural update can be undertaken on the tree T, and a new tree T′ is output. Using τ1, updated encryption information is copied on the new tree T′. γ′ is then output to be the new encrypted index containing T′, and c′ to be the new set of ciphertexts that comprises c−{c}, where c is the ciphertext of a file f with id(f)=τ2.
With reference now toFIGS. 5-7, various exemplary methodologies are illustrated and described. While the methodologies are described as being a series of acts that are performed in a sequence, it is to be understood that the methodologies are not limited by the order of the sequence. For instance, some acts may occur in a different order than what is described herein. In addition, an act may occur concurrently with another act. Furthermore, in some instances, not all acts may be required to implement a methodology described herein.
Moreover, the acts described herein may be computer-executable instructions that can be implemented by one or more processors and/or stored on a computer-readable medium or media. The computer-executable instructions may include a routine, a sub-routine, programs, a thread of execution, and/or the like. Still further, results of acts of the methodologies may be stored in a computer-readable medium, displayed on a display device, and/or the like. The computer-readable medium may be any suitable computer-readable storage device, such as memory, hard drive, CD, DVD, flash drive, or the like. As used herein, the term “computer-readable medium” is not intended to encompass a propagated signal.
Referring now toFIG. 5, anexemplary methodology500 that facilitates generating an index that indicates which keywords are included in files in a file collection is illustrated. Themethodology500 starts at502, and at504 an index is received that identifies, for each file in a file collection, which keywords in a dictionary are included in a respective file. For instance, the index can be a red-black tree, and can have a structure that has been described in detail above.
At506, the index is encrypted to generate an encrypted index. The unencrypted index may have, for instance, a structure that is similar to the tree structure shown inFIG. 4, such that each node in the tree is assigned a vector with numerous entries that correspond to keywords in a dictionary. The index can be encrypted by first computing a compliment for each of the vectors. Thereafter, for each corresponding entry in the vectors, a random oracle can be employed to selectively ascertain which vector includes a “correct” entry by randomly or pseudorandomly outputting a “1” or “0”. Additionally, a start position of the vectors can be randomly selected, such that for a first node, the first entry in a vector corresponds to an Nth keyword in the dictionary, while for a second node, the first entry in a vector corresponds to an Mth keyword in the dictionary. Accordingly, an adversary reviewing vectors assigned to any given node in the encrypted index would be unable to ascertain which value between two vectors is correct, and further would be unable to ascertain which keyword corresponded to a given entry of the two vectors. At508, the encrypted index is transmitted to a remote repository, wherein the encrypted index points to encrypted files on such remote repository. Themethodology500 completes at510.
Now referring toFIG. 6, anexemplary methodology600 for transmitting one of an add token or a delete token to a remote repository to update an encrypted index is illustrated. Themethodology600 starts at602, and at604 an indication is received that a file is desirably added or removed from a file collection. For instance, a user can receive an email that is desirably included in an email collection or delete an email from an email collection. At606, one of an add token or a delete token is generated responsive to receive of the aforementioned indication. The add token can include an encrypted file c that is desirably added to a file collection, a plurality of vectors that can be indicative of which keywords in a dictionary are included in the file (and not included in the file). The delete token can include data that identifies a position in an encrypted index that points to an encrypted file c that corresponds to a file that is desirably deleted from a file collection. At608, the one of the add token or the delete token is transmitted to a remote repository that is tasked with retaining encrypted files of the user and an encrypted index corresponding to such files. Themethodology600 completes at610.
With reference now toFIG. 7, anexemplary methodology700 that facilitates dynamically updating an encrypted index that can be utilized in connection with searching over encrypted files is illustrated. Themethodology700 starts at702, and at704 one of an add token or a delete token is received from a client computing device. At704, an encrypted index and a collection of encrypted files are dynamically updated at a remote repository based at least in part upon the add token or the delete token. As discussed above, the add token can include an encrypted file and vectors that are indicative of which keywords are (or are not) included in the file. The add token can also include a subtree of an index, wherein nodes of the subtree are assigned vectors with values for entries that are based upon values for the vectors corresponding to the encrypted file. The encrypted index can have a corresponding tree structure, and a subtree on the encrypted tree can be dynamically updated. The delete token can include a subtree with vectors assigned to nodes therein, wherein entries in the vectors are updated to reflect the deletion of the file from the file collection. The corresponding subtree in the encrypted tree can be updated based upon the delete token. Themethodology700 completes at708.
Now referring toFIG. 8, a high-level illustration of anexemplary computing device800 that can be used in accordance with the systems and methodologies disclosed herein is illustrated. For instance, thecomputing device800 may be used in a system that supports dynamically updating an encrypted index to facilitate searching over encrypted files. Thecomputing device800 includes at least oneprocessor802 that executes instructions that are stored in amemory804. Thememory804 may be or include RAM, ROM, EEPROM, Flash memory, or other suitable memory. The instructions may be, for instance, instructions for implementing functionality described as being carried out by one or more components discussed above or instructions for implementing one or more of the methods described above. Theprocessor802 may access thememory804 by way of asystem bus806. In addition to storing executable instructions, thememory804 may also store a file collection, a secret key, an index, encrypted files, an encrypted index, or the like.
Thecomputing device800 additionally includes adata store808 that is accessible by theprocessor802 by way of thesystem bus806. The data store may be or include any suitable computer-readable storage, including a hard disk, memory, etc. Thedata store808 may include executable instructions, a file collection, an index, encrypted files, an encrypted index, etc. Thecomputing device800 also includes aninput interface810 that allows external devices to communicate with thecomputing device800. For instance, theinput interface810 may be used to receive instructions from an external computer device, a user, etc. Thecomputing device800 also includes anoutput interface812 that interfaces thecomputing device800 with one or more external devices. For example, thecomputing device800 may display text, images, etc. by way of theoutput interface812.
Additionally, while illustrated as a single system, it is to be understood that thecomputing device800 may be a distributed system. Thus, for instance, several devices may be in communication by way of a network connection and may collectively perform tasks described as being performed by thecomputing device800.
It is noted that several examples have been provided for purposes of explanation. These examples are not to be construed as limiting the hereto-appended claims. Additionally, it may be recognized that the examples provided herein may be permutated while still falling under the scope of the claims.