Unformatted text preview:

Encrypted Keyword Search in a Distributed Storage System Shay Artzi Adam Kiez un Calvin Newport David Schultz artzi akiezun cnewport das csail mit edu Abstract tion to keep the servers oblivious to the content of the database while allowing clients who have the decryption key to generate trapdoors which the service can use to perform keyword searches of the database on the clients behalf Our system provides strong privacy guarantees Short of breaking the encryption the service has no way to learn anything about the content of keyword queries or of the documents being stored except possibly the lengths of the documents the list of documents in each search result and the limited statistics used to rank results The traditional technique for performing searches efficiently is to maintain an index that maps keywords to document identifiers However as we discuss in Section 2 3 direct application of this approach is insecure Our solution instead relies upon per document metadata that servers use in conjunction with trapdoors to determine which documents match the keyword that generated the trapdoor This technique introduces two main difficulties First the cost of performing a search is linear in the total number of documents whereas inverted indices have performance that is linear in the number of search results Second the need for privacy complicates the implementation of search result ranking mechanisms Without access to the contents of the documents it is unclear how to decide which results best match a given query Our main contribution in this work is a system that addresses these two problems Our architecture makes searches fast by taking advantage of multiple servers and caching We also introduce several novel relevance schemes that rely on carefully selected extra values stored in our encrypted metadata to allow for accurate relevance decisions to be made by the untrusted server We show that the system is usable at a scale of 100 000 documents both in terms of efficiency and accuracy Encrypted keyword search allows a server to perform a search over a set of encrypted documents on behalf of a client without learning the contents of the documents or the words being searched for Designing a practical system is challenging because the privacy constraint thwarts standard indexing and ranking techniques We present Mafdet an encrypted keyword search system we have implemented Our system makes the search practical even for large data sets We evaluated Mafdet s performance on a set of queries and a large collection of documents In these queries Mafdet s accuracy is within 6 of Google Desktop and the search time is on the order of seconds for document sets as large as 2 6 GB 1 Introduction As people and organizations increasingly rely on distributed services to store large volumes of data reliably and make it globally available the problem of searching these data becomes more important For instance consider a laptop user who encrypts his files and stores them on such a service to avoid data theft or loss or a small organization that uses distributed storage for archival Both of these clients may need to perform searches over the documents they have stored however this is a challenging problem because the storage service does not and should not know the encryption key and the clients cannot reasonably download all of the potentially relevant documents over the WAN Several prior schemes e g 5 10 18 address the encrypted search problem on a small scale usually involving a PDA as the client and a personal computer as the storage service We propose a widely distributed storage service that uses encryp1 2 Encrypted Keyword Search sets should take on the order of seconds not minutes or hours In this section we describe the properties required of a secure search service and the threat model that we use We then exhibit a proposal that is simpler than our design and show why it fails to achieve these properties In Section 3 and following we describe our architecture and show how it achieves our goals Section 4 describes the prototype of the system that we have implemented High server throughput The system should be capable of processing many queries concurrently without significantly increasing the number of search servers Support for boolean queries The query language must be expressive enough to allow users to issue precise queries As with most search engines we dismiss regular expression searches and focus on simpler boolean queries In practice we focus on the and operator or and not are relatively trivial 2 1 Properties Since seemingly small information leaks can often be exploited by a malicious adversary to reveal significant information we view privacy as a constraint and build on established definitions of security Subject to these requirements we have built a system that can scale well to large data sets and provide accurate search results Below we give an informal description of the desired properties deferring a principled discussion of security to 10 18 2 1 1 Competitive search accuracy Information retrieval systems employ a wide variety of heuristics based on query keyword proximity and frequency within documents and overall keyword frequency and other metrics to identify the most relevant matches For a large data set it is crucial that our system be capable of supporting enough of these heuristics to produce accurate search results Privacy Constraints Controlled searching The server cannot learn anything about the contents of documents ex 2 2 Threat Model cept when the client performs a search We model the adversary as an eavesdropper who knows everything that the storage service knows Hidden queries The client can search for a However the adversary is passive she cannot comset of documents containing a keyword without promise the integrity of the results produced by the revealing the keyword to the server service This assumption may seem limiting but it Query isolation From the query result the fits well with real world systems such as BFT 4 server learns nothing about the plain text other which preserves integrity but not secrecy in the face than the set of documents that match the query of active attacks involving less than a threshold num and possibly the limited statistical information ber of replicas We assume that the adversary has no knowledge used to perform ranking of the contents of the documents stored by the client Update isolation The server learns nothing Ideally one


View Full Document

MIT 6 824 - Encrypted Keyword Search in a Distributed Storage System

Documents in this Course
Logging

Logging

4 pages

Load more
Loading Unlocking...
Login

Join to view Encrypted Keyword Search in a Distributed Storage System and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Encrypted Keyword Search in a Distributed Storage System and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?