Unformatted text preview:

Database SecurityPrivacy and statisticsRecords privacy• Database access control is an important aspect of practical security• One of the crucial assets of many organizations is the data recorded inits databases• Protection of data records is a complex issue, often involving differentsecurity levels assigned to parts of records (e.g., database table rows).Role-based access control is also more common in this setting thanwith regard to regular file systems• In addition to the many important theoretical and practical aspects ofdatabase access control, databases often have another privacydimension: Statistical access to data can reveal information aboutindividual records. We explore this issue in this class.Statistical databases• Valuable data records have been compiled to support statistics-based research– Medical databases: genetic component of diseases, effectivenessof drugs and treatments, health policy issues– Scientific databases– Marketing research databases• Legal privacy protections or intellectual property interestsindicate that in many cases it is important to allow for queries toreveal generic patterns on the data (statistical information) whileprotecting information contained in individual records. Weconsider here the inherent risks to disclosure of individual recordinformation to an attacker that access the database throughstatistical queries.Statistical databases(simplified model)• n records, each belonging to an individual• Confidential category and data fields• Category fields are used to identify and selectrecords• Data fields contain other information• Identifier fields or indexes, not used by statisticalqueries• Attacks against static databases (not updated duringthe time of the attack) or against dynamic onesStatistical queries• Assumption: Only perfect match is considered (norelevance-type queries) and the expressions arecombinations of operators AND, OR, NOT.• The collection of records that satisfy a query C iscalled the query set XC• Consider those statistical functions of the query setthan can be expressed as:– q(C; j, m) = ∑ vijm, where i is in XC– vij is the value in data field j of record i, and m is an integerSpecial cases• m = 0:– q(C; j, 0) = ∑ vij0 = ∑ 1, for i in XC = |XC|• m = 1:– q(C; j, 1) = ∑ vij1 , for i in XC = SUM(C; j)• m = 2:– q(C; j, 2) = ∑ vij2 , for i in XC• Statistical significance:– Mean(C;j) = q(C;j,1)/q(C;j,0) = µ– Variance(C;j) =[1/q(C;j,0)] * [ q(C;j,2) - 2µq(C;j,1) + µ2 ]Examples (Denning et al.)03KStuCSMLord1210025KProfMATHFKnapp111520KAdmSTATMJones10103KStuCSFIrons950018KProfMATHMHayes82010KAdmCSMGrady715022KProfSTATFFlynn6018KProfSTATMEngel55015KProfCSFDodd420025KProfMATHFCook310015KProfMATHMBaker25020KProfCSMAdams1Pol. contrSalaryPositionDeptSexIdentifierNo.CategoriesDataPrivacy compromising queries• Simplified notation:– AND: • OR: + NOT x: ¬x• The following two queries reveal thesalary of Prof. Dodd.– Count(F • CS • Prof) = 1– Count(F • CS • Prof • 15KSal) = 1Complementary approach• By querying on the complement, it is possible toviolate the privacy of users while generating querieswith corresponding large query sets– Count(Prof + ¬Prof) = 12 // Total database– Count(¬ (F • CS • Prof)) = 11 //– From this, it can be derived thatCount(F • CS • Prof) = 12 - 11 = 1• After this, the following queries reveal Prof. Dodd’ssalary:– SUM(Prof + ¬Prof; Sal) = 194K– SUM (¬ (F • CS • Prof)) = 179KFirst observations• Here we discuss considerations about privacy ofindividual data records:– Statistical queries that reveal statistical information aboutsmall query sets may leak information about specific records– Statistical queries that reveal statistical information aboutquery sets with small complements may leak informationabout specific records• A necessary restriction for protection of record-specific data is to only answer queries that satisfy:– k ≤ Count( C) ≤ n-k, where n = total # of records, and k is aparameter (k-privacy)• However, is the above restriction sufficient?Privacy threats• Since k and n - k play similar roles, it issufficient to study the dependency of privacyon k for k in the range [2, n/2]• Intuitively, the close k is to n/2, the moreprivacy is afforded by the system, at the costof denying responses to more queriesIndividual tracker• Schlörer showed the following:– Assume that an attacker to privacy knows that asingle record in the database satisfies a query C.– To check if this record has characteristic b, it issufficient to test if Count(C•b) is 0 or 1.– However, such small query sets are assumeddisallowed by the k-privacy requirement (k > 1).– How could an attacker overcome this?Using intersections• Assume that the query formula C withsingleton query set XC = { I } can bedecomposed as C = A • B, where both A•(¬B) and A are answerable:– k ≤ |XA| ≤ n - k and k ≤ |XA·(¬B) | ≤ n - k– XC = XA ∩ XB• Formula T = A•(¬B) is called a tracker for Ibecause it can be used to find othercharacteristics of I.Compromise1. C = A • B identifies I2. T = A•(¬B) is the tracker for I3. To test for any characteristic b, first note thata. XT ⊆ XT + A·b = X A·(¬B) + A· b ⊆ XAb. Since both T and A are answerable, so is T + A• b4. Note thata. XC·b = XA·B·b = (XA.b - XA·(¬B) ) = XA.b +A·(¬B) - XA·(¬B) = X T + A· b - XT5. Finally, note thata. Count(C• b) = Count(T+A• b) - Count(T)Example• Suppose we wish to find out Dodd’s salary when thedatabase implements 2-privacy, so no query sets ofcardinality 1 or fewer are allowed.• C = F • CS • Prof = F • (CS • Prof)• T = F • ¬(CS • Prof)• Suppose b = 25KSal is the characteristic to test for,then:– Count(F • CS • Prof • 25KSal) =Count(F•¬(CS • Prof) + F • 25KSal)- Count(F • ¬(CS • Prof)) = 4 - 4 = 0– So, the salary of Dodd is NOT 25KGeneral trackers• Denning, Denning, and Schwartz havegeneralized the trackers in several ways– Single tracker suffices for the wholedatabase– No external knowledge of characteristicsthat identify specific records is requiredGeneral idea1. Assume that k ≤ n/42. Attacker chooses a formula T such thata. 2k ≤ |XT| = Count(T) ≤ n - 2kb. Let Q = Count(T) + Count(¬T)3. If C is unanswerable query with |XC| < k:a. Count(C ) = Count(C+T) + Count(C +¬T) - Q4. If C is unanswerable query with |XC| > n -


View Full Document

FSU CIS 5930 - Database Security

Download Database Security
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Database Security and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Database Security 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?