Equality (
10) allows us to recover the outcome of each VARIANCE query from the mean value of the squares of the values of the confidential attribute. Therefore, in order to store an MVQ query in computer memory, it is enough to keep a record of the coefficients that occur in the MEAN query, the outcome of the MEAN query, and the mean value of the squares of the values of the confidential attribute. Therefore, to store a set of MVQ queries, we can use the following pair of matrix equalities,
where
M is the matrix storing the coefficients of the MEAN queries,
is the column of the confidential values
,
,
is the column of the squares of the confidential values,
V is the column vector with the values returned by the MEAN queries, and where
W is the column vector of the mean values of the squares of the confidential values. In concise matrix notation, the Equation (
11) can be stored as the following matrix:
The following example illustrates our matrix notation.
Next, we define the first type of a nonlinear inference attack, the QEA attack, which can be used by an adversary to compromise MVQ queries. Steps of the QEA attack are explained in Algorithm 1.
To protect sensitive data from QEA attacks, we design a quadratic audit system (QAS). It is described in Algorithm 2 by using the following matrix notation.
A formal proof establishing that the QAS system guarantees protection of sensitive data from QEA attacks is given in Theorem 3. It relies on Theorem 2, which gives matrix conditions necessary and sufficient for QEA attack to reveal confidential data.
Proof of Theorem 2. As in the proof of the main theorem of [
14] and in other previous publications, it has been customary to assume that the attackers can gain knowledge of the COUNT query corresponding to each their query. It is important to ensure rigorous protection of privacy under this assumption, in view of the following three easy ways enabling the attackers to gain access to the outcomes of the COUNT queries.
(a) The COUNT query is a legitimate query. It can be submitted to the database and may be answered as a separate query.
(b) The COUNT query can be included as an integral part of every SUM query or linear query.
(c) It may be easy for the attackers to gain access to the values of some COUNT queries by using additional information, legal knowledge, or insider knowledge.
Theorem 1 and its proof also assume that the audit system must provide protection against database compromise even if the attackers can gain access to the COUNT queries. Without this assumption, Theorem 1 is invalid. Indeed, even if the attackers can manage to obtain an outcome of the query corresponding to the value of a confidential attribute in just one record, they will be unable to notice that they have achieved this, since without the knowledge of a COUNT query they won’t know whether the outcome corresponds to just one record or many records. This is why it is a common practice to assume that the attackers can also gain access to the outcomes of the corresponding COUNT queries, and that audit system must provide protection in these circumstances.
(i)⇒(ii): Suppose that condition (i) holds, i.e., the QEA could be used to achieve a compromise ofD. Let us refer to the definition of the QEA attack in Algorithm 1.
First, we consider the case where the attackers managed to achieve a compromise in Step 1 of the Quadratic Equation Attack. In this case, Step 1 results in a compromise achieved by using only the set of MEAN queries. Every classical compromise is an example of a 2-compromise required for condition (ii). Therefore in this case condition (ii) follows immediately.
Now, we assume that the attackers had to proceed to the remaining steps of the QEA. This means that they found an element
and a subset
with properties (A1) and (A2). Let us take the equality
, which is the first equality of the system (
22). It implies that
. Therefore, the attackers have managed to derive the value
of the statistic
, which depends on at most two variables. This means that the attackers have achieved a 2-compromise by using only the set of MEAN queries, and so condition (ii) holds again.
(ii)⇒(iii): Suppose that condition (ii) holds, i.e., the attackers have managed to achieve a 2-compromize of
D by using only MEAN queries. This means that they derived the value
of a statistic
, for some
, where
. Denote the rows of the matrix
M by
. For
, let us denote by
the linear combination of the variables
corresponding to the
i-th row of the matrix
M. This means that
where
. Then, as in (
35) above, again it follows that there exist
such that
First, we consider the case where. Then the value provides a 1-compromise. Hence, Theorem 1 implies that the normalized basis matrix of the audit system has a row with only one nonzero entry. Therefore condition (iii) is satisfied.
Second, if, then it follows in the same way that condition (iii) holds true, as well.
Third, it remains to treat the case where
. Note that
as in (
7). Let us keep in mind that because
is an identity matrix, it follows that every nonzero linear combination of the rows of
M has at least one nonzero component in the first
k columns. Applying this to the linear combination (
25), we see that
. Furthermore, the following two subcases are possible and we consider them separately.
Subcase 1.. This means that
belongs to the columns of the matrix
, which is the right block of the matrix
in (
7). Clearly, the sum
has only one nonzero component in the first
k columns. More specifically, the only nonzero component of this sum in the first
k columns is the
-th component. Because
is an identity matrix, it follows from (
25) that
and
Subcase 2.. This means that
,
belong to the columns of the matrix
in
. Hence, we get
and all the other coefficients
are equal to 0, i.e.,
Therefore, all entries in the last columns of are equal to zero. Denote by and the projections of the rows and on the matrix, respectively. It follows that. This implies that the projections and are collinear, and so condition (iii) is satisfied.
(iii)⇒(i): Suppose that condition (iii) holds. The following two cases are possible.
Case 1. The matrix
in (
16) has a row with at most two nonzero entries. Denote by
ℓ the index of this row, where
. By using the same notation
for this row and the same linear combination
of the variable as in (
24), we get
Let
,
be the indices of the two nonzero entries in
, where
. Denote these two nonzero entries of
by
and
. Then it follows from (
29) that
The
ℓ-th linear equation of the system (
16) shows that
where
is the
ℓ-th component of the column vector
in (
16). Therefore the value of the statistic
is equal to
. This establishes a 2-compromise derived by using only the set of MEAN queries. Thus, condition (ii) holds.
Case 2. The matrix
in (
16) has two rows with collinear projections on the matrix
in (
17). Denote by
the indices of these rows, where
. Denote by
and
the projections of the rows
and
on the matrix
, respectively. Given that
and
are collinear, we can multiply one of these vectors by an appropriate coefficient and obtain the second vector. Without loss of generality, we may assume that there exists a coefficient
such that
. Because
is an identity matrix and the projection of the vector
on the matrix
is equal to
, it follows that
This establishes a 2-compromise again, because equalities (
32) show that the value of the statistic
is known and is equal to the constant
. This establishes that condition (ii) is satisfied in each of the cases, i.e., the attackers can achieve a 2-compromise by using only the set of MEAN queries.
Let us introduce notation for the set of MVQ queries answered so far. Suppose that a set of
k queries consisting of the corresponding pairs of mean and variance for the set of the corresponding
k samples
have been submitted to the audit system. Applying (
8), we can record the set of MEAN queries as a system of linear equations
where
, where
is the outcome of the MEAN query, and where
for
. Denote the left-hand-side of equality (
33) by
.
Given that the attackers have achieved a 2-compromise by using only the queries of the system (
33), they have derived the value
of a statistic
, for some
, where
. It follows that there exist coefficients
such that
and the value of the statistic
is equal to
.
For each MEAN query of the system (
33), the corresponding VARIANCE query of the form (
9) can be rewritten in the form (
10). It follows that all VARIANCE queries can be recorded as the following system of equations expressed in terms of the quadratic variables
,
,
where
, where
, where
is the outcome of the
i-th VARIANCE query and
is the outcome from (
33), and where
Denote the left-hand-side of equality (
36) by
. Equalities (
34) and (
37) show that the coefficients
in the system (
33) coincide with the corresponding coefficients
in the system (
36). Therefore, it follows from (
35) that
Because at least one of the coefficients
is nonzero, without loss of generality we may assume that
. Hence, (
35) implies that
Substituting (
39) for
in (
38), we get
This is a quadratic equation in one variable. It can be solved to determine the value of, which achieves a compromise ofD. Thus, condition (i) is satisfied.
This completes the proof of Theorem 2. □