Movatterモバイル変換


[0]ホーム

URL:


Next Article in Journal
The Role of E-Government Ambidexterity as the Impact of Current Technology and Public Value: An Empirical Study
Previous Article in Journal
Factors That Affect the Usage Intention of Virtual Learning Objects by College Students
 
 
Search for Articles:
Title / Keyword
Author / Affiliation / Email
Journal
Article Type
 
 
Section
Special Issue
Volume
Issue
Number
Page
 
Logical OperatorOperator
Search Text
Search Type
 
add_circle_outline
remove_circle_outline
 
 
Journals
Informatics
Volume 9
Issue 3
10.3390/informatics9030066
Font Type:
ArialGeorgiaVerdana
Font Size:
AaAaAa
Line Spacing:
Column Width:
Background:
Article

Protecting Private Information for Two Classes of Aggregated Database Queries

1
School of Computing Technologies, RMIT University, GPO Box 2476, Melbourne, VIC 3001, Australia
2
Centre for Research in Mathematics and Data Science, Western Sydney University, Locked Bay 1797, Penrith, NSW 2751, Australia
3
School of Information and Physical Sciences, College of Engineering, Science and Environment, The University of Newcastle, Callaghan, NSW 2308, Australia
*
Author to whom correspondence should be addressed.
Submission received: 28 July 2022 /Revised: 28 August 2022 /Accepted: 2 September 2022 /Published: 5 September 2022

Abstract

:
An important direction of informatics is devoted to the protection of privacy of confidential information while providing answers to aggregated queries that can be used for analysis of data. Protecting privacy is especially important when aggregated queries are used to combine personal information stored in several databases that belong to different owners or come from different sources. Malicious attackers may be able to infer confidential information even from aggregated numerical values returned as answers to queries over large collections of data. Formal proofs of security guarantees are important, because they can be used for implementing practical systems protecting privacy and providing answers to aggregated queries. The investigation of formal conditions which guarantee protection of private information against inference attacks originates from a fundamental result obtained by Chin and Ozsoyoglu in 1982 for linear queries. The present paper solves similar problems for two new classes of aggregated nonlinear queries. We obtain complete descriptions of conditions, which guarantee the protection of privacy of confidential information against certain possible inference attacks, if a collection of queries of this type are answered. Rigorous formal security proofs are given which guarantee that the conditions obtained ensure the preservation of privacy of confidential data. In addition, we give necessary and sufficient conditions for the protection of confidential information from special inference attacks aimed at achieving a group compromise.

    1. Introduction

    A large and rapidly developing area of modern informatics deals with security and privacy of data (see, for example, [1,2,3,4,5,6]). In particular, the preservation of privacy is crucial for broad adoption of digital payments [7], healthcare applications [8], location-based services [9], telemedicine [10], monitoring industrial infrastructure [11] and the Internet of things [12,13].
    The investigation of formal conditions which guarantee the preservation of private information against inference attacks using aggregated database queries originates from a fundamental result obtained by Chin and Ozsoyoglu [14] in the case of linear queries and linear inference attacks. It belongs to an important research direction devoted to the protection of privacy of confidential information and provides answers to aggregated queries that can be used for analysis of data [9,15]. Protecting privacy is especially important when aggregated queries are used to combine personal information stored in several databases that belong to different owners or come from different sources [16]. Malicious attackers may be able to infer confidential information even from aggregated numerical values returned as answers to queries over large collections of data [17]. Formal proofs of security guarantees are important, because they can be used for implementing practical systems protecting privacy and providing answers to aggregated queries.
    The present paper obtains novel rigorous formal conditions, which guarantee the protection of privacy of confidential information against certain possible inference attacks for two new classes of aggregated nonlinear queries motivated by the main result of [14].Section 2 of our paper gives a review of related previous work.Section 3 contains technical details on the materials and methods used in this paper.Section 4 presents main results of our article.Section 4.1 defines MEAN and VARIANCE queries (MVQ) and introduces a new class of inference attacks, quadratic equation attacks (QEA). In order to protect confidential information from QEA attacks we design a quadratic audit system (QAS). Theorems 2 and 3 establish that QAS systems guarantee the protection of confidential data from QEA attacks. Rigorous formal security proofs are given to ensure the preservation of privacy of confidential data.Section 4.2 introduces interval inference attacks (IIA). To protect sensitive data from IIA attacks, we design an interval audit system (IAS). Theorems 4 and 5 prove that the IAS ensures protection against IIA attacks. Finally, Theorem 6 inSection 4.3 gives rigorous matrix conditions for the protection of confidential information from a group compromise. The results obtained are discussed inSection 5, where directions for future research are also proposed. A conclusion is given inSection 6.
    The present paper contributes to the advancement of knowledge on the preservation of privacy of confidential information by developing formal theory, designing new formal systems for the protection against inference attacks and obtaining novel rigorous conditions that guarantee that the confidential information remains protected. In summary, a point-by-point list of the main contributions of this paper can be presented as follows:
    • Formal definitions of the MVQ queries and a new class of inference attacks, the QEA attacks.
    • The design of a QAS system for the protection of confidential information against the QEA attacks.
    • Rigorous formal proofs of Theorems 2 and 3, which establish that QAS systems guarantee the protection of confidential data from the QEA attacks.
    • Formal definition of a new class of inference attacks, the IIA attacks.
    • The design of an IAS system for the protection of sensitive data from the IIA attacks.
    • Rigorous formal proofs of Theorems 4 and 5, which demonstrate that IAS systems ensure protection against IIA attacks.
    • Rigorous formal proof of Theorem 6, which provides stringent matrix conditions for the protection of confidential information from a group compromise.

    2. Previous Work

    This section is devoted to the existing literature related to the results of [14] and a brief review of other relevant research. The paper [14] investigated linear queries and designed the concept of an audit expert, which maintains a dynamic matrix for processing such queries. The paper [18] suggested using a static audit expert for arbitrary linear queries, where the query basis matrix is prepared and fixed by the system beforehand. The paper [19] proposed to apply a hybrid audit expert, which combined the advantages of the dynamic and static expert systems. The effectiveness of hybrid audit experts was further investigated in [20].
    The majority of previous papers devoted to linear queries concentrated on studying the more special case of so-called SUM queries (seeSection 3 for a mathematical definition). The databases where the clients are allowed to submit SUM queries, were investigated in [21,22,23]. The readers are referred to our survey article [24] for more details.
    Wu et al. [25] used the concept of differential privacy and designed a differentially private mechanism for answering linear queries, which achieves a near-optimal data utility subject to a fixed privacy protection constraint. Mckenna et al. [26] applied advanced optimisation methods to develop a mechanism for accurate answers to a user-provided set of linear queries under local differential privacy. Khalili et al. [27] proposed an incentive mechanism and a randomized response algorithm for generating differentially private answers to linear queries. Xiao et al. [28] devised a fine-grained strategy of adding Gaussian noise to query answers in the special case of answering linear queries under differential privacy subject to per-query constraints on accuracy.
    Differential privacy has also been applied for privacy protection in various more advanced scenarios recently. For example, the paper by Qu et al. [29] proposed a customizable reliable differential privacy (CRDP) model and developed a modified Laplacian mechanism that enables CRDP to simultaneously minimize background knowledge attacks and eliminate collusion attacks in cyber-physical social networks. An application of the differential privacy for the development of personalised privacy protection in cyber-physical social systems was investigated in [30].
    Another important relevant direction of research deals with federated learning, which occurs when a query needs to be answered by using a large database that is a union of several separate databases that belongs to different data owners not willing to share data with others due to privacy issues. For example, Wan et al. [31] proposed to integrate differential privacy and the Wasserstein Generative Adversarial Network (WGAN) for preserving the privacy of sensitive parameters in federated learning. Cui et al. [32] introduced a blockchain-empowered decentralized and asynchronous federated learning framework and designed an improved, differentially private federated learning based on generative adversarial nets. Qu et al. [33] proposed a blockchain-enabled adaptive asynchronous federated learning paradigm (FedTwin) and designed a tailor-made consensus algorithm that uses generative adversarial network-enhanced differential privacy and an improved Markov decision process. A trade-off optimization procedure and a hybrid model were developed by Qu et al. [34] for simultaneous protection of the identity and location privacy of smart mobile devices against dynamic adversaries. Blockchain-enabled federated learning and WGAN-enabled differential privacy were applied by Wan et al. [35] in order to protect confidential model parameters in the fifth-generation broadband cellular networks and beyond fifth-generation networks.
    Thus, a lot of research has been conducted that investigates related directions. However, the protection of private information for the classes of nonlinear queries examined in the present paper has never been considered in the literature before.

    3. Materials and Methods

    If a data repository processes aggregated numerical queries for subsets of the records and provides the outcomes of these queries without giving access to individual records, then such a repository is often called astatistical database (cf. [36,37]). We use standard concepts and terminology, following [36,38,39,40,41,42]. Our proofs also apply the main theorem of [43].
    The set of all real numbers is denoted byR. The cardinality of a setS is denoted by|S|. For positive integersab, the symbol[a:b] stands for the set
    [a:b]={a,a+1,a+2,,b}.
    A summary of the main notation used in this paper is given inTable 1.
    Letm be the number of attributes in every record of the database, and let
    r=(r1,r2,,rm)
    be an arbitrary record. The attributes in the database are denoted byA1,,Am. For1im, the attributeAi is a function such thatAi(r)=ri.
    Letn be the number of records stored in the database. Denote the records byr1,,rn. We assume that the users can submit aggregated queries regarding the confidential attributeA1, and the attributesA2,,Am are used to select subsets of records for these queries. ThenA1 is called aquantitative attribute andA2,,Am are calledcharacteristic attributes for such queries. Letx1,x2,,xn be the (confidential) values of the quantitative attributeA1 in the records.
    The set of records chosen for a query by specifying conditions for the characteristic attributes is called thequery sample orquery set. To select a sample set for a query, the users can use inequalities and Boolean expressions. Denote byB the set of all Boolean expressions of inequalities involving the characteristic variables. This set can be defined inductively by the following rules:
    (B1)
    For anyrR,j[2:m], the setB contains inequalitiesrjr,rjr,rj<r,rj>r and equalityrj=r.
    (B2)
    IfB1,B2B, thenB1B2B,B1B2B,¬B1B, where ∧, ∨, ¬ denote the logical AND, OR and NOT operators, respectively.
    Throughout, we consider a query using a Boolean expressionBB to select the query sample. It specifies recordsr stored in the database such that the Boolean expression holds true for these records. The query sample, i.e., the set of all records inD satisfying conditionB, is denoted by S=B(D).
    Thorough investigation in the literature has been devoted to linear queries [14,18,19,44]. Alinear query can be recorded as a linear combination
    α1x1+α2x2++αnxn=β,
    whereβ is the outcome of the query, andα1,,αnR. Linear queries are also calledweighted sum queries. TheCOUNT query corresponding to the linear query (3) is defined as the number of nonzero coefficientsαi, fori[1:n].
    ASUM query is defined as a linear Equation (3), whereβ is the outcome of the query, and where
    αi=1if i-th record is included in the sum,0otherwise.
    When there is a set of linear queries indexed byj=1,,k with equations
    αj,1x1+αj,2x2++αj,nxn=βj,
    then we can collect them into the matrixM=[αj,i] and the column vectorV=[βj]. We can represent it as the matrix equationMX=V. Thus, every set of SUM queries (or linear queries) can be recorded as a system of linear equations of the form
    MX=V,
    whereM=[αj,i], and whereV=[βj] is the column vector with the values returned by the queries corresponding to the rows of the matrixM. Each query corresponds to a row of the matrixM. To derive the confidential valuesx1,,xn, the user can try to solve the system of linear equations.
    For linear queries, it is enough to consider one-dimensional databases, or databases with only one quantitative attribute. An arbitrary set of linear queries in a multi-dimensional database can be represented as a disjoint union of linear queries corresponding to different quantitative attributes, and each of these subsets can be viewed as a set of linear queries of the corresponding 1-dimensional database.
    Every linear combination of linear queries is also a linear query. If the outcomes of several linear queries are known, then the outcomes of all their linear combinations are also known. Therefore, row and column operations can be used to simplify (6). Applying row interchange, row scaling, row addition, and column interchange, the system (6) can be reduced to a normalized basis matrix form. Therefore, without loss of generality we may assume that (6) has been simplified and is a represented by anormalized query basis matrix M=Mk, where
    Mk=IkMk
    andIk is the(k×k) identity matrix. Then the matrixM is said to be innormalized form. The row vectors ofMk form a basis of the space of all queries with outcomes which are known, because they can all be derived by using linear combinations of query vectors.
    Inference attacks can be used to derive private information from legitimately available data. It may be possible to deduce confidential information by comparing the results of several different queries. Letx1,x2,,xn be the values of a protected or confidential attribute in the records. If the valuexi of a confidential attribute in one record is revealed to the user, for somei[1:n], then this event is called acompromise of the database. When it is essential to emphasize that the value in precisely one record has been revealed, then the terms1-compromise orclassical compromise can also be used.Linear inference attacks occur when malicious adversaries try to solve the system of linear equations (6) to determine confidential values.
    To provide protection against linear inference attacks, Chin and Ozsoyoglu [14] proposed a system called Audit Expert. It uses a normalized basis matrix to store all queries answered previously. When a new query is added, the Audit Expert adds it to the matrix and then reduces it to a normalized basis form again.
    Theorem 1
    ([14]).A statistical database with linear queries is compromised if and only if the normalized query basis matrixMk of the Audit Expert has a row with exactly one nonzero entry. The time complexity of the algorithm dynamically processing the query matrix of the Audit Expert and maintaining it in a normalized form for a set of k consecutive linear queries isO(k2).

    4. Results

    This section presents new results obtained in this paper for the protection of confidential information against the quadratic equation attacks (Section 4.1), Interval Inference Attacks (Section 4.2), and Group Compromise (Section 4.3).

    4.1. Quadratic Equation Attacks

    In this subsection, we consider a new different class of nonlinear queries by using variance and mean. These notions play crucial roles in hypothesis testing, significance analysis, and other studies, see [39].
    LetS=B(D) be a query sample, i.e., the set of records chosen by the Boolean expressionB. Denote byV the set{r1(r1,,rm)S} of values of the confidential quantitative attributeA1 in the records of the sampleS with the corresponding probability distribution. Themean of the values of the quantitative attribute is also called theexpected value of the quantitative attribute. It is denoted byV¯=E(r1) and is defined by the formula:
    V¯=E(r1)=1|S|(r1,,rm)Sr1.
    • Thevariance ofV is the expected valueE[(r1E(r1))2] of the squared differencesr1E(r1) of values of the quantitative attributer1 from the meanE(r1) (see [40]). The variance ofV is denoted byσV and is defined by the following formula:
      σV2=E[(r1E(r1))2]=1|S|(r1,,rm)S(r1V¯)2,
      whereV¯ is the mean given by (8) (see [40,41]). The variance measures the variability of values of the quantitative attribute from the mean. It is explained in [40] with a complete proof (see also [41]), that formula (9) can be rewritten in the following equivalent form:
      σV2=E[(r1E(r1))2]=E(r12)(E(r1))2=1|S|(r1,,rm)Sr121|S|(r1,,rm)Sr12.
    • For more explanations and worked examples, the readers are referred to [40,41].
    AMEAN and VARIANCE query, or anMVQ query, can be defined as a pair(f,B), whereB is a Boolean expression andf is a functionf=(f1,f2), wheref1 is defined by (8) andf2 is defined by (9). This means that an MVQ query submits a Boolean expressionB and asks to return the values of the sample mean and variance for the sampleS=B(S).
    Equality (10) allows us to recover the outcome of each VARIANCE query from the mean value of the squares of the values of the confidential attribute. Therefore, in order to store an MVQ query in computer memory, it is enough to keep a record of the coefficients that occur in the MEAN query, the outcome of the MEAN query, and the mean value of the squares of the values of the confidential attribute. Therefore, to store a set of MVQ queries, we can use the following pair of matrix equalities,
    MX=V,MY=W,
    whereM is the matrix storing the coefficients of the MEAN queries,X=[x1,,xn]T is the column of the confidential valuesx1,,xn,Y=[x12,,xn2]T is the column of the squares of the confidential values,V is the column vector with the values returned by the MEAN queries, and whereW is the column vector of the mean values of the squares of the confidential values. In concise matrix notation, the Equation (11) can be stored as the following matrix:
    (M|V|W).
    The following example illustrates our matrix notation.
    Example 1.
    Suppose that in a dataset with two recordsr1,r2 the values of the confidential attribute arex1=0,x2=2. Suppose that the MVQ queries have been answered for the following three samples:{r1,r2},{r1},{r2}. Then we get the following matrix equalities
    1/21/21001x1x2=102,
    1/21/21001x12x22=204.
    Here (13) keeps a record of the mean values of the samples, and () stores the corresponding mean values of the squares of the confidential attribute. We do not have to store long records of all coefficients of the VARIANCE queries, because equality (10) makes it easy to obtain the values of all VARIANCE queries from (). The concise matrix notation we are going to use to keep a record of all MVQ queries is the matrix
    1/21/21210000124.
    Applying the row and column operations, we can reduceM to a normalized form. Then the system (11), simplifies and reduces to the normalized form
    MkX=V,MkY=W,
    where thenormalized query basis matrix Mk has the form
    Mk=IkMk,
    whereIk is the(k×k) identity matrix. In concise matrix notation equations (16) can be stored as the matrix
    (Mk|V|W).
    Next, we define the first type of a nonlinear inference attack, the QEA attack, which can be used by an adversary to compromise MVQ queries. Steps of the QEA attack are explained in Algorithm 1.
    To protect sensitive data from QEA attacks, we design a quadratic audit system (QAS). It is described in Algorithm 2 by using the following matrix notation.
    Letv be a vector withn components, and letT be a(k×n)-matrix. Denote by|v| the number of nonzero components inv. For1ik, thei-th row ofT is denoted byT(i,:). For1jn, thej-th column ofT is denoted byT(:,j). The deletion of thej-th column fromT is denoted byT[:,j][]. For1j<n, the interchanging the columnsj and inT is denoted byT(:,[j])T(:,[j]). The vector[v(j),v(j+1),,v()] is denoted byv(j:). The(k+1×n)-matrix obtained by adding thev as the last row toT is denoted by[T;v]. Two vectorsu andv are said to beparallel orcollinear if and only if either at least one of them is a zero vector, or there exists a nonzero real numberα such thatu=αv. If two vectorsu,v are collinear, then we writeu||v.
    A formal proof establishing that the QAS system guarantees protection of sensitive data from QEA attacks is given in Theorem 3. It relies on Theorem 2, which gives matrix conditions necessary and sufficient for QEA attack to reveal confidential data.
    Theorem 2 uses the concept ofc-compromise, wherec is a positive integer. This concept includes as a special case the notion of a classical compromise or 1-compromise treated in Theorem 1. Namely, the disclosure of a statistic based onc or fewer records in the database is called ac-compromise. The notion of ac-compromise has already been studied in the literature (see the survey paper [24] for more references).
    For any rowr of the matrixMk in (17), denote by
    r(k,)=(r1,,rk)
    the vector of the firstk components ofr. Denote by
    r(,nk)=(rk+1,,rn)
    the vector of the lastnk components ofr. Then the row has the form
    r=(r(k,),r(,nk)).
    The vectorr(,nk) will be called theprojection of the rowr on the matrixMk in (17).
    Algorithm 1 Quadratic Equation Attack.
     Input: 
    A set of MVQ queries.
     Output: 
    A compromise of the set of queries.
       1:
    First, verify whether a compromise can be achieved by using only the set of MEAN queries as in Theorem 1. If not, then proceed to the next step.
       2:
    Test all combinations oft[1:n] andT[1:n] to find a pair(t,T) with two properties (A1), (A2):
    (A1) The set of linear equations corresponding to the MEAN queries can be used to derive equalities
    xi=γixt+δi
    whereγi,δiR, for alliT.
    (A2) The attackers may be able to use the outcomes of the VARIANCE queries to derive a quadratic equation of the form
    q(xi,iT)=w
    depending only onxi,iT, wherewR.
       3:
    Substitute all expressions (22) into (23) so that it becomes a quadratic equation in one variable xt.
       4:
    Solve the resulting quadratic equation in one variablext to achieve a compromise.
       5:
    Outputt,xt.
    Algorithm 2 Quadratic Audit System.
     Input: 
    Normalized matrixMk=Ik|Mk of the answered MVQ queries and the vectorv of the new MVQ query.
     Output: 
    New normalized matrix and answer to the query, or response that the query has been rejected.
        1:
    uvi=1kvi·Mk(i,:);jk+1
        2:
    if u=0then
        3:
       Answer the query, keep the matrix unchanged.
        4:
    else if |u|2 then
        5:
       Reject the query, keep the matrix unchanged.
        6:
    else
        7:
       Letuj be the first nonzero component ofu. Setu1uju;T[Mk;u(k+1:n)].
        8:
       if j>k+1 then
        9:
         Mk+1(:,[(k+1)j])Mk+1(:,[j(k+1)])
       10
         for all i[1:k] do
       11:
         T(i,:)T(i,:)T(i,k+1)T(k+1,:)
       12:
         if T(i,:)(k+1:n)||T(k+1,:)(k+1:n) then
       13:
            Reject the query, keep the matrix unchanged.
       14:
         end if
       15:
        end for
       16:
       end if
       17:
       Answer the query and setMk+1=[Ik+1;T].
       18:
    end if
    Theorem 2.
    Let D be a database with the set of MVQ queries answered so far stored in matrix form (11) with the normalized form (16). Then the following conditions are equivalent.
    (i)The QEA attack can be used to achieve a compromise of D.
    (ii)The attackers can use the set consisting of only the MEAN queries answered so far to achieve a 2-compromise of D.
    (iii)EitherMk in (16) has a row with at most two nonzero entries, orMk has two rows with collinear projections onMk in (17).
    Proof of Theorem 2. 
    As in the proof of the main theorem of [14] and in other previous publications, it has been customary to assume that the attackers can gain knowledge of the COUNT query corresponding to each their query. It is important to ensure rigorous protection of privacy under this assumption, in view of the following three easy ways enabling the attackers to gain access to the outcomes of the COUNT queries.
    (a) The COUNT query is a legitimate query. It can be submitted to the database and may be answered as a separate query.
    (b) The COUNT query can be included as an integral part of every SUM query or linear query.
    (c) It may be easy for the attackers to gain access to the values of some COUNT queries by using additional information, legal knowledge, or insider knowledge.
    Theorem 1 and its proof also assume that the audit system must provide protection against database compromise even if the attackers can gain access to the COUNT queries. Without this assumption, Theorem 1 is invalid. Indeed, even if the attackers can manage to obtain an outcome of the query corresponding to the value of a confidential attribute in just one record, they will be unable to notice that they have achieved this, since without the knowledge of a COUNT query they won’t know whether the outcome corresponds to just one record or many records. This is why it is a common practice to assume that the attackers can also gain access to the outcomes of the corresponding COUNT queries, and that audit system must provide protection in these circumstances.
    (i)⇒(ii): Suppose that condition (i) holds, i.e., the QEA could be used to achieve a compromise ofD. Let us refer to the definition of the QEA attack in Algorithm 1.
    First, we consider the case where the attackers managed to achieve a compromise in Step 1 of the Quadratic Equation Attack. In this case, Step 1 results in a compromise achieved by using only the set of MEAN queries. Every classical compromise is an example of a 2-compromise required for condition (ii). Therefore in this case condition (ii) follows immediately.
    Now, we assume that the attackers had to proceed to the remaining steps of the QEA. This means that they found an elementt[1:n] and a subsetT[1:n] with properties (A1) and (A2). Let us take the equalityx1=γ1xt+δ1, which is the first equality of the system (22). It implies thatx1γ1xt=δ1. Therefore, the attackers have managed to derive the valueδ1 of the statisticx1γ1xt, which depends on at most two variables. This means that the attackers have achieved a 2-compromise by using only the set of MEAN queries, and so condition (ii) holds again.
    (ii)⇒(iii): Suppose that condition (ii) holds, i.e., the attackers have managed to achieve a 2-compromize ofD by using only MEAN queries. This means that they derived the valueη of a statisticν1x1+ν2x2, for some11<2n, whereν12+ν220. Denote the rows of the matrixM bym1,,mk. Fori[1:k], let us denote byλi the linear combination of the variablesx1,,xn corresponding to thei-th row of the matrixM. This means that
    λi=miX,
    whereX=[x1,,xn]T. Then, as in (35) above, again it follows that there existξ1,,ξk such that
    η=ν1x1+ν2x2=ξ1λ1++ξkλk.
    First, we consider the case whereν1=0. Then the valueη=ν2x2 provides a 1-compromise. Hence, Theorem 1 implies that the normalized basis matrixMk of the audit system has a row with only one nonzero entry. Therefore condition (iii) is satisfied.
    Second, ifν2=0, then it follows in the same way that condition (iii) holds true, as well.
    Third, it remains to treat the case whereν1,ν20. Note thatMk=[IkMk] as in (7). Let us keep in mind that becauseIk is an identity matrix, it follows that every nonzero linear combination of the rows ofM has at least one nonzero component in the firstk columns. Applying this to the linear combination (25), we see that1k. Furthermore, the following two subcases are possible and we consider them separately.
    Subcase 1.2>k. This means thatx2 belongs to the columns of the matrixMk, which is the right block of the matrixMk=[IkMk] in (7). Clearly, the sumν1x1+ν2x2 has only one nonzero component in the firstk columns. More specifically, the only nonzero component of this sum in the firstk columns is the1-th component. BecauseIk is an identity matrix, it follows from (25) thatξ10 and
    ξ1=ξ11=ξ11==ξk=0.
    • Hence,η=ξ1λ1. It follows that the1-th row ofMk has precisely two nonzero entries, and so condition (iii) holds.
    Subcase 2.2k. This means thatx1,x2 belong to the columns of the matrixIk inMk. Hence, we getξ1,ξ20 and all the other coefficientsxi are equal to 0, i.e.,
    ξ1=ξ2==ξ11=ξ1+1=ξ1+2==
    ξ21=ξ2+1=ξ2+2==ξk=0.
    • Therefore, all entries in the last(nk) columns ofη are equal to zero. Denote byp1 andp2 the projections of the rowsm1 andm1 on the matrixMk, respectively. It follows thatξ1p1+ξ2p2=0. This implies that the projectionsp1 andp2 are collinear, and so condition (iii) is satisfied.
    (iii)⇒(i): Suppose that condition (iii) holds. The following two cases are possible.
    Case 1. The matrixMk in (16) has a row with at most two nonzero entries. Denote by the index of this row, where1k. By using the same notationm for this row and the same linear combinationλ of the variable as in (24), we get
    λ=mX.
    Let1,2 be the indices of the two nonzero entries inm, where11<2n. Denote these two nonzero entries ofm byν1 andν2. Then it follows from (29) that
    mX=ν1x1+ν2x2.
    The-th linear equation of the system (16) shows that
    mX=v,
    wherev is the-th component of the column vectorV in (16). Therefore the value of the statisticν1x1+ν2x2 is equal tov. This establishes a 2-compromise derived by using only the set of MEAN queries. Thus, condition (ii) holds.
    Case 2. The matrixMk in (16) has two rows with collinear projections on the matrixMk in (17). Denote by1,2 the indices of these rows, where11<2k. Denote byp1 andp2 the projections of the rowsm1 andm2 on the matrixMk, respectively. Given thatp1 andp2 are collinear, we can multiply one of these vectors by an appropriate coefficient and obtain the second vector. Without loss of generality, we may assume that there exists a coefficientφ such thatp1=φp2. BecauseIk is an identity matrix and the projection of the vectorm1φm2 on the matrixMk is equal top1φp2, it follows that
    λ1ϱλ2=x1ϱx2=v1ϱv2.
    • This establishes a 2-compromise again, because equalities (32) show that the value of the statisticx1ϱx2 is known and is equal to the constantv1ϱv2. This establishes that condition (ii) is satisfied in each of the cases, i.e., the attackers can achieve a 2-compromise by using only the set of MEAN queries.
    Let us introduce notation for the set of MVQ queries answered so far. Suppose that a set ofk queries consisting of the corresponding pairs of mean and variance for the set of the correspondingk samplesS1,,Sk have been submitted to the audit system. Applying (8), we can record the set of MEAN queries as a system of linear equations
    αi1x1+αi2x2++αinxn=βi,
    wherei[1:k], whereβi is the outcome of the MEAN query, and where
    αij=0ifj-th record is not includedini-th sampleSi,1|Si|otherwise,
    forj[1:n]. Denote the left-hand-side of equality (33) byqi.
    Given that the attackers have achieved a 2-compromise by using only the queries of the system (33), they have derived the valueη of a statisticν1x1+ν2x2, for some11<2n, whereν12+ν220. It follows that there exist coefficientsξ1,,ξk such that
    ξ1q1++ξkqk=ν1x1+ν2x2,
    and the value of the statisticν1x1+ν2x2 is equal toη=ξ1β1++ξkβk.
    For each MEAN query of the system (33), the corresponding VARIANCE query of the form (9) can be rewritten in the form (10). It follows that all VARIANCE queries can be recorded as the following system of equations expressed in terms of the quadratic variablesx12,x22,,xn2
    γi1x12++γinxn2=δi,
    wherei[1:k], whereδi=σi2+βi2, whereσi2 is the outcome of thei-th VARIANCE query andβi is the outcome from (33), and where
    γij=0ifj-th record is not includedini-th sampleSi,1|Si|otherwise.
    Denote the left-hand-side of equality (36) byϱi. Equalities (34) and (37) show that the coefficientsαi1,,αin in the system (33) coincide with the corresponding coefficientsγi1,,γin in the system (36). Therefore, it follows from (35) that
    ξ1ϱ1++ξkϱk=ν1x12+ν2x22=η.
    Because at least one of the coefficientsν1,ν2 is nonzero, without loss of generality we may assume thatν10. Hence, (35) implies that
    x1=ην1ν2ν1x2.
    Substituting (39) forx1 in (38), we get
    ν1ην1ν2ν1x22+ν2x22=η.
    This is a quadratic equation in one variablex2. It can be solved to determine the value ofx2, which achieves a compromise ofD. Thus, condition (i) is satisfied.
    This completes the proof of Theorem 2.    □
    Theorem 3.
    LetMk=Ik|Mk be the normalized matrix of the answered MVQ queries, and let v be the vector of the coefficients of the mean in the next MVQ query. Then Algorithm 2 answers the next query only if it is safe to do so and the QEA attack cannot be used to disclose sensitive data. Algorithm 2 ensures that the next query is rejected if the QEA attack can reveal sensitive data after an answer to this query.
    Proof. 
    The proof establishing that QAS system guarantees protection of sensitive data from QEA attacks follows from Theorem 2. It follows immediately, because Algorithm 2 verifies condition (iii) of Theorem 2 and answers the next query only if Theorem 2 guarantees that sensitive data cannot be revealed by using the QEA attack after the query is answered.    □

    4.2. Interval Inference Attacks

    The class of IIA inference attacks is defined in Algorithm 3. It uses the following concepts. For a positive real numberε, we say that anε-approximate compromise or anapproximate compromise with precision ε has been achieved, if the attackers can determinexR such that they can deduce that the value of the confidential attribute in a record belongs to the interval[x,x+ε]. We say that anapproximate compromise occurs if there existsε such that anε-approximate compromise has been achieved.
    To protect sensitive data from IIA attacks, we design an interval audit system (IAS). It is described in Algorithm 4.
    A formal proof that the IAS system protects sensitive data from IIA attacks is presented in Theorem 4. It relies on Theorem 5, which gives necessary and sufficient conditions for an approximate compromise to occur.
    Algorithm 3 Interval Inference Attack.
     Input: 
    A set of MVQ queries with query sampleSj, meanmj, varianceσj2, forj[1:].
     Output: 
    Indexs of a record and the upper and lower boundsU,L for the sensitive attribute in the record.
        1:
    S=j=1Sj.
        2:
    for all rS do
        3:
      Lr;Ur+.
        4:
    end for
        5:
    for all j[1:] do
        6:
      for all rSj do
        7:
       Lrmax{Lr,mjσj|Sj|1};
        8:
       Urmin{Ur,mj+σj|Sj|1}.
        9:
      end for
       10:
    end for
       11:
    L;U+;s=.
       12
    for all rS do
       13:
      if |UrLr|<|UL|  then
       14:
       LLr;UUr;sthe index ofrinD.
       15:
      end if
       16:
    end for
       17:
    Outputs,L,U.
    Theorem 4.
    Let ε be a positive real number, let the set of already answered MVQ queries consist of ℓ queries with meansmj and variancesσj2, forj[1:]. Let S be the set of all records occurring in any of these already answered queries, and letLr,Ur be the values defined forrS in Algorithm 3. Let T be the sample of records of the next submitted MVQ query. Then, Algorithm 4 answers the next query only if it is safe to do so and the IIA attack cannot result in a ε-approximate compromise of sensitive data. Algorithm 4 ensures that the next query is rejected if the IIA attack can result in an ε-approximate compromise after an answer to this query.
    Proof. 
    The proof establishing that IAS system guarantees protection of sensitive data from IIA attacks follows from Theorem 5. It follows immediately, because Algorithm 4 verifies condition (iii) of Theorem 5 and answers the next query only if Theorem 5 guarantees thatε-approximate compromise does not occur after the query is answered.    □
    Algorithm 4 Interval Audit System.
     Input: 
    ε>0 such that the system must protect fromε-approximate compromise. The set of already answered MVQ queries withmj,σj2,j[1:],S, andLr,Ur defined forrS in Algorithm 3. The new MVQ query with sampleT.
     Output: 
    Reject the query if it leads toε-compromise. Otherwise, returnm andσ2 for the new query.
        1:
    Compute the meanm and varianceσ2 forT.
        2:
    for all rST do
        3:
       Lrmax{Lr,mσ|T|1};
        4:
       Urmin{Ur,m+σ|T|1}.
        5:
    end for
        6:
    for all rTS do
        7:
       Lrmσ|T|1;
        8:
       Urm+σ|T|1.
        9:
    end for
       10:
    ifmin{|UrLr|:rST}εthen
       11:
       Reject the query.
       12:
    else
       13:
       Outputm,σ.
       14:
    end if
    Theorem 5.
    Algorithm 3 returns the index s of a recordr=(r1,,rn)D and an interval[L,U]=[Lr,Ur] such that it is guaranteed thatr1[L,U] and the length|UrLr| of the achieves the minimum value. There exist two databasesDL andDU such that the recordrL with indexsL found by Algorithm 4 inDL has confidential attributer1 equal to L, and the recordrU with indexsU inDU has confidential attribute equal to U.
    Proof. 
    Suppose that Algorithm 3 is applied to a set of samples of MVQ queries indexed byj[1:], with query sampleSj consisting of recordsr=(r1,,rn)Sj such that the mean and variance of the confidential componentsr1, forrSj, are equal tomj andσj2, respectively.
    For eachj[1:] and each recordrS=j=1Sj, it is easily seen that lines 2 to 9 of Algorithm 3 compute the following values
    Lr=maxj:rSjmjσj|Sj|1,
    Ur=minj:rSjmj+σj|Sj|1.
    For any sampleSj, wherej[1:], and any recordr=(r1,,rn)Sj, the following Samuelsen’s inequalities were proven in [43]:
    mjσj|Sj|1r1mj+σj|Sj|1.
    Combining equalities (41) and (42) with all inequalities (43) for one fixed recordrS and all samplesSj, forj[1:], containingrS, we get
    Lrr1Ur.
    It is clear that lines 11 to 16 of Algorithm 3 find the indexs of the recordr such that the length|UrLr| of the interval[Lr,Ur] achieves the minimum value.
    LetDL be a database withn recordsr[1],,r[n]. Suppose that there is just one sampleS containing all records ofDL and that the meanμ and varianceσ2 are given and fixed. Let us define
    r[1]1=L,
    r[2]1==r[n]1=μ+(μL)/(n1),
    whereL andU are defined by (41) and (42), respectively. It is routine to verify that the mean of the confidential attributes of all records inDL is equal toμ and the variance is equal toσ2. Then Algorithm 4 computes
    Lr[1]==Lr[n]=L,
    Ur[1]==Ur[n]=U.
    Therefore, Algorithm 4 returnssL=1,L,U. Becauser[1]1=L, this example shows that in full generality, the valueL cannot be improved.
    A dual example of databaseDU with
    r[1]1=U,
    r[2]1==r[n]1=μ(Uμ)/(n1),
    shows that in general the valueU cannot be improved either. □

    4.3. Group Compromise

    Letc,k be positive integers such thatck, and letMk=(IkMk) be the normalized basis matrix of a set of linear queries as in (6) and (7). We use the following well-known definitions and facts of the matrix theory (see [38]). The rank of a matrix is equal to the dimension of the vector space spanned by the rows of the matrix. It is also equal to the maximum number of linearly independent rows of the matrix. The rank of a matrix withk rows is less thank if and only if the rows of the matrix are linearly dependent, i.e., there exists a nontrivial linear combination of the rows equal to zero. The rank of the matrixMk is equal tok.
     Theorem 6. 
    Let c, k be positive integers such thatck, and letMk=(IkMk) be the normalized basis matrix (7) of a set of linear queries for the database D. Then the following conditions are equivalent.
    (i)
    The database D is c-compromised by the set of linear queries with the normalized basis matrixMk.
    (ii)
    There exist c columns inMk such that after deletion of these columns the rank of the remaining matrix becomes less than k.
    (iii)
    There exist s and t withs+t=c such that it is possible to remove s columns ofMk and in this new matrix find t rows that span a space of dimension less than t.
    Proof. 
    Letn be the number of columns in the matrixMk in the hypothesis of this theorem. Denote the rows ofMk bym1,,mk, and the rows of the matrixMk bym1,,mk. Forj[1:n], let
    ej=(ej1,ej2,,ejn)
    be the vector with componentsej, for[1:n], defined by
    ej=1ifj=,0otherwise.
    LetX=[x1,,xn]T be the column of the confidential variables.
    (i)⇒(ii) Suppose that condition (i) holds. Then there exist coefficientsν1,,νk such that the linear combinationi=1kνimi has at mostc nonzero components. Therefore it can be represented in the form
    i=1kνimi==1cξei
    for some positive integers1i1<<icn and someξ1,,ξcR. LetM¯k be the matrix obtained from the matrixMk by deleting all columns with indicesi1,,ic. Denote bym¯1,,m¯k the rows obtained from the rowsm1,,mk by deleting all columnsi1,,ic. It follows from (53) thati=1kνim¯i=0. Therefore the rows of the matrixM¯k are linearly dependent. It follows that the rank ofM¯k is less thank. Thus, condition (ii) is satisfied.
    (ii)⇒(i) Suppose that condition (ii) holds. Then there existc columns in the matrixMk such that the rank of the matrixM¯k obtained by deleting these columns is less thank. Denote the indices of these columns byi1,,ic, where1i1<<icn. Letm¯1,,m¯k be the rows obtained from the rowsm1,,mk by deleting all columnsi1,,ic. It follows that the rowsm¯1,,m¯k are linearly dependent, i.e., there exist coefficientsν1,,νk such thati=1kνim¯i=0. Hence, equality (53) holds true, for someξ1,,ξc. Therefore the statistic (53) produces ac-compromise of the databaseD. Thus, condition (i) is satisfied.
    (i)⇒(iii) Suppose that there is ac-compromise. As above, then there exist coefficientsν1,,νk such that the sumi=1kνimi can be represented in the form (53), for some1i1<<icn andξ1,,ξc. Lets be the number of the indices1i1<<icn that are greater thank. Putt=cs. Then
    i1<<itk<it+1<<ic
    Denote byN˜ the matrix obtained fromMk by deleting the columns with indices
    it+1k,it+1k+1,,ick.
    LetM˜ be the matrix obtained fromMk by deleting the columns with indices
    it+1,it+1+1,,ic.
    This means thatM˜ is obtained fromMk by replacingMk withM˜. Then (7) implies that
    M˜=[IkN˜].
    Denote the rows of the matrixM˜ bym˜1,,m˜k. Letε1,,εk be the rows of the identity matrixIk, and letp˜1,,p˜k be the rows of the matrixN˜. Then we have
    m˜i=(εip˜i),
    fori[1:k]. Denote bye˜1,,e˜n the vectors obtained frome1,,en by deleting the columns with indices (55). Clearly,
    e˜it+1==e˜ic=0.
    Therefore, equality (53) implies that
    i=1kνim˜i==1tξe˜i
    It follows that the sumi=1kνim˜i has at mostt nonzero components corresponding to thet vectorse˜i in the right-hand side of (60). Therefore (58), (60) and the definition ofεi show that
    νi=0wheneveri{i1,,it}.
    Hence, (58), (60) and (61) imply that
    =1tνim˜i==1tνi(εip˜i)==1tξe˜i.
    It follows that=1tνip˜i=0. This means that the vectorsp˜i1,,p˜it are linearly dependent. Because these vectors are rows of the matrixN˜, we see that theset rows of the matrixN˜ span a space of dimension less thant. Thus, condition (iii) is satisfied.
    (iii)⇒(i) Suppose that condition (iii) holds. Then there exists andt such that it is possible to removes columns with indicesit+1k,,ick from the matrixMk and in this new matrixN˜ findt rowsm˜i1,,m˜it that span a space of dimension less thant. (For consistency, here we introduce and use the same notation as in the proof of the preceding implication, so that the numbersit+1,,ic refer to the indices of the corresponding columns in the matrixM.) Then these rows are linearly dependent, and so there exists a linear combination equal to zero,
    =1tνim˜i=0
    for someνi1,,νit. Consider the following linear combination
    φ==1tνimi.
    BecauseIk is the identity matrix, it follows from (58) that, if we look at the lastnk components of the vectorφ, then we see that all nonzero values among these components correspond to thes columnsit+1,,ic ofMk of the matrixMk corresponding to the columns of the submatrixMk that were deleted in the discussion above. All the other values among the lastnk components ofφ are equal to zero by (63). Therefore, there are at mosts nonzero values among the lastnk components of the vectorφ.
    On the other hand, becauseIk is an identity matrix andφ is a sum oft rows ofMk, it follows that there are at mostt nonzero coordinates among the firstk components of the vectorφ. In total, we see thatφ has at mosts+t=c nonzero components. It follows that the linear combination (64) of the rows of the matrixMk produces ac-compromise of the databaseD. Thus, condition (i) is satisfied. This completes the proof of Theorem 6. □
    Note that the running times of the algorithms for the detection of ac-compromise using conditions (ii) and (iii) areOk2nc andO2cc2nc, respectively.

    5. Discussion

    The results obtained in this paper advance theoretical knowledge devoted to the protection of private and confidential information and prepare a foundation for the development of future comprehensive privacy protection systems.
    At the same time, the results obtained have certain limitations, which motivate future work. Next, we formulate and discuss examples of directions for future research, which are motivated by our results and will need to be addressed in separate subsequent publications.
    The first limitation of our results is explained by the general approach adopted in the previous papers [14,18,19,20,21,22,23,24]. This approach gives only exact and correct answers to the queries submitted by the clients. However, if the system detects that a query can compromise confidential information, then it only replies that the query cannot be answered. The present paper also uses this approach.
    The advantage of this approach is that in the case where it is determined that a new query submitted by a client does not lead to a disclosure of confidential information, then the client will be happy to receive an exact answer to the query. However, if it is discovered that a query leads to disclosure of confidential information, then no answer is given. Therefore, the client does not receive any helpful response in the latter case.
    To tackle this issue, it may be a good idea to investigate how to supply the client with some additional information expressed, for example, in terms of evaluation of probabilities. We suggest the following direction for future research.
    Direction 1.
    Investigate and develop hybrid systems, which provide exact answer to a query if it does not lead to disclosure of confidential information, and which use differential privacy techniques to provide a randomised probabilistic response to a query if it leads to disclosure of confidential information.
    The second limitation of our proposed systems is their focus on the particular novel classes of attacks that have not been considered previously. However, if a system provides protection against these attacks, then it can remain vulnerable to various other types of attacks. Therefore, for practical applications it is essential to consider systems providing simultaneous protection against various types of attacks without incurring a prohibitive computational overload.
    Direction 2.
    Design and investigate combined comprehensive systems providing answers to aggregated queries with simultaneous protection of confidential data against various different types of attacks without incurring a prohibitive computational overhead. Consider novel approaches to the optimisation of the performance of these systems.
    The third limitation of [14,18,19,20,21,22,23,24] and our systems is explained by the fact that this research still remains at the theoretical stage of development, when it is paramount to develop a comprehensive theory. Clearly, useful systems can be implemented as practical software only when there is sufficient rigorous theoretical foundation and only after significant advances on Direction 2 are achieved. After that, it will become important to design software implementations and conduct experimental studies comparing their performance for various categories of practical datasets. This motivates the following direction.
    Direction 3.
    Design software implementations of new systems proposed during future developments of Direction 2. Conduct comprehensive experimental studies comparing their performance for various categories of practical datasets.
    The fourth limitation of our systems is in the assumption that the whole collection of data is known to the system answering queries. Therefore, the systems cannot operate in the federated learning scenario. Because federated learning is a rapidly growing area of research where aggregation techniques play significant roles (see, for example, the surveys [45,46]), we propose the following direction for future research.
    Direction 4.
    Develop systems for protecting the privacy of confidential information in the federated learning scenario.
    Directions 1 to 4 are recorded here in general form for arbitrary queries, even though the present article motivates the investigation of these directions with a focus on the MVQ queries as the very first option for consideration.

    6. Conclusions

    This paper investigated nonlinear queries, which had not been considered in the literature before. It contributed to the development of formal theory designing new systems for the protection against inference attacks and obtaining novel rigorous conditions that guarantee that the confidential information remains protected. The paper presented the following contributions to the advancement of knowledge on the preservation of privacy of confidential information:
    • Definitions of the MVQ queries (Section 4.1) and the QEA attacks (Algorithm 1).
    • The design of a QAS system for the protection of confidential information against the QEA attacks (Algorithm 2).
    • Theorems 2 and 3 prove that QAS systems guarantee protection against the QEA attacks.
    • Definition of the IIA attacks (Algorithm 3).
    • The design of an IAS system for the protection of sensitive data from the IIA attacks (Algorithm 4).
    • Theorems 4 and 5 prove that IAS systems ensures protection against IIA attacks.
    • Theorem 6 provides stringent matrix conditions for the protection of confidential information from a group compromise.
    Four directions for future research were discussed and presented inSection 5.

    Author Contributions

    Conceptualization, X.Y. (Xuechao Yang), X.Y. (Xun Yi), and A.K.; methodology, X.Y. (Xun Yi) and L.R.; formal analysis, A.K., L.R., Y.L., and J.R.; investigation, X.Y. (Xuechao Yang), A.K., Y.L., and J.R.; writing—original draft preparation, A.K. and L.R.; writing—review and editing, X.Y. (Xuechao Yang), X.Y. (Xun Yi), Y.L., and J.R.; supervision, X.Y. (Xun Yi); project administration, X.Y. (Xun Yi) and L.R.; funding acquisition, X.Y. (Xun Yi) and L.R. All authors have read and agreed to the published version of the manuscript.

    Funding

    This research was funded by the Australian Research Council, Discovery grant DP160100913.

    Institutional Review Board Statement

    Not Applicable.

    Informed Consent Statement

    Not Applicable.

    Data Availability Statement

    Not Applicable.

    Acknowledgments

    The authors are grateful to three anonymous reviewers for thorough reports and comments that have helped to improve this paper.

    Conflicts of Interest

    The authors declare no conflict of interest.

    Abbreviations

    The following abbreviations are used in this paper and subsections where they are explained:
    AbbreviationMeaningSubsection
    IASInterval Audit SystemSection 4.2
    IIAInterval Inference AttackSection 4.2
    MVQMean and Variance QuerySection 4.1
    QASQuadratic Audit SystemSection 4.1
    QEAQuadratic Equation AttackSection 4.1

    References

    1. Bartol, J.; Vehovar, V.; Petrovčič, A. Should We Be Concerned about How Information Privacy Concerns Are Measured in Online Contexts? A Systematic Review of Survey Scale Development Studies.Informatics2021,8, 31. [Google Scholar] [CrossRef]
    2. Downer, K.; Bhattacharya, M. BYOD Security: A Study of Human Dimensions.Informatics2022,9, 16. [Google Scholar] [CrossRef]
    3. Hirschprung, R.S.; Klein, M.; Maimon, O. Harnessing Soft Logic to Represent the Privacy Paradox.Informatics2022,9, 54. [Google Scholar] [CrossRef]
    4. Antunes, M.; Oliveira, L.; Seguro, A.; Verissimo, J.; Salgado, R.; Murteira, T. Benchmarking Deep Learning Methods for Behaviour-Based Network Intrusion Detection.Informatics2022,9, 29. [Google Scholar] [CrossRef]
    5. Azeez, N.A.; Odufuwa, O.E.; Misra, S.; Oluranti, J.; Damaševičius, R. Windows PE Malware Detection Using Ensemble Learning.Informatics2021,8, 10. [Google Scholar] [CrossRef]
    6. Perera, S.; Jin, X.; Maurushat, A.; Opoku, D.J. Factors Affecting Reputational Damage to Organisations Due to Cyberattacks.Informatics2022,9, 28. [Google Scholar] [CrossRef]
    7. Sahi, A.M.; Khalid, H.; Abbas, A.F.; Zedan, K.; Khatib, S.F.A.; Al Amosh, H. The Research Trend of Security and Privacy in Digital Payment.Informatics2022,9, 32. [Google Scholar] [CrossRef]
    8. Bile Hassan, I.; Murad, M.A.A.; El-Shekeil, I.; Liu, J. Extending the UTAUT2 Model with a Privacy Calculus Model to Enhance the Adoption of a Health Information Application in Malaysia.Informatics2022,9, 31. [Google Scholar] [CrossRef]
    9. Feng, D.; Zhou, F.; Wang, Q.; Wu, Q.; Li, B. Efficient Aggregate Queries on Location Data with Confidentiality.Sensors2022,22, 4908. [Google Scholar] [CrossRef]
    10. Iqbal, Y.; Tahir, S.; Tahir, H.; Khan, F.; Saeed, S.; Almuhaideb, A.M.; Syed, A.M. A Novel Homomorphic Approach for Preserving Privacy of Patient Data in Telemedicine.Sensors2022,22, 4432. [Google Scholar] [CrossRef]
    11. Sobecki, A.; Barański, S.; Szymański, J. Privacy-Preserving, Scalable Blockchain-Based Solution for Monitoring Industrial Infrastructure in the Near Real-Time.Appl. Sci.2022,12, 7143. [Google Scholar] [CrossRef]
    12. Liu, B.; Zhang, X.; Shi, R.; Zhang, M.; Zhang, G. SEPSI: A Secure and Efficient Privacy-Preserving Set Intersection with Identity Authentication in IoT.Mathematics2022,10, 2120. [Google Scholar] [CrossRef]
    13. Xie, Y.; Li, Y.; Ma, Y. Data Privacy Security Mechanism of Industrial Internet of Things Based on Block Chain.Appl. Sci.2022,12, 6859. [Google Scholar] [CrossRef]
    14. Chin, F.Y.; Ozsoyoglu, G. Auditing and Inference Control in Statistical Databases.IEEE Trans. Softw. Eng.1982,SE-8, 574–582. [Google Scholar] [CrossRef]
    15. Cellamare, M.; van Gestel, A.J.; Alradhi, H.; Martin, F.; Moncada-Torres, A. A Federated Generalized Linear Model for Privacy-Preserving Analysis.Algorithms2022,15, 243. [Google Scholar] [CrossRef]
    16. Kelarev, A.; Yi, X.; Badsha, S.; Yang, X.; Rylands, L.; Seberry, J. A Multistage Protocol for Aggregated Queries in Distributed Cloud Databases with Privacy Protection.Future Gener. Comput. Syst.2019,90, 368–380. [Google Scholar] [CrossRef]
    17. Ziegler, J.; Pfitzner, B.; Schulz, H.; Saalbach, A.; Arnrich, B. Defending against Reconstruction Attacks through Differentially Private Federated Learning for Classification of Heterogeneous Chest X-ray Data.Sensors2022,22, 5195. [Google Scholar] [CrossRef]
    18. Miller, M.; Seberry, J. Audit expert and Statistical Database Security. In Proceedings of the Australian Database Research Conference, Melbourne, Australian, 6 February 1990; pp. 149–174. [Google Scholar]
    19. Brankovic, L.; Miller, M.; Širáň, J. Towards a Practical Auditing Method for the Prevention of Statistical Database Compromise. In Proceedings of the 7th Australasian Database Conference, Melbourne, VIC, Australia, 29–30 January 1996; pp. 177–184. [Google Scholar]
    20. Brankovic, L.; Miller, M.; Širáň, J. Graphs, 0-1 matrices, and usability of statistical databases.Congr. Numer.1996,120, 169–182. [Google Scholar]
    21. Miller, M.; Roberts, I.; Simpson, J. Application of symmetric chains to an optimization problem in the security of statistical databases.Bull. Inst. Combin. Appl.1991,2, 47–58. [Google Scholar]
    22. Brankovic, L.; Miller, M. An application of combinatorics to the security of statistical databases.Austral. Math. Soc. Gaz.1995,22, 173–177. [Google Scholar]
    23. Griggs, J.R. Concentrating Subset Sums atk Points.Bull. Inst. Combin. Appl.1997,20, 65–74. [Google Scholar]
    24. Kelarev, A.; Ryan, J.; Rylands, L.; Seberry, J.; Yi, X. Discrete Algorithms and Methods for Security of Statistical Databases Related to the Work of Mirka Miller.J. Discret. Algorithms2018,52–53, 112–121. [Google Scholar] [CrossRef]
    25. Wu, G.Q.; He, Y.P.; Xia, X.Y. Near-Optimal Differentially Private Mechanism for Linear Queries.Ruan Jian Xue Bao/J. Softw.2017,28, 2309–2322. [Google Scholar]
    26. Mckenna, R.; Maity, R.K.; Mazumdar, A.; Miklau, G. A Workloadadaptive Mechanism for Linear Queries under Local Differential Privacy. In Proceedings of the PKAW2010, Online, 31 August–4 September 2020; Volume 13, pp. 1905–1918. [Google Scholar]
    27. Khalili, M.M.; Vakilinia, I. Trading Privacy through Randomized Response. In Proceedings of the IEEE Conference on Computer Communications Workshops, Vancouver, BC, Canada, 10–13 May 2021. [Google Scholar] [CrossRef]
    28. Xiao, Y.; Ding, Z.; Wang, Y.; Zhang, D.; Kifer, D. Optimizing Fitness-for-Use of Differentially Private Linear Queries. In Proceedings of the 47th International Conference on Very Large Data Bases, Copenhagen, Denmark, 16–20 August 2021; Volume 14, pp. 1730–1742. [Google Scholar]
    29. Qu, Y.; Yu, S.; Zhou, W.; Chen, S.; Wu, J. Customizable Reliable Privacy-Preserving Data Sharing in Cyber-Physical Social Networks.IEEE Trans. Netw. Sci. Eng.2021,8, 269–281. [Google Scholar] [CrossRef]
    30. Qu, Y.; Gao, L.; Yu, S.; Xiang, Y. Personalized Privacy Protection of IoTs Using GAN-Enhanced Differential Privacy. InPrivacy Preservation in IoT: Machine Learning Approaches; Springer Briefs in Computer Science; Springer: Singapore, 2022; pp. 49–76. [Google Scholar] [CrossRef]
    31. Wan, Y.; Qu, Y.; Gao, L.; Xiang, Y. Differentially Privacy-Preserving Federated Learning Using Wasserstein Generative Adversarial Network. In Proceedings of the IEEE Symposium on Computers and Communications, Athens, Greece, 5–8 September 2021. [Google Scholar] [CrossRef]
    32. Cui, L.; Qu, Y.; Xie, G.; Zeng, D.; Li, R.; Shen, S.; Yu, S. Security and Privacy-Enhanced Federated Learning for Anomaly Detection in IoT Infrastructures.IEEE Trans. Ind. Inform.2022,18, 3492–3500. [Google Scholar] [CrossRef]
    33. Qu, Y.; Gao, L.; Xiang, Y.; Shen, S.; Yu, S. FedTwin: Blockchain-Enabled Adaptive Asynchronous Federated Learning for Digital Twin Networks.IEEE Netw.2022, 1–8. [Google Scholar] [CrossRef]
    34. Qu, Y.; Gao, L.; Yu, S.; Xiang, Y. Hybrid Privacy Protection of IoT Using Reinforcement Learning. InPrivacy Preservation in IoT: Machine Learning Approaches; SpringerBriefs in Computer Science; Springer: Singapore, 2022; pp. 77–109. [Google Scholar] [CrossRef]
    35. Wan, Y.; Qu, Y.; Gao, L.; Xiang, Y. Privacy-Preserving Blockchain-Enabled Federated Learning for B5G-Driven Edge Computing.Comput. Netw.2022,204, 108671. [Google Scholar] [CrossRef]
    36. Domingo-Ferrer, J.; Muralidhar, K.Privacy in Statistical Databases, UNESCO Chair in Data Privacy; Springer: Cham, Switzerland, 2020. [Google Scholar]
    37. Brankovic, L.; Giggins, H. Statistical Database Security. InSecurity, Privacy, and Trust in Modern Data Management; Data-Centric Systems and Applications; Springer: Berlin/Heidelberg, Germany, 2007; pp. 167–181. [Google Scholar]
    38. Banerjee, S.; Roy, A.Linear Algebra and Matrix Analysis for Statistics, Texts in Statistical Science; Chapman and Hall/CRC: New York, NY, USA; London, UK, 2014. [Google Scholar]
    39. NIST/SEMATECH. E-Handbook of Statistical Methods. 2022. Available online:http://www.itl.nist.gov/div898/handbook/ (accessed on 15 August 2022).
    40. Wikipedia. Variance. 2022. Available online:https://en.wikipedia.org/wiki/Variance#Discrete_random_variable (accessed on 22 August 2022).
    41. Science Buddies. Variance and Standard Deviation. 2022. Available online:https://www.sciencebuddies.org/science-fair-projects/science-fair/variance-and-standard-deviation (accessed on 22 August 2022).
    42. Yi, X.; Paulet, R.; Bertino, E.Homomorphic Encryption and Applications; Springer: New York, NY, USA, 2014. [Google Scholar]
    43. Samuelson, P. How Deviant Can You Be?J. Am. Stat. Assoc.1968,63, 1522–1525. [Google Scholar] [CrossRef]
    44. Miller, M.; Seberry, J. Relative Compromise of Statistical Databases.Aust. Comput. J.1989,21, 56–61. [Google Scholar]
    45. Yin, X.; Zhu, Y.; Hu, J. A Comprehensive Survey of Privacy-preserving Federated Learning: A Taxonomy, Review, and Future Directions.ACM Comput. Surv.2021,54, 1–36. [Google Scholar] [CrossRef]
    46. Liu, Z.; Guo, J.; Yang, W.; Fan, J.; Lam, K.; Zhao, J. Privacy-Preserving Aggregation in Federated Learning: A Survey.IEEE Trans. Big Data2022, 1–20. [Google Scholar] [CrossRef]
    Table
    Table 1. Main terminology and notation used in the present paper.
    Table 1. Main terminology and notation used in the present paper.
    TermNotation
    Database with confidential dataD
    Number of records inDn
    All records inDr1,r2,,rn
    Number of attributes in each recordm
    An arbitrary record inDr=(r1,r2,,rm)
    Quantitative attributeA1
    Characteristic attributesA2,,Am
    Values of attributeA1 inr1,r2,,rnx1,x2,,xn
    Boolean expressionBB
    Query(f,B)
    Query sampleS=B(D)
    Query outcomef(S)=f(B(D))
    Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

    © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

    Share and Cite

    MDPI and ACS Style

    Yang, X.; Yi, X.; Kelarev, A.; Rylands, L.; Lin, Y.; Ryan, J. Protecting Private Information for Two Classes of Aggregated Database Queries.Informatics2022,9, 66. https://doi.org/10.3390/informatics9030066

    AMA Style

    Yang X, Yi X, Kelarev A, Rylands L, Lin Y, Ryan J. Protecting Private Information for Two Classes of Aggregated Database Queries.Informatics. 2022; 9(3):66. https://doi.org/10.3390/informatics9030066

    Chicago/Turabian Style

    Yang, Xuechao, Xun Yi, Andrei Kelarev, Leanne Rylands, Yuqing Lin, and Joe Ryan. 2022. "Protecting Private Information for Two Classes of Aggregated Database Queries"Informatics 9, no. 3: 66. https://doi.org/10.3390/informatics9030066

    APA Style

    Yang, X., Yi, X., Kelarev, A., Rylands, L., Lin, Y., & Ryan, J. (2022). Protecting Private Information for Two Classes of Aggregated Database Queries.Informatics,9(3), 66. https://doi.org/10.3390/informatics9030066

    Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further detailshere.

    Article Metrics

    No
    No

    Article Access Statistics

    For more information on the journal statistics, clickhere.
    Multiple requests from the same IP address are counted as one view.
    Informatics, EISSN 2227-9709, Published by MDPI
    RSSContent Alert

    Further Information

    Article Processing Charges Pay an Invoice Open Access Policy Contact MDPI Jobs at MDPI

    Guidelines

    For Authors For Reviewers For Editors For Librarians For Publishers For Societies For Conference Organizers

    MDPI Initiatives

    Sciforum MDPI Books Preprints.org Scilit SciProfiles Encyclopedia JAMS Proceedings Series

    Follow MDPI

    LinkedIn Facebook X
    MDPI

    Subscribe to receive issue release notifications and newsletters from MDPI journals

    © 1996-2025 MDPI (Basel, Switzerland) unless otherwise stated
    Terms and Conditions Privacy Policy
    We use cookies on our website to ensure you get the best experience.
    Read more about our cookieshere.
    Accept
    Back to TopTop
    [8]ページ先頭

    ©2009-2025 Movatter.jp