The transmission of own and partly confidential data to another agent, e.g., for cloud computing, comes along with the risk of enabling the receiver to infer information he is not entitled to learn. We consider a specific countermeasure against unwanted inferences about associations between data values whose combination of attributes are declared to be sensitive. This countermeasure fragments a relation instance into attribute-disjoint and duplicate-preserving projections such that no sensitive attribute combination is contained in any projection. Though attribute-disjointness is intended to make a reconstruction of original data impossible for the receiver, the goal of inference-proofness will not always be accomplished. In particular, inferences might be based on combinatorial effects, since duplicate-preservation implies that the frequencies of value associations in visible projections equals those in the original relation instance. Moreover, the receiver might exploit functional dependencies, numerical dependencies and tuple-generating dependencies, as presumably known from the underlying database schema. We identify several conditions for a fragmentation to violate inference-proofness. Besides complementing classical results about lossless decompositions, our results could be employed for designing better countermeasures.
A data owner might consider to somehow fragment his relational data and to only make the resulting fragments accessible to another agent, which, for a prominent example, might offer some cloud services to the owner. Such a fragmentation then aims at hiding some information about sensitive associations contained in the original data to the service agent. Thus, though in principle being seen as cooperating, the service agent is also perceived as potentially attacking the confidentiality interests of the owner by attempting to infer hidden original information from accessible data and, if applicable, additional background knowledge. Accordingly, the data owner should carefully choose a fragmentation technique and thoroughly investigate whether the resulting fragmentation of his specific data sufficiently satisfies his confidentiality interests.
Our considerations are motivated by the particular proposal of “combining fragmentation and encryption to protect privacy in data storage”, a technique which converts a given relation instance and some confidentiality requirements on the schema level into a set of vertical relational fragments all of which might be accessible for an attacker. We focus on three aspects of this proposal:
Focusing on the enforcement of confidentiality requirements by means of fragmentation, we will purposely ignore all cryptographic aspects and neglect the details of reconstructability of the original data by the data owner. For further simplifying our investigations, we will also assume that none of the attributes get encrypted values:
For this setting, we will discuss various kinds of successful inference attacks based on observable frequencies of visible data items and on additional background knowledge in the form of data dependencies and actual content data, in spite of the attribute-disjointness at first glance generating unrelated fragments. In doing so, we will present some fundamental assertions about such inferences, together with some complexity considerations. The resulting main contribution will be the identification of both the crucial role of frequencies and the challenge to future research how to block their exploitation.