From Privacy-Preserving Machine Learning by J. Morris Chang, Di Zhuang, and G. Dumindu Samaraweera

This article delves into how Machine Learning algorithms interact with data and the importance of preserving data privacy.

Read it if you’re a machine learning engineer, or a developer building around machine learning.

Machine Learning (ML) can be seen as the capability of an algorithm to mimic intelligent human behavior in terms of performing complex tasks in a way how humans solve problems by looking at the data from different angles and analyzing them across different domains. As we can see, this process of learning is being utilized by various applications in our day-to-day life, from product recommendation systems in online web portals to sophisticated mechanisms of intrusion detection in internet security applications.


In terms of producing high confidence results, machine learning applications require vast amounts of data collected from various sources. The web search queries, browsing history, transaction history of online purchases, movie preferences, individual location check-ins are some of the information that is being collected and stored on a daily basis, most of the time even without being known to the users. Some of this information is private to the individual’s, and somehow are being uploaded to high-end centralized servers mostly in clear-text format for the use of machine learning algorithms to extract patterns and build ML models from them.
However, the problem is not limited to the collection of these private data by different ML applications. At first, these are exposed to insider attacks where the information is available for inside workers at these companies. For example, database administrators or application developers might have access to this data without many restrictions. On the other hand, this data might also be exposed to external hacking attacks, which could reveal the private information to the outside world. In addition, most importantly, it is possible to extract additional information from the private data even if they are anonymized or datasets and ML models are inaccessible and only the testing results are revealed.


In order to understand the rationale between data privacy and ML algorithms, it is crucial to know how ML systems work with the data they processed. Typically, input data of ML algorithm (captured from various sources) is represented as a set of sample values. Each sample is often comprised of a set of features. Let’s take an example of a facial recognition algorithm that recognizes people when you upload an image to Facebook. Consider an image of 100×100 pixels where each pixel is represented by a single number from 0 to 255. These pixels then can be concatenated to form a feature vector and each image can be represented to the ML algorithm as a vector of data along with an associated label for that data. During the training phase, the ML algorithm would use multiple feature vectors and their associated labels to produce an ML model. This model will then be used with fresh data (testing samples) in order to predict the result, in this case, to recognize a person.

So, how can you measure the performance of ML models?

The ability of ML model to accurately predict the final result is a measure of how well this ML model generalizes to unseen data or the data that is introduced for the first time. This accuracy is usually measured empirically, and it varies on various factors such as the number of training samples, the algorithm used to build the model, quality of the training samples, the selection of hyperparameters of the ML algorithm and so on. In some of the applications, it is equally important to pre-process the raw data before feeding them to the ML models. In such cases, they use different mechanisms to extract the most important features out from raw data and may involve different techniques (e.g., Principal Component Analysis) to project data to a lower dimension.

However, whatever the ML task it is, whether the data is labeled or not, whether raw data is pre-processed, modern state-of-the-art ML algorithms are capable of learning beyond the intended task; hence, it is of utmost importance to explore these threats toward designing privacy-preserved ML algorithms.


Apart from the privacy leakages we discussed in the previous section, there are many other threats and attacks for machine learning systems being proposed and discussed in the literature, which has the potential to deploy on real-world application scenarios. For instance, figure 1 illustrates the timeline of a list of threats and attacks for machine learning systems which includes De-Anonymization (Re-Identification) attacks, Reconstruction attacks, Parameter inference attacks, Model inversion attacks and Membership inference attacks. The details of each threat or attack are briefly introduced in the next section.

Figure 1 A timeline of threats and attacks identified for machine learning systems.

While some leading companies, e.g. Google and Apple, started designing and utilizing their own privacy-preserving methodologies for different machine learning tasks, it is still a challenge to improve the public awareness on these privacy technologies mainly due to the lack of well-organized tutorials and books that explain the concept methodically and systematically.

The Problem of Private Data in the Clear

Let us first look at the figure 2 which illustrates a typical client-server application scenario. As you can see, whenever an application collects the private information, it is often transferred to the cloud-based servers, possibly through encrypted channels where the learning happens. Consider the following scenario of a mobile application connects to a cloud-server to perform an inference task. For example, a parking app sends location data of the user to find a nearby available garage. In this case, even though the communication channel is secured, these collected data most likely reside in the cloud in their original form, which is unencrypted or as features extracted from the original form. This is one of the biggest challenges to privacy as data would be susceptible to various insider and outsider attacks.

Figure 2 The problem of storing private data in the cleartext format.

Reconstruction Attacks

By now we understand that it is important to store private data in the server either in encrypted form or we shall not send raw data directly to the server in its original form. However, there is another possible threat imposed by reconstruction attacks where the attacker can reconstruct data even without having the access to the full set of raw data in the server. In this case, the adversary gains the advantage of having the external knowledge on feature vectors or the data used to build the ML model. This usually requires the direct access to the ML models deployed on the sever and we call it as white-box access. And then, adversary tries to reconstruct the raw private data by using the knowledge of the feature vectors in the model. These attacks are possible when the feature vectors used during the training phase to build the ML model were not flushed from the server after building the model.

Table 1. Difference between white-box, black-box and grey-box access.


White-box Access Black-box Access Grey-box Access
Have full access to the internal details of the ML models (e.g. parameters, loss functions). Have no access to the internal details of the ML models. Have partial access to the internal details of the ML models.


With that high-level overview on how reconstruction attack works, let’s look into more details on how it is possible. The approach it takes to reconstruct the data depends on what information is available for the attacker to accurately reproduce data (background knowledge). Let’s consider following two use case examples of biometric-based authentication systems.

  • Reconstruction of fingerprint images from minutiae template: Nowadays, fingerprint-based authentication is more popular among many different organizations where users are authenticated by comparing a newly acquired fingerprint image with a fingerprint image already saved in the user authentication system. In general, these fingerprint matching systems use four different types of representation schemes known as grayscale image, skeleton image, phase image, and minutiae, among which minutiae-based representation is the most widely adopted due to its compactness of the representation. On the other hand, because of this compactness, people generally have the impression that minutiae template does not contain enough information and it makes harder to reconstruct the original fingerprint image. However, in 2011 a set of researchers [1] successfully demonstrated an attack to reconstruct fingerprint images directly from minutiae templates. In a nutshell, the idea was to reconstruct the phase image from minutiae which is then converted into the original (grayscale) image and then to launch an attack against fingerprint recognition systems to infer private data.
  • Reconstruction attacks against mobile-based continuous authentication systems: In an another scenario, Al-Rubaie et al. [2] investigated the possibility of reconstruction attacks using gestures raw data from user’s authentication profiles in mobile-based continuous authentication systems. The idea was how possibly to use the available private information leaked to the adversary. At a high level, they have utilized the actual feature vectors stored in the user profiles to reconstruct raw data and then to use that information to hack into other systems.

In most of these cases, privacy threat was resulted due to a security threat to the authentication system in which reconstructed raw data has misguided the ML system by forcing it to think that raw data belonged to a certain user. For example, in the case of mobile-based continuous authentication systems, it gains access to the mobile device and its personal records, thereby failing to protect the user’s privacy. Moreover, there is another class of reconstruction attack that might reveal private information directly, which we are going to discuss with the following scenario.


In 2019, Simson Garfinkel and his team at the US Census Bureau, presented a detailed example of how a reconstruction attack [3] can be primed by an attacker, just utilizing the data available to the public. It further explained that, publishing the frequency count, mean and the median age of a population broken down by a few demographics, allows anyone with access to the statistics and a personal computer to accurately reconstruct the personal data of almost the entire survey population. This incident raised the alarm on privacy concerns of Census data. Based on this finding, US Census Bureau conducted a series of experiments on 2010 Census data and found that among eight billion statistics, 25 data points per person allowed successful reconstruction of confidential records for more than 40% of the US population.
We can now understand how real the risk is. The vast amount of sensitive data published by statistical agencies each year, may give an opportunity for a determined attacker more than enough information to reconstruct some or entire target database and breach the privacy of millions of people. The US Census Bureau has identified this risk and implemented the correct measures to protect the 2020 US Census. But it is important to note that reconstruction is no longer a theoretical danger. It is real.
So now the question is, how can we alleviate such attacks? In terms of mitigating reconstruction attacks tailored for machine learning models, the best approach is to avoid the storage of explicit feature vectors inside the ML model. In a case if they are required to be stored (e.g. SVM requires feature vectors/meta-data to be stored alongside with the model), explicit feature vectors should be inaccessible to the resulting party so that it is hard for them to reconstruct. On the other hand, toward mitigating the reconstruction attacks targeting database or data mining operations (as we discussed with the US Census example), there are different, well-established data sanitization/disclosure-avoidance techniques that can be used in practice. This is just a summary of how reconstruction attack works.

Model inversion attacks

While some ML models store explicit feature vectors alongside them, there are a bunch of ML algorithms that provide models where they do not store feature vectors inside the model (e.g. neural networks, ridge regression). In such circumstances, the adversary’s knowledge is limited. However, still, adversary might have access to the ML model even without stored feature vectors as we discussed with white-box access scenario. On the other hand, there can be another scenario we call it as black-box access where the adversary does not have direct access to the ML model. But the adversary can listen to the incoming requests to a machine learning model when a user submits new testing samples and thereafter, the responses generated by the model for a given sample. In model inversion attacks, the adversary’s target is to utilize these responses generated by the ML model in a way such that to resemble the original feature vectors used to create this ML model [4]. A summary of how model inversion attack works is depicted in figure 3 shown below.
Typically, such attacks utilize the confidence values received from the ML model (e.g. probability decision score) to generate feature vectors. For example, let us consider a facial recognition algorithm. Once you submit a face image to the algorithm, based on the features it identifies on that image, algorithm produces a result vector with the class and the corresponding confidence score. For now, assume the result vector generated by the algorithm is as follows:

[John:.99, Simon:.87, Willey:.65]

What is the meaning of this result? It says that the algorithm is 99% confident (decision score) that this image belongs to John (class) and 87% confident that it belongs to Simon and so on. Now what if the adversary can listen to all these communications. Even though he does not know what the input image to the algorithm is or whose image it is, but he can deduce that if you input this kind of image, you will probably get a confidence score in this range. At this point, by just accumulating the results obtained over a certain period, attacker would be able to produce an average score that represents a certain class in the machine learning model. Once the class is identified, it could be one of the most threatening privacy attacks if it represents a single individual, as such the case of a facial recognition algorithm where each class represents each individual.
Therefore, the model inversion attack is a serious threat for machine learning based systems. In addition, it is noteworthy that in some cases, model inversion attack can be classified as a sub-class of reconstruction attack based on how well the features are arranged in raw data.

Figure 3 The difference between white-box access and black-box access. White-box access requires direct access/permission to the ML model to infer data whereas black-box access usually works on just by listening to the communication channel.

In terms of mitigating model inversion attacks, it is important to limit the adversary to black-box access, thereby limiting the adversary’s knowledge. For instance, in our above-mentioned example with face recognition-based authentication, instead of providing the exact confidence value of a certain ML class, it can be rounded, or only the final predicted class label can be provided/returned so that it is hard for an adversary to learn anything beyond.

Membership inference attacks

While model inversion attacks do not try to reproduce an actual sample from the training dataset, the membership inference attack tries to infer a sample based on the ML model output to identify whether it was in the original training dataset or not. The idea behind the membership inference attack is, whenever given an ML model and a sample with adversary’s domain knowledge, it tries to determine whether this sample was a member of the original training dataset used to build the ML model as shown in the figure 4.
Let us consider a scenario of machine learning based system of disease diagnosis by analyzing the input medical information/conditions. For instance, let us consider a patient who participates to a study that diagnose the right difficulty level for a complex game destined to people suffering from Alzheimer disease. Now at this point, if an attacker succeeds to carryout membership inference, he will simply deduce that this patient suffers from Alzheimer disease. This is a serious leakage of sensitive information that could be used at a later time for a targeted action against the patient.


In this kind of attack, adversary’s intention is to learn whether a certain personal record of an individual has been used to train the original ML model. In order to achieve this, attacker first generates a secondary attack model by utilizing its knowledge on the domain. Typically, these attack models are trained using shadow models generated based on either using noisy versions of real data, or data extracted from model inversion attacks, or statistics-based synthesis. In terms of training these shadow models, adversary requires the black-box or white-box access to the original ML model and sample datasets.
With that, attacker has the access to both the ML service and the attack model. Thereby, attacker observes the dissimilarities of the output data produced by ML model predictions that were used during the training phase and the data that were not included in the training set [5] as depicted in the figure 4. It is noteworthy that the idea of membership inference is to learn whether a particular record is in the training dataset, not to learn the dataset itself.
In the course of mitigating membership inference attacks, there are a couple of strategies, such as regularization or coarse precision of prediction values. We will be discussing these regularization strategies in more details under chapter 8 of this book. However, it was found that limiting the output of the ML model only to the class labels is the most effective way of downgrading the threat. In addition, Differential Privacy (DP) is known to be one of the influential resisting mechanisms for membership inference attacks.

Figure 4 How membership inference attack works.

De-Anonymization or Re-Identification Attacks

Anonymizing the datasets before releasing them to third party users is one of the typical approaches of protecting user privacy. To that end, some organizations employed different approaches to release only the anonymized versions of their datasets for the use of the general public (e.g. public voter database, Netflix prize dataset, AOL search data etc.). As with the case of Netflix, they publicized a large dataset of 500,000 Netflix subscribers with anonymized movie ratings by inviting different contestants to perform data mining and propose new algorithms to build better movie recommender systems. However, in 2008 Narayanan et al. [6] demonstrated that even with the data anonymization, it is possible to infer private information of individuals. In their attack scenario, they utilized the Internet Movie Database (IMDb) as the background knowledge to identify the Netflix records of known users, apparently uncovering users’ political preferences. Thus, it is well understood that anonymization cannot reliably protect data privacy against strong adversaries.

Challenges of Privacy Protection in Big Data Analytics

Apart from the different threats and attacks specifically tailored for machine learning models and frameworks, there is another set of privacy challenge that arise at the opposite ends of the machine learning & privacy spectrum. That is, how can we protect data-at-rest for example, data stored in a database system before feeding to an ML task, and data-in-transit where data flows through various elements in the underlying machine learning framework. To that end, the ongoing move toward larger and connected data reservoirs makes it more challenging for database systems and data analytics tools to protect them against data privacy threats.
One of the significant privacy threats in database systems is linking different database instances together to explore a unique fingerprint of an individual. This can be categorized as a subclass of a re-identification attack, and most often, they are insider attacks in terms of database applications. Based on the identification and formation of data, these attacks can be further classified into two types of attacks.


The ultimate purpose of this attack is to find a correlation between two or more data fields in a database or a set of database instances to create unique and informative data tuples. In some of the cases, the additional domain knowledge from external data sources can also be brought into the identification process. As an example, let’s take a medical records database that lists user information with a history of medication prescriptions. Consider another data source with user information along with pharmacies visited. Once both of these sources linked together, the correlated database can have some additional knowledge, such as which patient bought his or her medication from which pharmacy. Moreover, if it is smart enough to explore the frequently visited pharmacies, an adversary might obtain a rough estimation of where the patient resides. Thus, the final correlated dataset can have more private information per user than the original dataset.


While correlation attack tries to extract more private information in general, an identification attack is trying to identify a targeted individual by linking entries in a database instance. The idea is to explore more private information about a particular individual in terms of identification. This can be considered as one of the most threatening types of data privacy attacks for a dataset as it has more impact on an individual’s privacy. As an example, if an employer looked into all the occurrences of its employees in a medical record or pharmacy customer database, it may reveal lots of additional information about their medication records, medical treatments, and illnesses of its employees. Thus, it can be an increasing threat to the privacy of an individual.
At this point, it is well understood that we all need to have sophisticated mechanisms in data analytics/ machine learning applications to protect an individual’s privacy from different targeted attacks. By using multiple layers of data anonymization and data pseudonymization techniques, it is possible to protect privacy in a way such that linkage of different datasets is still possible, however, identifying an individual by analyzing the data records is challenging.

Learn more about the book here.


[1] J. Feng and A. K. Jain, “Fingerprint reconstruction: From minutiae to phase,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 2, pp. 209–223, 2011, doi: 10.1109/TPAMI.2010.77.
[2] M. Al-Rubaie and J. M. Chang, “Reconstruction Attacks Against Mobile-Based Continuous Authentication Systems in the Cloud,” IEEE Trans. Inf. Forensics Secur., vol. 11, no. 12, pp. 2648–2663, 2016, doi: 10.1109/TIFS.2016.2594132.
[3] S. Garfinkel, J. M. Abowd, and C. Martindale, “Understanding Database Reconstruction Attacks on Public Data,” Commun. ACM, vol. 62, no. March 2019, pp. 1–26, 2019.
[4] M. Fredrikson, S. Jha, and T. Ristenpart, “Model inversion attacks that exploit confidence information and basic countermeasures,” Proc. ACM Conf. Comput. Commun. Secur., vol. 2015-Octob, pp. 1322–1333, 2015, doi: 10.1145/2810103.2813677.
[5] R. Shokri, M. Stronati, C. Song, and V. Shmatikov, “Membership Inference Attacks Against Machine Learning Models,” Proc. – IEEE Symp. Secur. Priv., pp. 3–18, 2017, doi: 10.1109/SP.2017.41.
[6] A. Narayanan and V. Shmatikov, “Robust de-anonymization of large sparse datasets,” Proc. – IEEE Symp. Secur. Priv., pp. 111–125, 2008, doi: 10.1109/SP.2008.33.