From Privacy Engineering by Nishant Bhajaria

This article series explores incorporating privacy into your design from the beginning using automation.


Take 40% off Privacy Engineering by entering fccbhajaria into the discount code box at checkout at manning.com.


You can check out part 1 and part 2 before starting this article if you missed them.

Let’s assume you manage a company that analyzes purchasing of medicines to advise a pharmacy so that they can plan for new orders accordingly.

As such, you have access to the prescriptions that you have filled, with the names of patients, their birth dates, their gender(s), addresses, etc. The compilation of these prescriptions over a period of time will give you a sense of what the demand looks like. Based on that demand forecast, you can plan for future orders from drug manufacturers so that you can make sure future prescriptions can be filled on time without making patients wait.

In your database, the information that personally identifies the patients and what medicines they take would fall into the restricted category. Under most laws, this information is extremely sensitive and even beyond the regulatory angle, people are extremely protective of information that deals with their healthcare.

It stands to reason that data in this bucket will be tied to strict access controls and constrained retention periods.

However, in our use case, you have no reason to focus on individual users and their health/medical situation. You are more interested in the aggregate prescription information over time so as to plan for the future.

As such, you could modify your storage patterns. The two tables below explain how.

Table 1. Individual Prescription Listing

Name

Medicine

Date

Josh Smith

Ritalin

12/1/2019

Karen Jones

Ritalin

12/7/2019

Oona Blair

Losartan

12/8/2019

Vikram Khanna

Ritalin

12/15/2019

Tony Brown

Losartan

12/18/2019

Theresa Johnson

Losartan

12/22/2019

TABLE 1 represents a database in which you have names of individuals listed with the medicines they filled at a pharmacy. This information could uniquely identify individuals and hence would be classified as “Restricted.” However, you may want to retain this data longer or allow more people to access it. TABLE 2 shows how you may do that.

Table 2. Aggregated Prescription Listing

Medicine Name             

Number of prescriptions

Date Range

Ritalin

3

12/1/2019 – 12/31/2019

Losartan

3

12/1/2019 – 12/31/2019

In TABLE 2, you can see a database that has redacted the name of the patients and yet retained the data you really care about i.e. how many times specific medicines were purchased in the pharmacy. You could make an informed argument that the absence of personally identifiable information in this table means that this could be classified as “Confidential,” which means you could retain it for longer – for example to compare December 2019 to December 2018.

FIGURE 3 below shows how transitioning from TABLE 1 to TABLE 2 is a win for privacy, cost saving and security as well.


Figure 3. Changing data for better privacy and lower costs


This is an oversimplified example, but the takeaway is that Data Classification allows you to understand your use case better, and manage the data protection techniques more prudently. The thinking and collaboration required here will ensure you are more thoughtful and proactive about what you collect and how long you keep it. The potential savings in storage costs and reduction in risk are benefits that will accrue over time, and you will be able to build a credible narrative that you are collecting data for legitimate business reasons without being careless about user privacy.

Part 4 introduces the concept of a “Data Inventory” as a collection of tags derived from your Data Classification.

If you want to learn more about the book, you can check it out on our browser-based liveBook platform here.