From Privacy Engineering by Nishant Bhajaria

This article series explores incorporating privacy into your design from the beginning using automation.


Take 40% off Privacy Engineering by entering fccbhajaria into the discount code box at checkout at manning.com.


There is no known official and scientific definition for “Data Inventory” so this book will aim to create a definition that is intuitive and actionable.

This process of adding tags derived from your Data Classification to your data systems is called “Data Inventory.” As you start building your Data Inventory, you are indexing the contents of your data stores and making individual components expeditiously searchable. Data Inventory is like building the backend of a search engine for your data, much like a team of smart engineers built the backend of tools like Google.

The definition covers a very intuitive need for Data Inventory but it is key that executives and aspiring executives understand specifically the risk mitigation and business enablement Data Inventory makes possible.

In the lead up to the GDPR, the International Association of Privacy Professionals (IAPP) provided an enumerated plan for companies to get a headstart on compliance. This was to be a checklist so that companies would know where to start and what structures and processes to create as they prepared for a post-GDPR world, once where privacy was to become front and center like never before. This list remains fairly applicable even as its individual components have become more complex to implement and with more variations based on a company’s use of data.

I have listed the plan below with my insights added in.

  1. Conduct data inventory and mapping. This assumes that the starting point of a sound data protection program is the ability to classify, catalog and discover data such that privacy risk is comprehensible at the time of data collection and access. This book provides a deep dive into Data Governance based on this time-tested guidance from industry experts.
  2. Establish a lawful basis for data processing and cross-border transfers. This is something your legal team would advise on, but it may be the case that how you can process data and where you can transfer it to may take on additional complexities when it comes to geographic boundaries. Making that assessment requires exactly the sort of insight and discoverability that Data Classification and Data Inventory makes possible.
  3. Build and maintain a system to govern the data protection process, including establishing leadership (where appropriate, a data protection officer, setting policies and training personnel)
  4. Perform data protection impact assessments, along with data protection by design and by default. This typically refers to the privacy risk assessments and privacy reviews that your teams conduct on products and features. We will be looking at privacy risk assessments in a later chapter.
  5. Prepare and implement data retention and record keeping policies and systems so that you can meet information transparency and communications obligations. These obligations could form a part of your audits, for which prudent book-keeping is a prerequisite. Otherwise, your audit processes could become cumbersome and expensive.
  6. Configure systems and put in place processes to accommodate data subjects’ rights, including access, rectification, erasure, portability, objection to automated processing and revocation of consent. As mentioned before, data subjects’ rights (DSAR) are a key commitment for many companies thanks to laws like the GDPR and the CCPA. Having a Data Inventory is key to meeting these commitments at scale and with accuracy.
  7. Prepare for security breach response and notification. You will want your legal team and/or outside counsel to weigh in, but several jurisdictions in the United States and elsewhere have breach notification laws. These laws create expectations that companies that suffer from a data breach need to notify the impacted entities with specific pieces of information and within specific timeframes.
  8. Have a sound vendor management protocol. This step is critical since vendors who may get access to your systems and your data could make decisions with attendant privacy implications. Assessing the ability of your vendors to follow your data protection guidelines and their past record is critical. As we saw previously, companies may claim that data privacy issues occurred at third parties, but your stakeholders in the privacy community may hold you responsible nonetheless.
  9. Establish systems and channels for communicating with your data protection authority. It is possible that you will need to provide to regulatory authorities granular details around data, your decisions around handling it and time-stamped records. Data inventory will enable and accelerate such a disclosure process, and that could help build a strong trust relationship as well.

To those executives who seek comfort from the fact that the only companies in the news for privacy are the big tech giants, I have this to say: These high-visibility companies faced a moment of truth AFTER rapid growth; at least they had the money to build privacy teams and lawyers to represent them in court. What if regulators or activist citizens come after a startup pre-IPO and VCs fail to even get a basic return on their investment?

Additionally, the smaller your size and the more limited your resources, the harder it will be to adapt to a sudden regulatory change. Not unless you use a fully managed IT support for businesses across Manchester to assist you. I know of several small companies who went without that found their roadmaps severely impacted, so if you think privacy is expensive, the opportunity cost of not having privacy controls will almost certainly be higher. As a somewhat imperfect analog, consider this: Bill Gates recently said that the antitrust investigation around Microsoft in the late 1990s affected the company’s ability to effectively comprehend the threat posed by Google’s SaaS model and Apple’s mobile computing model, resulting in a lost decade for Microsoft. Why would you knowingly subject your company to such uncertainty, especially when doing the right thing with privacy and trust will help your business build trust with your customers and help growth?

Data Inventory is a key part of your data protection program. Having established what Data Inventory is and the forcing functions that render it key, we will now look at the foundational building blocks of a Data Inventory.

“Tagging” or “Labeling” is something we all do routinely in our lives to help locate important materials like our tax returns or medical records. However, this concept and process is key when it comes to Data Governance.

Data Inventory is the process of applying the Data Classification onto your physical data stores. As we have already seen, the classification process is fairly cross-functional, and forces teams to come up with labels that describe the nature of the data and the privacy risk attached to it. However, there are additional steps required to ensure that your Data inventory is functional and serves its purpose i.e. indexing data, making it searchable and easier to protect.

The first steps in this process, and this is one that many companies tend to overlook to their eventual detriment, is to come up with tags or labels. These tags are the machine-readable incarnation of the Data Classification.

Data Inventory may well be the first time a company has a common definition around the data previously collected by several teams across the company. The task of finalizing these tags can often be confusing for many teams that may have gotten inured to their own naming conventions.

In order to simplify this process, I wanted to provide some criteria for useable data tags that will help your Data Inventory process and outcomes:

  1. These data labels and data value tags should be such that they are easily consumed by enforcement points like data loss prevention gateways or information rights management for actionable intelligence.
  2. The tags should be compatible with and supportive of external regulatory requirements (e.g. GDPR, CCPA). There will often be occasions where you need to apply controls germane to specific legislation, so being able to tag your data appropriately will be helpful. As an analog, in GMail, you can tag a specific email with the labels “family vacation December 2019” and “Mom.” In this case, a search for either term will surface that email.
  3. They should be applicable to all data in these states: data at rest, data in transit, and data in use. When it comes to data, you will need to protect it regardless of its state, so the tags that enable you to locate it should yield similar outcomes regardless of whether the data is being transported between data centers or whether the data lives in a dare warehouse.
  4. Tag definitions should be canonical, unambiguous, and machine-readable. They can be used either individually (e.g. for individual database column or API parameters), or as a group, represented as comma separated values, where applicable (e.g. for an entire dataset, or API).

The above enumeration is not exhaustive, but should offer you a great place to start. It is vital that your team take the exercise to come up with tag names seriously. The process to apply these tags, as we will find out in part 5, can be extremely expensive. This is one area where months and years of re-tagging will save you weeks of planning.

If you want to learn more about the book, you can check it out on our browser-based liveBook platform here.