From Privacy Engineering by Nishant Bhajaria

This article series explores incorporating privacy into your design from the beginning using automation.


Take 40% off Privacy Engineering by entering fccbhajaria into the discount code box at checkout at manning.com.


Lest you think that the privacy risk in data sharing is merely theoretical, let’s first look at an example of how data in motion can help track, and therefore violate the privacy of, the most protected individual in the world.

The New York Times has done some amazing work as part of its Privacy Project. In keeping with the sharing aspect of this course, let’s focus on the implications of the apps on your device sharing your location data.

The research conducted by the Times found that based on the apps on your phone and their sharing of real-time location data, it is possible to track anyone. Yes, even the President of the United States.

The Times Privacy Project obtained a dataset with more than 50 billion location pings from the phones of more than 12 million people in the U.S.

According to the Times, this was a random sample from 2016 and 2017.

But it took only minutes — with assistance from publicly available information — for the Times to deanonymize location data and track the whereabouts of President Trump.

(The map that enabled the Times to track the President is available here: https://www.nytimes.com/interactive/2019/12/20/opinion/location-data-national-security.html)

This map and the movements therein will show how location sharing can help track someone. This map was populated by location pings from the cellphone of someone in the President’s entourage. More on that person in a few moments.

As you can see, a single dot appeared on the screen, representing the precise location of someone in President Trump’s entourage at 7:10 a.m. It lingered around the grounds of the president’s Mar-a-Lago Club in Palm Beach, Florida, where the president was staying, for about an hour.

Then, the dot was on the move.

The dot traveled to the Trump National Golf Club in Jupiter, about 30 minutes north of the hotel, pinging again at 9:24 a.m. just outside the compound. The president was there to play golf with Prime Minister Shinzo Abe of Japan.

There the dot stayed until at least 1:12 p.m., when it moved to the Trump International Golf Club in West Palm Beach, where the world leaders enjoyed a private lunch.

By 5:08 p.m., the phone was back at Mar-a-Lago.

The president had what he called a working dinner with Mr. Abe that night.

As I have mentioned before, data sharing is effective not just because of what you or a company shares about you.

Shared data, when combined with data already available, is what makes data sharing effective and problematic.

For now, let’s track the President of the United States and his entourage.

The same phone represented by the dot we saw previously pinged a dozen times at the nearby Secret Service field office.

From computer screens more than 1,000 miles away, journalists at the Times could watch this person travel from exclusive areas at Palm Beach International Airport to Mar-a-Lago.

These movements were accurate to within a few feet of the president’s entourage.

The device owner was easy to trace, revealing the outline of the person’s work and life. The New York Times believes that the device owner was a Secret Service agent, whose home was also clearly identifiable in the data.

Connecting the home to public deeds revealed the person’s name, their spouse’s name, exposing even more details about both families.

Now, this secret service owner has company. Even a prominent senator’s national security adviser — someone for whom privacy and security are core to their every working day — was identified and tracked in the data.

Who shared this data?

We can be fairly certain that a secret service agent protecting the president was not actively sharing THEIR location real-time.

But this person probably had apps on their phone that shared the data. And this is why you need to use security controls to protect data from privacy harms.

And here is another example.

Strava, the fitness tracking app, uses satellites to record its users’ runs, bike rides, and other workouts. (Source) It also makes many of these routes available for public view on its Global Heatmap, which shows where people around the world go running and cycling.

This cool feature ended up creating headaches for Strava and the US military. US service members had been recording their runs around the compounds of their military bases. That information made it on the Strava heatmap and unknowingly revealed their locations.

Twitter users figured out they could identify outlines and activity patterns on US military bases in places like Syria, Afghanistan, and Somalia. The biggest potential threat was not the base locations themselves, which are public, but what went on in and around the bases.

The map showed activity patterns within and around the base, giving away supply and patrol routes, as well as the precise location of facilities like mess halls and living quarters. Further, users could get location-specific data, allowing them to link map activity to specific profiles.

The result: You could find out which service members were in which locations at a given point in time.

Strava responded that all users have the ability to set activities to private so they’re not included in the Heatmap. While that explanation is technically correct, when it comes to security and privacy, the companies building the products will own the outcomes, not the users.

As a former product manager, I understand what Strava was thinking when it built the heatmap. A heatmap provides visibility into adoption and gives users a sense of belonging to a fitness-centric community.

In doing so, it creates a positive motivation to run and then log your data. This was especially true in the early days of social media, when sharing was empowerment

However, your feature is only as privacy-safe as the most creative invader of data privacy.

If a privacy expert with an eye on the risks around data sharing had reviewed this app design, they may have raised questions like

  1. Who were these heatmaps visible to?
  2. What additional information can be inferred from them about Strava users?
  3. Could we alter the data to make it less identifiable when it comes to sensitive locations like military bases, refugee housing, etc.

Here are some other lessons from this incident:

  1. Data sharing is not just about sharing data between one company and another.
  2. Any time data you have collected from someone else leaves your company, as a practical matter you are essentially sharing that information with outside entities.
  3. In the age of social media, publicly available information, data on the dark web obtained by way of breaches and ML-based tools to combine data, identifying people has become easier than ever.

So, for privacy, you need to think of “data sharing” anytime data leaves your domain. This is true when you are a company collecting user data. This is ALSO true when you are just an individual broadcasting your data via your cell phone.

This is not just about privacy, but also about safety.

It is therefore vital that anyone creating an app that shares data also builds in privacy techniques to anonymize this data and/or reduce access to it. Using the same security techniques that are used to protect your data from external hackers and bad actors, you can prevent privacy harms from internal bad actors and maladroit employees. Having set this context, the book looks at some techniques to anonymize data for privacy before sharing it.

If you want to learn more about the book, you can check it out on our browser-based liveBook platform here.