27 September 2023

Dis-covered! How ‘anonymous’ data might not be so anonymous

Start the conversation

Nick Wells and Leslie Picker* say that de-identified data is the bedrock of modern marketing and research, but a study suggests it’s possible to re-identify people from anonymous data.


Image: Gerd Altmann

We’ve all done it: When signing up for an account online, we’ve clicked “I agree” to have our data sold to third parties.

It will be anonymised, we’re assured, and only a small percentage of data will be made available to others.

But how secure can we be that our personal data can’t be traced back to us?

That’s the central question that a team of researchers at Université catholique de Louvain in Belgium and Imperial College London sought to answer.

The conclusion is — “not very.”

Using machine learning, the researchers developed a system to estimate the likelihood that a specific person could be re-identified from an anonymised dataset containing demographic characteristics.

The researchers’ model suggests that over 99 per cent of Americans could be correctly re-identified from any dataset using 15 demographic attributes, including age, gender and marital status.

“While there might be a lot of people who are in their thirties, male and living in New York City, far fewer of them were also born on 5 January, are driving a red sports car and live with two kids (both girls) and one dog,” said Luc Rocher, a PhD candidate at Université catholique de Louvain and the study’s lead author.

Personal data can be used for research, illicit activities and even investing, as CNBC has previously reported.

Their paper, “Estimating the Success of Re-identifications in Incomplete Datasets Using Generative Models,” was published in the journal Nature Communications.

Their findings suggest that commonly used anonymisation tools like adding noise and sampling data may not be enough to keep up with pro-data privacy laws like the European Union’s General Data Protection Regulation (GDPR) and California’s Consumer Privacy Act (CCPA).

The results “question whether current de-identification practices satisfy the anonymisation standards of modern data protection laws such as GDPR and CCPA,” the researchers wrote.

As part of their research, the trio published an online tool to help people understand how likely it is for them to be re-identified, based on just three common demographic characteristics: gender, birth date and postcode.

On average, people have an 83 per cent chance of being re-identified based on those three data points, the researchers said.

“The goal of anonymisation is so we can use data to benefit society,” said Yves-Alexandre de Montjoye, one of the researchers.

“This is extremely important but should not and does not have to happen at the expense of people’s privacy.”

* Leslie Picker writes for CNBC. She tweets at @LesliePicker.

Nick Wells is a data and analytics producer for CNBC.

This article first appeared at www.cnbc.com/.

Start the conversation

Be among the first to get all the Public Sector and Defence news and views that matter.

Subscribe now and receive the latest news, delivered free to your inbox.

By submitting your email address you are agreeing to Region Group's terms and conditions and privacy policy.