It’s easy to point at the Facebook or Google’s of the world, and blame them for violating our privacy when they’ve been tracking our personal information all over the web, but it’s not just them. It’s also all the organizations that promised us the data they were tracking was “safe” because it’s all aggregated and anonymous. Maybe not:
Researchers from Imperial College London and the University of Louvain have created a machine-learning model that estimates exactly how easy individuals are to reidentify from an anonymized data set. You can check your own score here, by entering your zip code, gender, and date of birth.
On average, in the US, using those three records, you could be correctly located in an “anonymized” database 81% of the time.
It only take a couple of bits of information about you to start locating your information in anonymous sets of data. Obviously,, the more information the better, but how many people know your zip, gender and date of birth? It’s almost impossible to keep all information about ourselves hidden. That’s a problem, and makes me once again go back to the question, if this stuff is so easy to misuse, or breach, why do organizations keep it around in the first place? Right now, it’s worth more than the risks of it being breached or misused. We need to change that equation. There should be some sort of penalty for selling data about customers that isn’t 100%, truly, and completely anonymous. Until then, there’s simply no incentive for companies to stop doing it.