NPR has a story this morning talking about Facebook’s facial recognition tools and the potential risk to our privacy that comes along with it.
One of the things I like about this story is the explanation of modeling, and how without a model, facial scanning doesn’t really work. It’s comparable to the conversations I’ve had about using eDiscovery software to do Optical Character Recognition. Typically, when I explain that running OCR on handwriting is not really going to be useful. People want to know why, when they “write” on their tablet device, it can translate that into text, but OCR software cannot. It’s because you’ve already modeled what your individual handwriting looks like for the tablet device. It is not trying to identify some random person’s handwriting. If I can, I’ll also drive home that point by picking up their tablet device and show them how terribly it recognizes my handwriting, because that’s not the model it is comparing from.
Facial recognition, as of right now, works very similarly. It’s great when you know who you are looking for, but horrible at identifying a random person, because we don’t have a full model of photos for that person for the software to compare. But, along comes Facebook with their photo tagging feature, and suddenly, is there the possibility of getting a model based on a large number of different photos from your FB profile, to be shared with law enforcement? Yes, there is that possibility. But, if I were in law enforcement, while I might be interested in having access to FB’s photo modeling, I’d also have to be somewhat wary of using it. It relies on Facebook users to actually tag photos of the actual people in the photo, or someone to go through that multitude of photos to correct for all of those cases where people have posted a picture of a baby, and tagged the parents, or of a pet, and tagged the owner, and so on and so on. Of course, we know those things happen, so the risk here is not so much that law enforcement would use FB photos and compare them to surveillance video in order to capture wrongdoers. The real risk is the chance that the inaccurate models will cause mistaken identifications, and lead to harassment and investigation into completely innocent people.
That’s also the risk of all government surveillance programs. When the NSA gathers as much data as Edward Snowden claims that they have, the risk is not that they are reading your emails. It’s almost impossible to imagine that someone is sitting and looking at the billions of messages and phone records they are collecting. No, it’s the collection and storage of that data, because if and when you are identified as a suspect, based on some random algorithm based on the “big data” collection they have, they will now have all of that information and start drawing conclusions based on things you’ve said in emails, or who you’ve talked to on the phone. They’ll start investigating the people you communicate with, talking to the people you work with, and so on.
When you have that much data it’s useless until you know what you’re looking for, (If you work with eDiscovery, you know this fact well), but once you know what you’re looking for, it’s easy to find data that conforms to your theory if you have enough of it, even if your theory is completely wrong.
When you are actually innocent, that kind of investigation doesn’t go away in terms of how people think about you. False accusations ruin lives. With that much data about you living in one place, the potential for this to happen to you, rises.
In the end, I’m not worried about Facebook recognizing my face, because if it gets it wrong, it’s mostly just funny and correctable. But I am definitely concerned about the government using that same technology, because when they get it wrong, I can’t correct it, and it is most definitely not funny.