|

Not All Machine Learning is Good Machine Learning

I know that you’ve seen all the hype around AI and Machine Learning. It’s probably warranted. AI is in the process of making huge changes in how we work and live.

Recently, though, I saw a good reminder of how much that machine learning is still dependent on the proper inputs, and of course, that means someone giving the machine the proper data.

You may have seen a recent Washington Post story about Family Tree Now, a website that seems to crawl various public databases and grab all sorts of information about people, when they were born, where they lived, etc. Yeah, it’s creepy to think about all of that information being collected up for the world to look at, and the Post article focuses on letting us know how to “opt-out” of that site.

Naturally, my wife and I decided to look ourselves up and see what the site knew about us, and clearly it had access to a lot of public records, it had a variety of address records going back years. Creepy? Sure. The site also had a list of possible relatives and associates, and that’s where someone seems to have made some poor choices when it came to inputs.

The first thing she noticed about her information was that the first possible relative listed, was my first wife. As you might imagine, she was not thrilled, or impressed with the AI. Clearly, Family Tree Now missed some public records, like my divorce! For myself, yeah my first wife was listed as a possible relative, as were her parents and siblings. Again, they missed a record, but fair enough. I also noticed a long list of potential associates, people who I had no connection to at all. Upon further inspection, I realized that much of that list seemed to be made up of people who lived at one of my former addresses, well after I had left. I’m not sure who decided that made for a potential associate.

In short, the technology to crawl through public records seems pretty decent, but maybe incomplete. The learning about what makes for a connection seems pretty illogical. But that all goes back to the programmers. The AI, I assume, was programmed to crawl, but someone didn’t include some records that would have made it clear that some family relationships had been annulled. It also used an overly simply logic to match up dates without looking at the end dates of residences. The machine knew that I lived somewhere in 1996-1997, and it knew I had a different address after that, it said so, but it was still looking at people from the same address 10 years later and assuming a connection. That’s a logical fallacy. The machine didn’t do that. 😉

Why is this important? Because whether you’re talking about Big Data analytics for business and marketing, or TAR in the eDiscovery industry, if the inputs and algorithms aren’t correct, you may end up with the wrong results. Don’t just assume the machine knows, make sure it’s measuring what you think it should be.

Similar Posts

  • | |

    The Scourge of Amateurs

    So now it’s Instagram ruining photography, eh? I can remember, like Matthew, when the same complaints were leveled against blogs and twitter. Heck I can remember when the professional photographer world was up in arms about how DSLR technology led to any MWAC (Mom With A Camera) thinking they could make money taking portraits, and…

  • |

    Lunch break from Class

    Utilizing the Wi-Fi in the training room to catch up on email, and uploading a handful of photos from the last couple of days. Class has been good in terms of learning, but it’s a mental drain. I’m looking forward to Friday and Saturday and being able to go back to just being a tourist….

  • What I’m Sharing (weekly) Sept. 27, 2020

    Everyone Agrees – We Need a Comprehensive U.S. Privacy Law

    The Best of Relativity Fest 2020: Our Favorite Commentary

    The Re:Set Guide to Recognizing and Tackling Work From Home Burnout

    Microsoft Teams is getting virtual commutes and Headspace meditation

    Staying In Touch

    Our guidance on staying in touch with your network

    Algorithms control your online life. Here’s how to reduce their influence.

    Inbox Zero: Merlin Mann’s Tips for Managing Your Life Online

    If “Angels Fear to Tread” into Search Terms, Why Are Lawyers So Confident About Them?

    The Cognitive Biases that Make Us All Terrible People

    Relativity Fest Day One Report

    How to Network Professionally During the Coronavirus Pandemic

  • G.ho.st Update

    You may recall a while back I talked about G.ho.st the online Virtual Machine project. I haven’t gone back to look at it again in some time, but I got an email from them today, and given the new features, I may have to take a look very soon: As part of our continuous efforts…

  • |

    Linked: The Foreign Language of E-Discovery

    If this is you, you really should take their advice, and go learn something about eDiscovery technology. Have you ever been involved in a meet and confer regarding electronically stored information and felt your adversary was speaking a foreign language? Is active machine learning an unfamiliar concept to you? Is BYOD an acronym for who-knows-what?…

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

To respond on your own website, enter the URL of your response which should contain a link to this post's permalink URL. Your response will then appear (possibly after moderation) on this page. Want to update or remove your response? Update or delete your post and re-enter your post's URL again. (Find out more about Webmentions.)