|

Not All Machine Learning is Good Machine Learning

I know that you’ve seen all the hype around AI and Machine Learning. It’s probably warranted. AI is in the process of making huge changes in how we work and live.

Recently, though, I saw a good reminder of how much that machine learning is still dependent on the proper inputs, and of course, that means someone giving the machine the proper data.

You may have seen a recent Washington Post story about Family Tree Now, a website that seems to crawl various public databases and grab all sorts of information about people, when they were born, where they lived, etc. Yeah, it’s creepy to think about all of that information being collected up for the world to look at, and the Post article focuses on letting us know how to “opt-out” of that site.

Naturally, my wife and I decided to look ourselves up and see what the site knew about us, and clearly it had access to a lot of public records, it had a variety of address records going back years. Creepy? Sure. The site also had a list of possible relatives and associates, and that’s where someone seems to have made some poor choices when it came to inputs.

The first thing she noticed about her information was that the first possible relative listed, was my first wife. As you might imagine, she was not thrilled, or impressed with the AI. Clearly, Family Tree Now missed some public records, like my divorce! For myself, yeah my first wife was listed as a possible relative, as were her parents and siblings. Again, they missed a record, but fair enough. I also noticed a long list of potential associates, people who I had no connection to at all. Upon further inspection, I realized that much of that list seemed to be made up of people who lived at one of my former addresses, well after I had left. I’m not sure who decided that made for a potential associate.

In short, the technology to crawl through public records seems pretty decent, but maybe incomplete. The learning about what makes for a connection seems pretty illogical. But that all goes back to the programmers. The AI, I assume, was programmed to crawl, but someone didn’t include some records that would have made it clear that some family relationships had been annulled. It also used an overly simply logic to match up dates without looking at the end dates of residences. The machine knew that I lived somewhere in 1996-1997, and it knew I had a different address after that, it said so, but it was still looking at people from the same address 10 years later and assuming a connection. That’s a logical fallacy. The machine didn’t do that. 😉

Why is this important? Because whether you’re talking about Big Data analytics for business and marketing, or TAR in the eDiscovery industry, if the inputs and algorithms aren’t correct, you may end up with the wrong results. Don’t just assume the machine knows, make sure it’s measuring what you think it should be.

Similar Posts

  • Morning news items

    Thanks to Andy for the pointer to Microsoft’s patch for the Secure Sockets problem that they claimed before wasn’t a big deal. Oh well, at least they did patch it! (Note the Win 2000 version of the patch is not available yet.) Doc has written up his Linux Journal article about his talk at Gnomedex….

  • |

    Technical Mystery

    When we moved into our new place at the end of October, and it turns out that the only available cable TV/internet bundle available to us is through Charter. It would not have been my first choice, given experiences I’ve heard from others, but it is what it is. I had U-Verse in my apartment,…

  • |

    Wordfence Security Plugin for WordPress

    One of the challenges of hosting your own site and using WordPress is security. As WP has gotten more and more popular, it has become a huge target for hackers of all sorts. I’ve had my own fair share of old installations getting hacked and causing problems for live sites, rogue files, brute force login…

  • ISP Difficulties

    We’ve been having problems off and on with our Internet connection at the house. Finally, last night Angela gave them a call. The bad news is that it’ll be Thursday before they get out to our house, so I can’t really plan on having reliable Internet connection during the evening until then. (For example, it…

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

To respond on your own website, enter the URL of your response which should contain a link to this post's permalink URL. Your response will then appear (possibly after moderation) on this page. Want to update or remove your response? Update or delete your post and re-enter your post's URL again. (Find out more about Webmentions.)