Hand holding phone showing the Bluesky app
|

If it’s public data – someone is using it to train an LLM

This morning, I saw this article about the latest outrage about user data being used to train AI:

Bluesky may not train AI on your posts, but others can, and users are furious

Read the whole thing, but to recap where we are:

  • Users are upset that Elon is creating an AI LLM using X data and are leaving it for Bluesky.
  • Bluesky has no AI tool that they are building.
  • Bluesky is open and transparent. Anyone can build an app to interact with Bluesky using the AT protocol.
  • That data being public makes it open for others to use to train AI
  • Users are upset that their data is being used.

Here is the part I don’t understand. If you’re using any open protocol, that data is subject to being used in ways you disagree with. This blog, for example, uses open web protocols to make it available to readers. That also makes it available for scraping, regardless of how many tools I use to try and prevent it. I can try to block known AI spiders, limit the public RSS feed from including entire posts, etc. There are still plenty of ways someone can get the data.

Bluesky is developing an open protocol, and Mastodon uses an open protocol (ActivityPub). The idea seems to be that we can create a social media platform without a walled garden where users don’t own the data, which is also completely protected from someone grabbing that public data to build an AI model.

That’s not going to happen. We are all going to have to make a choice.

Once again, I’m left with this question: Why are so many Bluesky users pro-AI yet so opposed to using their public posts to train it? Where do they think the data has been coming from?

Someone will grab that post to train an AI model if you post something to an open platform. Get used to it.

Similar Posts

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

To respond on your own website, enter the URL of your response which should contain a link to this post's permalink URL. Your response will then appear (possibly after moderation) on this page. Want to update or remove your response? Update or delete your post and re-enter your post's URL again. (Find out more about Webmentions.)