You probably don’t know about Automattic, but they know you.
As the parent company of WordPress, its content management systems host around 43 percent of the internet’s 10 million most popular websites. Meanwhile, it also owns a vast suite of mega-platforms including Tumblr, where a massive amount of embarrassing personal posts live. All this is to say that, through all those countless Terms & Conditions and third-party consent forms, Automattic potentially has access to a huge chunk of the internet’s content and data.
[Related: OpenAI’s Sora pushes us one mammoth step closer towards the AI abyss.]
According to 404 Media earlier this week, Automattic is finalizing deals with OpenAI and Midjourney to provide a ton of that information for their ongoing artificial intelligence training pursuits. Most people see the results in chatbots, since tech companies need the text within millions of websites to train large language model conversational abilities. But this can also take the form of training facial recognition algorithms using your selfies, or improving image and video generation capabilities by analyzing original artwork you uploaded online. It’s hard to know exactly what and how much data is used, however, since companies like Midjourney and OpenAI maintain black box tech products—such is the case in this imminent business deal.
So, what if you wanna opt-out of ChatGPT devouring your confessional microblog entries or daily workflows? Good luck with that.
When asked to comment, a spokesperson for Automattic directed PopSci to its “Protecting User Choice” page, published Tuesday afternoon after 404 Media’s report. The page attempts to offer you a number of assurances. There’s now a privacy setting to “discourage” search engine indexing sites on WordPress.com and Tumblr, and Automattic promises to “share only public content” hosted on those platforms. Additional opt-out settings will also “discourage” AI…
Read the full article here