Facebook parent company Meta has admitted to scraping posts and photos from Australian users since as far back as 2007 to train the company’s AI models. This confession came about at an inquiry that was held on September 10 in Australia, when Meta’s global privacy director, Melinda Claybaugh, was asked about the company’s efforts in the AI space – particularly on matters relating to the privacy of users.
Per the ABC, Claybaugh confirmed that posts and photos were taken from Aussie users that were registered as over 18 years old to feed the company’s large language model (LLM) development, although anyone registered under that age limit was exempt. However, photos of children uploaded on other accounts, such as by their parents or other adult family members, were fair game and could have been scraped.
The most glaring admission made during the inquiry was that there is no opt-out option for Australian users unless they set all their posts to private. Such an option exists for users in the European Union where strong privacy and data protection laws are enforced.
Explaining Meta’s AI content scraping
Meta using Facebook and Instagram posts to train its AI algorithms is not a revelation. It’s an international practice for the tech giant and was revealed in a blog post last September. At the time, the tech giant made it clear that “We didn’t train these models using people’s private posts” and that “We also do not use the content of your private messages with friends and family to train our AIs”.
Even if posts were all set to private by an Australian user after the fact – for example, publicly available posts from 2010 being made private on hearing this news – there remains the question of whether that information would stay within the LLM’s database. Unfortunately, reporting from the New York Times earlier this year indicated that data would stay in the AI model even if made private after being added in, but making future posts private will prevent their addition to the database.
It’s not out of the realm of possibility that an opt-out option could be introduced on the Australian version of Meta’s platforms, along with other regions. The Australian government is set for a review of the Privacy Act soon, and has this week been pushing for a sweeping ban of children from social media (with an age limit yet to be determined).
Meta’s rival X (previously Twitter) performs the same AI training with posts and photos uploaded to its platform, however all users internationally have had the ability to opt out since July 2024. Reddit also allows for data scraping to develop Google’s Gemini AI.
So what does Meta’s AI content scraping mean for Australians?
If you’ve had your content on Facebook and Instagram set to private from when you started using these social media platforms, then you should be less worried about this development. Meta has said that they’re not taking such privately marked content and putting it into its AI development.
As pointed out by one of my colleagues, users posting innocuous updates may not be too worried about their content on Facebook and Instagram being used to train an AI model, but this does expose a glaring need for refreshed online privacy laws in Australia and other parts of the world. Facebook’s product has always been its users, but those users need protections.
As an Australian writing this story, I think it’s a shame that there’s no readily available opt-out option, but my own government not enforcing such a requirement (as in the EU) is even more of a disappointment.
From a consumer perspective, the fact that this data scraping goes as far back as 2007 – long before Meta got involved with AI development – is worrying. A longtime Facebook user might not know about the company’s use of posts and photos to train its AI, and may not be comfortable with it – but there remains no ability for them to remove their content that’s already gone into the LLM.
To be fair, I suppose, Meta isn’t alone in doing this. As I’ve mentioned earlier, user data from Reddit and X also feeds into the development of AI – into Google’s Gemini and X’s Grok respectively – and the entire topic of AI has been flooded with content theft and taking information from websites without permission. It’s a topic of much heated debate.
My mood on this news is that Australia’s online privacy laws are well overdue for a refresh and that it’s an inversion of Facebook’s longtime business model of selling ads. Rather than the user being the product, they are producing the product. Facebook may lay claim to content uploaded to its platforms, but it’s disempowering to users to go uncredited, unpaid an unalerted about their photos and posts contributing to an AI product.
Techradar has reached out to Meta on content removal avenues for its AI models, along with information on how Meta is informing users about its use of content to train AI and to ask what products the company is using the data for. We’ll update this article if we get a response.