The artificial intelligence is eating itself

robot face on a platter, digital art / DALL-E

Today we take a look at some early notes on the effect of generative AI on the wider web and reflect on what it means for platforms.

TO The limitJames Vincent surveys the landscape and finds a dizzying number of changes in the consumer internet in just the last few months. He writes:

Google is trying to kill the 10 blue links. Twitter is be abandoned a bot and blue checkmarks. There is the Amazon junk and the enshiptification of TikTok. Layoffs are gutting online media. A job announcement The search for an AI editor involves producing 200-250 articles per week. ChatGPT is used to generate whole spam sites. Etsy is inundated with Garbage generated by artificial intelligence. Chatbots mention each other in a ouroboros disinformation. LinkedIn uses artificial intelligence to stimulate tired users. Snapchat and Instagram hope bot will talk to you when your friends don’t. Redditors are staging blackouts. Stack Overflow mods are on strike. The Internet Archive is fighting data scrapersAND Artificial intelligence is tearing Wikipedia apart. The old web is dying and the new web is struggling to be born.

It cannot be said that the rapid diffusion of the text generated by the large linguistic models on the web is a real surprise. Back in December when I first covered the promise and dangers of ChatGPTI led with the story of Stack Overflow gets overwhelmed by the hopeful bullshit of the AI. From there, it was only a matter of time before platforms of every variety started experimenting with their own version of the problem.

To date, these issues have mostly been treated as nuisances. The moderators of various sites and forums see their workloads increase, sometimes precipitously. Social feeds are filling up with bot-generated product announcements. Lawyers are get into trouble for unknowingly citing a jurisprudence that does not actually exist.

For every paragraph that ChatGPT instantly generates, apparently, it also creates a to-do list that needs to be checked, plagiarism to considerand policy questions for technical leaders and site administrators.

When GPT-4 came out in March, OpenAI CEO Sam Altman tweeted: It’s still buggy, still limited, and looks even more impressive on first use than after spending more time with it. The more we use chatbots like yours, the more true this statement rings. For all the impressive things it can do, and if anything ChatGPT is a champion writer of first drafts, there also seems to be little doubt that it’s corroding the web.

On this point, two new studies have offered some cause for alarm. (I discovered both in the latest edition of Import AIthe indispensable weekly newsletter from Anthropic co-founder and former journalist Jack Clark.)

The first study, which had an admittedly small sample size, found this out Crowdsourced workers on Amazon’s Mechanical Turks platforms are increasingly admitting to using LLMs to perform text-based tasks. By studying the output of 44 workers, using a combination of keystroke tracking and synthetic text classification, researchers at EPFL extension write, they estimate that 3346% of crowd workers used LLM during task completion. (The task here was to summarize abstracts of medical research papers, one of the things today’s LLMs are supposed to be relatively good at.)

academic researchers they often use platforms like Mechanical Turk to conduct research in the social sciences and other fields. The promise of the service is that it gives researchers access to a large, available and affordable body of potential research participants.

Until now, the assumption was that they answered truthfully based on their own experiences. In a post-ChatGPT world, however, academics can no longer make that assumption. Given the largely anonymous and transactional nature of the assignment, it’s easy to imagine a worker signing up to participate in a large number of studies and outsourcing all of their responses to a bot. This raises serious concerns about the gradual dilution of the human factor in crowdsourced text data, the researchers write.

This, if true, has big implications, Clark writes. He suggests that the proverbial mines from which companies harvest the supposed raw material of human insights are instead filling up with counterfeit human intelligence.

He adds that one solution here would be to build new authenticated layers of trust to ensure that work is predominantly human-generated rather than machine-generated. But surely those systems will come sooner or later.

A second, most worrying study comes from researchers at the University of Oxford, the University of Cambridge, the University of Toronto and Imperial College London. He found that training AI systems on data generated from other synthetic AI system data, to use the industry term, causes the models to degrade and eventually collapse.

While the decay can be managed by using synthetic data sparingly, the researchers write, the idea that models can be poisoned by feeding them your own results raises real risks for the web.

And that’s a problem, because to bring together the threads of today’s newsletter so far the output of AI is spreading to encompass more of the web every day.

The obvious larger question, Clark writes, is what this does to competition among AI developers as the Internet fills up with a higher percentage of generated content than real content.

When tech companies were building the first chatbots, they could be confident that the vast majority of data they were collecting was human-generated. Going forward, however, they will be less and less sure of this and until they find reliable ways to identify the text generated by the chatbot, they risk breaking their own models.

What we’ve learned so far about chatbots, then, is that they make writing easier while also generating text that’s annoying and potentially disruptive for humans to read. Meanwhile, the output of AI can be dangerous for other AIs to consume and, the second group of researchers predict, will eventually create a robust market for datasets that were created before chatbots came along and started polluting the models.

In The limitVincent argues that the current wave of disruption will eventually bring some benefits, even if it will only serve to disrupt the monoliths that have dominated the web for so long. Even if the web AND flooded with AI junk, it could prove beneficial, spurring the development of better-funded platforms, he writes. If Google consistently gives you junk search results, for example, you might be more inclined to pay for sources you trust and visit them directly.

Perhaps. But I also worry that excessive AI text will leave us with a network where signal is increasingly difficult to find in the noise. Early findings suggest that these fears are justified, and that soon everyone on the internet, regardless of their job, may soon find themselves having to exert ever greater effort in searching for signs of intelligent life.

Discuss this edition with us on Discord: This link will get you in for next week.

OpenAI plans to build a ChatGPT-based AI assistant for the workplace, putting it at odds with partners and customers like Microsoft and Salesforce who are looking to do the same. This has always been the obvious risk of white labeling OpenAI technology. (Aaron Holmes / The information)
Medical professionals are cautiously optimistic about the benefits of generative AI, especially the reduction in burnout due to paperwork and other documentation duties. However, there are concerns that AI software could introduce errors or falsifications into medical records. (Steve Lohr/ The New York Times)
Amazon’s warehouse robots are increasingly automating human-level work, in part by using a new device called the Proteus that works alongside humans and a picking robot called Sparrow that can order products. (He will be Knight / Wired)
TikTok is discontinuing its BeReal clone, called TikTok Now, less than a year after it was announced in another sign of BeReal’s waning relevance. (Jon Porter/ The limit)
TikTok has introduced a new monetization feature that will allow creators to submit video ads as part of a brand challenge, which may require the use of a certain prompt or sound. Making spec announcements for big brands and hoping to get enough views to make it worthwhile seems like a bad turn in the creator market! (Aisha Malik / TechCrunch)
Google often violates its own standards when placing video ads on third-party sites, third-party analysis has found, leading some advertisers to request refunds. Google disputed the claims. (Patience Haggin / Wall Street Journal)
Google is abandoning its Iris augmented reality glasses project and will instead focus on building AR software. (Hugh Langley/ Insiders)
Amazon-owned Goodreads has become a popular avenue for review bombing campaigns by outraged readers, many of whom seek to derail new books before they’re even published. (Alexandra Alter and Elizabeth A. Harris / The New York Times)
The longstanding conflict between Elon Musk and Mark Zuckerberg involves mutual jealousy, according to this report, with Musk envious of Zuckerberg’s wealth and the Meta CEO wishing to have Musk’s (former!) reputation as an innovator. (Tim Higgins and Deepa Seetharaman / wsj extension)
Damus, a decentralized social media app backed by Jack Dorsey, will be removed from the App Store due to a cryptocurrency flip feature that Apple says should qualify for its 30% cut. Damus disagrees and intends to appeal the removal. (Aisha Malik / TechCrunch)
Telegram will launch an ephemeral feature for Stories next month, in response to years of user requests from the company. Finally a way to ensure your crypto scams disappear from the public record within 24 hours. (Aisha Malik / TechCrunch)
WhatsApp has revealed that its small business-focused app has quadrupled in monthly active users, to over 200 million, over the past three years. (Ivan Metha / TechCrunch)