Are LLMs the Best That They Will Ever Be?
Without careful disclosure architectures, AI products may only get worse, even as the technology gets better.
This is a guest post led by Rufus Rock. Rufus holds a degree in History & Philosophy of Science from University College London (UCL), and is completing an M.S. at Carnegie Mellon in Logic, Computation & Methodology. At UCL’s Institute for Innovation and Public Purpose (IIPP), he worked on a research project investigating “algorithmic attention rents”.
He is the lead author of “Behind the clicks: Can Amazon allocate user attention as it pleases?” (December 2024) published in Information Economics and Policy with Tim O’Reilly, Mariana Mazzucato, and Ilan Strauss.
One often hears the refrain from AI optimists that “today’s LLMs are the worst that they’ll ever be.” Certainly, they have some compelling reasons for positivity: The technology is still very young, R&D spending is through the roof, and over the last couple of years we have seen rapid advances. Things can only get better, right?
Well, no. In fact, I think there is a reasonable case to be made that today’s LLMs might be the best that they will ever be. The technology might continue to get better, but that doesn’t mean that the user experience will.
Why? Primarily because no one knows how to make LLMs profitable.
A Case Study
Take AI shopping assistants, for example. The other week, I wanted to buy a candle. Admittedly, I have a somewhat pretentious taste in candles, so I didn’t just want any candle. I wanted one that smelled somewhere between pine forest and old bookshop. This, I hope you agree, is a task in which you would have no chance of success in a million years using Amazon search. With ChatGPT, though, life was a breeze. I simply described the smell (as I just did), and ChatGPT spat out five good looking recommendations. I bought one, and it’s now the highlight of my Sunday evenings.
As embarrassingly ostentatious as this example might be, it demonstrates the power of LLMs in the product discovery space. Others have had similarly positive experiences. Searching for products online at the moment is hard. Amazon will show you more than 60 options on a search page; Google likewise is overflowing with visual clutter and ads. For products that aren’t my regular purchases, or known brands, I’ve found myself trusting advice from Reddit and YouTube reviews most of the time. LLM search isn’t perfect (i.e., they make stuff up, give you bad links, etc.), but having been trained on the collective judgement of the internet, they can quickly distill all the quasi-wisdom from Reddit, trusted blogs, and YouTube into a few recommendations – even for pretty complex searches. That’s incredibly valuable.
Why this Might be the Peak
OpenAI reportedly loses money on even its most premium ChatGPT subscriptions. Jeff Bezos has openly called AI spending a “bubble.” And Peter Thiel is taking money out of the AI market. You don’t need to be an oracle to know that AI spending is unsustainable and risks morphing into a financial crisis. There is, therefore, an enormous and increasing amount of investor pressure for AI companies like OpenAI and Anthropic to figure out how to monetize their chatbots. (Anthropic’s emphasis on business revenue via its API may see it go into the green quicker than OpenAI, but for now these are all just projections.)
Yet the lack of profits at these AI companies has an upside for the user — at least for now. Searching for things like candles on ChatGPT is so unusually pleasant at the moment — particularly compared to Amazon and Google — precisely because its search function has not yet been warped by the pressure to generate the kind of profits needed to recoup past losses. However, if these AI tools are to survive financially, the economic reality is that they will have to change.
AI meets Rents
Consider the business models and associated economics rents that are being disrupted by AI. E-commerce giants like Amazon and Google extract billions of rents through advertising. Amazon has recently discussed the fact that users take, on average, 14 clicks before purchasing something on their site. One might think that is an unforced error, given that part of Amazon’s job description is surely to make finding products as quick and easy for its users as possible. But when you realize that each of the 14 clicks that go to product exploration means more potential clicks to Amazon’s high-margin advertising results, it starts to make sense.
Users suffer too. Since these platforms can only show you so many products at a time, every advertising result means one fewer organic (i.e., maximally relevant) result to choose from. Our own research with Mariana Mazzucato, published in Information Economics and Policy (December 2024), showed that the most clicked on advertising (“sponsored”) results on Amazon are often 17% more expensive than their organic counterparts and one-third less relevant. In fact, one-quarter of product search results on the first page are adverts — half of which are duplicated as an organic result on the first page too! Talk about the death of consumer choice.
Now imagine instead a chatbot that cuts down those 14 clicks to, say, just two clicks before finding the product they want. That might be great for consumers who can stop paying these attention rents: as product search becomes less time and energy consuming. But that’s also a direct threat to Amazon’s $60+ billion revenue stream from third-party advertising on its website.
So incumbents have a choice: they can either try to prevent chatbots from cannibalizing their ad-driven platforms, or ensure that chatbots can generate equivalent revenue through other means — or they can create better products using AI. More ‘enshittification’ is, therefore, one option.
Enshittification, as coined by Cory Doctorow, is when a platform that once delivered value to its users and third-party producers, makes its hosted & recommended content progressively worse in order to allocate more value to itself. In a multi-sided context, this enshittification ultimately “comes out of the barrel of an algorithm”. We’ve seen this happen to Google Search: 10 blue links in the search results were replaced by a gazillion ads.
We’ve also seen this happen to social media sites where algorithms built “for you” – for sharing pictures, text and content – were replaced by algorithms built for the platform’s own interests – for engagement, mindless video scrolling, and consuming. Evidence from Meta’s recent court win shows that Meta transformed itself into a “discovery engine” in order to compete with TikTok for user engagement. Meta-owned Instagram, nominally a ‘social’ network, now shows only 5% of posts from friends; the rest are “nothing but unconnected videos recommended by an algorithm”, the judge noted.
Moreover, the ‘social’ feed is full of ads. Meta’s own internal documents from the court case show ads decreased user time on the platform by around 7% – they are a “tax on engagement”, Meta found. Yet Judge James E. Boasberg still decided that Meta’s ad load reflected rational consumers (no behavioral economics allowed here), competitive market forces, higher quality ads, and an overall higher quality user experience.1 The evidence seemed decidedly more mixed. Meta is able to constantly lie about its advertising efficacy and reach, according to reporting in the Financial Times. That’s because advertising markets are a black box.
So, don’t be surprised if LLM-based search results start returning less relevant products and more ads. But why stop there? If you are OpenAI, why not push a ‘curated handful of products’ to the user, designed to get them querying and engaging again? Of course, this can be marketed under the noble banner of “respecting users’ decision-making autonomy”, or something along those lines; but functionally, it is pernicious advertising at scale but with a new, chattier interface.
There’s one big issue, though. Current LLM performance in product discovery is dependent on the internet continuing to exist as a repository of current useful information. My candle search on ChatGPT was so good because the model could synthesize insights from people who actually tried the candle and who then discussed it online in a good faith attempt to help other consumers. But marketers and merchants are cottoning on. More and more subreddits, review forums, and enthusiast blogs are being injected with synthetic, Gen-AI produced chatter. The goal of these third-parties isn’t to persuade you directly – it’s to slowly poison the informational substrate that the LLM product search relies on, and then the LLM persuades you through the commercially-laced information.
So — it’s worth savoring what we have now: chatbots without behavior completely warped by profit incentivizes, training data largely untainted by adversarial manipulation, and monthly fees (subscription model) rather than an attention-driven economics. This might well be AI’s 10 blue links moment. But let’s not assume that it’ll last, let alone that it will get even better.
Does it have to be this way?
If LLMs do start to drift towards enshittification, is there anything we can do to stop it? Is it possible, under the current system of economic incentives for AI services to create value without extracting it?
Jimmy Wales, co-founder of Wikipedia, thinks so. His new book sketches a picture of how we can start to rebuild trust on the internet. Against the background of an ad and click-bait soaked internet, Wikipedia is a shining example of what is possible. To my mind it is almost certainly the single greatest feat of knowledge organization in human history. The numbers are mind-boggling. Since you started reading this article, more than 4,000 edits have been made on Wikipedia by an army of volunteers whose only incentive is the pleasure they take in trying to be helpful and factual.
Clearly, not every company can be a crowdfunded nonprofit. Nevertheless, we can learn some important lessons from Wales and Wikipedia.
For one, transparency – and in turn traceability and accountability – can be made part of the product itself. Wikipedia’s edits, discussions, and history are all publicly accessible. This openness makes the information on Wikipedia more reliable and trustworthy, and so consumed by more end-users. That Wikipedia is publicly modifiable gives it a sense of collective ownership. And it also shapes what economists call the market’s “structure”: how many firms compete and how they compete. Although it was Window’s Encarta who helped kill Encyclopaedia Britannica’s expensive book set,2 it was Wikipedia who showed that dominating a market on the internet need not mean exploiting it.
As Wikipedia shows, transparency doesn’t have to mean “explainable AI” or “sparse autoencoder”, or anything super technical, really. It can be a structural and economically-encoded transparency – one that is built into the market’s proper functioning, like on a stock exchange where trades are settled with complete counterparty information. For AI, transparency should cover things like: Where does the model’s data come from? How is it processed? Who benefits from this processing? How exactly is it monetized?
The impetus for AI’s product disclosure should ideally be the incentives that come from the market itself. In this way, transparency is not optional. And it reflects and supports the market’s dynamic functioning, rather than the ideas of someone in government (which may be right or may be wrong). In other words, the disclosure is directly tied to the information in the market itself – such as the logs (records) of an advertising exchange online indicating who won the bid, the price, and the auction type – rather than a regulator’s notional understanding of the market. (We explore the drivers of a market’s structure in a companion piece.)
But not everything can be a non-profit Wikipedia.
We think it is possible for companies to use AI to make more money and make their products better — including through improving the user experience. This is a win-win business model which avoids enshittification. Initial evidence from Amazon’s Rufus AI chatbot, for example, is that it led to a $10 billion increase in sales that would otherwise have not been there. Users could, once again, find what they wanted! Amazon estimated that customers who engage with the assistant during their shopping journey are 60% more likely to complete a purchase compared to those who do not. The trick is ensuring that incentives are aligned towards win-win business models, no matter what market structure emerges (competition or consolidation and monopoly).
Incentives may sometimes need a push from government. Government rules can help constrain corporate behavior, in effect deciding the boundaries of fair competition, and the pay-offs facing firms from different business models and business strategies. Minimum wages, for example, mean that firms cannot compete by lowering wages beyond a basic level. Minimum wages incentives firms to compete through investing in fixed capital (like factories and new technologies) to upgrade their productivity so that effective unit labor costs falls.
Similarly, “interoperability” requirements for digital markets – including for AI chatbots and user data – are one way to help ensure that firms compete through using technologies to improve the user experiences, rather than enshittify them. Interoperability requirements were central to the success of the famous break-up of AT&T, which forced the new “Baby Bells” to allow third parties to inter-connect to their network and offer rival telecommunication services.
Elsewhere, we have argued for more direct government disclosure requirements for Big Tech. In particular, governments ought to mandate corporate disclosure of Big Tech companies’ internal operating metrics used to measure and optimize their products’ performance – in-line with segment reporting and GAAP accounting principles. These non-price metrics — such as ‘monthly active users’ or ‘time spent on platform’ — increasingly shape a product‘s monetization, especially in multi-sided markets. This makes them vital to disclose alongside their financial metrics. Moreover, we argued that these metrics should be differentiated by product line, to get a sense of the firm’s market power and economic strength not only as an integrated whole but within each relevant “operating segment”. That way regulators can keep track of a company’s changing behavior in individual markets and step in where necessary – you can’t regulate what you don’t understand.
And finally, there is us – the users. We have to call out these systems when they fail, when they start feeling extractive rather than helpful. The idea that “today’s LLMs are the worst they’ll ever be” isn’t just wrong, it’s dangerous. It breeds complacency right at the moment when norms, business models, and standards are up for grabs in an immense great powers contest between warring companies.
Whether LLMs get better or worse depends less on technical progress than on political choices, the economic incentives we align and foster, and whether we can demand better of the companies that shape society’s welfare. The technology may get better, but only careful attention to disclosure and market design can ensure that society doesn’t get worse.
“When Meta has stuffed its apps with more ads and upgraded them with new features, it is not clear that the company has reduced quality overall. More importantly, even considering just this one input into app quality, the effect of ads on users’ experience depends on not only their number but also on their quality and relevance”
Which still exists today as a digital product: https://hbr.org/2013/03/encyclopaedia-britannicas-president-on-killing-off-a-244-year-old-product










Thanks for sharing this
Yes, an LLM as a product must evolve because it's not financially viable without external funding in its current form—essentially a public service. But commercialising it doesn't necessarily translate to bad UX. As valuable to society as it is, an honest consumer advice in the candle-shopping department is hardly a pinnacle of value to the user that LLMs might provide. To cite the Amazon's recommender example above, when aptly embedded into an existing service, an LLM can provide a lot of valuable user experience ($10 billion of it, apparently). The thing is that LLMs are not e-commerce or social media platforms or even a search engine—actually, it's hard to tell what economic role they are supposed to fill. Perhaps too much money is chasing after development of their capabilities in hope to harvest AGI spoils. Or maybe because the technology is still relatively young, the sensible applications have not yet made it to the public consciousness. I strongly agree with the transparency and accountability points, though. As a consumer of its advice/recommendation, I'd like to be able to tell what rules an LLM followed and which guardrails it obeyed to generate it. Finally, on the users themselves—I think complacency is a default setting—the best you can do is to actively try and remove it from their decision making. I'd like to see some reasonable proposals in that area.