The Attribution Crisis in LLM Search Results

Search-enabled LLMs are exploiting online ecosystems. We estimate how bad it is and what we can do about it.

Asimov’s Addendum

Ilan Strauss

Tim O'Reilly

, and 2 others

Jun 27, 2025

When we built the web, we built a commons where links were the currency of value. Today, AI search engines roam that commons collecting value — but too often they forget to pay the toll. In our new working-paper, The Attribution Crisis in LLM Search Results: Estimating Ecosystem Exploitation, we track exactly how big that unpaid bill is and what must happen next.

Using a range of statistical techniques we analyze ~ 14,000 real-world user conversations with LLM models in search mode. TL;DR: Google Gemini cites nothing in 92% of its answers, Perplexity’s Sonar hoovers up almost a dozen pages on average yet credits only a few, and OpenAI’s GPT-4o shows “perfect” attribution largely because it (likely) withholds relevant results from its own logs.

Ecosystems run on attribution

Links are more than polite acknowledgements online; they keep in motion the virtuous cycle between content consumption and content production. When an AI answer to a user omits a link to a relevant information source it consumed, the publisher loses traffic, the creator loses revenue, and the model itself corrodes the very knowledge base upon which it depends. The BBC’s threat of legal action against Perplexity shows how swiftly that erosion is escalating into a courtroom fight when licensing and permission are ignored. Perplexity’s new revenue-sharing scheme gestures in the right direction, but it is still optional, opaque, and limited to a small circle of publishers.

For two decades, Google Search helped ecosystems grow on top of an agreement with publishers: grant our crawler access — through robots.txt, canonical tags, or paywall signals — and we’ll send you traffic in return. AI assistants have now vaulted the fence, repackaging the web’s content in full answers without so much as a handshake. In response, publishers oscillate between blanket bot bans and bespoke licensing deals. Yet without transparent, verifiable, telemetry (retrieval logs, trace IDs, relevance scores) publishers are unable to verify whether those agreements are actually being honored — or if they even made a good deal in the first place.

From access violations to in-market exploitation

Earlier this year we documented in a research paper (“Beyond Public Access in LLM Pre-Training Data: Non-public book content in OpenAI’s Models”) how frontier models were trained on non-public books without the right to do so. That study made clear that training data governance was broken. Next, we detailed in a separate research paper (“Real-World Gaps in AI Governance Research: AI Safety and Reliability in Everyday Deployments“) that most AI governance research ignores what happens after a model is deployed — monitoring of real-world harms oddly takes a back seat.

Our most recent attribution study tries to close this loop: moving from violations in the training set, to the ways models exploit the web in real-time, siphoning value without passing credit downstream. We don’t consider model hallucinations and citation accuracy, which further complicates proper attribution (defined as external clickable citation links to URL sources). We only consider if the number of relevant websites consumed by LLMs during search matches up with the number of in-text citations provided.

What the new audit finds

Note: *Predictions from negative binomial hurdle model. “Uncited pages” = relevant URLs logged during search minus URLs shown to the user. †GPT-4o reveals far fewer pages in its logs, so its gap is almost certainly under-reported. See working paper for full details.

Four patterns of exploitation emerge:

No-search answers – Gemini skips live search retrieval in 34% of conversations, replying from its pre-training data instead.
Citation black holes – Gemini withholds citation links 90% of the time, even when searching.
High-volume, low-credit – Sonar opens ≈ 10 pages per query but leaves three relevant web pages uncited on average.
Improper disclosure. GPT-4o appears to heavily pre-filter its relevant search logs, making it difficult to verify if it is properly attributing its relevant sources.

All is not lost though. Companies with better “RAG” pipelines (“Retrieval Augmented Generation”) already have better attribution practices. We estimate "citation efficiency" across 11 models — the extra citations provided by the model per additional relevant web page consumed. This ranges widely from 0.19 to 0.45, showing that proper attribution is a design choice governed by a model’s retrieval settings, context size, and geolocation — not a technical impossibility.

Figure 1. How “efficiently” does a model convert relevant web page visits into in-text URL citations?

Note: Results from separate regressions run for each model. Pairs of models responded to the same user query, allowing us to account for differences in query topic. See working paper for full details.

Fix it? Trace it

Technical fixes are already on the shelf — and, crucially, they build on standards that industry knows and can adopt. Today’s observability stacks can emit a complete search trace: query, retrieval step, re-ranking scores, relevance ratings, and the final citations. OpenTelemetry’s new GenAI semantic conventions even specify how to label each element. Add a tool like LangSmith and, with a single line of code, every retrieved document gains a permanent hash, making it simple to compare what a model consumed with what it ultimately credited. Couple those traces with standard relevance metrics and we can reward high-integrity systems while exposing free-riders.

Everyone wants outputs they can trust. This includes model developers, third-party vendors, and end-users. But trust doesn’t spring from goodwill; it is built on hard data. This means the metrics that allow for transparent, competitive, and fair online markets to emerge.

Our call to action is simple. Search model developers: publish full retrieval logs and citation traces through your APIs. The GenAI telemetry conventions already let you tag each document, score, and ranking decision in a machine-readable way. Once those traces are public, independent third-parties can benchmark every model on equal footing, and innovators can create new marketplaces, ranking engines, and accountability layers on top of a common programmatic substrate. For competitive and transparent digital search markets in the era of AI, we need telemetry shared openly across the content ecosystem.

So, what could come next?

Developers – Adopt GenAI telemetry by default. Your compliance-minded customers will thank you.
Publishers – Demand verifiable search traces before signing licensing deals.
Policymakers – Make full retrieval logs a condition of AI transparency regimes. The technical path is ready.

This paper highlights the cost of not demanding better disclosures on attribution. It’s not rocket science. It’s an engineering choice.1 If we want an AI ecosystem that “feeds, rather than bleeds the web”, it’s time to standardize those choices now.

Ilan Strauss, Tim O’Reilly, Sruly Rosenblat, & Isobel Moure

We would love to hear from you. Please share your thoughts in the comments or reply directly to @AIDisclosures on X.

Read the paper & join the discussion

Working Paper: https://www.ssrc.org/publications/the-attribution-crisis-in-llm-search-results/

Data and Code: https://github.com/AI-Disclosures-Project/Ecosystem_Exploitation_In_Search_Results/tree/main/analysis

AI Disclosures Project | Asimov’s Addendum | @AIdisclosures | Research

We do recognize the importance of addressing model hallucinations if attributions are to become reliable quantities.

A guest post by