The Architecture of Participation: Tim O'Reilly and Kevin Werbach on AI, Regulation, and Disclosure
Podcast Episode. "Tim O'Reilly: The value of AI disclosure" in The Road to Accountable AI.
This conversation is based on a transcript of the interview between me and Kevin Werbach on Kevin’s Road to Accountable AI podcast. It has been corrected, rearranged, and expanded for clarity. In places, I rewrote it to say what I wish I had said rather than how I actually said it in the moment. – Tim O’Reilly
Kevin: Both of us have been following emerging technology for a long time. Let me first ask you, how do you think about AI? What kind of innovation is it? How significant do you think it is?
Tim: I would start by putting all great waves of computer technology innovation into one frame, and that is the increase in the ease with which humans are able to communicate with computers and get them to do our bidding. It’s as simple as that. Think back to ENIAC, the first programmable, electronic, general-purpose digital computer, which they programmed by making physical circuit connections. Then we got to stored-program computers, but you put in the program and data one bit at a time, by flipping switches on the front.
Assembly language and eventually higher-level languages like FORTRAN and COBOL that output machine instructions without people needing to write them directly made it easier to program and expanded the size of the market. But there was still a sort of priesthood who could do these very difficult things with computers.
Then we got the PC, which made it possible for almost anyone to have a computer, and interpreted languages like BASIC that made it much easier to program, and suddenly there were millions of people writing and using software.
With the web, you got something new, which was that you could create documents (web pages) that could call programs. People don’t recognize what a big step forward in interfaces the web was; in some ways, it was even bigger than the GUI (Graphical User Interface). You could make an interface out of a document—the kind of thing that humans write, read, and exchange with each other. The instructions to the computer are embedded in human-readable documents.
Now, with AI, we’re at a place where we can just talk to machines in plain language, and they (mostly) do what we want.
If you look at the history, every time we’ve had one of these technological advances in the ease of interface, more people could use computers, and they could do more things with them. It really is profound. AI is going to grow the market—not just in the number of people who can use the technology, but in the number of things that they can do.
Kevin: Is there anything we can learn from prior shifts in computer technology that can help us in engaging with the issues that are coming up now with AI?
Tim: Absolutely. First, people underestimate new technologies, and then they overestimate them. In some ways, there’s a thermostatic kind of process, just like in politics, in which people try to have this grand narrative, while the world stumbles forward in unexpected ways. I think back on my early career, which was shaped by open source software and the World Wide Web. People got a lot of things wrong.
I convened the meeting where the term “open-source software” was adopted, but I was kind of an outlier in how I thought about it. Everybody was focused on licenses, and I said, “I don't think licenses are the issue. Actually, open source is about network-enabled collaboration. It has what I called the “architecture of participation” — the design of systems that people can make small pieces of and they work together.
There’s this great line I remember seeing on the internet early on: "The difference between theory and practice is always greater in practice than it is in theory.” I don't know who said it, but it's brilliant. Because always you have people who have a theoretical construct but it might not be practically correct. This is what happened with hypertext. When the World Wide Web came out, all the hypertext pundits said, “This won't work because it only has one-way links. It doesn't have two-way links. It has to have two-way links.” And of course, the one-way link was exactly what made the web grow so explosively, that you could just get a 404, and everything wasn’t tightly bound.
The thing we learn is that we stumble forward, and there’ll be some innovation, and then somebody else will build on that innovation.
Kevin: Yes.
Tim: Both I and Ethan Mollick, who’s one of my favorite observers of AI, are fans of James Bessen’s work on the Industrial Revolution. Bessen asked, “Why does it take so long for new innovations to spread?” Because people have to learn how to use them, and you need to build communities of practice, and you have to have people pushing on and innovating based on what they learn while using the new tools.
Again, I think back to the World Wide Web. The original web served static documents. Then Rob McCool had a bright idea: “Hey, we can actually call a database from a web link,” and then we had dynamic websites. It was this evolutionary process where people were learning and practicing. Brian Pinkerton built the first web crawler, and then Google figured out how to do it better. Overture figured out pay-per-click advertising, but they did it with a crude auction in which the ad went to the highest bidder. Google figured out how to make a really good auction system that included a prediction of the likelihood that a user would click on the ad. And bit by bit, the world that we are familiar with evolved. Steve Jobs with the iPhone, etc., etc.
So we’re very early in AI’s development. However, there’s an issue now that we didn’t have before. I think Silicon Valley has gone wrong with its investment strategy. Reid Hoffman calls it blitzscaling, the idea that companies should race to get market share, and the VCs should basically invest in them so they can become monopolies. Think of Uber and Lyft and WeWork. In some sense, the market didn’t pick the winners; the winning business model didn’t evolve. What happened was that the VCs flooded the market with capital. They picked a couple of winners and a couple of early business models. It’s as if we got stuck at the Yahoo! stage of the web.
One big question for AI is whether we’ll see the kind of competitive dynamics that we had with the personal computer, internet, and mobile revolutions, or whether it will look more like what happened with Uber and Lyft, where the initial innovation quickly got frozen in time and unprofitable business models were artificially propped up by massive amounts of capital.
We celebrate the idea that Silicon Valley enables an innovation market, but increasingly that innovation market has become a kind of central planning by a small number of very deep-pocketed companies and investors who are looking for stock market exits rather than profitable companies with true product-market fit. And that’s one of the biggest worries that I have about AI — that it will be harder to have some of the kinds of experimentation that we had in the past that led to the innovations that were needed.
Kevin: Absolutely. Right. And given the extremely high cost throughout the entire AI stack for developing some of these AI models, what can we do now to avoid that same kind of very concentrated future for AI?
Tim: Well, I think one of the first things that we need to do is to think about this idea that I talked about back in the days of open source — the architecture of participation. I've been giving some talks where I riff on this. The New York Times podcast The Daily had an episode called “AI's Original Sin” focused on the copyright issues around AI. They quoted a lawyer for Andreessen Horowitz who said something like, “If we don’t let these companies crawl and process the internet as they please we can’t get to AI— this is the only possible way to build these giant models.” And I thought “that’s a lot like 1992,” when the “only possible way” to get your content online was AOL, and then a little later, the Microsoft Network. But something was coming out of left field — the World Wide Web — where anybody could get their content online. And the web won because it built a real market.
Right now we’re in the AOL stage of AI. We’ve got these big centralized players, and what’s waiting in the wings is an alternate world in which smaller, independent, cooperating AIs are trained on specialized data. People are starting to see this. There was an article I saw recently that said, “Sam Altman’s real rival is Jamie Dimon,” because JPMorgan is sitting on a huge class of specialized data — and they are using AI now — that’s not available to any of these other guys.
Kevin: Yes. And they’ve got the capital — they've got a $10 billion IT budget.
Tim: Exactly. So there’s an interesting question there: Does that specialised and cooperating AI world play out? At O’Reilly, we have a much smaller budget than JPMorgan Chase, but we also have a business that has a body of intellectual property that in theory has not been accessible to the models for training and that we should be able to build unique services against.
In practice, our intellectual property may have been trained on by other AI models — they probably got it by hook or by crook, but we don’t know that for sure. Which is of course why I think that the companies should be required to disclose their AI model’s training data — at least if only in the form of “Do you have my content?” queries, just like we do with privacy.
But copyright is just the beginning. We need to think about an AI architecture that isn’t a winner-takes-all game, and when the winner emerges, others have to be satellites to them. If that’s the world we’re building for, the winners ought to be regulated utilities. And then you say, “Okay: You can be a foundation model; this is how you get paid, but you can’t compete with everybody else.” That would be one way to go, like with telecommunications to a certain extent.
Kevin: Right. Your point before was a really good one, that so many things seem inevitable in hindsight. And I’m sure you’ve talked to lots of people who weren’t there in the early days of the web or even the days of Web 2.0, who just assumed it couldn’t have been otherwise. Now we’re in this period with AI where it seems like there’s some possibility for alternatives to emerge. So what are the things that regulators should be doing, to push companies, that might make it more likely to have that more distributed AI future?
Tim: Well, one of the things that we could and should do is have more “disclosure.” Now, when I say that, a lot of people respond, “Disclosures don't work,” and what they’re thinking about are things like shrinkwrap licenses that you can’t understand and just click on to use the service anyway, or when you get your prescription, there’s this long piece of paper that you peel off and you throw away because it just has a bunch of legal gobbledygook. Or even food labeling. There’s a little bit of value in that: how many calories is this? What are the ingredients? And so on.
But there’s a different kind of disclosure that people don’t really think about, that really is a lot closer to communication standards. TCP/IP is a kind of disclosure. It says, “This is the format for traffic on this type of network, and you can build to this.” The web also depends on a system of disclosures: the browser identifies itself to the server and the server to the browser, which tells each of them something about the capabilities of the other; they then exchange documents that use HTML and CSS to encode all kinds of specialized information to guide the actions they take, what they do, and how they display the content the user requests.
I’ve also been thinking a lot about the analogy to accounting standards, which describe how companies manage their money. Double-entry accounting has been around since the 13th century and has been refined quite a bit. But at the beginning of the 20th century, in the early days of public companies and securities, there was a lot of fraud, and so they said, “Wait a minute, we’ve got to actually make sure that people follow the same rules in describing the finances of their business.”
If you think about the purpose of those financial disclosures, they enable what Jack Clark of Anthropic and Gillian Hadfield of Johns Hopkins call “regulatory markets” — an idea that I am building on here. Part of what enables the regulatory market of finance are Generally Accepted Accounting Principles and the European equivalent, the International Financial Reporting Standards (IFRS). These don’t just enable government functions like tax collection or supervision of public stock markets. Banks use these standards when they consider loans, and investors when they consider funding a company. A market of accountants and auditors has arisen, all using the same protocols, so to speak. And you can see from this analogy that not all disclosure has to be public. Every company discloses its financials to its accountant and to tax collectors, if they are above a certain size to its auditors, and only in some cases to the public. The amount of disclosure at each level can be different.
And that’s a lot like networking, and how the internet grew. Disclosures as communication and information standards — as a common symbolic language — underpinned its success. I remember back when my friend Dan Lynch started Interop — a conference where all the computers had to talk to each other. And the IETF, Internet Engineering Task Force, said that if you propose a standard, you’ve got to show us three independent interoperable implementations. The notion of interoperability, of how standards inform interoperability, is really, I think, where we should go with disclosures and some AI safety regulations.
Kevin: Yes.
Tim: I always have in the back of my mind this quote from the famous computer scientist Donald Knuth, who once said, “Premature optimization is the root of all evil.” A lot of the regulations that are being proposed are premature optimization and over-specification.
I go back to my early background in networking, and you look at the OSI (Open Systems Interconnect) standards, where they had the seven-layer networking model and everything was specified. Whereas TCP/IP is a classic example of what John Gall said in his book Systemantics: “A complex system that works is invariably found to have evolved from a simple system that worked.” TCP/IP was a simple system that worked, and it far outperformed and outcompeted the complex system where people had tried to come up with the entire stack in committee.
Kevin: Right. So let me stop you there. I want to get back and ask you about the disclosure base — this is fascinating and important. But would you say that something like the European AI Act is a premature regulatory optimization for AI?
Tim: Let me put it this way: If regulation was easy to refactor and change and update as you learn new things, as you see what happens, it would probably be okay. The reason it isn’t is that you incur a kind of societal technical debt when you have a set of rules that don’t actually work and are hard to change. That’s one of the advantages that we have with technical standards, because they're fundamentally focused on outcomes. And if they don't work, they lose in the marketplace or they’re updated. It’s one of the things that I've always loved about not just the internet, but open standards of all kinds.
Kevin: Yes. Back to what we were talking about before: given how much money is at stake and is already involved in this AI space, is there any hope to get back to that environment, where the technologists are just talking to each other about what the right answer is?
Tim: Despite what I said earlier about the big money players making it a less competitive market than it might otherwise be, we have some really interesting cross-currents fomenting competition. With Anthropic you had a bunch of people saying: “OpenAI has lost its way with regard to AI safety; we’re going to do better.” You’ve got some really interesting competition there. If I rank the big players, OpenAI is definitely in the “move fast and break things” camp, saying: “We're going to be dominant; we're going to do it however we can.” Anthropic is a really interesting alternative, and Meta’s Llama is too. Google Gemini is pushing forward some interfaces that go well beyond chat. And then, of course, there are all the smaller open-source models and other models being developed.
As we think about possible futures for AI, one of them is the one that’s being advertised, which is we have to keep training on more and more data, and it’s going to be more and more and more expensive. But the smaller models are catching up, and maybe we’re seeing the advantages of the current spending of the largest model developers level off. If this continues, foundation models will become commoditized. If so, we’ll start to see more innovation because it’s not going to be constrained by a few companies that are dominant and have to monetize their models.
Something useful that regulators could do at this stage of the evolution of the market, is not to write regulations that specify what companies can and can’t do, but a regulation that specifies what they have to tell us about what they are doing.
So, for example, if you care about one axis of AI safety, which is addictiveness for kids, you could take China’s approach. They have limited online gaming and some social media use for kids — with more restrictions in progress. But you could mandate corporate disclosure instead, saying “You've got to report to us what kind of engagement patterns you have and what you're doing to maximize it.” It seems pretty clear that with the kid who committed suicide while using Character.ai, the company could have known that here’s this kid who’s clearly addicted and was in danger. And they should have guardrails against stuff like that. But how would we know if instead Character.ai had been egging him on? They might have been saying, “Great, this is a really engaged customer.” What are their optimizations for engagement?
Character.ai has done a bit of both, changing the way their model works for younger users, and also reminding them that they are not interacting with a real person. This may not work, but it’s a good step, part of what James Bessen calls “learning by doing.”
I’m a fan of mandated disclosures about what metrics a company is optimizing for. If you look at social media, once it started to optimize for engagement, bad things happened. So maybe we ought to make sure companies tell us when and how they are doing that. If companies have to show they are not putting the pedal to the metal on this risk factor, and are instead actually moderating it and managing it, then the market might punish people who don’t do “the right thing” and you don’t have to have the law come down.
One big mistake that regulators make (led on by some of the early hype about the risks of superintelligence) is to focus primarily on model safety, and for the largest models. This is a bit like handling auto safety by saying, “Okay, let’s only regulate cars that can go over 150 miles an hour. And let’s do crash test dummy testing but have no other safety regulations.”
When you think about auto safety, the National Highway Traffic Safety Administration (NHTSA) assesses how long it takes for a car to brake to a stop at a particular speed or what the crumple zone is at various speeds, and uses crash test dummies to see how passengers might fare in a collision. All this kind of stuff is really useful, and is roughly equivalent to some AI model safety work. But we also need to think about driver education and licensing, speed limits, and other rules of the road. And the speed limits aren’t one size fits all; they’re different on different kinds of roads. Speed limits are also a kind of disclosure. They tell the driver (who also has a feedback mechanism in the form of a speedometer) how fast it is legal to go, and how fast it is safe to go, say on a curve, or near a school.
Gillian Hadfield has focused on the notion that AI models ought to be registered, just like cars are registered — they have a license plate and a vehicle ID number. Guns also have a serial number. But AI models? No.
More broadly, there are many lessons we could take from other regulatory regimes that could also guide our thinking about AI safety. For example, we may need to store data for an inquiry when something goes wrong, like we do with black boxes in airplane crashes. This is another kind of mandated disclosure.
As I’ve been discussing with Vint Cerf, one of the really interesting questions that we ought to be asking is: is this system regulatable at all? And how would we make it regulatable if it isn’t? Because it's certainly possible that you can build technologies that are fundamentally not regulatable, and then we have to ask ourselves, do we want to do that?
Kevin: Yes. I think Larry Lessig called it “regulable” back in the early internet days. We have these same conversations in blockchain and it’s an important point. Let me ask you one more thing: where does open source fit into this? There’s a big debate about open-weight models and the risks that they provide. And in some ways that seems like it’s more disclosing, but on the other hand it’s potentially out of the control of the developer what people do with the model.
Tim: I’m not sure I have a clear answer on that. In general, I am a fan of the way that open source makes a market more competitive. Obviously, though, there are people who have a lot of concerns about national security. The point that Meta has made in response is, “Hey, we have a lot of industrial espionage, and as in every version of cybersecurity, the notion that you’re just going to build a wall and keep people out is not generally that successful in the end. So you're better off building a robust system that can handle the fact that bad people know things.”
Kevin: Right. So last question as we don’t have too much more time: are you optimistic that we will come up with an approach that makes these technologies appropriately regulable? And if so, what gives you that optimism?
Tim: First off, I don’t think AI is as dangerous as the people who make the big existential risk arguments think. I also don’t think we’re on a path to superintelligence right now. The whole chemical, biological, radiological, and nuclear (CBRN) risk area is important work, though. But it’s not specific to AI. Narayanan and Kapoor in AI Snake Oil talk about the notion that if you’re worried about bio-weapons, then physical biosecurity and access to the kind of equipment that you need to be dangerous is actually probably more important than whether people can get information about how to do it. So again, we’ve got the wrong focus.
There’s also a class of risks that we call commercialization risk. And there are two parts to that. First, do companies have the incentive to do the wrong thing? There is a very real “move fast and break things” risk in AI, as it is driven by a race for monopoly, such that companies might say “Yes, we’re all about AI safety,” but then they fire their AI safety team when it raises roadblocks because really, they want to win the race.
But there’s another risk that we can see using examples from the social media era. Consider the role of social media in the Myanmar genocide. The big takeaway is not that Facebook didn’t have guardrails against hate speech — it is that they didn’t work in Myanmar because they didn’t understand the language; their systems weren’t tuned for the language. So I call that deployment risk.
It goes back to this: The difference between theory and practice is always greater in practice than it is in theory. In theory, we have guardrails. Do we have them in practice? And that’s the question that regulators should be asking. And this goes back to this analogy: “Okay, we tested with crash test dummies (the model pre-deployment), but now we’re not looking at the data from the real world (post-deployment) — we’re not even thinking about the data from the real world.”
And again, this goes back to disclosures: “Hey, you say you do AI safety. What does that look like? What are you actually doing?”
That’s a lot of what our AI Disclosures Project is trying to focus on, which is, what should regulators be asking to see and to know about what a company is actually doing? That requires understanding what safety engineering for AI looks like in practice. And how much a company is spending on it. Are they doing it in all the markets they operate in, or just some of them? For example they might have to say: “Yes, well, we’re offering our services around the world, but we’re only doing most of the safety engineering in English, in the U.S. The coverage in other languages and cultures is negligible.”
Another risk comes from what your third-party developers do. There was a report from Proof News, Julia Angwin’s outfit, about AI and election misinformation. They conducted red-teaming on the leading AI models with election workers who knew what questions people typically asked the model, and they found that the models were giving misinformation about half the time.
But the most interesting result in the paper came about by accident. They had tested a bunch of models in parallel using an API harness, so they could submit all the same questions to multiple models at the same time through their APIs — and they were told, “Your results don’t show our real guardrails because you were using the API, and with the API, it’s the responsibility of the developer.” That’s like Facebook saying: “Yes, we have these policies for protecting user privacy, but guess what? Cambridge Analytica misused it.” If you're saying it's the responsibility of the downstream application developer, then you are admitting that you’re wide open. So either you have guardrails or you don’t. Again, regulators should be really thinking about deployment and not just the risks or questions in theory.
Kevin: Big challenges.
Tim: Yes.
Kevin: All right, so much more we could talk about, but I think that’s a good place to land. Always fascinating to speak with you, Tim. Thanks so much.
Tim: Great to talk with you too.