Weekly Roundup (February 5, 2025)
Zuckerberg on personalized AI, Sam Altman cozying up to government, and more...
Your AI best friend and echo chamber. In a Facebook post, Mark Zuckerberg shared Meta’s fourth quarter earnings report and his vision for the next year at Meta, declaring, “[Meta has] a really exciting roadmap for this year with a unique vision focused on personalization. We believe that people don't all want to use the same AI — people want their AI to be personalized to their context, their interests, their personality, their culture, and how they think about the world. I don't think that there's going to be one big AI that everyone just uses the same thing. People will get to choose how AI works and looks like for them.”
We completely support the idea of a decentralized, personalized AI future, where individuals and companies can build their own cooperating AI applications and services without paying a tax to a winner-takes-all centralized repository of all human knowledge. We envision a decentralized AI architecture, analogous to the decentralized web infrastructure that exists today, based on communications standards and protocols for cooperating AIs. And Meta’s release of the open weight Llama models is a powerful element of that infrastructure. So is the lower cost of inference that DeepSeek seems to promise.
But it is hard to reconcile this vision with personalized services from a centralized social network designed to monetize the activity of its participants for itself. We’ve already seen how pernicious feedback loops can be in simple recommendation algorithms, and how these feedback loops seem to have become an integral part of the monetization strategy of social media. There are also growing concerns around the erosion of relational barriers between humans and AI. Personalized AI may further increase engagement and addiction. These sorts of risks largely go unacknowledged in corporate AI safety policy, highlighting the way that commercial pressures shape what companies view as a risk.
Elon and Sam wrestling in the Oval Office. Last Tuesday, Jan 28th, OpenAI released ChatGPT Gov, a “new tailored version of ChatGPT designed to provide U.S. government agencies with an additional way to access OpenAI’s frontier models.” OpenAI CEO Sam Altman was lobbying in Washington last week, presumably promoting this new service and vying for continued protection from the U.S. government. A government contract could be a huge boon for a company that may have run at a loss of $5bn in 2024.
These developments are alarming from a commercial viewpoint. As Lina Khan might say, what does it do to competition when government has an outsized reliance on a single company? Moreover, is AI currently capable enough to replace employees and increase efficiency? A conservative think tank’s plan to replace federal workers with AI thinks so. A report from TechTonic Justice suggests otherwise, finding that many recent implementations of AI in government have gone poorly, such as falsely accusing 40,000 people in Michigan of unemployment benefits fraud and hindering access to SNAP benefits in Rhode Island. When we think about AI safety, we must take into account how developers and governments themselves can be threat vectors — in this case if tech becomes excessively concentrated or rolled out prematurely.Three mistaken assumptions in the debate on DeepSeek and Export Controls. A good post from the ChinAI substack. “I want to briefly react to this post by Dario Amodei, Anthropic’s CEO, which argues that DeepSeek’s success only reinforces the case for export controls. I’ve stated before that the U.S.’s chip controls represent a fascinating alliance between three camps: 1) China hawks who want to be tough on China for the sake of being tough on China; 2) “small yard, high fence” folks concerned about dual-use risk of AI; 3) people who believe AGI risks supersede any other causes. Dario’s post neatly reveals some of the key mistaken assumptions of this precarious throuple.” We agree. We believe instead that DeepSeek should remind U.S. policy makers that we will win by having the most competitive AI development marketplace, not the one with the biggest moats. We believe that the integration of affordable AI across the economy fosters innovation and economy-wide competition far more effectively than expensive AI concentrated in the hands of a few national champions.
Big Tech, savior or captor? Former FTC chair Lina Khan published an Op-Ed in the New York Times yesterday, advocating for reduced government protections for the leading AI developers, who seem to be encouraging regulations that act as a moat against competition. ”DeepSeek is the canary in the coal mine”, she argues. “It’s warning us that when there isn’t enough competition, our tech industry grows vulnerable to its Chinese rivals, threatening U.S. geopolitical power in the 21st century.” She comes down in favor of disclosures and openness to help foundation model markets becomes less concentrated: “At the Federal Trade Commission, I argued that in the arena of artificial intelligence, developers should release enough information about their models to allow smaller players and upstarts to bring their ideas to market without being beholden to dominant firms’ pricing or access restrictions. Competition and openness, not centralization, drive innovation.“
Safety requires institutions, hardware, and software mechanisms. One thing we're reading is a 2020 paper by Miles Brundage and two dozen others, on the importance of institutions, software, and hardware mechanisms to AI governance. The intertwining of these three we think can help develop effective disclosures and standards. The authors note: "Institutional mechanisms shape or clarify the incentives of people involved in AI development and provide greater visibility into their behavior, including their efforts to ensure that AI systems are safe, secure, fair, and privacy-preserving. ...In this report, we provide an overview of some such mechanisms, and then discuss third party auditing, red team exercises, safety and bias bounties, and sharing of AI incidents in more detail." It's (shockingly) rare for AI safety work to talk about incentives, unless it's the incentives of the model due to its technology, entirely divorced from its corporate owners.
EU AI Act takes effect. The first phase of the EU AI Act began this week. This initial stage prohibits only the most risky of AI applications, such as social rating systems, real-time public facial recognition, and predictive policing, among other uses. As the most far reaching piece of global legislation to date, we’ll be watching closely to see how this regulatory experiment plays out.
Bitkom, a German technology business association, released a position paper ahead of the initial phase, emphasizing the importance of standards in creating effective legislation. They urge lawmakers to create timely, consistent, and horizontally compatible standards for developers. We agree that the road to successful AI regulation involves standardization across legislative regimes — a move that would both ease regulatory constraints on developers while offering consumers a clearer understanding of the technology they’re interacting with. The position paper underscores how standardization is optimal for businesses and consumers, but it’s important that big tech is not allowed to “defang” the regulation in the name of acceleration and competition. While practical technical solutions are important, we must not fall into the trap of believing that AI safety is a purely technical problem to be fixed with technical solutions alone, nor that big tech always has society’s best interests at heart.
We haven’t yet had time to digest The International AI Safety Report developed by Yoshua Bengio and 100 other researchers and released in advance of next week’s Paris AI summit. It’s 300 pages, and claims to be “the world’s first comprehensive synthesis of current literature of the risks and capabilities of advanced AI systems.” We’ll be taking a look and reporting what we think might still be missing.
Another inch towards solving jailbreaking. Anthropic has advanced AI safety by publishing a paper on a new framework that trains classifier safeguards using a set of explicit constitutional rules. This approach involves training multiple classifiers to detect key risks and applying them to block harmful prompts and responses. Early results are promising — no universal jailbreak was found after 3,000 hours of red teaming — though limitations remain. The testing prototype, which prioritized robustness, rejected 44% of all Claude.ai traffic, and red teamers still managed to elicit some harmful outputs.
Anthropic acknowledges these challenges, but results with a less sensitive version of the classifiers showed an 81.6 percentage point reduction in successful attacks with only a 0.38 point increase in false refusals. Considering how challenging a problem jailbreaking has been, this is an exciting advance and a demonstration of how Anthropic has been continuing to make good on its promise to prioritize safety.
Thanks for reading! If you liked this post please share it and click “subscribe now”, if you aren’t yet a subscriber.