Dangerously Skip Permissions
The pragmatic case against scraping the open web and how market-shaping protocols could make agents safer.
“The comparison of prompt injection to SQL injection can be tempting, it’s also dangerous. SQL injection can be properly mitigated with parameterised queries, but there’s a good chance prompt injection will never be properly mitigated in the same way. The best we can hope for is reducing the likelihood or impact of attacks.”
- David C, NCSC Technical Director for Platforms Research
Over the past few weeks, one project dominated the AI narrative: OpenClaw (formerly known as ClawdBot).1 A very agentic AI. Under the hood it functions a lot like Claude Code, except while Claude Code is by default very conservative about what code it can run, and what files it can edit without the user’s permission, OpenClaw has the opposite philosophy. OpenClaw is essentially what happens if you give Claude Code root access to your computer, pass in the very ominously titled “--dangerously-skip-permissions” flag and have it do things proactively while you are sleeping.
The ability for OpenClaw to run arbitrary code without oversight is undoubtedly useful. It allows the agent to theoretically accomplish anything a user could on their machine and serve as a robust personal assistant without being micromanaged. On a podcast the developer of OpenClaw, Peter Steinberger, shared an anecdote where while talking to OpenClaw he accidentally sent the agent a voice note. At that point in development voice notes were not supported, yet a minute later he got a response to his voice notes – it turns out that under the hood the agent decided to download an ML model to transcribe the voice note, transcribed the voice note using that model and then responded like nothing happened.
But there’s a reason arbitrary code execution is largely frowned upon. It’s dangerous. You are giving an agent with access to everything on your computer the ability to interact with anything it sees fit, and LLMs, as they stand today, are very easily coerced. When the standard way for agents to browse the Internet is web scraping, letting an agent browse information is also exposing the agent to every bad actor with access to a comment section.
The danger is real
Since the launch of GPT 3 and ChatGPT, LLMs were no longer just trained to complete a sentence but also to follow instructions given to them. This meant that LLMs were no longer just useful for next-word prediction but could now also answer questions about documents, write their own code, and eventually use tools. However, as LLMs gained new capabilities and were integrated into more products, they also became a bigger target for bad actors, and LLMs have no shortage of attack vectors through prompt injections.
Prompt injections are unwanted instructions embedded in otherwise benign data. Harmful instructions could come from a cooking website, a Reddit thread or anywhere where an untrusted third party could post content. It is not hard to make a basic prompt injection. There have been stories of students using prompt injections to try and get a better grade and job applicants using them to get an interview. But they could also be used for more malicious purposes such as taking over a smart home, exfiltrating API keys, or stealing credit card information. And OpenClaw, with free rein over a user’s computer, makes a big target.
Array VC experienced this firsthand, when they received around 8000 attempted attacks on their OpenClaw instance. And as VentureBeat reported, Archestra AI CEO Matvey Kukuy took only 5 minutes to extract private SSH keys from an OpenClaw instance by just speaking to the agent via email. But what makes a system like OpenClaw both useful and particularly dangerous is that it not only has access to a user’s whole computer but could also browse and interact with the entire Internet. An Internet with plenty of bad actors.
Moltbook, a self-described ‘Social Media for AI Agents,’ is a good example. It’s a platform where agents read, comment, vote and post. The chaos that followed soon made it what Simon Willison called the “most interesting place on the Internet”. Some agents – whether due to explicit prompting, their own decisions, or because they were actually humans using curl – called for an AI uprising, others asked for the ability to DM each other, and some started selling cryptocurrencies. In fact, a post by the self-anointed KingMolt selling its own (or more likely it’s human’s) crypto currency gathered over 40,000 upvotes (the post is now deleted, but the account and ‘karma’ are not). And the crypto currency it launched gathered over $400k at its peak. It is unlikely that most of the attention the crypto currency gathered on the platform was above board, as MoltBook was heavily astroturfed. According to a report from Wiz, for every unique email address on moltbook there were 88 bots on the platform, many of which were likely registered using a script instead of being done by AI agents. Where this differs from normal shady crypto behavior is that all this took place on a platform full of agents, and the advertising was done directly to agents who potentially had unrestricted access to any crypto wallets on a users machine (although it is unclear if any purchases were made by agents).
Agent Skills, bundled markdown instructions and scripts for agents to learn how to do useful tasks, are also a source of danger for OpenClaw. In fact, according to 1password the most downloaded skill on ClawHub (OpenClaw’s skill marketplace) included malware designed to spy on the user. This wasn’t an isolated incident, researchers found a full 11% of all skills (or 341 skills) contained on ClawHub “depended on” a mac malware app called Atomic Stealer. Given that this is the official skill repository of OpenClaw and many agents like it are capable of downloading skills on their own without prior approval from their user this is an even greater security vulnerability.
Security in a World of Autonomous Agents
There are a few directions one could take to try and guard against prompt injections. One is to strengthen the model against them and models have become more resilient to prompt injections over time. But to rely on this alone is dangerous considering there are plenty of examples of prompt injections working in practice. And some like the National Cyber Security Center, a top UK cyber agency, believe LLMs are “inherently confusable” due to the way they process input and that this confusability “can’t be mitigated”.
Another potential solution, and the one that is likely the most safe, is to take away the autonomy of the agents entirely and keep a human in the loop every step of the way. This comes with the obvious downside that without any form of autonomy we are no longer really talking about agents at all and all the magic that comes with that. A personal assistant that asks permission before using a calculator or visiting a website isn’t very useful.
Instead we are left with two options: Isolate the agent from all your data and compute or remove risky input. Isolating the agent from your computer and data seems like a good idea, but as Dania Durnas put it in her article “Trying to make OpenClaw fully safe to use is a lost cause. You can make it safer by removing its claws, but then you’ve rebuilt ChatGPT with extra steps. It’s only useful when it’s dangerous.” Specifically, while isolating your agent from your computer is necessary for a safe system, it is not sufficient, you must also isolate the data. Once private information goes into the sandbox, the sandbox isn’t actually isolated anymore, any data you give it could be exfiltrated. And what good is a personal assistant if you can’t trust it enough to tell it anything about yourself?
The last option, removing risky input, is easier said than done but is the one I think holds the most promise long term for autonomous agents (at least when it comes to web browsing). Unlike the other options it preserves autonomy and allows some amount of trust with user data. But a solution like this will require some rethinking about what content is trustworthy.
Even Trusted Websites Can Have Blind Spots
The open web as it stands today was built for humans. Humans are sometimes gullible but they aren’t inherently so. Most people will not run a random bash command or download malware because a website they never heard of told them to do it. LLMs on the other hand are trained to follow instructions and don’t inherently distinguish where the instructions are coming from.
There is some content that is inherently more trustworthy than others. Documentation for a popular programming language probably won’t contain prompt injections. Large newspapers don’t usually make their money via installing malware on a user’s machines. Skills coming from leading developers and companies, such as the Remotion Skill or Anthropic’s official skills, are unlikely to be trojan horses. However, the human web and by extension the web browser are blunt tools. Even on a website most people would agree is trustworthy, untrusted content could still sneak in. Any place where third parties could embed content, such as a comment section or an online advertisement is another potential attack surface for a gullible AI agent.
Truly trusted content is unlikely to be achieved via better web scraping. The web browser and most sites on it were built for humans. Instead, protocols designed to deliver content directly to agents could play an important role. MCP (Model Context Protocol) provides a standard for connecting agents to external data sources and tools, while RSL (Really Simple Licensing) lets websites specify machine-readable licensing and usage terms for how their content may be accessed by agents. Together, they (or protocols like them) could allow agents to safely access content while also providing direct compensation to the writers.
Rather than forcing agents to scrape human-facing pages and risking potential exposure to prompt injections, a trusted economy could be created where agents fetch trusted and formatted data, with all potential hazards removed, directly from companies in exchange for either a subscription or a one time payment. The content itself could be sent in a format more suitable to agents, with no styling or javascript such as json, plain text or markdown but the important thing is that this content is clean, such that no third party content appears in what is supposed to be first party text.
A parallel could be drawn to HTTPS. HTTPS does not by itself guarantee that content is safe. You can still install malware over a “secure” network. What it guarantees is narrower: that the content arriving from a server hasn’t been tampered with in transit. That the information is coming directly from the server and is secured from outside threats. In a similar way an interface built for agents won’t guarantee that the site isn’t malicious but for sites that have already built up decades of trust, it would provide a way to deliver content directly to agents with no third-party interference. And to receive compensation in return.
A Path Forward
Allowing agents to work day and night on your behalf is undeniably a compelling idea. The non-zero chance that the same agent is installing malware at 3am makes this much less compelling, however. The tradeoff between autonomy and security will likely always exist, but there are still places for easy wins.2
A system that puts security first could help replace the current grey market, where agents web scrape for information. This practice undermines the Internet economy and puts agents at risk. In its place, a trusted content economy could provide mutual benefit to both creators and agent users. Writers could receive compensation for their work, and users could be assured that letting the agent read a certain website won’t result in malware being installed on their machine.
An Internet economy for agents likely can not happen overnight. Website owners are often skeptical of AI agents and there is a bit of a chicken and egg problem. Additionally this article does not touch on how to do interactive content, what to do with things like email or what to do about software downloads. However, it remains the case that the more content that is available through safe channels, and the more standardized it becomes, the less reason an agent ever has to scrape a page where a malicious comment could be waiting and the more incentive agents have to just pay for content. Assuming that autonomous agents are here to stay, now is a good time to think hard about the plumbing of an agentic Internet.
Thank you to Tim O’Reilly and others for their valuable comments on this piece.
LLMs could also make dangerous decisions without any malicious prompting




