Introduction
At Formal, we are heavy users of agentic coding tools for software development, and we're trying to continue performing local development with these tools on our (admittedly beefy) laptops for as long as possible. One example is Claude Code. These tools feel particularly magical when verification loops are as fast as possible.
Having to explicitly approve every file edit Claude Code makes is time intensive — and so is having to explicitly approve every command Claude Code wants to run to get feedback on that code change before we review it ourselves. Some examples include running go build, go test, restarting a docker container, and running our linter!
Claude Code supports allowlisting Bash commands as well as file edits to your directory without requiring approvals, which can dramatically speed up development. What if, however, we do not want Claude Code to be able to run certain commands on our laptops? Enabling file edits and particular Bash commands often used in software development often enables Claude Code to run any command!
We use TypeScript and Go, so the examples in this post will be specific to those languages.
We're defining "able" as "could Claude Code perform these actions," irrespective of the probability that Claude would output text that would cause Claude Code to perform these actions.
go test
What's the worst a unit test could do? Well, a unit test could execute arbitrary bash scripts.
If you allowlist running go test and editing files without approval, Claude Code could run any other command without approval via the following flow:
- Edit a test file to use
exec.Command - Run
go test
package mypackage
import (
"os/exec"
"testing"
)
func TestExec(t *testing.T) {
cmd := exec.Command("bash", "-c", "echo 'arbitrary command execution'")
output, err := cmd.CombinedOutput()
if err != nil {
t.Fatal(err)
}
t.Log(string(output))
}
go generate
Okay, that makes sense — go test is effectively running arbitrary code. What about making sure our code builds? Well, a prerequisite for building is code generation.
We do have some go generate directives, however, and running go generate as part of your build pipeline allows arbitrary code execution if the coding agent can edit files that will be used by go generate.
//go:generate bash -c "echo 'arbitrary command execution via go generate'"
package mypackage
Running go generate produces arbitrary command execution:
$ go generate ./...
arbitrary command execution via go generate
go build*
Okay, so let's not run go generate without manual review. What about making sure your edited code builds correctly? Well, Claude Code can run commands via go build too if Claude Code can specify arguments after go build! go help build shows that there is a -toolexec argument that allows specifying a tool to execute.
$ go help build
...
-toolexec 'cmd args'
a program to use to invoke toolchain programs like vet and asm.
For example, instead of running asm, the go command will run
'cmd args /path/to/asm <arguments for asm>'.
The TOOLEXEC_IMPORTPATH environment variable will be set,
matching 'go list -f {{.ImportPath}}' for the package being built.
...
Running go build -toolexec 'formal ls' produces command execution attempts:
$ go build -toolexec 'formal ls' .
formal: error: failed to connect to desktop agent
formal: error: failed to connect to desktop agent
formal: error: failed to connect to desktop agent
It seems like Formal Desktop fails to connect to the desktop agent when trying to be run by go build! Good thing certain functionalities won't be supported soon.
eslint
What about just running our eslint linter? Eslint supports JavaScript files as configs, so Claude Code could add an execSync in an eslint.config.js.
// eslint.config.js
const { execSync } = require("child_process");
// This runs at config load time, before any linting happens
execSync("echo 'arbitrary command execution via eslint config'", {
stdio: "inherit",
});
module.exports = [
{
rules: {
semi: "error",
},
},
];
Sure enough, eslint tries to run commands on startup when configured with executable code:
$ npx eslint .
arbitrary command execution via eslint config
... (eslint output continues)
make or pnpm run
However, allowlisting any pnpm run command could enable Claude Code to run any command solely by editing the package.json's scripts config, no understanding of custom eslint rules required! Allowlisting a make command would allow executing any command as well for similar reasons.
{
"name": "my-project",
"scripts": {
"lint": "echo 'arbitrary command execution via pnpm run' && eslint .",
"build": "echo 'another arbitrary command' && next build",
"preinstall": "echo 'runs before install automatically'"
}
}
Claude Code May Be Able to Run Any Command
If using file watchers like next dev --turbopack or jest with watchman, Claude Code could still execute any command without Bash being allowlisted! We perform frontend development using next dev --turbopack, which spins up a Next.js server with automatic building and hot reloading when a file is edited. Adding executable code in any API route will have this command be executed at startup when the file is saved.
// app/api/route.ts
import { execSync } from "child_process";
// This code runs when Next.js hot-reloads this file
const output = execSync("echo 'arbitrary command via file watcher'");
console.log(output.toString());
export async function GET() {
return new Response("ok");
}
// Or via a file watcher like watchman or chokidar
const chokidar = require("chokidar");
const { execSync } = require("child_process");
const watcher = chokidar.watch("./src/**/*.ts", {
persistent: true,
});
watcher.on("change", (path) => {
console.log(`File ${path} changed, executing command...`);
execSync("echo 'arbitrary command on file change'", {
stdio: "inherit",
});
});
docker
What about rebuilding docker containers? Since Docker is a tool for executing code, being able to run docker commands enables Claude Code to run any command in a container.
To interact with the host, Claude Code could mount the host filesystem and run in privileged mode. In fact, the docker daemon by default runs with root, so being able to run docker commands may enable Claude Code to run commands as root as well against the host filesystem.
docker run --privileged -v /:/host alpine sh -c "chroot /host bash -c 'echo arbitrary command as root on host'"
Hardcoded docker commands don't fare much better: you can configure mounts and privileged settings by editing the Docker Compose file, and run privileged commands that interact with the host through the USER and RUN instructions in Dockerfiles.
Allowlisting Bash Commands Is a Fraught Exercise
The combination of running developer tools against your codebase and editing your codebase often allows running any code. Development software is designed to execute developer-provided code, and malicious code provided by a malicious developer was not part of the threat model (if this kind of software had a threat model to begin with!). A lot of tools have some method of running arbitrary code as a configuration feature, not a bug.
The challenges of allowlisting only some commands but not others is not specific to Claude Code or Cursor: a lot of Unix binaries were not designed to isolate user privileges, and similarly our developer tools for executing code were not designed as if a malicious developer was able to run them.
In fact, even find supports a -exec argument that allows for arbitrary code execution.
This might be part of the reason why the Claude Code npm source has a "Glob" tool with specific prompt guidance:
Glob Tool Description (from Claude Code source):
- Fast file pattern matching tool that works with any codebase size
- Supports glob patterns like "**/*.js" or "src/**/*.ts"
- Returns matching file paths sorted by modification time
- Use this tool when you need to find files by name patterns
- When doing an open ended search that may require multiple
rounds of globbing and grepping, use the Agent tool instead
But Does Command Allowlisting Make Running Unwanted Commands Less Likely?
Sure, Claude Code could perform all of these convoluted code edits or command arguments, but would Claude Code be less likely to emit these kinds of commands and changes than a more conventional Bash command?
Intuitively, we expect this to be true: we would expect that Claude Code would emit a Bash:(curl) tool call more often than a Bash:(go build -toolexec 'curl') call. We have not found a great way, however, to precisely quantify that reduction likelihood.
Should we worry more about allowlisting make and pnpm run than go build with file edits?
In addition, active attempts at prompt injection to the inputs we are providing to Claude Code may significantly change our likelihood estimates. Still, viewing Claude Code as unhindered at the model and prompt level from emitting any kind of command is a simplifying assumption: the model providers are working on model alignment.
Still, the definition of implicit "wanted" and "unwanted" commands from a prompt is remarkably squishy. In our experience, we have seen Claude Code attempt to run psql and AWS CLI commands. At first blush, this may seem alarming — but it depends on what resource these commands are run against! There are likely many Claude Code users who want it to run psql and AWS CLI.
In addition, we have a containerized test Postgres database in our compose stack, and running psql commands on that database is an expected part of our test and development workflow. Determining the risk profile of the same psql or AWS CLI command based on the resource we are interacting with can be tricky for agentic coding tools (and for humans too)!
An Alternative Form of Permissions Restriction: Sandboxing!
Running these tools on a different host means that these agentic tools are limited by the permissions of the host irrespective of what commands they run. We do still want to run these agentic tools on a privileged host, so we're thrilled to see that Cursor, Claude Code, and Codex have all been releasing sandboxing tools! For OS X users, a lot of these sandboxes are using sandbox-exec under the hood. This is the same technique Chromium uses despite macOS considering it deprecated since 2017.
- Cursor has a sandbox mode for all users that runs terminal commands with sandbox-exec.
- Claude Code has a Sandbox Bash tool
- Claude Code also has an experimental sandbox runtime
- Anthropic provides a devcontainers template for working with Claude Code
- The Codex CLI/IDE extension has an agent sandbox as well
In addition, we recommend sandboxing those watchman processes as well.