Cloud Native: Cloudflare’s Code Mode MCP server is a blueprint for ‘two tools, infinite API’ agents

The fastest way to make an agent “useful” is to give it tools. The fastest way to make an agent “dumb” again is to give it too many tools. That tension is baked into the Model Context Protocol (MCP): tools need descriptions and schemas, and those cost tokens — the same scarce resource you want to spend on reasoning, planning, and actual user work.

Cloudflare’s latest twist on this problem is straightforward and slightly heretical: don’t export 2,500+ API endpoints as 2,500 tools. Export two. Then let the model write code that searches a typed OpenAPI spec and executes the right API calls inside a sandbox. Cloudflare calls the pattern “Code Mode,” and they’ve released an MCP server for the entire Cloudflare API that relies on it.

Under the hood, this is a cloud native story: it’s an API, a spec, a sandbox, and a deployment model that assumes untrusted inputs (including prompt injection) will happen. The agent UX sits on top of a security boundary — exactly how we’ve built distributed systems for the last decade.

The core idea: move tool selection into code, not prose

Traditional MCP servers are “tool-per-operation.” That’s fine for small surfaces. But for large APIs, the context-window math gets absurd. Cloudflare argues that a naive “every endpoint is a tool” approach would require over a million tokens of tool description just to load the API surface area into the model.

Code Mode takes a different stance: give the model a typed representation of the OpenAPI spec outside the model context, and let it query that spec via code. The model doesn’t need to see the whole spec; it needs a way to ask questions like “what endpoints mention rulesets under /zones/?”

In Cloudflare’s MCP server, the tool surface area becomes:

  • search(): run a JavaScript async function that inspects the spec object (with $refs pre-resolved).
  • execute(): run a JavaScript async function that calls the Cloudflare API via an authenticated client.

The key is that both are executed inside a Workers isolate. You’re basically treating “agent tool use” as “user-submitted untrusted code,” and you wrap it in the same kind of sandboxing you’d use for multitenant compute.

Why this matters: context windows are now an architectural constraint

We talk about context windows like they’re a model feature. In practice, they’re an infrastructure limit with budget pressure. Every tool schema competes with:

  • system prompts and safety rails
  • conversation history
  • retrieved documentation and snippets
  • the model’s own chain-of-thought (hidden or not)

Once you accept that, you stop treating MCP servers as “API adapters” and start treating them as “token budgets with an IO boundary.” Code Mode is one answer: keep the interface small, and make the model generate compact programs as the plan.

Security isn’t optional when your tools can change production

Cloudflare’s post is refreshingly explicit: prompt injection and untrusted external content are expected, and the execution environment should not have ambient access to secrets. Their server-side execution uses a sandbox that (by default) avoids file system access, avoids leaking environment variables, and disables outbound fetches unless explicitly allowed.

This is worth calling out because agent tooling has a tendency to re-learn old lessons. If you’ve ever operated multi-tenant systems, “don’t let arbitrary user input run with your credentials” is not a new insight. But in the agent space, it’s easy to accidentally create exactly that: a model that reads a web page, gets tricked by an instruction, and then calls a privileged tool.

A Workers isolate isn’t magic, but it’s a real boundary — and boundaries are the only thing that keep “helpful automation” from turning into “automated incident.”

A practical workflow: search → narrow → execute

The post’s example (configure DDoS/WAF controls) shows the “agent loop” you want for large APIs:

  1. Discover: use search() to find relevant endpoints and confirm schemas (including enums like ruleset phases).
  2. Plan: decide what needs to change based on current state (list existing rulesets, inspect configurations).
  3. Act: use execute() to chain multiple API requests in one tool call, including pagination and response handling.

Notice what’s missing: “pick from a giant tool list.” Instead, the model writes code that navigates a spec and uses a client. That’s a closer match to how human engineers actually work with big APIs.

Where this pattern breaks (and where it shines)

Code Mode isn’t a universal win. It shifts complexity from “tool catalog” to “runtime.” You now need:

  • a safe execution environment for model-generated code
  • a stable typed interface (SDK/OpenAPI representation)
  • strong guardrails for what outbound calls are allowed
  • observability on what code ran and what it changed

But for large, fast-evolving APIs, it solves a real operational problem: tool surfaces don’t scale linearly with endpoint count. Two tools do.

What platform teams should take away

If you’re building internal MCP servers, Cloudflare’s approach suggests three heuristics:

  • Minimize tool count when the API surface is huge; don’t pay token rent for thousands of endpoints.
  • Prefer typed specs over natural-language endpoint descriptions; code is compressible.
  • Sandbox execution as if every prompt is hostile, because eventually one will be.

In other words: treat agents as distributed systems clients. The best agent tooling will look less like “prompt wizardry” and more like “a careful API runtime with constrained capabilities.” Cloudflare’s two-tool server is a clean step in that direction.

Sources