From Sandboxes to Security: How Cloud Native Infrastructure Is Adapting to the Agentic AI Era

The line between cloud-native infrastructure and agentic AI is getting blurrier by the week. What started as isolated experiments—running Claude Code in a terminal, or asking a Copilot to refactor a module—is quickly becoming a production-grade pattern that demands the same rigor we apply to microservices. The CNCF ecosystem is responding with new tools, new integrations, and new security models built specifically for a world where autonomous agents read files, run commands, and deploy code inside your clusters.

Over the past two weeks, several major projects and vendors have shipped features that signal a broader shift: cloud native is no longer just about orchestrating containers; it is about orchestrating agents. From Falco’s new Prempti tool for agent governance, to Cloudflare’s Claude Managed Agents integration, to k6 2.0’s AI-assisted testing workflows, the infrastructure layer is being retooled for an agent-driven future. This article examines the most significant developments and what they mean for platform teams.

Agents Need Harnesses, and Harnesses Need Feedback

A recent analysis in The New Stack reframed how we should think about coding agents. The argument, drawn from Addy Osmani’s work on agent harness engineering, is that the harness—the prompts, tools, sandboxes, feedback loops, and policies wrapped around a model—often matters more than the model itself. Data from Viv Trivedy showed that the same model, dropped into a different harness, jumped from rank 30 to rank 5 on Terminal Bench 2.0. The implication is clear: if you want better agent outcomes, invest in the scaffolding.

But in cloud-native systems, that scaffolding is harder to build. When an agent works on a single local application, feedback is straightforward: run the dev server, click around, read the test output. In distributed systems, the agent might deploy a canary to Kubernetes, trigger a service mesh route change, or patch a CRD—and the feedback loop spans logs, metrics, traces, alerts, and possibly multiple clusters. The infrastructure to close that loop does not exist by default. It must be built, instrumented, and secured.

This is exactly where the recent wave of CNCF and vendor updates fits in.

Falco Introduces Prempti: Runtime Security for AI Agents

On May 12, the Falco team introduced Prempti, an experimental project that extends Falco’s runtime security model to the AI agent tool-call lifecycle. Falco, a CNCF graduated project and the de facto standard for cloud-native runtime security, has historically monitored containers, Kubernetes, and hosts for anomalous behavior. Prempti applies that same policy-driven detection to what an AI agent does on your machine.

The problem Prempti addresses is real: when Claude Code or Cursor reads your files, runs shell commands, and makes network requests, those actions happen inside your user session with your permissions. Most developers have no structured visibility into that activity. You see the agent’s chat output, but you do not see what happens under the hood.

Prempti runs as a lightweight user-space service alongside the agent. It intercepts tool calls before they execute, evaluates them against Falco rules, and delivers a verdict: Allow, Deny, or Ask. The architecture is simple: a hook fires before each tool call, an interceptor sends the event to Falco via a Unix socket, Falco’s rule engine evaluates it, and the verdict is returned to the agent.

The default ruleset covers six areas: working-directory boundaries, sensitive path access (SSH keys, AWS credentials, .env files), sandbox disable attempts, threat vectors like pipe-to-shell and encoded payloads, MCP and skill content poisoning, and persistence vectors such as git hook injection. Custom rules go in ~/.prempti/rules/user/ and are preserved across upgrades.

Prempti is not a sandbox—it is a policy layer at the agent level. For deep syscall-level visibility, Falco’s kernel instrumentation (eBPF or kernel module) remains the right tool. But for the vast majority of developers who want structured visibility and enforcement without root access, Prempti fills a critical gap.

Cloudflare Becomes the Agent Cloud

On May 19, Cloudflare announced an integration with Anthropic’s Claude Managed Agents, giving developers a fast, isolated execution environment for autonomous code delivery. The pitch is compelling: run the agent loop on Anthropic’s platform (the “brain”), but execute code, secure connections, and run custom tool calls on Cloudflare’s edge (the “hands”).

Cloudflare has spent the past year building out its Developer Platform for exactly this use case. It now offers Sandboxes for stateful Linux microVMs, an Agents SDK for customizable agent frameworks, Browser Run for programmable agent browsers, and Dynamic Workers for sandboxed code execution at scale. The Claude integration brings these together into a turnkey deployment template.

The security model is particularly interesting. Cloudflare sandboxes can use an outbound proxy for dynamic, zero-trust authentication to private services. This lets you inject secrets into requests outside the sandbox, so the agent never has access to them. Combined with Cloudflare Mesh and Workers VPC, agents can connect to internal services running on AWS or on-prem without ever exposing them to the open Internet.

Cloudflare is also offering a lightweight alternative to microVMs: V8 isolates via Dynamic Workers and Codemode. For agents that need to execute arbitrary code but do not need a full Linux environment, isolates boot in milliseconds and scale to tens of thousands of concurrent agents. If you need the full VM, Cloudflare Containers are available. The choice between microVMs and isolates is a single config toggle.

k6 2.0 Brings AI Into the Testing Loop

Performance testing is another domain being reshaped by agents. Grafana’s k6 2.0 release, announced on May 12, introduces a suite of AI-assisted commands designed to make testing faster, more automated, and more composable with agentic workflows.

The new commands are built around a simple insight: as AI agents generate more code, the need for validation grows proportionally. k6 2.0 adds k6 x agent to bootstrap agentic testing workflows, k6 x mcp to expose k6 through a Model Context Protocol server, k6 x docs to give agents CLI access to documentation, and k6 x explore to browse the extension registry programmatically.

The release also improves Playwright compatibility in the browser module, making it easier to migrate existing browser tests to k6. A new Assertions API brings Playwright-inspired expect() matchers to both protocol and browser tests, with auto-retrying assertions for UI elements and non-retrying assertions for static values like HTTP status codes.

For teams running tests at scale, k6 2.0 adds a JSON summary output for machine-readable CI/CD consumption and native OpenTelemetry output for real-time observability. The k6 Operator 1.0 is now stable, enabling distributed performance testing on Kubernetes.

OpenTelemetry Expands: Blueprints, Profiles, and GenAI Conventions

None of this agent infrastructure matters without observability. OpenTelemetry introduced Blueprints and Reference Implementations on May 12, a new effort to reduce the complexity of adopting OTel across the stack. The project addresses a real pain point: users often have to piece together SDK configuration, Collector deployments, instrumentation libraries, and semantic conventions from scattered documentation. Blueprints provide opinionated, end-to-end reference architectures for common deployment patterns.

OTel also continues to expand its signal coverage. The Profiles signal entered public Alpha in March, adding continuous production profiling alongside traces, metrics, and logs. And the Semantic Conventions for Generative AI, published in May, standardize how GenAI operations are recorded—including model names, token counts, prompt content, tool calls, and tool results. This is the telemetry foundation that agent harnesses will need to close the feedback loop.

At scale, the challenges are well-documented. Skyscanner’s recent case study describes managing collectors across 24 production Kubernetes clusters and over 1,000 microservices. Adobe’s pipeline runs thousands of collectors per signal type. These are the telemetry backbones that agent feedback loops will rely on.

Kubernetes Keeps Evolving Underneath It All

While much of the attention is on AI integrations, Kubernetes itself continues to mature. In v1.36, the Mixed Version Proxy (MVP) graduated to Beta and is enabled by default. MVP solves a subtle but serious problem: during a control plane upgrade, an older API server might receive a request for a resource version it does not yet know about, returning an incorrect 404 Not Found. MVP proxies that request to a newer peer API server that can serve it, preventing mistaken garbage collection or blocked namespace deletions.

v1.36 also introduces a new alpha metric, route_controller_route_sync_total, to support A/B testing of the CloudControllerManagerWatchBasedRoutesReconciliation feature gate. The watch-based approach only reconciles routes when nodes actually change, reducing unnecessary API calls to infrastructure providers—an important efficiency gain for large-scale clusters.

What This Means for Platform Teams

The convergence of AI agents and cloud-native infrastructure is not a hypothetical future. It is happening now, across multiple layers of the stack. Platform teams should be thinking about three things:

Agent governance: Tools like Prempti give you policy-driven visibility into what agents do on your infrastructure. If you do not know what your agents are touching, you cannot secure it.
Agent infrastructure: Sandboxes, isolates, and VPC connectivity are becoming first-class primitives. Cloudflare’s Claude integration is a template; expect more vendors to follow.
Agent observability: OTel’s GenAI conventions and k6’s MCP integration show that observability is being rebuilt around the assumption that agents, not just humans, are the consumers of telemetry.

The harness framing is useful here. A good agent harness needs tools (MCP servers, CLI access), policies (Falco rules, egress controls), feedback (OTel traces, test assertions), and infrastructure (sandboxes, Kubernetes, service mesh). The CNCF ecosystem is building all of these, but they are not yet a turnkey package. Platform teams that assemble them early will have a significant operational advantage.

The next phase will likely involve standardization: common agent identity models, standardized harness APIs, and perhaps even a Kubernetes-native agent controller. For now, the pieces are falling into place rapidly. The cloud-native stack is becoming the agent-native stack—and that is a transformation worth watching closely.