The Agentic AI Era Has Officially Arrived
At Google I/O 2026, CEO Sundar Pichai did not mince words. The keynote was titled “Welcome to the agentic Gemini era.” That is not marketing fluff. It is a declaration that the industry has shifted from chatbots that answer questions to agents that take action. Ten years after Google pivoted to AI-first, the company is now all-in on autonomous systems that work across Search, Docs, Maps, YouTube, and the web itself.
The numbers back up the ambition. Google is now processing over 3.2 quadrillion tokens per month across its surfaces, a 7x jump from the prior year. The Gemini app has surged past 900 million monthly active users, more than doubling in twelve months. Daily requests have grown over 7x in the same period. AI Overviews in Search now reaches 2.5 billion users. The world is using AI at a scale that would have seemed impossible just two years ago, when monthly token counts were measured in single-digit trillions.
But what makes this year different is not the scale. It is the shift from passive assistance to active agency.
Google’s Agentic Stack: Spark, Flash, and Antigravity
Google I/O 2026 unveiled a coherent agentic platform rather than a collection of disjointed features. At the center is Gemini 3.5 Flash, a model that claims to be both frontier-capable and four times faster than competing models in output tokens per second. Pichai positioned it in the “top right quadrant” of the intelligence-versus-speed chart, delivering frontier-level accuracy at less than half the price of comparable models. Google claims that if a company processing a trillion tokens daily shifted 80% of workloads to 3.5 Flash, it would save over $1 billion annually.
The model powers the most visible new product: Gemini Spark, a 24/7 personal AI agent that runs on dedicated virtual machines in Google Cloud. Spark integrates with tools through the Model Context Protocol (MCP), operates in the background even when devices are offline, and will soon work through email, chat, and within Chrome itself. It is Google’s answer to the question of what happens when an AI assistant stops waiting for prompts and starts proactively managing your digital life.
Behind the consumer-facing Spark sits Antigravity 2.0, a desktop application for orchestrating cohorts of autonomous agents. Google has been using an optimized version of Flash internally that is reportedly 12x faster than other frontier models, and the company disclosed it now processes over three trillion tokens per day across internal AI developer tools, doubling every few weeks.
Google is backing this push with staggering infrastructure spending. Capital expenditures this year are expected to reach $180 to $190 billion, roughly six times the $31 billion spent in 2022. A key part of that investment is custom silicon: the eighth generation of TPUs introduces a dual-chip architecture with TPU 8t for training and TPU 8i for inference. Google can now distribute training across more than one million TPUs globally, creating what it calls the largest training cluster in the world. For inference, TPU 8i is designed to minimize latency while delivering up to two times better performance per watt.
Search itself is being rebuilt for the agentic era. Information agents will run 24/7 in the background, proactively finding what users need. Search will also gain agentic coding capabilities, building custom dynamic layouts and interactive visuals for individual queries. For longer-running tasks, it will construct persistent custom dashboards that users can return to, effectively creating “mini apps” inside Search powered by Antigravity.
OpenAI’s Self-Improving Agents
While Google was making headlines at I/O, OpenAI published a detailed engineering account of how it built self-improving tax preparation agents using Codex. Working with Thrive Holdings and Crete’s network of over 30 accounting firms, OpenAI deployed a system that processed 7,000 tax returns during the season and measurably improved itself over six weeks.
The results are striking. At launch, only 25% of returns reached 75% correct field completion. Within six weeks, that figure climbed to 86%. The system expanded from simple W-2 and 1099 forms into complex schedules with K-1s and rental properties. The key innovation is a three-part loop: practitioner corrections generate structured evidence, production traces turn that evidence into evaluation targets, and Codex investigates, proposes fixes, validates them against regression suites, and ships improvements autonomously.
This is not a lab demo. It is a production system that proves agents can learn from real-world failure signals rather than requiring engineers to manually inspect every edge case. OpenAI has also open-sourced Symphony, its orchestration framework for building such loops, and published a Frontier Governance Framework aligning its safety practices with emerging regulations including California’s Transparency in Frontier AI Act and the EU AI Act.
The Competition Consolidates
Mistral renamed its consumer product from Le Chat to Vibe, positioning it as a unified agent for both work and code. Vibe’s Work Mode handles multi-step tasks across Google Workspace, Slack, GitHub, and custom connectors. Its Code Mode launches remote coding agents from a web surface or through a new VS Code extension. The Vibe CLI supports session teleportation between local terminals and the cloud, custom skills as slash commands, and sub-agents for specialized tasks.
Cohere released Command A+ under an Apache 2.0 license, a 218-billion-parameter mixture-of-experts model optimized for agentic workflows, tool use, and multimodal document processing. It expands language support from 23 to 48 languages and runs on as little as two NVIDIA H100s thanks to aggressive quantization. Cohere claims it is the first model to consolidate reasoning, multimodal understanding, tool use, and multilingual capability into a single open-weights package.
Anthropic is taking a different tack with Project Glasswing, a security initiative bringing together Amazon Web Services, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorgan Chase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks. The goal is to secure the world’s most critical software supply chains, a foundational requirement if agentic systems are to operate safely at scale across enterprise infrastructure.
The Reality Check: Frontier Models Still Struggle with Enterprise Tasks
For all the excitement, a new benchmark serves as a sobering reminder of how far agentic AI still has to go. ITBench-AA, developed by Artificial Analysis and IBM, evaluates frontier models on real-world Site Reliability Engineering tasks including Kubernetes incident diagnosis. The results: even the best models score below 50%.
Claude Opus 4.7 leads at 47%, followed by GPT-5.5 at 46% and Qwen3.7 Max at 42%. The benchmark requires agents to read logs, trace dependencies, and identify root-cause entities across complex infrastructure snapshots. Models that over-investigate, averaging 80+ turns per task, often perform worse than more focused agents. The open-weights Gemma 4 31B scores 37% at $0.14 per task, outperforming Gemini 3.1 Pro Preview at $2.23 per task on both accuracy and cost.
The message is clear: agents are impressive in controlled environments, but enterprise infrastructure tasks remain a frontier that even the most advanced models have yet to cross.
Infrastructure and Security for the Agentic Era
As agents gain authority to act across systems, the attack surface expands dramatically. NVIDIA is addressing this with DOCA in-silicon security on BlueField-4 DPUs, embedding threat detection, data access control, and network policy enforcement directly into AI factory hardware. The company claims runtime threat detection up to 1,000x faster than software-only approaches, operating independently of the host system so detection remains intact even if workloads are compromised.
The terminology around agents is also maturing. A new glossary from Hugging Face attempts to standardize concepts that have blurred together: a model generates text, a harness executes the loop that makes it act, and scaffolding defines the behavior through prompts, tools, and context management. An agent is the complete system: model plus harness plus scaffolding. This clarity matters as the field moves from experimentation to engineering discipline.
What It All Means
The agentic AI era is not a future possibility. It is the present reality, being built by Google, OpenAI, Mistral, Cohere, NVIDIA, and dozens of others simultaneously. The products announced in the past month represent a coherent shift: models are getting faster and cheaper, harnesses are getting more capable, and agents are moving from demos to production systems that handle real workloads.
Yet the ITBench-AA results are a necessary corrective. The gap between demo and deployment remains wide. Self-improving loops like OpenAI’s Codex-driven tax system point to how that gap closes: not through bigger models alone, but through tighter feedback loops between production use, evaluation, and autonomous improvement. The winners in this era will not just build the smartest models. They will build the systems that learn fastest from real-world use.
For developers and enterprises, the practical takeaway is to start experimenting now. The infrastructure, tooling, and open models are available. The agentic era is no longer coming. It is here, it is shipping, and it is learning.
Sources
- Google I/O 2026: Welcome to the agentic Gemini era — Sundar Pichai keynote transcript and announcements
- Building self-improving tax agents with Codex — OpenAI engineering blog
- OpenAI’s Frontier Governance Framework
- Vibe gets to work — Mistral AI product announcement
- Introducing Command A+ — Cohere open-source release
- ITBench-AA: Frontier Models Score Below 50% — Artificial Analysis and IBM
- Harness, Scaffold, and the AI Agent Terms Worth Getting Right — Hugging Face glossary
- Advancing AI Infrastructure for Agentic AI with NVIDIA DOCA
- Anthropic News — Project Glasswing, Claude Design
