“Agentic AI” becomes real the moment you try to operate it. The first demo works on a laptop. The second demo needs state. The third demo needs audit logs, retries, credentials, and a way to keep tools from doing something dumb at 2 AM.
One useful framing for 2026 is a three-layer stack:
- LangGraph for durable, stateful workflows (graphs that can resume after failure).
- MCP for standardized tool integration and boundary-setting (what tools exist, what they can do, what context they see).
- Ollama for local inference (lower latency, improved privacy, predictable cost).
This combination shows up in the field because it maps nicely to platform engineering instincts: separate concerns, make state explicit, and put hard seams between components.
Why monolithic chatbots don’t scale
A single “chatbot + plugins” design tends to collapse under operational pressure:
- State is implicit and fragile (context windows, prompt history, hidden memory).
- Tool calls are ad hoc (no consistent contract, no governance).
- Failures are unrecoverable (one timeout and the whole interaction resets).
- Observability is shallow (hard to know which step caused a bad action).
A graph-based workflow plus standardized tool boundaries is a pragmatic response.
Component roles in a production-minded stack
LangGraph: durable execution and state checkpoints
LangGraph’s value is that it treats an agent workflow as a graph with explicit state transitions. That enables checkpoints and resumption. From an ops perspective, this is the difference between “it usually works” and “we can retry step 7 after the database comes back.”
In practice, you want:
- A persistent store for checkpoints (SQLite for local, Postgres for shared environments).
- Idempotent nodes (rerunning a node shouldn’t double-apply a change).
- A dead-letter / manual review path for high-risk actions.
MCP: tool boundaries, not tool sprawl
MCP is useful when you stop treating tools as random code snippets and start treating them as interfaces. A tool is a declared capability with a schema and behavior that clients can reason about.
In an ops setting, MCP becomes a governance surface:
- Which tools exist?
- Which tools can run without approval?
- Which tools can access which data sources?
- How do you log and audit tool calls?
Ollama: local inference for cost and control
Ollama’s appeal is straightforward: run modern open models locally (CPU or GPU), reduce latency, and avoid external API costs for every step of a long workflow. For many internal assistants, “good enough” local models are preferable to “best possible” remote models if the workflows touch sensitive data.
A reference architecture (simple but real)
Here’s a blueprint that maps to how platform teams already deploy services:
- Agent Orchestrator (LangGraph runtime): runs graphs, manages retries, checkpointing.
- Tool Plane (MCP servers): each server owns a domain (tickets, metrics, repos, deployment).
- Model Plane (Ollama): provides local inference endpoints; optionally a hybrid path for high-accuracy steps.
- Storage: checkpoint DB + audit log store.
- Policy: allowlists, approval gates, and secrets scoping.
The payoff is operational clarity: when something breaks, you know which layer broke.
Guardrails that separate a system from a demo
- Human-in-the-loop by default for destructive actions (delete, rotate, downgrade).
- Tool idempotency: “create ticket” should detect duplicates; “apply config” should be declarative.
- Context minimization: MCP tools should fetch only what’s needed (reduce accidental data leakage).
- Versioning: version MCP servers and their schemas; treat breaking changes like APIs.
- Observability: log every node execution and tool call with correlation IDs.
When to go hybrid (local + remote)
Local inference is great for routine steps (classification, summarization, extracting structured data). For rare but high-stakes steps (complex reasoning, unusual incidents), a hybrid approach can escalate a step to a stronger remote model—while keeping the tool plane and audit paths unchanged.

Leave a Reply