When Tiffany Wang and Simon Bourassa from Morgan Stanley took the stage at FluxCon NA 2025, they brought receipts. Their team manages over 500 Kubernetes clusters, 2,000 nodes, 100,000 containers, and tens of thousands of Flux resources. Their talk, “Stairway to GitOps,” documented a five-year journey from traditional push-based pipelines to a self-service GitOps platform that serves one of the world’s largest financial institutions.
The Starting Point: Push-Based Pain
Like many organizations, Morgan Stanley began with conventional CI/CD. Application teams used Helm to push manifests directly to clusters. This worked—until it didn’t. As the deployment footprint grew, familiar problems emerged:
- Configuration drift: Without continuous reconciliation, clusters diverged from the source of truth. Manual changes and failed deployments left systems in inconsistent states.
- Fragile recovery: Rebuilding clusters required extensive coordination. Platform teams could restore infrastructure, but application teams had to manually redeploy workloads—often across time zones, often at inconvenient hours.
The team recognized they needed to decouple delivery from the pipeline and embrace continuous reconciliation—core principles of the GitOps methodology.
Step 1: Security-First Multi-Tenancy
In a regulated financial environment, security isn’t negotiable. Morgan Stanley chose Flux specifically because it fit their strict multi-tenancy requirements. The platform leverages Flux’s service account impersonation and native Kubernetes RBAC to enforce least-privilege access. Controllers reconciling manifests for one team have zero visibility into another team’s resources.
To streamline adoption, they built a self-service onboarding platform that automates entitlement checks, registers services in their CMDB, and primes target namespaces with the necessary Flux GitRepository and Kustomization resources. New application teams receive scaffolded, ready-to-use repositories without needing deep Kubernetes expertise.
Step 2: Operating at Scale
With tens of thousands of resources reconciling continuously, performance tuning became essential. The Morgan Stanley team focused on three key areas:
Reconciliation Intervals
The team increased platform defaults, tuning intervals to balance responsiveness with control plane load. Not every resource needs immediate reconciliation—finding the right cadence reduced unnecessary API calls.
Controller Concurrency
By adjusting the --concurrent flags on Flux controllers, they increased how many reconciliations could happen in parallel. This prevents bottlenecks during bulk operations while maintaining system stability.
Resource Management
They monitored and adjusted resource limits for Flux components to ensure reliability under sustained load. Running controllers with appropriate CPU and memory guarantees prevents eviction during peak periods.
Step 3: Git to S3 Transition
An interesting architectural decision: Morgan Stanley moved from a self-hosted Git provider to S3 buckets as the source of truth. Driven by high availability and compliance requirements, they built a mechanism to push artifacts from CI to S3.
This transition was possible because Flux’s Source Controller supports multiple backends: Git, Helm repositories, OCI repositories, and S3-compatible buckets. The GitOps Toolkit architecture makes this kind of swap straightforward—you change the source layer while keeping the delivery pipeline intact.
Observability at Fleet Scale
Managing 500 clusters requires comprehensive visibility. The team built centralized Grafana dashboards providing a unified view of their entire fleet. They extended open-source Flux dashboards with custom metrics from kube-state-metrics, tailored to their developers’ needs.
The Notification Controller plays a critical role, sending success and failure alerts directly to the pipelines and tools developers already use. This closes the feedback loop—developers don’t need to monitor Kubernetes directly to know whether their deployments succeeded.
What’s Next
Even after five years, the Morgan Stanley team continues evolving their platform. Their roadmap includes:
- Flux sharding: Distributing controller load across multiple instances within clusters
- OCI artifacts: Moving toward “Git-less GitOps” for improved performance and security
- Progressive delivery: Adopting Flagger for canary and blue-green deployments
Their journey demonstrates that GitOps at enterprise scale isn’t just possible—it’s operational reality. The key is treating the platform as a product, iterating based on real user feedback, and leveraging the extensibility of tools like Flux to integrate with existing organizational processes.
Sources
- Flux Blog – “Stairway to GitOps: Scaling Flux at Morgan Stanley” (March 15, 2026)
- FluxCon NA 2025 presentation recording
