Stairway to GitOps: How Morgan Stanley Scaled Flux to 500 Kubernetes Clusters

At FluxCon NA 2025, Morgan Stanley’s platform engineering team presented a case study that validates GitOps principles at financial services scale. Tiffany Wang and Simon Bourassa described their five-year migration from push-based CI/CD pipelines to a self-service GitOps platform powered by Flux—now managing over 500 clusters, 2,000 nodes, and 100,000 containers.

The Starting Point: Push-Based Pipelines

Morgan Stanley began with traditional push-based CI/CD. Application teams used Helm to push manifests directly to clusters—a pattern familiar to most enterprises. At smaller scales, this worked. But as adoption grew, two failure modes emerged:

  • Configuration drift: Without continuous reconciliation, clusters inevitably diverged from their source of truth. Manual changes and failed deployments left systems in unknown states.
  • Fragile recovery: Cluster rebuilds required heavy coordination. Platform engineers could restore infrastructure, but application teams manually redeployed workloads—an operation fraught with timezone and coordination friction, especially problematic during 2 AM incident response.

These pain points motivated the platform team to decouple delivery from CI/CD pipelines and embrace continuous reconciliation via Flux.

Security and Self-Service

In a regulated financial environment, security controls are non-negotiable. Morgan Stanley leveraged Flux’s native capabilities to implement strict multi-tenancy:

  • Service account impersonation: Flux controllers reconcile manifests using service accounts with no cross-team visibility.
  • Kubernetes RBAC: Native Kubernetes authorization enforces least-privilege access boundaries.

Rather than forcing developers to manage low-level Kubernetes resources, the team built a self-service onboarding platform that:

  • Automates entitlement checks and change control processes
  • Registers services in their CMDB (Configuration Management Database)
  • Primes target namespaces with Flux GitRepository and Kustomization resources
  • Scaffolds ready-to-use application repositories

This demonstrates Flux’s extensibility: the controllers serve as glue between enterprise systems. Developers interact with familiar tooling while CMDB and compliance workflows integrate into GitOps pipelines.

Operating at Scale

The platform team’s scale metrics illustrate the magnitude of their environment:

And now we have over 500 clusters, over 2,000 nodes, over 100,000 containers, and tens of thousands of Flux resources.

— Tiffany Wang, Morgan Stanley

To handle this load without overwhelming Kubernetes control planes, the team tuned Flux’s runtime parameters:

  • Reconciliation intervals: Increased default intervals to balance responsiveness with API server load
  • Controller concurrency: Adjusted --concurrent flags to increase parallel reconciliation throughput
  • Resource management: Monitored and tuned Flux controller resource limits for sustained reliability

Source Migration: From Git to S3

An interesting architectural decision involved moving the source of truth from a self-hosted Git provider to S3 buckets. Driven by high availability and compliance requirements, the team built a mechanism pushing artifacts from CI to S3.

Because Flux’s Source Controller supports Git, OCI, and S3-compatible sources, this transition required only configuration changes—not architectural redesign. The GitOps Toolkit abstraction kept delivery pipelines intact while the source layer changed.

Observability and Feedback Loops

Managing 500 clusters requires comprehensive visibility. The team built centralized Grafana dashboards extending Flux’s open-source observability with custom metrics from kube-state-metrics tailored to developer needs.

They closed the developer experience loop by integrating Flux’s Notification Controller, sending reconciliation events directly to developer-facing pipelines and tools. This provides immediate feedback on GitOps operations without requiring cluster access.

Next Steps

Even after five years, the team continues evolving their platform:

  • Flux Sharding: Distributing Flux controller load across multiple instances within clusters
  • OCI Artifacts: Evaluating OCI registries as primary source of truth for improved performance and immutability
  • Progressive Delivery: Planning Flagger adoption for canary and blue-green deployments

The Morgan Stanley case study demonstrates that GitOps principles scale to the largest enterprise environments when combined with operational discipline and thoughtful platform engineering.

Sources