From diffusion language models that break free from token-by-token generation to async batching that reclaims 25% of wasted GPU time, AI inference infrastructure is undergoing a fundamental transformation in 2026.
May 2026 brings major milestones for three CNCF projects: Kyverno 1.18 hardens security post-graduation, Microcks reaches incubation with 2.5M downloads, and Fluid helps NetEase Games cut LLM cold starts from 42 minutes to 30 seconds.
OpenTelemetry graduates from CNCF, k6 2.0 introduces AI-assisted testing workflows, Prometheus 3.12 patches security vulnerabilities, and Kubernetes policy enforcement shifts left.
The CNCF ecosystem is being re-architected for AI workloads — from Fluid’s 30-second LLM cold starts to OpenTelemetry’s GenAI observability standards, Cloudflare’s agent sandboxes, and k6 2.0’s AI-assisted testing.
Kubernetes is evolving into the operating system for the AI era, with new GKE Agent Sandbox, Dynamic Resource Allocation, and AI-powered GitOps operations leading the charge across the ecosystem.
The AI revolution is shifting from training to inference. Explore how vLLM, TensorRT-LLM, and MLOps practices are reshaping computing infrastructure for the inference era.
Kubernetes positions itself as the definitive operating system for AI data centers with 15.6 million cloud native developers and AI conformance standards expanding rapidly.
Ten years after CNCF's founding, the ecosystem has grown to over 200 projects. From OpenTelemetry's declarative configuration milestone to Cilium's dominance in Kubernetes networking, here's what's shaping cloud native in 2026.
Kubernetes 1.36 drops April 22 with 80 enhancements including stable user namespaces, OCI VolumeSource, and the retirement of Ingress NGINX. Plus: CNCF warns that Kubernetes alone isn't enough to secure LLM workloads.
A practical guide to migrating from the deprecated ingress-nginx controller to Kubernetes Gateway API before the March 2026 retirement deadline.
When adding GPUs doesn't reduce latency, the problem isn't capacity—it's routing. Discover how llm-d's cache-aware scheduling delivers 57x faster TTFT and 2x throughput on the same hardware.
Kubernetes 1.36 brings 22 security enhancements, ProtoMessage method removal, and production hardening aligned with NSA/CISA guidelines. Explore the security improvements, observability enhancements, and Nutanix NKP Metal's bare-metal Kubernetes capabilities.
The CNCF's new Kubernetes AI conformance program aims to solve portability and predictability challenges for AI workloads running on the 80% of enterprises already using Kubernetes.
At KubeCon EU 2026 in Amsterdam, Broadcom announced that Velero—the Kubernetes-native backup, restore, and migration tool—has been accepted into the CNCF Sandbox. The move traces a…
The vLLM Korea Meetup 2026, held in Seoul on April 2nd, delivered more than just technical presentations—it offered a window into how AI inference infrastructure is…
vLLM v0.19.0 brings full Google Gemma 4 architecture support, speculative decoding with zero-bubble async scheduling, and significant Model Runner V2 maturation for improved throughput and efficiency.
Learn how to migrate from Ingress-NGINX to Gateway API using the stable 1.0 release of Ingress2Gateway, featuring support for over 30 annotations and comprehensive integration testing.
Flux 2.8.0 introduces Helm v4 support, server-side apply for HelmReleases, kstatus-based health checking, faster recovery from failed deployments, and GitHub App integration for source authentication.
At FluxCon NA 2025, Morgan Stanley shared their five-year journey from push-based CI/CD to GitOps with Flux, now managing 500+ clusters, 2,000+ nodes, and 100,000+ containers with a self-service platform.
Kubernetes v1.36, scheduled for late April 2026, introduces Dynamic Resource Allocation (DRA) for partitionable devices, faster SELinux volume mounting, external token signing, and deprecates service.spec.externalIPs.