Month: February 2026

vLLM 0.16.0 Raises the Bar for Open-Source Inference Serving

vLLM 0.16.0 lands with async scheduling and pipeline parallelism, a new WebSocket-based Realtime API, speculative decoding improvements, and major platform work—including an overhaul for XPU support. Here’s why those details matter to teams building reliable, cost-efficient inference stacks.

OpenStack 2026.1 ‘Gazpacho’ Is in Development: How to Plan an Upgrade Path Without Surprises

OpenStack’s 2026.1 release series (‘Gazpacho’) is tracking toward an April 2026 initial release, with SLURP upgrade guarantees shaping how operators should plan rollouts. Here’s what the release series table really tells you, how to map it to your internal maintenance windows, and where the OpenInfra community’s ‘digital sovereignty’ messaging intersects with real operations.

Multi-LoRA at Scale: How vLLM + AWS Aim to Stop Paying for Idle GPUs

AWS and the vLLM community describe multi-LoRA serving for Mixture-of-Experts models, with kernel and execution optimizations that let many fine-tuned variants share a single GPU. The pitch: higher utilization, better latency, and a clearer path to serving ‘dozens of models’ without dozens of endpoints.