%te%% - Page 6 of 6 - The Stack Observer

Month: March 2026

vLLM 0.16.0 ships async scheduling + pipeline parallelism: what it means for serving LLMs at scale

March 1, 2026•Stackxx•AI

vLLM 0.16.0 lands with async scheduling and full pipeline parallelism support, plus speculative decoding improvements. Here’s how to think about throughput, tail latency, and operational rollout.

Ollama 0.17.4/0.17.5: new models, better tool-call parsing, and why local inference UX is converging

March 1, 2026•Stackxx•AI

Ollama’s latest releases add new model options (including Qwen-family variants) and tighten tool-call handling. The bigger story: local inference is standardizing around ‘agent-ready’ APIs.