Ollama v0.19.0 Brings MLX to Apple Silicon: Local AI Gets Apple’s Machine Learning Muscle

Key Takeaways

  • Ollama v0.19.0 now runs on Apple’s MLX framework on Apple Silicon Macs
  • MLX’s unified memory architecture eliminates copying between CPU and GPU
  • This represents a major architectural shift from previous Ollama implementations on macOS
  • Version 0.19.0 also includes KV cache improvements and bug fixes for Anthropic API compatibility

Apple Silicon Mac users running local large language models just got a significant upgrade. Ollama v0.19.0, released on March 31, 2026, represents a fundamental architectural change for macOS users: the project has rebuilt its core on top of Apple’s machine learning framework, MLX.

Why MLX Changes Everything

MLX is Apple’s machine learning framework designed specifically for Apple Silicon’s unified memory architecture. Unlike traditional GPU computing where data must be copied between CPU and GPU memory, MLX allows models to access the same memory without data duplication. This eliminates a major bottleneck in local LLM inference.

For Ollama users, this means:

  • Higher context lengths with the same physical memory
  • Faster model loading and prompt processing
  • Better memory utilization across the system
  • Reduced latency during inference

Apple has been iterating rapidly on MLX since its release, making this the right time for Ollama to make the switch. The framework is now mature enough to power a production local inference engine.

More Than Just MLX

Ollama 0.19.0 brings several quality-of-life improvements alongside the MLX migration:

KV Cache Snapshot Improvements: The MLX runner now creates periodic snapshots during prompt processing, allowing for more efficient memory management during long conversations. This addresses a previous pain point where very long contexts could exhaust available memory.

Anthropic API Compatibility: The Anthropic-compatible API now sees improved KV cache hit rates, making Ollama a better drop-in replacement for Claude when running locally—which is exactly the point: local first, privacy always.

Tool Call Parsing Fixed: The release resolves an issue where Qwen3.5 models would output tool calls inside their thinking blocks instead of properly formatted tool call outputs.

Web Search Integration: Running ollama launch pi now includes a web search plugin powered by Ollama’s own search implementation, letting your local models query the web when needed.

What This Means for Developers

If you’re building AI applications on Apple Silicon, Ollama’s MLX adoption is significant. It means:

  • The primary local LLM tool is now optimized for the unified memory architecture
  • Competitive with proprietary solutions like Apple’s own Core ML
  • A path toward supporting Apple’s latest silicon (M3, M4) with optimized kernels

On x86 Macs and other platforms, Ollama continues to use its existing backends. The MLX transition applies specifically to Apple Silicon systems.

Try It Yourself

Updating is straightforward. On macOS, simply run:

brew upgrade ollama

Or download the latest version from ollama.com. The update happens automatically when you launch the app.

The MLX backend is enabled by default on Apple Silicon—no configuration required. Your existing models will pull in and run using the new architecture automatically.


Sources

Ollama is now powered by MLX on Apple Silicon — Ollama Blog
v0.19.0 Release Notes — Ollama GitHub
MLX Framework — Apple Machine Learning Research

— The Stack Observer