Ollama, a runtime system for operating large language models on a local computer, has introduced support for Apples open source MLX framework for machine learning. Additionally, Ollama says it has improved caching performance and now supports Nvidias NVFP4 format for model compression, making for much more efficient memory usage in certain models. Combined, these developments promise significantly improved performance on Macs with Apple Silicon chips M1 or later—and the timing couldnt be...