Key Findings
The most interesting takeaways from the community benchmarks
Memory bandwidth is king. The M1 Ultra's 800 GB/s bandwidth lets it outrun the M4 on large models despite being 3 generations older. The M2 Max at 400 GB/s frequently beats the M4 Pro at 273 GB/s on the same models.
The M4 Mac mini is the efficiency champ. At ~8W average system power it delivers 5.35 tok/W — comparable to the M4 Max at 23W. The A18 Pro (MacBook Neo) leads per-watt at 6.75 tok/W but is limited to small models by its 8GB RAM.
MoE models are the sweet spot for Mac. GPT-OSS-120B (a Mixture-of-Experts model) runs at 74 tok/s on M4 Max 128GB — nearly as fast as dense 20B models. Qwen3.5-122B-A10B MoE hits 54 tok/s. These massive models run smoothly because only ~10B parameters activate per token.
Backend matters — sometimes a lot. The same Llama-3.2-3B model runs at 44.5 tok/s on MLX vs 41 tok/s on Ollama on the same M4/24GB hardware. For small models, MLX-native format consistently edges out GGUF through Ollama by 5-10%.
The MacBook Neo (A18 Pro) can run LLMs. With only 8GB and 100 GB/s bandwidth, it manages 50 tok/s on 1B models and 23 tok/s on 3B models. Viable for lightweight assistants and edge inference, but forget anything >7B.