Mac LLM Benchmark Analysis

Community benchmark results from 158 runs across 36 users, 85 models, and 8 Apple Silicon chips

158
Benchmark Runs
36
Contributors
85
Models Tested
8
Apple Silicon Chips
466
Peak tok/s
13
Mac Configurations

Key Findings

The most interesting takeaways from the community benchmarks
Memory bandwidth is king. The M1 Ultra's 800 GB/s bandwidth lets it outrun the M4 on large models despite being 3 generations older. The M2 Max at 400 GB/s frequently beats the M4 Pro at 273 GB/s on the same models.
The M4 Mac mini is the efficiency champ. At ~8W average system power it delivers 5.35 tok/W — comparable to the M4 Max at 23W. The A18 Pro (MacBook Neo) leads per-watt at 6.75 tok/W but is limited to small models by its 8GB RAM.
MoE models are the sweet spot for Mac. GPT-OSS-120B (a Mixture-of-Experts model) runs at 74 tok/s on M4 Max 128GB — nearly as fast as dense 20B models. Qwen3.5-122B-A10B MoE hits 54 tok/s. These massive models run smoothly because only ~10B parameters activate per token.
Backend matters — sometimes a lot. The same Llama-3.2-3B model runs at 44.5 tok/s on MLX vs 41 tok/s on Ollama on the same M4/24GB hardware. For small models, MLX-native format consistently edges out GGUF through Ollama by 5-10%.
The MacBook Neo (A18 Pro) can run LLMs. With only 8GB and 100 GB/s bandwidth, it manages 50 tok/s on 1B models and 23 tok/s on 3B models. Viable for lightweight assistants and edge inference, but forget anything >7B.

Throughput by Apple Silicon Chip

Average, median, and max tokens/sec across all tested models per chip

Memory Bandwidth vs Throughput

Each dot is a benchmark run — bandwidth is the primary driver of LLM inference speed

Power Efficiency

Tokens per Watt of system power by chip

Time to First Token

Median TTFT by chip (lower is better)

llama3.2:3b — The Community Benchmark Standard

23 runs of the same model across different hardware. The M4 24GB results are remarkably consistent (37-43 tok/s) showing reliable benchmarking.

Backend Showdown

Average throughput by inference backend (backends with fewer than 2 runs excluded)

Leaderboard: Top 15 Fastest Runs

Highest tokens/sec across all submissions
#ModelChipRAMtok/sTTFTW/tokBackendUser

Big Model Club: 100B+ Parameters

Running frontier-class models locally on a Mac
ModelChipRAMtok/sQuantBackend

MacBook Neo (A18 Pro) — LLMs in Your Pocket

What can you run on 8GB with 100 GB/s bandwidth?
Modeltok/sTTFTW/tokBackend

Model Format Distribution

MLX-native vs GGUF vs other

Memory Tier Performance

Average tok/s by RAM capacity

Community Contributors

Top contributors by number of benchmark submissions