Local LLM benchmarking for Apple Silicon with real-time hardware telemetry
The local LLM ecosystem on macOS is fragmented. Chat wrappers focus on conversation, performance monitors are CLI-only, and no tool correlates hardware metrics with inference speed in real time.
Real-time dashboard with 8 metric cards, 7 live charts, power telemetry, and configurable prompt presets. Stream responses with live hardware overlay.
Side-by-side A/B model comparison. Sequential or parallel execution. Vote for a winner — results are persisted with full stats.
Unified model management across all backends. Pull, delete, inspect, and unload models. Automatic metadata enrichment from HuggingFace and LM Studio caches.
GPU, CPU, ANE, and DRAM power in watts via IOReport. See watts-per-token efficiency — compare quantizations by their actual power cost.
Auto-detects the backend process by port. Tracks real memory footprint including Metal/GPU allocations. Manual override available.
Session history with full replay. Export as CSV, Markdown, or shareable 2x retina PNG images with one click. Respects light/dark mode.
| Metric | Source | Description |
|---|---|---|
| GPU Utilization | IOReport | GPU active residency percentage |
| CPU Utilization | host_processor_info | Usage across all cores |
| GPU Power | IOReport Energy Model | GPU power consumption in watts |
| CPU Power | IOReport Energy Model | CPU (E-cores + P-cores) power in watts |
| ANE Power | IOReport Energy Model | Neural Engine power consumption |
| DRAM Power | IOReport Energy Model | Memory subsystem power |
| GPU Frequency | IOReport GPU Stats | Weighted average from P-state residency |
| Process Memory | proc_pid_rusage | Backend phys_footprint (includes Metal/GPU) |
| Thermal State | ProcessInfo | System thermal pressure level |
Free and open source. Download the notarized app or build from source.