Anubis - LLM Benchmark Scores
Anubis

ANUBIS

Local LLM Testing & Benchmarking

Measure the performance of your models with precision. Real-time metrics, side-by-side comparisons, and unified model management for Apple Silicon. Export results, view all past runs with graphs and full details, face off two models on different backends with Arena Mode and stamp a winner. Supports Ollama, LM Studio, MLX, and any OpenAI API compatible endpoints. Now with canned performance requests and direct Ollama model pull from inside the app!

Ollama mlx-lm OpenAI-Compatible Apple Silicon LM Studio OpenWeb UI
DOWNLOAD VIEW GUIDE
Anubis Benchmark Dashboard
Benchmark Dashboard - all details stored for recall in run history
Anubis Benchmark Dashboard
Arena Mode! have two models on the same or different backends go head to head with the same prompt and store a winner with metadata
Anubis Benchmark Dashboard
All the metrics that ollama exposes, stamped in your benchmark history to recall and export. Live graph metrics as the responses stream in.


Simple but effective Local LLM Benchmarking tools.

FEATURES

Three powerful modules to test, compare, and manage your local LLMs

โšก

BENCHMARK

Real-time performance dashboard with live metrics during inference.

  • โ€ข Tokens per second
  • โ€ข GPU & CPU utilization
  • โ€ข Time to first token
  • โ€ข Memory tracking
  • โ€ข Session history
โš”๏ธ

ARENA

Side-by-side A/B testing to compare models head-to-head.

  • โ€ข Same prompt, two models
  • โ€ข Sequential or parallel
  • โ€ข Vote for winners
  • โ€ข Cross-backend comparison
  • โ€ข Comparison history
๐Ÿ“ฆ

VAULT

Unified view of all models across all configured backends.

  • โ€ข All models, one view
  • โ€ข Filter by backend
  • โ€ข Model metadata
  • โ€ข Size & quantization
  • โ€ข Running model status

ARENA MODE

Run the same prompt against two different models and compare results side-by-side. Vote for winners and track comparison history.

  • โ–ธ Sequential mode for memory efficiency
  • โ–ธ Parallel mode for speed
  • โ–ธ Compare across backends
Anubis Arena - pit two models against each and document a winner
Arena Comparison

Benchmark History

Benchmark History
View and export all the previous benchmarks, with their charts!

View all models across backends, see what's loaded, and manage disk usage.

Anubis Metrics - all the good ones, with graphs!
Metrics Dashboard

REAL-TIME METRICS

Every metric card includes a help tooltip explaining exactly where the data comes from and how it's calculated.

Tokens/sec

completion_tokens รท generation_time

GPU %

IOReport utilization percentage

Model Memory

Ollama /api/ps size_vram field

GUIDE

Everything you need to get started with Anubis

SUPPORTED BACKENDS

Backend Port Setup
Ollama 11434 Install from ollama.ai
mlx-lm 8080 mlx_lm.server --model <model>
LM Studio 1234 Enable server in settings
vLLM 8000 Configure in Settings
openWebUI/Docker 3000 Launch OpenWebUI through Docker and pull a model

QUICK START

1. Install Ollama

Download from ollama.ai and run ollama serve

2. Pull a Model

ollama pull llama3.2:3b

3. Launch Anubis

Select your model and click Run to benchmark

MODEL VAULT

Anubis Vault
Model Vault - pull ollama models directly from anubis

View all models across backends, see what's loaded, and manage disk usage.

TIPS

  • ๐“‚€ Use Sequential mode in Arena to conserve memory when comparing large models
  • ๐“‚€ Click the (?) on any metric card to see how it's calculated
  • ๐“‚€ Use Preset Prompts for consistent benchmarking across models
  • ๐“‚€ Export benchmark history as CSV for external analysis

GET NOTIFIED AT LAUNCH

Be the first to know when Anubis is available.

REQUIREMENTS

๐ŸŽ

macOS 15+

๐Ÿฆ™

Ollama, LMStudio, mlx-lm, OpenWebUI etc

๐Ÿ’ป

Apple Silicon

MORE APPS