ANUBIS

Local LLM Testing & Benchmarking

Measure the performance of your models with precision. Real-time metrics, side-by-side comparisons, and unified model management for Apple Silicon. Export results, view all past runs with graphs and full details, face off two models on different backends with Arena Mode and stamp a winner. Supports Ollama, LM Studio, MLX, and any OpenAI API compatible endpoints. Now with canned performance requests and direct Ollama model pull from inside the app!

𓂀 SAVE WITH THE BUNDLE 𓂀

THE ARCHITECT'S TOOLKIT

Get Anubis, devPad, and Nabu Pro together at a discount

FEATURES

Three powerful modules to test, compare, and manage your local LLMs

⚡

BENCHMARK

Real-time performance dashboard with live metrics during inference.

• Tokens per second
• GPU & CPU utilization
• Time to first token
• Memory tracking
• Session history

⚔️

ARENA

Side-by-side A/B testing to compare models head-to-head.

• Same prompt, two models
• Sequential or parallel
• Vote for winners
• Cross-backend comparison
• Comparison history

📦

VAULT

Unified view of all models across all configured backends.

• All models, one view
• Filter by backend
• Model metadata
• Size & quantization
• Running model status

ARENA MODE

Run the same prompt against two different models and compare results side-by-side. Vote for winners and track comparison history.

▸ Sequential mode for memory efficiency
▸ Parallel mode for speed
▸ Compare across backends

Anubis Arena - pit two models against each and document a winner

Arena Comparison

Anubis Metrics - all the good ones, with graphs!

Metrics Dashboard

REAL-TIME METRICS

Every metric card includes a help tooltip explaining exactly where the data comes from and how it's calculated.

Tokens/sec

completion_tokens ÷ generation_time

GPU %

IOReport utilization percentage

Model Memory

Ollama /api/ps size_vram field

GUIDE

Everything you need to get started with Anubis

SUPPORTED BACKENDS

Backend	Port	Setup
Ollama	11434	Install from ollama.ai
mlx-lm	8080	`mlx_lm.server --model <model>`
LM Studio	1234	Enable server in settings
vLLM	8000	Configure in Settings
openWebUI/Docker	3000	Launch OpenWebUI through Docker and pull a model

QUICK START

1. Install Ollama

Download from ollama.ai and run ollama serve

2. Pull a Model

ollama pull llama3.2:3b

3. Launch Anubis

Select your model and click Run to benchmark

MODEL VAULT

Model Vault - pull ollama models directly from anubis

View all models across backends, see what's loaded, and manage disk usage.

TIPS

𓂀 Use Sequential mode in Arena to conserve memory when comparing large models
𓂀 Click the (?) on any metric card to see how it's calculated
𓂀 Use Preset Prompts for consistent benchmarking across models
𓂀 Export benchmark history as CSV for external analysis

ANUBIS

THE ARCHITECT'S TOOLKIT