← Home

Agent Synth · Catalog

Curated models

Hand-picked local models, sized for Apple Silicon. We update this list as the landscape shifts. The menu-bar agent uses this catalog to recommend swaps that fit your machine.

Use case:
Runtime:
Showing 15 of 15 models

Qwen 2.5 Coder 1.5B

qwen2.5-coder:1.5b

qwen2.5-coder
coding

Tiny coding model for fast autocomplete on 8 GB Macs.

Runtimes
ollama · mlx
Memory
≥ 2.1 GB · 8.6 GB recommended

Best as a draft/autocomplete model — not for long-form code generation.

Community-curated · added 2026-05-28

Qwen 2.5 Coder 7B

qwen2.5-coder:7b

qwen2.5-coder
coding

Solid coding default for 16 GB+ Macs; strong instruction following.

Runtimes
ollama · mlx
Memory
≥ 8.6 GB · 17.2 GB recommended

Community-curated · added 2026-05-28

Qwen 2.5 Coder 14B

qwen2.5-coder:14b

qwen2.5-coder
coding

Stronger coding — fits 24 GB+ Apple Silicon at default quant.

Runtimes
ollama · mlx
Memory
≥ 17.2 GB · 25.8 GB recommended

Supersedes

  • qwen2.5-coder:7b Materially better at multi-file refactors and longer code spans, given the headroom.

Community-curated · added 2026-05-28

Qwen 2.5 Coder 32B

qwen2.5-coder:32b

qwen2.5-coder
coding

Frontier-tier local coding model — fits M-series Max/Ultra.

Runtimes
ollama · mlx
Memory
≥ 25.8 GB · 38.7 GB recommended

Supersedes

  • qwen2.5-coder:14b Better at architectural reasoning and unfamiliar codebases; needs the memory.

Community-curated · added 2026-05-28

Llama 3.2 3B

llama3.2:3b

llama3.2
chat

Small, fast chat — viable on 8 GB Macs and as a low-latency draft model.

Runtimes
ollama · mlx · lmstudio
Memory
≥ 4.3 GB · 8.6 GB recommended

Community-curated · added 2026-05-28

Qwen 2.5 7B

qwen2.5:7b

qwen2.5
chatrag

Strong general chat at 7B. Use Coder variant for code-heavy work.

Runtimes
ollama · mlx
Memory
≥ 8.6 GB · 17.2 GB recommended

Community-curated · added 2026-05-28

Llama 3.1 8B

llama3.1:8b

llama3.1
chatrag

Older default — still solid for chat/RAG with the broadest runtime support.

Runtimes
ollama · mlx · lmstudio · llamacpp
Memory
≥ 8.6 GB · 17.2 GB recommended

Community-curated · added 2026-05-28

Qwen 2.5 14B

qwen2.5:14b

qwen2.5
chatrag

Capable general chat — preferred over 7B once memory allows.

Runtimes
ollama · mlx
Memory
≥ 17.2 GB · 25.8 GB recommended

Supersedes

  • qwen2.5:7b Better at longer context and nuanced instruction following.

Community-curated · added 2026-05-28

DeepSeek R1 Distill (Qwen 7B)

deepseek-r1-distill-qwen:7b

deepseek-r1
reasoningchat

Compact reasoning model — visible chain-of-thought useful for code planning + math.

Runtimes
ollama · mlx
Memory
≥ 8.6 GB · 17.2 GB recommended

Verbose by default — outputs <think> blocks. Strip or surface them depending on the surface.

Community-curated · added 2026-05-28

DeepSeek R1 Distill (Qwen 14B)

deepseek-r1-distill-qwen:14b

deepseek-r1
reasoningchat

Mid-tier reasoning — stronger on multi-step problems than the 7B at the memory cost.

Runtimes
ollama · mlx
Memory
≥ 17.2 GB · 25.8 GB recommended

Supersedes

  • deepseek-r1-distill-qwen:7b Materially better at multi-step reasoning; takes the headroom seriously.

Community-curated · added 2026-05-28

Nomic Embed Text v1.5

nomic-embed-text

nomic
embedding

Default embedding model for local RAG. Tiny footprint, broad runtime support.

Runtimes
ollama · mlx
Memory
≥ 1.1 GB · 2.1 GB recommended

Community-curated · added 2026-05-28

MixedBread Embed Large

mxbai-embed-large

mxbai
embedding

Alternative embedding — competitive quality, slightly larger vectors than Nomic.

Runtimes
ollama
Memory
≥ 1.1 GB · 2.1 GB recommended

Community-curated · added 2026-05-28

Llama 3.2 Vision 11B

llama3.2-vision:11b

llama3.2
visionchat

Image-aware chat — for screenshot Q&A, OCR-like extraction, and visual reasoning.

Runtimes
ollama
Memory
≥ 12.9 GB · 17.2 GB recommended

Ollama-only at default quant today. MLX support exists for separate weights.

Community-curated · added 2026-05-28

Whisper Large v3

whisper-large-v3

whisper
stt

Gold-standard local transcription — strong multilingual + low WER.

Runtimes
llamacpp · mlx
Memory
≥ 4.3 GB · 8.6 GB recommended

Typically loaded via whisper.cpp or mlx-whisper, not via Ollama. Used by Hermes for STT.

Community-curated · added 2026-05-28

Distil-Whisper Large v3

distil-whisper-large-v3

whisper
stt

~6× faster than Whisper Large v3 with a small accuracy trade-off — good for live transcription.

Runtimes
llamacpp · mlx
Memory
≥ 2.1 GB · 4.3 GB recommended

Supersedes

  • whisper-large-v3 Materially faster TTS-to-text loop; preferred for real-time use cases.

Optimized for English. Pair with Whisper Large for multilingual.

Community-curated · added 2026-05-28