I tested local models on 100+ real RAG tasks. Here are the best 1B model picks
Published on October 13, 2025

I ran a head-to-head comparison of lightweight, on-device models over 100+ real retrieval-augmented generation (RAG) tasks to see what actually performs best in practical workflows.
This post summarizes key takeaways and links to the full write-up with benchmarks, observations, and recommendations.
TL;DR — Best model by task
(Tested on 16GB Macbook Air M2)
- A — Find facts + cite sources → Qwen3–1.7B-MLX-8bit
- B — Compare evidence across files → LMF2–1.2B-MLX
- C — Build timelines → LMF2–1.2B-MLX
- D — Summarize documents → Qwen3–1.7B-MLX-8bit & LMF2–1.2B-MLX
- E — Organize themed collections → models > 1B needed
Read the full article
Read the full write-up with benchmarks and methodology on Medium: Read the full article →
Thanks for reading! If you have thoughts or want to compare notes, feel free to reach out.