Complete install guide
No experience with AI or command lines needed. Follow each step in order.
These instructions work on Mac, Windows, and Linux. Windows users: use the Terminal app (search for “Terminal” in Start) or PowerShell.
A terminal lets you type commands directly to your computer.
Ollama runs AI models on your computer. autotune makes it faster.
# Download the Mac installer from https://ollama.com/download
# Double-click the downloaded file and follow the instructions.
# Then verify it's working:
ollama --versioncurl -fsSL https://ollama.com/install.sh | sh
ollama --version# Download OllamaSetup.exe from https://ollama.com/download
# Run the installer. Then in PowerShell:
ollama --version💡If ollama --version prints a version number like 0.7.2, Ollama is installed correctly.
autotune is a Python tool. You need Python 3.10 or newer.
# Check if Python is already installed:
python3 --version
# Should print: Python 3.10.x or higher
# If not installed, download from https://python.org/downloads
# Mac users: you can also use Homebrew: brew install python@3.13pip install llm-autotune
# If you get a "command not found" error, try:
pip3 install llm-autotune
# Apple Silicon Mac (M1/M2/M3/M4)? Get faster inference too:
pip install "llm-autotune[mlx]"💡After install, the autotune command will be available in your terminal. Ollama is started automatically — no separate ollama serve needed.
autotune scans your CPU, RAM, and GPU and tells you exactly which model to run. You don't need to guess — it calculates the optimal model and quantization for your exact setup.
autotune recommendThis prints the recommended model with an exact download command. Copy the autotune pull command it shows and use it in the next step.
Use the model name from autotune recommend, or pick from the table below. autotune starts Ollama automatically — no separate ollama serve needed.
| Your RAM | Run this |
|---|---|
| 8 GB | autotune pull qwen3:4b |
| 16 GB | autotune pull qwen3:8brecommended |
| 24 GB | autotune pull qwen3:14b |
| 32 GB+ | autotune pull qwen3:30b-a3b |
# For most people (16 GB RAM):
autotune pull qwen3:8b
# Watch the download progress in your terminal.
# This takes a few minutes depending on your internet speed.ℹ️Models are downloaded once and stored on your computer. Nothing is sent to the cloud when you chat.
That's it. autotune handles all the optimization automatically.
# Replace qwen3:8b with whatever model you downloaded:
autotune chat --model qwen3:8b
# Type your question and press Enter.
# Type /quit to exit, or press Ctrl+C.💡You should see the first word appear about 39% faster than running Ollama alone. The second message will be even faster — autotune caches your conversation context.
Run a 30-second benchmark using Ollama's own internal timers to see exactly how much faster autotune is on your machine.
autotune proof -m qwen3:8b
# Saves a proof_qwen3_8b.json file you can inspect or share.autotune is updated frequently. Run this any time to check for a new version and upgrade:
autotune upgradeOr upgrade directly with pip: pip install --upgrade llm-autotune
autotune lsSee all your models and how well they fit your hardwareautotune psCheck which models are currently loaded in memoryautotune serveStart an API server (works with any app that uses OpenAI)