Complete install guide

Get autotune running in 5 minutes

No experience with AI or command lines needed. Follow each step in order.

These instructions work on Mac, Windows, and Linux. Windows users: use the Terminal app (search for “Terminal” in Start) or PowerShell.

1

Open a terminal

A terminal lets you type commands directly to your computer.

Mac
Press ⌘ + Space, type Terminal, press Enter
Windows
Press Win, type Terminal or PowerShell, press Enter
Linux
Press Ctrl + Alt + T
2

Install Ollama (the AI engine)

Ollama runs AI models on your computer. autotune makes it faster.

Mac (easiest)
Terminal
# Download the Mac installer from https://ollama.com/download
# Double-click the downloaded file and follow the instructions.
# Then verify it's working:
ollama --version
Linux (one command)
Terminal
curl -fsSL https://ollama.com/install.sh | sh
ollama --version
Windows
Terminal
# Download OllamaSetup.exe from https://ollama.com/download
# Run the installer. Then in PowerShell:
ollama --version

💡If ollama --version prints a version number like 0.7.2, Ollama is installed correctly.

3

Install Python (if you don't have it)

autotune is a Python tool. You need Python 3.10 or newer.

Terminal
# Check if Python is already installed:
python3 --version
# Should print: Python 3.10.x or higher

# If not installed, download from https://python.org/downloads
# Mac users: you can also use Homebrew: brew install python@3.13
4

Install autotune

Terminal
pip install llm-autotune

# If you get a "command not found" error, try:
pip3 install llm-autotune

# Apple Silicon Mac (M1/M2/M3/M4)? Get faster inference too:
pip install "llm-autotune[mlx]"

💡After install, the autotune command will be available in your terminal. Ollama is started automatically — no separate ollama serve needed.

5

Find the best model for your hardware

autotune scans your CPU, RAM, and GPU and tells you exactly which model to run. You don't need to guess — it calculates the optimal model and quantization for your exact setup.

Terminal
autotune recommend

This prints the recommended model with an exact download command. Copy the autotune pull command it shows and use it in the next step.

6

Download your model

Use the model name from autotune recommend, or pick from the table below. autotune starts Ollama automatically — no separate ollama serve needed.

Your RAMRun this
8 GB
autotune pull qwen3:4b
16 GB
autotune pull qwen3:8brecommended
24 GB
autotune pull qwen3:14b
32 GB+
autotune pull qwen3:30b-a3b
Terminal
# For most people (16 GB RAM):
autotune pull qwen3:8b

# Watch the download progress in your terminal.
# This takes a few minutes depending on your internet speed.

ℹ️Models are downloaded once and stored on your computer. Nothing is sent to the cloud when you chat.

7

Start chatting!

That's it. autotune handles all the optimization automatically.

Terminal
# Replace qwen3:8b with whatever model you downloaded:
autotune chat --model qwen3:8b

# Type your question and press Enter.
# Type /quit to exit, or press Ctrl+C.

💡You should see the first word appear about 39% faster than running Ollama alone. The second message will be even faster — autotune caches your conversation context.

8

Prove it on your hardware (optional)

Run a 30-second benchmark using Ollama's own internal timers to see exactly how much faster autotune is on your machine.

Terminal
autotune proof -m qwen3:8b
# Saves a proof_qwen3_8b.json file you can inspect or share.

Keeping autotune up to date

autotune is updated frequently. Run this any time to check for a new version and upgrade:

Terminal
autotune upgrade

Or upgrade directly with pip: pip install --upgrade llm-autotune

Something went wrong?

"command not found: autotune"
Run pip install llm-autotune again. If that doesn't work, try pip3 install llm-autotune
"Ollama is not running"
autotune will try to start Ollama automatically. If that fails, open the Ollama desktop app or run: ollama serve
"No models found"
You need to download a model first. Run: autotune pull qwen3:8b
First message is very slow (5–10 seconds)
Normal on first use. The model is being loaded from disk. Every message after this will be much faster.
"Not enough RAM" or the computer gets slow
Try a smaller model. For 8 GB RAM, use qwen3:4b instead of qwen3:8b

Next steps

autotune lsSee all your models and how well they fit your hardware
autotune psCheck which models are currently loaded in memory
autotune serveStart an API server (works with any app that uses OpenAI)
→ Full command reference