Cloud models run on Ollama’s infrastructure — ideal if your system doesn’t have enough VRAM for local models. Pull the manifest first (tiny download, the model runs remotely):
export OLLAMA_MODEL_MID="qwen3-coder" # Default modelexport OLLAMA_SMALL_FAST_MODEL="qwen3-coder" # Background model (or leave empty to use same)
By default, Ollama uses the same model for both main and background operations to avoid VRAM swapping. Only set OLLAMA_SMALL_FAST_MODEL if you have 24GB+ VRAM.
When you specify a model that isn’t installed locally, Andi AIRun offers a choice between local and cloud:
Copy
ai --ollama --model qwen3-coder# Model 'qwen3-coder' not found locally.## Your system has ~32GB usable VRAM.## Options:# 1) Pull local version (recommended) - qwen3-coder# 2) Use cloud version - qwen3-coder:cloud## Choice [1]: 1# Pulling model: qwen3-coder# [##################################################] 100%# Model pulled successfully
For systems with limited VRAM (< 20GB), cloud is recommended first.
# Use default modelai --ollama# Use cloud modelai --ollama --model glm-5:cloud# Use specific local modelai --ollama --model qwen3-coder# Use with tier flagsai --ollama --opus # Uses OLLAMA_MODEL_HIGH if set
LM Studio runs local models with Anthropic API compatibility. Especially powerful on Apple Silicon with MLX models. Requires sufficient RAM/VRAM for the model you choose.
LM Studio supports MLX models which are significantly faster than GGUF on M1/M2/M3/M4 chips. When downloading models, look for MLX versions for best performance.
MLX models can be 2-3x faster than GGUF on Apple Silicon due to optimized metal acceleration.
When you specify a model that isn’t available, Andi AIRun will offer to download it:
Copy
ai --lm --model lmstudio-community/qwen3-8b-gguf# Model 'lmstudio-community/qwen3-8b-gguf' not found in LM Studio.# Download it? [Y/n]: y# Downloading model: lmstudio-community/qwen3-8b-gguf# Progress: 100.0%# Model downloaded successfully# Load it now? [Y/n]: y# Model loaded
# Use default modelai --lmstudio# Use short aliasai --lm# Use specific modelai --lm --model openai/gpt-oss-20b# Use with tier flagsai --lm --opus # Uses LMSTUDIO_MODEL_HIGH if set