What GPU do I need for AI?
Tell us what you want to run and your budget — we'll recommend the best GPUs with live prices. Or browse the full comparison table.
| Model Size | Min VRAM | Sweet Spot | Example Models |
|---|---|---|---|
| 7-8B | 8GB | 12GB | Mistral 7B, Llama 3.1 8B |
| 14B | 12GB | 16GB | Qwen 2.5 14B |
| 14-30B / MoE | 16GB | 24GB | Mixtral 8x7B, Qwen 32B Q4 |
| 70B+ | 48GB | 48-80GB | Llama 3.1 70B Q4, Qwen 72B |
70B Q4 needs ~40GB. A single 24GB card cannot run it — you need 48GB+ (e.g., A6000, dual P40s).
For local AI inference, VRAM is the bottleneck. It determines the largest model you can load — and larger models produce smarter, more coherent output. Price per GB of VRAM tells you how much capability you get per USD spent.
This is why a used Tesla P40 (24GB) often beats a new RTX 4060 Ti (8GB) for AI work, despite being older hardware. The P40 can run 30B models that simply won't fit on 8GB. See the full comparison table ranked by this metric.
LLMs are stored as floating-point numbers. Quantization reduces precision (e.g., 16-bit to 4-bit) so larger models fit in less VRAM, with a small quality trade-off.