Tesla P40: The Best Budget GPU for Local AI
~$6/GB
24GB VRAM for ~$150
The best VRAM-per-dollar in the used GPU market
If you want to run large language models locally and you're on a budget, the NVIDIA Tesla P40 24GB is the GPU the AI community keeps coming back to. At around $150 used on eBay, it gives you 24GB of VRAM - enough to run models that would require a $700+ RTX 3090 or $1,600 RTX 4090 for their VRAM capacity. It's not the fastest GPU, but for pure VRAM-per-dollar, nothing beats it.
$/GB VRAM Comparison
VRAM is the single most important spec for running LLMs. Here's how the P40 stacks up against the competition:
| GPU | VRAM | Typical Used Price | $/GB | Notes |
|---|---|---|---|---|
| Tesla K80 | 12GB (per GPU) | $40 | $3.33/GB | Don't buy - only 12GB usable per GPU |
| Tesla M40 | 24GB | $80 | $3.33/GB | Cheapest real 24GB, but slow |
| Tesla P40 | 24GB | $150 | $6.25/GB | Best overall value |
| Tesla P100 | 16GB | $170 | $10.63/GB | Faster but less VRAM |
| RTX 3090 | 24GB | $700 | $29.17/GB | Much faster, but 4.7x the price |
| RTX 4090 | 24GB | $1,600 | $66.67/GB | Fastest consumer GPU |
| RTX 3060 12GB | 12GB | $200 | $16.67/GB | Has display out, but only 12GB |
While the M40 technically has a better $/GB ratio, the P40's 20% faster bandwidth and better software ecosystem make it the best overall value when considering the complete picture. You're paying $70 more for a significantly better experience.
Tesla P40 24GB - Full Specs
| GPU Architecture | Pascal (GP102) |
|---|---|
| CUDA Cores | 3,840 |
| VRAM | 24GB GDDR5X |
| Memory Bandwidth | 347 GB/s |
| FP32 Performance | 12 TFLOPS |
| INT8 Performance | 47 TOPS |
| FP16 Performance | Emulated only (no native FP16) |
| TDP | 250W |
| Cooling | Passive (requires aftermarket) |
| Compute Capability | 6.1 |
| PCIe | PCIe 3.0 x16 |
| Power Connector | 8-pin EPS (on some models) or 8-pin PCIe |
What Can You Run on 24GB?
24GB of VRAM opens up a wide range of models. Here's what fits:
- Llama 3 8B (Q4-Q8) - Fits easily with plenty of room for context. The bread-and-butter use case.
- Llama 3 14B (Q4-Q6) - Comfortable fit. Great quality for general tasks.
- Qwen 2.5 32B (Q4) - Fits at ~18GB. Leaves room for ~4K context. Excellent model quality.
- Deepseek Coder 33B (Q4) - Tight fit at ~20GB but works for coding tasks.
- Mixtral 8x7B (Q3-Q4) - Fits the MoE model with some quantization. Good results.
- Llama 3 8B (FP16) - Full precision fits in ~16GB. Best quality for the 8B class.
- Any 7B model at any quantization - Never worry about 7B models again.
Models That Won't Fit
- Llama 3 70B (Q4) - Needs ~40GB. Requires two P40s. See our 70B under $500 guide.
- Any 70B+ model at Q4 or above - Single P40 can't do it.
- Llama 3 34B at Q6+ - Gets tight above Q4 quantization for 30B+ models.
Real-World Performance
Tesla P40 24GB - Estimated tok/s (llama.cpp, Q4_K_M)
These speeds are comfortable for interactive use with 7B-14B models and usable for 30B+ models. For reference, comfortable reading speed is about 15-20 tok/s, and ChatGPT typically generates around 30-50 tok/s.
For a deeper dive into how we calculate these estimates, see our speed estimation methodology.
Pros
- 24GB VRAM for ~$150 - unbeatable value
- Full CUDA 12 / PyTorch support
- INT8 inference acceleration
- 347 GB/s bandwidth - fast enough for interactive use
- Huge community - countless guides and support
- Abundant supply on eBay
- Compute capability 6.1 - wide software support
- ECC memory - reliable for long runs
Cons
- No display output (need a second GPU for monitor)
- Passive cooling - requires aftermarket fan
- No native FP16 (emulated only, slower than true FP16)
- 250W TDP - significant power draw
- Large card - may not fit in small cases
- 8-pin EPS power connector on some variants
- Not great for training (no FP16 tensor cores)
Cooling Guide
The #1 concern with the P40 is cooling. It's a passive card with no fans. Without adequate airflow, it will thermal throttle and may shut down. Here are your options, ranked from best to simplest:
Cooling Solutions
Option 1: 3D-Printed Fan Shroud (Recommended)
Print a custom shroud from Thingiverse/Printables that mounts 1-2 fans (92mm or 80mm) onto the P40's heatsink. This provides targeted, effective cooling. Search "Tesla P40 fan shroud" - several tested designs are available. If you don't have a 3D printer, check if your local library or makerspace has one, or order a print from a service.
Option 2: Zip-Tied Fan
Zip-tie a 92mm fan (like Noctua NF-A9) directly to the card's heatsink. Cost: $15-20 for the fan. It's not pretty, but it works well enough for most use cases. Point the fan to blow air through the heatsink fins.
Option 3: High-Airflow Case
Some cases with strong front-to-back airflow (multiple 140mm intake fans) can keep a P40 cool enough without direct modifications. This works best if the P40 is the only card and the case has good airflow design. Not guaranteed - monitor temps carefully.
Option 4: Aftermarket GPU Cooler
Some people have mounted aftermarket coolers (like the Raijintek Morpheus) on the P40. This provides excellent cooling but requires more effort and money ($50+). Overkill for most, but an option if you want the best thermal performance.
Temperature Monitoring
Always monitor GPU temperature during first use: nvidia-smi -l 1 or watch -n 1 nvidia-smi. Target is under 80C under sustained load. The P40 starts throttling around 90C and will shut down at 95C.
P40 vs the Competition
| Factor | P40 ($150) | RTX 3090 ($700) | M40 ($80) | P100 ($170) |
|---|---|---|---|---|
| VRAM | 24GB | 24GB | 24GB | 16GB |
| Bandwidth | 347 GB/s | 936 GB/s | 288 GB/s | 732 GB/s |
| tok/s (8B Q4) | ~45 | ~130 | ~32 | ~80 |
| FP16 | Emulated | Native | None | Native |
| Display Output | No | Yes | No | No |
| Cooling | Passive | Active (fans) | Passive | Passive |
| Value ($/GB) | $6.25 | $29.17 | $3.33 | $10.63 |
| Max model (Q4) | ~32B | ~32B | ~32B | ~14B |
The RTX 3090 is objectively better in every performance metric, but at 4.7x the price. The P40 wins on value. The M40 is cheaper but noticeably slower. The P100 is faster but limited by 16GB VRAM.
For a detailed comparison with the M40, see our M40 review. For a detailed look at the P100, see our P100 review.
Verdict
The #1 Budget GPU for Local AI
The Tesla P40 24GB is our top recommendation for anyone getting started with local AI on a budget. For ~$150, you get 24GB of VRAM that can run models up to 32B parameters at Q4 quantization. It has broad software support, a huge community, and abundant availability on eBay.
Yes, you need to solve cooling. Yes, it has no display output. Yes, it's slower than an RTX 3090. But at $150 vs $700, the P40 lets you experience local AI for the cost of a nice dinner. You can always upgrade later once you know what you need.
If you're reading this and wondering which GPU to buy for local LLMs, the answer is almost certainly the P40.
Ready to Buy a P40?
Check current Tesla P40 prices and listings on GPUDojo.
View P40 ListingsAlso see our eBay buying guide for tips on buying used GPUs safely.