Tesla P40: The Best Budget GPU for Local AI

~$6/GB

24GB VRAM for ~$150

The best VRAM-per-dollar in the used GPU market

If you want to run large language models locally and you're on a budget, the NVIDIA Tesla P40 24GB is the GPU the AI community keeps coming back to. At around $150 used on eBay, it gives you 24GB of VRAM - enough to run models that would require a $700+ RTX 3090 or $1,600 RTX 4090 for their VRAM capacity. It's not the fastest GPU, but for pure VRAM-per-dollar, nothing beats it.

$/GB VRAM Comparison

VRAM is the single most important spec for running LLMs. Here's how the P40 stacks up against the competition:

GPU VRAM Typical Used Price $/GB Notes
Tesla K80 12GB (per GPU) $40 $3.33/GB Don't buy - only 12GB usable per GPU
Tesla M40 24GB $80 $3.33/GB Cheapest real 24GB, but slow
Tesla P40 24GB $150 $6.25/GB Best overall value
Tesla P100 16GB $170 $10.63/GB Faster but less VRAM
RTX 3090 24GB $700 $29.17/GB Much faster, but 4.7x the price
RTX 4090 24GB $1,600 $66.67/GB Fastest consumer GPU
RTX 3060 12GB 12GB $200 $16.67/GB Has display out, but only 12GB

While the M40 technically has a better $/GB ratio, the P40's 20% faster bandwidth and better software ecosystem make it the best overall value when considering the complete picture. You're paying $70 more for a significantly better experience.

Tesla P40 24GB - Full Specs

GPU ArchitecturePascal (GP102)
CUDA Cores3,840
VRAM24GB GDDR5X
Memory Bandwidth347 GB/s
FP32 Performance12 TFLOPS
INT8 Performance47 TOPS
FP16 PerformanceEmulated only (no native FP16)
TDP250W
CoolingPassive (requires aftermarket)
Compute Capability6.1
PCIePCIe 3.0 x16
Power Connector8-pin EPS (on some models) or 8-pin PCIe

What Can You Run on 24GB?

24GB of VRAM opens up a wide range of models. Here's what fits:

Models That Won't Fit

  • Llama 3 70B (Q4) - Needs ~40GB. Requires two P40s. See our 70B under $500 guide.
  • Any 70B+ model at Q4 or above - Single P40 can't do it.
  • Llama 3 34B at Q6+ - Gets tight above Q4 quantization for 30B+ models.

Real-World Performance

Tesla P40 24GB - Estimated tok/s (llama.cpp, Q4_K_M)

Llama 3 8B (Q4) ~40-50 tok/s
Llama 3 8B (Q8) ~25-30 tok/s
Llama 3 14B (Q4) ~22-28 tok/s
Qwen 2.5 32B (Q4) ~14-18 tok/s
Llama 3 8B (FP16) ~20-24 tok/s
Deepseek Coder 33B (Q3) ~10-14 tok/s
Mixtral 8x7B (Q4) ~15-20 tok/s

These speeds are comfortable for interactive use with 7B-14B models and usable for 30B+ models. For reference, comfortable reading speed is about 15-20 tok/s, and ChatGPT typically generates around 30-50 tok/s.

For a deeper dive into how we calculate these estimates, see our speed estimation methodology.

Pros

  • 24GB VRAM for ~$150 - unbeatable value
  • Full CUDA 12 / PyTorch support
  • INT8 inference acceleration
  • 347 GB/s bandwidth - fast enough for interactive use
  • Huge community - countless guides and support
  • Abundant supply on eBay
  • Compute capability 6.1 - wide software support
  • ECC memory - reliable for long runs

Cons

  • No display output (need a second GPU for monitor)
  • Passive cooling - requires aftermarket fan
  • No native FP16 (emulated only, slower than true FP16)
  • 250W TDP - significant power draw
  • Large card - may not fit in small cases
  • 8-pin EPS power connector on some variants
  • Not great for training (no FP16 tensor cores)

Cooling Guide

The #1 concern with the P40 is cooling. It's a passive card with no fans. Without adequate airflow, it will thermal throttle and may shut down. Here are your options, ranked from best to simplest:

Cooling Solutions

Option 1: 3D-Printed Fan Shroud (Recommended)

Print a custom shroud from Thingiverse/Printables that mounts 1-2 fans (92mm or 80mm) onto the P40's heatsink. This provides targeted, effective cooling. Search "Tesla P40 fan shroud" - several tested designs are available. If you don't have a 3D printer, check if your local library or makerspace has one, or order a print from a service.

Option 2: Zip-Tied Fan

Zip-tie a 92mm fan (like Noctua NF-A9) directly to the card's heatsink. Cost: $15-20 for the fan. It's not pretty, but it works well enough for most use cases. Point the fan to blow air through the heatsink fins.

Option 3: High-Airflow Case

Some cases with strong front-to-back airflow (multiple 140mm intake fans) can keep a P40 cool enough without direct modifications. This works best if the P40 is the only card and the case has good airflow design. Not guaranteed - monitor temps carefully.

Option 4: Aftermarket GPU Cooler

Some people have mounted aftermarket coolers (like the Raijintek Morpheus) on the P40. This provides excellent cooling but requires more effort and money ($50+). Overkill for most, but an option if you want the best thermal performance.

Temperature Monitoring

Always monitor GPU temperature during first use: nvidia-smi -l 1 or watch -n 1 nvidia-smi. Target is under 80C under sustained load. The P40 starts throttling around 90C and will shut down at 95C.

P40 vs the Competition

Factor P40 ($150) RTX 3090 ($700) M40 ($80) P100 ($170)
VRAM 24GB 24GB 24GB 16GB
Bandwidth 347 GB/s 936 GB/s 288 GB/s 732 GB/s
tok/s (8B Q4) ~45 ~130 ~32 ~80
FP16 Emulated Native None Native
Display Output No Yes No No
Cooling Passive Active (fans) Passive Passive
Value ($/GB) $6.25 $29.17 $3.33 $10.63
Max model (Q4) ~32B ~32B ~32B ~14B

The RTX 3090 is objectively better in every performance metric, but at 4.7x the price. The P40 wins on value. The M40 is cheaper but noticeably slower. The P100 is faster but limited by 16GB VRAM.

For a detailed comparison with the M40, see our M40 review. For a detailed look at the P100, see our P100 review.

Verdict

The #1 Budget GPU for Local AI

The Tesla P40 24GB is our top recommendation for anyone getting started with local AI on a budget. For ~$150, you get 24GB of VRAM that can run models up to 32B parameters at Q4 quantization. It has broad software support, a huge community, and abundant availability on eBay.

Yes, you need to solve cooling. Yes, it has no display output. Yes, it's slower than an RTX 3090. But at $150 vs $700, the P40 lets you experience local AI for the cost of a nice dinner. You can always upgrade later once you know what you need.

If you're reading this and wondering which GPU to buy for local LLMs, the answer is almost certainly the P40.

Ready to Buy a P40?

Check current Tesla P40 prices and listings on GPUDojo.

View P40 Listings

Also see our eBay buying guide for tips on buying used GPUs safely.