Tesla P40: The Best Budget GPU for Local AI

$11.88/GB

24GB VRAM for $285

Live price as of May 2026 · See all listings

The NVIDIA Tesla P40 24GB is the GPU the AI community keeps coming back to. At $285 used on eBay, you get 24GB of VRAM — the same capacity as a $700+ RTX 3090 or $1,600 RTX 4090. Not the fastest, but for VRAM-per-dollar, nothing beats it.

$/GB VRAM Comparison

VRAM is the single most important spec for running LLMs. Here's how the P40 stacks up against the competition:

GPU VRAM Typical Used Price $/GB Notes
Tesla K80 12GB (per GPU) $40 $3.33/GB Don't buy - only 12GB usable per GPU
Tesla M40 24GB $80 $3.33/GB Cheapest real 24GB, but slow
Tesla P40 24GB $285 $11.88/GB Best overall value
Tesla P100 16GB $170 $10.63/GB Faster but less VRAM
RTX 3090 24GB $700 $29.17/GB Much faster, but 4.7x the price
RTX 4090 24GB $1,600 $66.67/GB Fastest consumer GPU
RTX 3060 12GB 12GB $200 $16.67/GB Has display out, but only 12GB

While the M40 technically has a better $/GB ratio, the P40's 20% faster bandwidth and better software ecosystem make it the best overall value when considering the complete picture. You're paying $70 more for a significantly better experience.

Tesla P40 24GB - Full Specs

GPU ArchitecturePascal (GP102)
CUDA Cores3,840
VRAM24GB GDDR5X
Memory Bandwidth347 GB/s
FP32 Performance12 TFLOPS
INT8 Performance47 TOPS
FP16 PerformanceEmulated only (no native FP16)
TDP250W
CoolingPassive (requires aftermarket)
Compute Capability6.1
PCIePCIe 3.0 x16
Power Connector8-pin EPS (on some models) or 8-pin PCIe

What Can You Run on 24GB?

24GB of VRAM opens up a wide range of models. Here's what fits:

  • Llama 3 8B (Q4-Q8) - Fits easily with plenty of room for context. The bread-and-butter use case.
  • Llama 3 14B (Q4-Q6) - Comfortable fit. Great quality for general tasks.
  • Qwen 2.5 32B (Q4) - Fits at ~18GB. Leaves room for ~4K context. Excellent model quality.
  • Deepseek Coder 33B (Q4) - Tight fit at ~20GB but works for coding tasks.
  • Mixtral 8x7B (Q3-Q4) - Fits the MoE model with some quantization. Good results.
  • Llama 3 8B (FP16) - Full precision fits in ~16GB. Best quality for the 8B class.
  • Any 7B model at any quantization - Never worry about 7B models again.

Models That Won't Fit

  • Llama 3 70B (Q4) - Needs ~40GB. Requires two P40s. See our 70B under $500 guide.
  • Any 70B+ model at Q4 or above - Single P40 can't do it.
  • Llama 3 34B at Q6+ - Gets tight above Q4 quantization for 30B+ models.

Real-World Performance

Tesla P40 24GB - Estimated tok/s (llama.cpp, Q4_K_M)

Llama 3 8B (Q4) ~40-50 tok/s
Llama 3 8B (Q8) ~25-30 tok/s
Llama 3 14B (Q4) ~22-28 tok/s
Qwen 2.5 32B (Q4) ~14-18 tok/s
Llama 3 8B (FP16) ~20-24 tok/s
Deepseek Coder 33B (Q3) ~10-14 tok/s
Mixtral 8x7B (Q4) ~15-20 tok/s

Comfortable for 7B-14B models, usable for 30B+. Comfortable reading speed is ~15-20 tok/s; ChatGPT generates ~30-50 tok/s. See our speed estimation methodology.

Pros

  • 24GB VRAM for $285 - unbeatable value
  • Full CUDA 12 / PyTorch support
  • INT8 inference acceleration
  • 347 GB/s bandwidth - fast enough for interactive use
  • Huge community - countless guides and support
  • Abundant supply on eBay
  • Compute capability 6.1 - wide software support
  • ECC memory - reliable for long runs

Cons

  • No display output (need a second GPU for monitor)
  • Passive cooling - requires aftermarket fan
  • No native FP16 (emulated only, slower than true FP16)
  • 250W TDP - significant power draw
  • Large card - may not fit in small cases
  • 8-pin EPS power connector on some variants
  • Not great for training (no FP16 tensor cores)

Cooling Guide

The P40 is a passive card with no fans. Without cooling, it will thermal throttle and shut down. Options ranked from best to simplest:

Cooling Solutions

Option 1: 3D-Printed Fan Shroud (Recommended)

Search "Tesla P40 fan shroud" on Thingiverse/Printables. Mounts 1-2 fans (92mm or 80mm) directly onto the heatsink. Several tested designs available. No printer? Try your local library, makerspace, or an online print service.

Option 2: Zip-Tied Fan

Zip-tie a 92mm fan (Noctua NF-A9, ~$15) to the heatsink. Not pretty, but effective.

Option 3: High-Airflow Case

Strong front-to-back airflow (multiple 140mm intake fans) can work if the P40 is the only card. Monitor temps carefully.

Option 4: Aftermarket GPU Cooler

Raijintek Morpheus or similar ($50+). Excellent cooling but overkill for most.

Temperature Monitoring

Always monitor GPU temperature during first use: nvidia-smi -l 1 or watch -n 1 nvidia-smi. Target is under 80C under sustained load. The P40 starts throttling around 90C and will shut down at 95C.

P40 vs the Competition

Factor P40 ($285) RTX 3090 ($700) M40 ($80) P100 ($170)
VRAM 24GB 24GB 24GB 16GB
Bandwidth 347 GB/s 936 GB/s 288 GB/s 732 GB/s
tok/s (8B Q4) ~45 ~130 ~32 ~80
FP16 Emulated Native None Native
Display Output No Yes No No
Cooling Passive Active (fans) Passive Passive
Value ($/GB) $11.88/GB $29.17 $3.33 $10.63
Max model (Q4) ~32B ~32B ~32B ~14B

The RTX 3090 is faster in every metric but costs 4.7x more. The M40 is cheaper but slower. The P100 is faster but limited to 16GB. See our M40 review and P100 review for detailed comparisons.

Verdict

The #1 Budget GPU for Local AI

The Tesla P40 24GB is our top recommendation for anyone getting started with local AI on a budget. For $285, you get 24GB of VRAM that can run models up to 32B parameters at Q4 quantization. It has broad software support, a huge community, and abundant availability on eBay.

Yes, you need to solve cooling. Yes, it has no display output. Yes, it's slower than an RTX 3090. But at $285 vs $700+ for a 3090, the P40 lets you experience local AI without breaking the bank. You can always upgrade later once you know what you need.

If you're reading this and wondering which GPU to buy for local LLMs, the answer is almost certainly the P40.

Ready to Buy a P40?

Check current Tesla P40 prices and listings on GPUDojo.

View P40 Listings

Also see our eBay buying guide for tips on buying used GPUs safely.