Tesla P40: The Best Budget GPU for Local AI

Why the $285 Tesla P40 is the #1 recommendation for budget AI builders | Updated May 2026

$11.88/GB

24GB VRAM for $285

Live price as of May 2026 · See all listings

The NVIDIA Tesla P40 24GB is the GPU the AI community keeps coming back to. At $285 used on eBay, you get 24GB of VRAM — the same capacity as a $700+ RTX 3090 or $1,600 RTX 4090. Not the fastest, but for VRAM-per-dollar, nothing beats it.

$/GB VRAM Comparison

VRAM is the single most important spec for running LLMs. Here's how the P40 stacks up against the competition:

GPU	VRAM	Typical Used Price	$/GB	Notes
Tesla K80	12GB (per GPU)	$40	$3.33/GB	Don't buy - only 12GB usable per GPU
Tesla M40	24GB	$80	$3.33/GB	Cheapest real 24GB, but slow
Tesla P40	24GB	$285	$11.88/GB	Best overall value
Tesla P100	16GB	$170	$10.63/GB	Faster but less VRAM
RTX 3090	24GB	$700	$29.17/GB	Much faster, but 4.7x the price
RTX 4090	24GB	$1,600	$66.67/GB	Fastest consumer GPU
RTX 3060 12GB	12GB	$200	$16.67/GB	Has display out, but only 12GB

While the M40 technically has a better $/GB ratio, the P40's 20% faster bandwidth and better software ecosystem make it the best overall value when considering the complete picture. You're paying $70 more for a significantly better experience.

Tesla P40 24GB - Full Specs

GPU Architecture Pascal (GP102)

CUDA Cores 3,840

VRAM 24GB GDDR5X

Memory Bandwidth 347 GB/s

FP32 Performance 12 TFLOPS

INT8 Performance 47 TOPS

FP16 Performance Emulated only (no native FP16)

TDP 250W

Cooling Passive (requires aftermarket)

Compute Capability 6.1

PCIe PCIe 3.0 x16

Power Connector 8-pin EPS (on some models) or 8-pin PCIe

What Can You Run on 24GB?

24GB of VRAM opens up a wide range of models. Here's what fits:

Llama 3 8B (Q4-Q8) - Fits easily with plenty of room for context. The bread-and-butter use case.

Llama 3 14B (Q4-Q6) - Comfortable fit. Great quality for general tasks.

Qwen 2.5 32B (Q4) - Fits at ~18GB. Leaves room for ~4K context. Excellent model quality.

Deepseek Coder 33B (Q4) - Tight fit at ~20GB but works for coding tasks.

Mixtral 8x7B (Q3-Q4) - Fits the MoE model with some quantization. Good results.

Llama 3 8B (FP16) - Full precision fits in ~16GB. Best quality for the 8B class.

Any 7B model at any quantization - Never worry about 7B models again.

Models That Won't Fit

Llama 3 70B (Q4) - Needs ~40GB. Requires two P40s. See our 70B under $500 guide.

Any 70B+ model at Q4 or above - Single P40 can't do it.

Llama 3 34B at Q6+ - Gets tight above Q4 quantization for 30B+ models.

Real-World Performance

Tesla P40 24GB - Estimated tok/s (llama.cpp, Q4_K_M)

Llama 3 8B (Q4) ~40-50 tok/s

Llama 3 8B (Q8) ~25-30 tok/s

Llama 3 14B (Q4) ~22-28 tok/s

Qwen 2.5 32B (Q4) ~14-18 tok/s

Llama 3 8B (FP16) ~20-24 tok/s

Deepseek Coder 33B (Q3) ~10-14 tok/s

Mixtral 8x7B (Q4) ~15-20 tok/s

Comfortable for 7B-14B models, usable for 30B+. Comfortable reading speed is ~15-20 tok/s; ChatGPT generates ~30-50 tok/s. See our speed estimation methodology.

Pros

24GB VRAM for $285 - unbeatable value

Full CUDA 12 / PyTorch support

INT8 inference acceleration

347 GB/s bandwidth - fast enough for interactive use

Huge community - countless guides and support

Abundant supply on eBay

Compute capability 6.1 - wide software support

ECC memory - reliable for long runs

Cons

No display output (need a second GPU for monitor)

Passive cooling - requires aftermarket fan

No native FP16 (emulated only, slower than true FP16)

250W TDP - significant power draw

Large card - may not fit in small cases

8-pin EPS power connector on some variants

Not great for training (no FP16 tensor cores)

Cooling Guide

The P40 is a passive card with no fans. Without cooling, it will thermal throttle and shut down. Options ranked from best to simplest:

Cooling Solutions

Option 1: 3D-Printed Fan Shroud (Recommended)

Search "Tesla P40 fan shroud" on Thingiverse/Printables. Mounts 1-2 fans (92mm or 80mm) directly onto the heatsink. Several tested designs available. No printer? Try your local library, makerspace, or an online print service.

Option 2: Zip-Tied Fan

Zip-tie a 92mm fan (Noctua NF-A9, ~$15) to the heatsink. Not pretty, but effective.

Option 3: High-Airflow Case

Strong front-to-back airflow (multiple 140mm intake fans) can work if the P40 is the only card. Monitor temps carefully.

Option 4: Aftermarket GPU Cooler

Raijintek Morpheus or similar ($50+). Excellent cooling but overkill for most.

Temperature Monitoring

Always monitor GPU temperature during first use: nvidia-smi -l 1 or watch -n 1 nvidia-smi. Target is under 80C under sustained load. The P40 starts throttling around 90C and will shut down at 95C.

P40 vs the Competition

Factor P40 ($285) RTX 3090 ($700) M40 ($80) P100 ($170)

VRAM 24GB 24GB 24GB 16GB

Bandwidth 347 GB/s 936 GB/s 288 GB/s 732 GB/s

tok/s (8B Q4) ~45 ~130 ~32 ~80

FP16 Emulated Native None Native

Display Output No Yes No No

Cooling Passive Active (fans) Passive Passive

Value ($/GB) $11.88/GB $29.17 $3.33 $10.63

Max model (Q4) ~32B ~32B ~32B ~14B

The RTX 3090 is faster in every metric but costs 4.7x more. The M40 is cheaper but slower. The P100 is faster but limited to 16GB. See our M40 review and P100 review for detailed comparisons.

Verdict

The #1 Budget GPU for Local AI

The Tesla P40 24GB is our top recommendation for anyone getting started with local AI on a budget. For $285, you get 24GB of VRAM that can run models up to 32B parameters at Q4 quantization. It has broad software support, a huge community, and abundant availability on eBay.

Yes, you need to solve cooling. Yes, it has no display output. Yes, it's slower than an RTX 3090. But at $285 vs $700+ for a 3090, the P40 lets you experience local AI without breaking the bank. You can always upgrade later once you know what you need.

If you're reading this and wondering which GPU to buy for local LLMs, the answer is almost certainly the P40.

Ready to Buy a P40?

Check current Tesla P40 prices and listings on GPUDojo.
View P40 Listings
Also see our eBay buying guide for tips on buying used GPUs safely.

GPU Architecture	Pascal (GP102)
CUDA Cores	3,840
VRAM	24GB GDDR5X
Memory Bandwidth	347 GB/s
FP32 Performance	12 TFLOPS
INT8 Performance	47 TOPS
FP16 Performance	Emulated only (no native FP16)
TDP	250W
Cooling	Passive (requires aftermarket)
Compute Capability	6.1
PCIe	PCIe 3.0 x16
Power Connector	8-pin EPS (on some models) or 8-pin PCIe

Tesla P40: The Best Budget GPU for Local AI

$/GB VRAM Comparison

Tesla P40 24GB - Full Specs

What Can You Run on 24GB?

Models That Won't Fit

Real-World Performance

Tesla P40 24GB - Estimated tok/s (llama.cpp, Q4_K_M)

Pros

Cons

Cooling Guide

Cooling Solutions

Option 1: 3D-Printed Fan Shroud (Recommended)

Option 2: Zip-Tied Fan

Option 3: High-Airflow Case

Option 4: Aftermarket GPU Cooler

Temperature Monitoring

P40 vs the Competition

Verdict

The #1 Budget GPU for Local AI

Ready to Buy a P40?

Related Articles