NVIDIA Tesla M40 — 24GB

Cheapest 24GB GPU at ~$80. Maxwell architecture is slow but the VRAM capacity is real. Needs aftermarket cooling.

Specifications

Brand	NVIDIA
Model	Tesla M40
VRAM	24GB
Architecture	Maxwell
CUDA / Stream Processors	3,072
Memory Bandwidth	288 GB/s
TDP	250W
FP32 TFLOPS	7

No active listings. Check back soon!

Price History

Prices stable since 2026-03-13

eBay

For AI / LLM Use

Solid choice for 30B models and comfortable 14B inference. Slower generation — usable but not snappy. Older architecture may have limited software support (check CUDA compatibility). Datacenter card — no display output, may need aftermarket cooling.

What Models Can It Run?

30B Q4_K_M, 14B full precision, 70B Q2 (tight)
14B Q6_K, 30B Q3_K (tight)
14B Q4_K_M, 7B full precision
7B Q6_K, 14B Q3_K (tight)
7B Q4_K_M only

Estimated Performance

Generation: ~22 tokens/sec

Prefill: ~125 tokens/sec

Recommended Quantisations

Q4_K_M recommended for 30B models
Q6_K or Q8 for 14B and below
Full precision for 7B

Pros & Cons

Pros

24GB VRAM — handles large models

Cons

Low memory bandwidth — slower token generation
Older architecture — check CUDA/ROCm compatibility
No display output — headless only
May need aftermarket cooling solution

Community Verdict

r/LocalLLaMA
The absolute cheapest 24GB card. Passive cooled so you need a blower mod. Slow but VRAM is VRAM.
Source

← Back to full comparison table