NVIDIA Tesla K80 — 24GB

Looks like 24GB but it is two 12GB GPUs. Kepler is too old for most AI frameworks. Avoid unless it is nearly free.

Specifications

BrandNVIDIA
ModelTesla K80
VRAM24GB
ArchitectureKepler
CUDA / Stream Processors4,992
Memory Bandwidth480 GB/s
TDP300W
FP32 TFLOPS8.7

Buy Now

Prices last updated:

GPUDojo is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

Price History

Price tracking started — chart will appear after the next snapshot.

For AI / LLM Use

Solid choice for 30B models and comfortable 14B inference. Older architecture may have limited software support (check CUDA compatibility). Datacenter card — no display output, may need aftermarket cooling.

What Models Can It Run?

  • 30B Q4_K_M, 14B full precision, 70B Q2 (tight)
  • 14B Q6_K, 30B Q3_K (tight)
  • 14B Q4_K_M, 7B full precision
  • 7B Q6_K, 14B Q3_K (tight)
  • 7B Q4_K_M only

Estimated Performance

Generation: ~36 tokens/sec

Prefill: ~155 tokens/sec

Recommended Quantisations

  • Q4_K_M recommended for 30B models
  • Q6_K or Q8 for 14B and below
  • Full precision for 7B

Pros & Cons

Pros

  • 24GB VRAM — handles large models

Cons

  • 300W TDP — high power draw
  • Older architecture — check CUDA/ROCm compatibility
  • No display output — headless only
  • May need aftermarket cooling solution

Community Verdict

  • r/LocalLLaMA

    Avoid. Dual-GPU means 12GB per die, Kepler lacks modern CUDA support, and it draws 300W. Buy a P40 instead.

    Source