NVIDIA Tesla V100 32GB — 32GB

32GB of HBM2 memory at datacenter speeds. Enterprise workhorse finding second life in AI.

Specifications

BrandNVIDIA
ModelTesla V100 32GB
VRAM32GB
ArchitectureVolta
CUDA / Stream Processors5,120
Memory Bandwidth900 GB/s
TDP300W
FP32 TFLOPS14

No active listings. Check back soon!

Price History

Price tracking started — chart will appear after the next snapshot.

For AI / LLM Use

Solid choice for 30B models and comfortable 14B inference. Datacenter card — no display output, may need aftermarket cooling.

What Models Can It Run?

  • 30B Q6_K, 70B Q2_K
  • 30B Q4_K_M, 14B full precision, 70B Q2 (tight)
  • 14B Q6_K, 30B Q3_K (tight)
  • 14B Q4_K_M, 7B full precision
  • 7B Q6_K, 14B Q3_K (tight)
  • 7B Q4_K_M only

Estimated Performance

Generation: ~68 tokens/sec

Prefill: ~250 tokens/sec

Recommended Quantisations

  • Q4_K_M recommended for 30B models
  • Q6_K or Q8 for 14B and below
  • Full precision for 7B

Pros & Cons

Pros

  • 32GB VRAM — handles large models
  • High memory bandwidth for fast generation
  • Volta architecture — good software support

Cons

  • 300W TDP — high power draw
  • No display output — headless only
  • May need aftermarket cooling solution

Community Verdict

  • r/LocalLLaMA

    32GB HBM2 with excellent bandwidth. Datacenter-only form factor but great performance for the price.

    Source