NVIDIA RTX 4090 — 24GB

Fastest consumer GPU for AI inference. Worth it if speed matters more than value.

Specifications

BrandNVIDIA
ModelRTX 4090
VRAM24GB
ArchitectureAda
CUDA / Stream Processors16,384
Memory Bandwidth1008 GB/s
TDP450W
FP32 TFLOPS83

Buy Now

Prices last updated:

GPUDojo is reader-supported. When you buy through links on our site, we may earn an affiliate commission.

Price History

Price tracking started — chart will appear after the next snapshot.

For AI / LLM Use

Solid choice for 30B models and comfortable 14B inference.

What Models Can It Run?

  • 30B Q4_K_M, 14B full precision, 70B Q2 (tight)
  • 14B Q6_K, 30B Q3_K (tight)
  • 14B Q4_K_M, 7B full precision
  • 7B Q6_K, 14B Q3_K (tight)
  • 7B Q4_K_M only

Estimated Performance

Generation: ~76 tokens/sec

Prefill: ~1482 tokens/sec

Recommended Quantisations

  • Q4_K_M recommended for 30B models
  • Q6_K or Q8 for 14B and below
  • Full precision for 7B

Pros & Cons

Pros

  • 24GB VRAM — handles large models
  • High memory bandwidth for fast generation
  • Ada architecture — good software support
  • Consumer card — easy to install, display output

Cons

  • 450W TDP — high power draw

Community Verdict

Average score: 9.5/10 (1 review)

  • TechPowerUp 9.5/10

    The undisputed king of consumer GPUs. Ada Lovelace architecture delivers massive performance gains.

    Source
  • r/LocalLLaMA

    Fastest consumer card for inference. Overkill for most use cases but unmatched speed for 24GB tier.

    Source