The vLLMulator

Select your GPU and model below. Then enter your desired max context length and max output tokens, and we’ll show you:

  1. Which precision variants of that model will fit on your GPU
  2. Approximate KV‐cache size in GB
  3. A suggested vLLM command line (with flags) you can copy & paste