Are you using 16bit for inference? How many tokens/second if you use 8bit? Given... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		aurareturn 58 days ago \| parent \| context \| favorite \| on: vLLM large scale serving: DeepSeek 2.2k tok/s/h200... Are you using 16bit for inference? How many tokens/second if you use 8bit? Given that SOTA models now use 4bit inference, can you do an estimation for 4bit + Blackwell?

mycelia 57 days ago [–]

Hi! This benchmarking was done w/ DeepSeek-V3's published FP8 weights. And Blackwell performance is still being optimized. SGLang hit 14k/s/B200 though, pretty cool writeup here: https://lmsys.org/blog/2025-09-25-gb200-part-2/

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact