Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
aurareturn
58 days ago
|
parent
|
context
|
favorite
| on:
vLLM large scale serving: DeepSeek 2.2k tok/s/h200...
Are you using 16bit for inference? How many tokens/second if you use 8bit?
Given that SOTA models now use 4bit inference, can you do an estimation for 4bit + Blackwell?
mycelia
57 days ago
[–]
Hi! This benchmarking was done w/ DeepSeek-V3's published FP8 weights. And Blackwell performance is still being optimized. SGLang hit 14k/s/B200 though, pretty cool writeup here:
https://lmsys.org/blog/2025-09-25-gb200-part-2/
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search:
Given that SOTA models now use 4bit inference, can you do an estimation for 4bit + Blackwell?