Though if true, it does beg the question, how can these services possibly be off...

jacobr1 · on Jan 20, 2023

One thing to consider, there is considerable asymmetry between the high training costs and the lower (relatively) operating costs.

> Perhaps ChatGPT is run with massive operating losses and its all VC subsidized.

I'm not in the know either, but the current access is providing both marketing awareness, and significant human training/feedback to be used for improvements on future commercial projects. It could certainly be the case that that benefits outweigh costs compared to the classic, dumber "get eyeballs now, monetize later" strategy.

throwifasd · on Jan 20, 2023

This post [0] is a good primer and they are talking about a 20b model gpt3 has 175b. Their other posts go more into details but yes, it really is a massive operation.

Some hard facts from here [1] talking about BLOOM with 175b parameters..

>> Installing the full 175B version is a challenge though as it requires around 350GB of GPU VRAM, which is not something one can easily afford.

But hey why deploy a model like chatGPT at all when you guys can be confidently incorrect all by yourselves.

[0] https://nlpcloud.com/deploying-gpt-neox-20-production-focus-...

[1] https://nlpcloud.com/chatgpt-open-source-alternatives.html

adam_arthur · on Jan 20, 2023

So it sounds like this is a question of loading the model into VRAM, and not a question of the cost of a single query. I assume once a model is loaded, many queries can be serviced by that model quickly.

There's nothing incorrect about my assertion. If it were to actually take many GPUs to service one query, then there is no mass scale cost viable consumer product. That's just a clear economic fact. Regardless if a model could be theoretically spun up in a cost inefficient manner.

And even 100s of GB of VRAM is not far off from consumer hardware. Look at how quickly graphics ram has expanded over time. About ~10x in ~10 years for high end cards, at a cursory glance at various Nvidia cards. At the same trajectory we could see a 400GB vram card within the next decade (though lots of assumptions)

visarga · on Jan 20, 2023

> I assume once a model is loaded, many queries can be serviced by that model quickly.

Depends. If you have room to load the whole model, yes. If you need to swap in and out parts of the model, then it matters if you have enough RAM.

throwifasd · on Jan 20, 2023

You really are like a chatbot... look at the last three node sizes and the density of ram in them. It's not gonna happen as fast as you dream about it especially not with the discounts of the last Gens. The hope is to go to fp4 if you want to run it on consumer hardware and we are still not talking to about 2-3 cards. Why not at least try to Google before hammering down on stupid and uninformed hot takes?

visarga · on Jan 20, 2023

$3M/day is what I heard