> A GPU cloud instance costs $1000+ per month (vs. $10 per month for an AVX-512 CPU). A bargain GPU instance (e.g., Linode) costs $1.50 per hour (and far more on AWS) but an AVX-512 CPU costs maybe $0.02 per hour.
This is a little confused. A T4 on GCP is $.35/hr at on-demand rates. A single thread of a Skylake+ CPU is in the 2-3c/hr range [2] (so $15-20/month w/o any committed or sustained use discounts. Your $10 is close enough).
So roughly the T4 GPU itself is ~10 threads. Both of these are before adding memory, storage, etc., but the T4 is a great inference part and hard to beat.
Comparing a single thread of a CPU to training-optimized GPU parts (like the A100 or V100) is sort of apples and oranges.
If you're not holding it for a full month, it's $0.015 per hour.
Here are the $1000 per month ($1.50 per hour) cloud GPUs I use, the cheapest that Linode provides: https://www.linode.com/pricing/
I would like to see a comparison of Linode vs Google Cloud, taking into account network bandwidth costs, etc. Maybe the Google Cloud T4's total cost of ownership is lower? I doubt it: for example, the Linode plan includes 16 terabytes of fast network traffic at no additional cost, and Google Compute charges $0.085 per gigabyte.
And the T4 is still 23x more expensive than an AVX-512 core, hourly.
boulos is 100% right that you are choosing to compare to particularly expensive GPUs instead of cheaper GPU instances that are actually intended for inference at low cost.
> A GPU cloud instance costs $1000+ per month (vs. $10 per month for an AVX-512 CPU). A bargain GPU instance (e.g., Linode) costs $1.50 per hour (and far more on AWS) but an AVX-512 CPU costs maybe $0.02 per hour.
This is a little confused. A T4 on GCP is $.35/hr at on-demand rates. A single thread of a Skylake+ CPU is in the 2-3c/hr range [2] (so $15-20/month w/o any committed or sustained use discounts. Your $10 is close enough).
So roughly the T4 GPU itself is ~10 threads. Both of these are before adding memory, storage, etc., but the T4 is a great inference part and hard to beat.
Comparing a single thread of a CPU to training-optimized GPU parts (like the A100 or V100) is sort of apples and oranges.
[1] https://cloud.google.com/compute/gpus-pricing
[2] https://cloud.google.com/compute/all-pricing