A bottleneck can be running the model only on GPU, where CPU is more efficient. ...

		giacaglia on Sept 8, 2019 \| parent \| context \| favorite \| on: Waveglow Inference in CUDA C++ A bottleneck can be running the model only on GPU, where CPU is more efficient. But most of the bottlenecks are memory issues. GPUs do not necessarily have enough memory and so you end up having to access "external memory" that slows down forward pass a ton

Also, in some cases like small RNN/LSTMs, CPU's can be faster.