Much respect to NVIDIA and their software team but the situation is changing. PlaidML is like cuDNN for every GPU. Fully open source, faster in many cases than TF+cuDNN on NVIDIA, beats vendor tools on other architectures, Linux/Mac/Win. Supports Keras currently but more frameworks are not difficult (patches welcome).
The PlaidML benchmarks are suspect. They compare to Keras + Tensorflow, which is a really unfair comparison since 1) Tensorflow is probably the slowest of the big deep learning frameworks out there (compared to PyTorch, MXNet, etc.), and 2) Keras itself is quite slow. Keras is optimized more for ease of use, introduces lots of abstractions, and often doesn't take advantage of many TF optimizations, (for just one example until very recently Keras did not use TF's fused batch norm, which the TF docs claim provides a 10-30% speedup in overall network performance, which alone could be enough to account for many of the benchmarks showing PlaidML ahead).
In my opinion it's extremely fair. The benchmarks are Keras+PlaidML compared to Keras+TensorFlow, it allows running exactly the same nets (just imported from the Keras included applications) and whatever penalty Keras might impose is equal in the two cases. Having one very direct comparison is actually why we constructed the tests that way (none of the other frameworks run on our high priority platforms).
That said we'd be pretty excited if someone wanted to add support for TF, PyTorch, MXNet, etc. We like Keras but are happy to have integrations for all frameworks. With work you could pair it with Docker and containerize GPU-accelerated workloads without the guests even needing to know what hardware it's running on. Lots of possibilities.
> whatever penalty Keras might impose is equal in the two cases.
The penalty Keras imposes when using Tensorflow depends on its Tensorflow implementation. The penalty Keras imposes when using MXNet depends on its MXNet implementation. The penalty Keras imposes when using PlaidML depends on whatever the PlaidML devs implemented. When you build a Keras layer, it's calling different Keras code for each backend.
The comparison would be fair if Plaid claimed to be the fastest Keras backed, not if it were actually claiming to be faster than Tensorflow.
There was someone on reddit/ml who posted some pretty interesting numbers for training.
I think they have a lot of challenges ahead of them, but I’m still more optimistic about Plaid than AMD’s own efforts.
AMD says that they don’t care about ML[1], and their actions back that up.
Edit: and to be clear, I think comparing Keras+Plaid vs Keras+TF is an entirely valid thing to do. Lots of people work in Keras, and if you download a random NN code off github it likely to be Keras (or Pytorch now of course).
Batch 1 inference on convnets is key for us internally but training does work pretty well. The underlying machinery can do much more. Here's a blog post that talks about how it works with some links to more detailed docs & the actual implementations:
Two of the big motivators for opening the code were 1) giving students taking the popular courses a way to get started with GPU in whatever machine they've got (recent Intel GPUs in say a MacBook Air are enough) and 2) giving researchers a platform where it's simple to add efficient GPU-accelerated ops.
For scale on #2 check out the entire implementation of convolution: