> Hopefully AMD gets the Rx 6800xt working with ROCm consistently
I am a maintainer for rocSOLVER (the ROCm LAPACK implementation) and I personally own an RX 6800 XT. It is very similar to the officially supported W6800. Are there any specific issues you're concerned about?
I know the software and I have the hardware. I'd be happy to help track down any issues.
I might be operating off of old news. But IIRC, the 6800 wasn't well supported when it first came out, and AMD constantly has been applying patches to get it up-to-speed.
I wasn't sure what the state of the 6800 was (I don't own it myself), so I might be operating under old news. As I said a bit earlier, I use the Vega64 with no issues (for 256-thread workgroups. I do think there's some obscure bug for 1024-thread workgroups, but I haven't really been able to track it down. And sticking with 256-threads is better for my performance anyway, so I never really bothered trying to figure this one out)
Navi 21 launched in November 2020 but it only got official support with ROCm 5.0 in February 2022.
With respect to your issue running 1024 threads per block, if you're running out of VGPRs, you may want to try explicitly specify the max threads per block as 1024 and see if that helps. I recall that at one point the compiler was defaulting to 256 despite the default being documented as 1024.
The main issue I have with the idea of Navi 21 is that its a 32-wide warp, when CDNA2 (like MX250x) is 64-wide warp.
Granted, RDNA and CDNA still have largely the same assembly language, so its still better than using say... NVidia GPUs. But I have to imagine that the 32-wide vs 64-wide difference is big in some use cases. In particular: low-level programs that use warp-level primitives, like DPP, shared-memory details and such.
I assume the super-computer programmers want a cheap system to have under their desk to prototype code that's similar to the big MI250x system. Vega56/64 is several generations old, while 6800 xt is pretty different architecturally. It seems weird that they'd have to buy MI200 GPUs for this purpose, especially in light of NVidia's strategy (where A2000 nvidia could serve as a close replacement. Maybe not perfect, but closer to the A100 big-daddy than the 6800xt is to the big daddy MI250x).
--------
EDIT: That being said: this is probably completely moot for my own purposes. I can't afford an MI250x system at all. At best I'd make some kind of hand-built consumer rig for my own personal purposes. So 6800 xt would be all I personally need. VRAM-constraints feel quite real, so the 16GBs of VRAM at that price makes 6800xt a very pragmatic system for personal use and study.
I am a maintainer for rocSOLVER (the ROCm LAPACK implementation) and I personally own an RX 6800 XT. It is very similar to the officially supported W6800. Are there any specific issues you're concerned about?
I know the software and I have the hardware. I'd be happy to help track down any issues.