Super interesting post. Tyvm!
Which three sources would you recommend for someone fluent in ML to read up on to arrive at your conclusions presented here (or their own)?
The Variational Auto Encoder paper [1], and the DDPM paper[2] are pretty much all you need for this, [6[ is also good but covered by [2]. Going through the derivations helped solidify things for me. I haven't read [9] but looks very promising, authors include Jonathan Ho, and D. Kingma who authored [2] and [1] respectively.
From there [3,4] show improvements to DDPMs, [5] shows that diffusion models can be very general. [7,8] show diffusion models from the view of score matching.