Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The logic is not wrong. First off your statistics showed that it CAN happen. Just with low probability.

Second your way of measuring the probability is incorrect. IF we were to randomly pick an image out of ALL possible images of 32x32 1bit images THEN your probability would apply but clearly machine learning doesn't do this.

The actual calculations behind the probability is actually highly, highly dependent on the dataset.

To take the 2D line analogy further if we perform linear regression on the points (0,0), (1,1), (2,2) (5,5),(329,329),.... and a bunch of points with the same x,y coordinates the line would touch ALL training set data points 100% of the time as the equation of the line would be y = x. This isn't even called overfitting, y = x is actually the ONLY solution available. There's no way to prevent this if the data is just really clean.



The set of natural images is going to be smaller than 5.56×10^-309, but it's still going to be astronomically large. You're simply not going to get an accidental hit on a 512x512 image, even with training. The fact that in the paper they only managed to reproduce photos from Stable Diffusion that are likely common in the training set proves this and conclusively disproves your hypothesis. In addition, they observe that unique photos are extractable from a larger model (Imagen), making it obvious that this reproduction is due to memorisation from the larger capacity of the model and is definitely not due to an accident (otherwise the probability to reproduce would not correlate with model capacity).


> The fact that in the paper they only managed to reproduce photos from Stable Diffusion that are likely common in the training set proves this and conclusively disproves your hypothesis.

There's no hypothesis. My conclusion is not scientific. My conclusion is derived from logic. It cannot be disproved.

>The set of natural images is going to be smaller than 5.56×10^-309, but it's still going to be astronomically large. You're simply not going to get an accidental hit on a 512x512 image, even with training

It might be low probability to get an exact pixel perfect match. But a similar match that is more or less indistinguishable (or even a different but obvious reproduction) to the original from the standpoint of the human eye is a significantly larger probability.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: