Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That’s all true, you’re right, but what does it mean we should do? Shouldn’t the models actually guarantee they can’t reproduce inputs, given that in the U.S. all images are copyrighted by default? The only images that are legal to copy for training, and legal to distribute in the form of a neural network that can spit them out are the images explicitly licensed, for example, by Creative Commons.

The Imagen outlier results validate the accidental memorization of images, even if it’s a small number. And it might be premature to conclude that one paper’s inability to find memorized outliers in SD means that it doesn’t happen. It might be true, but 10k images is less than one ten-thousandth of the training data, and it’s certainly possible that more successful attack methodologies could exist. This represents a single attempt performed under many assumptions and run on a tiny fraction of the inputs, and nothing more.

Even if Stable Diffusion doesn’t memorize outliers, or any non-duplicate images, does that matter? SD will be out of date pretty soon and replaced by another network. If they didn’t take care to prevent memorization, if SD’s memorization behavior (or lack thereof) is accidental, then how do we know it won’t happen more often in the next network? Isn’t this a problem that needs to be explicitly addressed, and not just claim it’s uncommon?



Humans don't even guarantee that. Composers not-uncommonly accidentally reproduce a melody they've heard somewhere. The only thing that can be done is diligence.


So? When humans do this they are subject to copyright laws, and they can and have been sued for accidentally reproducing a melody.

That’s also not very relevant here. We’re talking about making duplicating machines that effectively memorize pixels, not the same kind of “accident” you’re referring to.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: