Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Nope. Totally different case based on a totally different set of circumstances and outcomes.

This is a great example of coming to a conclusion about copyright based on how you think the system should work vs how it actually works.

Google was able to convince the court their actions constituted fair use.

My guess is training a generative AI will also be fair use. The question is, what about the output of the resulting model? And that is a question it'll take a court to answer.



> My guess is training a generative AI will also be fair use. The question is, what about the output of the resulting model? And that is a question it'll take a court to answer.

I suspect that you are right, and that it will place legal responsibility on the individual operating the software for any infringement.

I think this will force the companies building these models to behave as if training the model was also infringement, because I cannot imagine a scenario where the average end-user has enough understanding/awareness of the implications of their prompts to avoid generating infringing work, and end-users getting sued would create an instant chilling effect on the use of such software.


I strongly suspect you're right about that.

My bet is the court will determine that whether the output of a model is or isn't subject to copyright isn't a black-and-white answer, but rather depends on the work.

Fundamentally, the test as to whether a work represents a copyright violation is about "substantial similarity" (https://en.wikipedia.org/wiki/Substantial_similarity):

> To win a claim of copyright infringement in civil or criminal court, a plaintiff must show he or she owns a valid copyright, the defendant actually copied the work, and the level of copying amounts to misappropriation. Under the doctrine of substantial similarity, a work can be found to infringe copyright even if the wording of text has been changed or visual or audible elements are altered

Okay, so let's say I take a thousand copyright images, averaged their pixels, and produced a single uniform grey output. No jury is going to conclude that work has "substantial similarity" with any of the original works, and I'm clear.

But now suppose I do the same, but weight it so 99% of the pixel colour comes from one image, and the remaining 1% comes from the rest.

Well, in that case, odds are very good a jury would find me guilty of violating the copyright of that original work that represents 99% of the image.

So my bet is the courts will conclude that the model, itself, doesn't in any way violate copyright, nor did the training itself run afoul of the law, but that any given output might, depending on the substantial similarity test.

And that means every single work is suspect and a potential target for litigation.


The thing is most models for Stable Diffusion aren't created by companies but rather by end-users. There are literally hundreds of models for stable diffusion that you can download, from landscapes, to animals, to (of course) porn. A few of them were created by Stability or Huggingface, but most are trained by end-users. It isn't hard at all to train a model with existing tools -- you don't have to be an AI expert to do it.


I believe those user-created models are still based on the core Stable Diffusion models though, and bring with them all of the same issues.

My understanding is that it's not difficult to tune existing models as an end-user, but to start from zero would be impossible for most individuals financially and technically.


It's not clear what you mean by "based on". For example, the model Anything is trained on the Danboru anime image site. These images aren't in the standard Stable Diffusion model. The issue with that model is with the legality of including those images which the standard Stable Diffusion model does not.


But isn't that model still mixed with the core SD model? I was under the impression that all of these specialized models are created by training against a particular image type/dataset, and then mixing the result with the core SD model.

This is how those specialized models can still generate just about anything. Without the core model mixed in, the specialized model would be nearly useless.


Train one on Disney cartoons, share it on the internet, and see what happens!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: