Do you see #1 and #3 conflicting at all? Ex: you produce a model, run it, publis...

Ajedi32 · on Sept 1, 2023

I see that more as a conflict between #1 and #2, but fair point. In extreme cases, you could probably make a crude copy of a model by training a new model solely on the outputs of the first one. Normally that would be a derivative work, but that's inconsistent with the idea that training on copyrighted works is always permissible.

Maybe one way to resolve this would be to say there ought to be some practical limits on what percentage of the training data can be from any one individual source. If I train an model solely on the text of one book, for example, such that it's so overfitted that it can do nothing but regurgitate passages from that book, it's probably fair to call that a derivative work. The same would apply to a model trained solely on output from another model. (Though if it merely incorporates a few examples from a bunch of different models that would be okay.)