Outside the scope of the paper, but sure, it's a valid and interesting question....

kilgnad · on Feb 2, 2023

Paintbrush and paint functions like a NN model. While brush strokes operates as input into the "paint brush" function to output a painting.

Could you say that paint and paintbrushes encode every single painting on the face of the earth into it?

I don't think this meshes with our intuition of the word. The model requires input for the encoding to be complete. If we don't think about it this way then practically anything on the face of the earth "memorizes" everything else. A pencil encodes every sketch that has ever been drawn and will be drawn.

This is a semantic issue. What do we mean by the word "memorize" or "encoding"? Eventually the gap between two isomorphisms becomes so big that the words memorize and encoding no longer apply. NN's are right at the border of this demarcation, but if we want to be consistent then the answer should be that NN's do not encode this information.

babel_ · on Feb 4, 2023

I believe you're mixing your metaphors, as it were. A paintbrush is not like a neural network, nor is it a function. You may describe a paintbrush in a model, and that may be as a mathematical function, but a paintbrush it not itself a function, it is a paintbrush, a real object that's part of the physical world.

On the other hand, a neural network is a function, because that is what it is constructed as, and what it is intended to perform as, and indeed what it "does". It passes the duck test for a function, and then some. A function has a domain and codomain/image, two sets it draws its inputs from and then produces output accordingly, which for the purpose of mathematical definition, are an intrinsic part of the function.

Therein, it may contain and encode exactly every single painting not merely on the Earth but in all possible existence. The rub is that you still need to construct examples by providing input from that domain to have an output in the real world --- whatever your belief in the Platonic ideal, it is agreeable that the infinite or absurdly large finite sets of input and output that are described are not fully reachable/depicatable in the real world until physically performed, which requires the cause and effect of first providing input into a machine performing that function (let alone the literal encoding of a dataset into parameters as per my prior comments).

So, in a sense, with a model that is defined as such, we can indeed state that a paintbrush does encode every painting into it, albeit merely as possibilities that much be realised. Once picked up, it then requires cause and effect to move from one (the input motions) to the other (a resultant image), which has then indeed made one of those possibilities real. The issue with such a metaphor is that the implied model is begging the question, it circularly supposes that the paintbrush is responsible for this, such that it can serve as a function analogue for the metaphor, such that the paintbrush is responsible. Hence why I say you're mixing metaphors, because you've combined two incompatible metaphors (a neural network is a function, and a paintbrush is a "function" due to the cause-effect analogy to input-output) using shared language, that being the word function; which I believe I've shown has very different meanings here in both contexts.

The end result is that this "intuition" of the word/world is correct, but only partially, because it has not yet recognised how these two uses are divergent as per above. Indeed, such a model of a paintbrush would imply everything is also functions that ties together their input and output as possibilities which are retained, with any inversion providing a way to reconstruct inputs from outputs, such as figuring out how specific brushstrokes may have been performed. The critical thing is, again, to not mix the metaphors of the real world and the mathematical world, because that mathematical model can only describe possibilities for our real world. After all, it does not do us much good to state that a pencil contains every sketches as possibilities, because until we make good on it, it's essentially just a truism from the model that hardly makes it any easier to produce such a sketch - the same can be said for every drop of water, and so forth. Therefore, this is not a semantic issue, but a conflation of Platonic idealism with reality such that much is rendered meaningless, when reality has not provided incontravertible evidence for this idealism.

For example, the paintbrush might not have, as possibilities, every single painting. One approach would be to say it is only going to be used for one, and so a more appropriate model would be needed such that "no man stands in the same river twice" and so forth, and so this function metaphor would be demonstrably inappropriate. In contrast, the neural network (or a diffusion model) is still a function in a world that does this, because its parameters have been frozen and its random seed coupled to its input, with no recurrence from any prior usage as part of its operation. Therefore, its model as a function is exact and clear.

This means that "memorise" and "encode" can have very precise definitions for neural networks or diffusion models (as in the paper, memorised images being those reconstructable by given methods to within a known measure by some metric of accuracy), whereas with a paintbrush these do not have such precise definitions (indeed, beyond being reliant upon certain world models that may or may not be incompatible, they at the very least do not have only one obvious interpretation of meaning, unlike a neural network as a function).

As to the final points, it would be more appropriate invoke the notion of transformative derivation and ask whether neural networks or diffusion models are "de facto" justifiable as "always producing derivatives" due to size. I would say your statement can be reconsidered in light of this, the above, and my prior comments, given that they empirically do not always produce derivatives (we can reconstruct input data from prompts alone), and so the determination of whether there is sufficient transformation to be a derivitive work would have to be handled on a case by case basis for each image produced, as with all things - you would not say that an artist be given the carte blanche right to declare all their works wholly original or transformatively derivative when they've clearly just produced a copy of the Mona Lisa.

Additionally, if we were to "be consistent", it would not follow that this implies there is no encoding of such, only that they do not encode images they cannot produce. There is indeed exactly an encoding of possible (and memorised) images, because this is consistent with the mathematical model of a function that the neural network represents. We can state that neural networks do not directly encode all this information in their real world instance, as they do not, their parameters are only directly correlated to the dataset that is a subset of its input domain. However, they do still indirectly encode a much larger dataset than is explicitly given, due to the fact that humans do not produce art completely randomly, and like the paintbrush, we have created art (now used in a dataset) that is not simply a function but a real continuous object that has recurrence and relation to prior and simultaneous art and thought, and this is what gives rise to all their possible outputs that are not exact copies of their inputs (and by combination they should outnumber those memorised works as well). This is as per my previous comments, however to state explicitly in combination with the above, this implies that neural networks do exactly (if indirectly) encode every possible image they may produce (and not every image that may be possible) as the real world instance/approximation of the function it is depicting (which would encode every possible image, not merely those realisable, supposing its codomain/image was an uncomputable set), of which some are memorised by more direct encoding and others are inferred reconstruction. Either way, these images are producable by the model, therefore the model does encode it, either from the perspective of it as a function, or it as a realised machine performing that function with fitted parameters to enable it (within some bounds and margin of error). So, I would say that you may wish to reconsider the statement that a desire to be consistent implies this information is not encoded, when consistency actually implies the exact opposite - I believe that statement to likely be borne from the same conflation as above, so perhaps this paragraph is a clearly redundant summary by the time you finish reading it.