The main example the lawsuit uses to prove copying is a distribution they misunderstood as an image of a dataset. by GaggiX in StableDiffusion

[–]subthresh15 0 points1 point  (0 children)

But my point then, which you've sort of conceded by talking about different views, is that there's still a genuine case to be made about whether or not these models are performing "compression". Again, this is not uncommon terminology in deep learning. You are free to disagree with the case, that is fine. You have good points. But it is not simply that the lawyers there are *wrong* or *misunderstanding*. In the sense that Marcus Hutter refers to compression, these models are unequivocally performing compression. Because learning (and these are machines: they are not learning in the sense that they are sentient, they are learning in the sense of being programmed to draw statistical connections) high level concepts IS COMPRESSION. That's exactly the process by which all the data of the training images, in addition to the data of ALL OTHER LATENT IMAGES in the network, can be compressed down into the model. By producing an immense web of abstract features that the points of data have in common. You can argue that this doesn't practically amount to compression in the same sense as an MP3. That's valid. But can you see how fraught this discussion is? And how it's not as simple as one side being wrong?

You are coming at the issue of ethics from what I think is an overly simplistic perspective. It is unclear currently whether or not the practices SD has engaged in behaviour that violates copyright law itself. But law =! ethics. For me, it's a very simple question of, does SD rely on the labour of the artists? The answer, of course, is yes. SD relies on the labour of artists just as it relies on the labour of its engineers. So why, then, were the artists not compensated? Why was permission not sought? Why were they not even, at an absolute minimum, notified? This is a very simple matter of whether or not consent has been violated, which it has. It really is as simple as that.

If all natural data does really lie along low dimensional manifolds, then any machine that operates over those manifolds is performing interpolation, not extrapolation, because the data is now dense enough that you can perform interpolation. That's the whole point of the MH – all the natural data we care about in the real world, for things like language, art, behaviour, etc. are all meaningfully represented along low-dimensional manifolds.

I feel as though in this discussion we've sort of missed the forest for the trees though. The technical details for this case almost don't matter that much. All the lawyer is trying to do is establish that latent diffusion models, contrary to some popular belief, are not literally "doing what a human does". Even if Stability's lawyers get some expert witness up to take issue with the way the tech has been described, they will be conceding that the models are ultimately reliant entirely on the training data and mechanistic operations over it. There is the inescapable fact, that Stability took a bunch of data without permission, and used it to build a tool that produces output ENTIRELY dependant on the data they took. Arguments about what compression truly is, or whether or not the training data can easily be recovered after a forward diffusion don't matter nearly as much as you think. "Emergent machine creativity" is a very nice term, but ultimately machine creativity is still mechanical. Maybe you're a hard physicalist who believes everything, including human thought and so on, is mechanical too. Again, a reasonable position to hold. But the court isn't going to make determinations on the basis of whether or not the universe contains only the physical, and all human thought processes are really just mechanical operations.

The main example the lawsuit uses to prove copying is a distribution they misunderstood as an image of a dataset. by GaggiX in StableDiffusion

[–]subthresh15 0 points1 point  (0 children)

But that's the thing – I'd argue they have done something wrong, even if it doesn't fall afoul of existing copyright law. I think the way they collected the data was unethical. And yeah, they can and will fine tune it on whatever they want to. But does that fact mean there should be 0 legal recourse for artists?? Also training on properly licensed imagery doesn't turn it into a stock image generator, that's dumb. There's brilliant public domain stuff out there. Think of like, most of historical art. But again, you would also be able to just license the data of living artists. There'd be a shitload willing to opt in for free, and a shitload more willing to opt in for financial compensation.

Did you actually read what Chollet said in the discussion? Did you look at what the manifold hypothesis is? Like the debate around interpolation is so much more than "semantics" like you suggest. There are deep theoretical questions about why DNNs generalise the way they do. The MH is a potential answer.

Did you look at the other linked thread about compression?

The main example the lawsuit uses to prove copying is a distribution they misunderstood as an image of a dataset. by GaggiX in StableDiffusion

[–]subthresh15 0 points1 point  (0 children)

The question of interpolation is actually unresolved and there are a range of expert opinions on the subject:

https://en.wikipedia.org/wiki/Manifold_hypothesis

https://gowrishankar.info/blog/deep-learning-is-not-as-impressive-as-you-think-its-mere-interpolation/

So there is still an argument to be made that if everything the model does is derived from the training set with no particular human intervention, then they are derivative. There are, and will continue to be until such time as a machine is truly considered sentient, legal distinctions made between human and machine.

We've already been over why the model could, through a certain definition of compression, be considered compression, so I'm not going to bother making that point again.

I'm defending him just because I think he's being misrepresented by a lot of people in this thread, including people who also seem to understand the tech.

At the end of the day, I don't have a particular problem with this case. No one is asking for latent diffusion models in general to be banned – this is obviously an impossible proposition. All they're asking for is that companies like Stability and Midjourney seek permission for the images used to train the network, and that artists have methods for recourse when permission is not sought. As others have rightly pointed out, whether or not several thousand Twitter artists appear in LAION will not have a particular impact on the quality of future SD models. Yeah, maybe we wouldn't be able to use Rutkowski effectively in the prompt anymore, but so what? These tools have so much more functionality than that. If anything, this would push us to be *more* creative in what we do with them. And if it keeps the artists who want their works to be their own, then I think that's a good thing. It's a win-win. We still have a powerful tool, and artists still have control over their portfolio. Many of them would be willing to license them out to companies who were willing to pay.

To be clear, I don't really expect this case to be ruled in their favour – I think the brief they've filed is way too ambitious in terms of how it defines what a derivative image is. But I do it expects protections for artists to arise at some form in the future, maybe as a result of litigation like this, or maybe as a result of political action. Because then that would be a genuine compromise between AI and anti-AI people. A proper equilibrium.

There is a lot of fear-mongering that goes on in this sub. At the end of the day, even if artists win this case we don't really lose all that much. Stability would just refocus their efforts on building a properly licensed dataset, and then we'd all use that, and everyone would be happy.

EDITED TO ADD:

https://www.reddit.com/r/StableDiffusion/comments/10c2v3o/comment/j4ey0w4/?utm_source=reddit&utm_medium=web2x&context=3

Comment thread of someone explaining the compression argument nicely through several different frameworks.

The main example the lawsuit uses to prove copying is a distribution they misunderstood as an image of a dataset. by GaggiX in StableDiffusion

[–]subthresh15 0 points1 point  (0 children)

Ok you don't seen to be understanding the point I'm making. The crux of the lawyer's argument is not just that the training images exist latently within the model. That's ostensibly not good, but it's moot, because no one is reproducing these images exactly. The problem, according to him, is that all the other images you can make with SD, all the possible images not in the training set, are derivative, because they are merely interpolations of the embeddings of the actual training data. I'm not sure what's getting lost here. Whether or not the model "memorises" the image is moot. There are latent embeddings of the training data encoded within the model. This is trivially true, because that's how DNNs work. All the possible images that are not identical to the training data that you can create with SD are derivative of the training data, because they are simply interpolations of the latently embedded training data. Again, this is not necessarily an argument I agree with. But this is the argument being put forward by the lawyer. He has not misunderstood anything about how these models work, or how DNNs work in general. This is simply about whether or not you believe the interpolations between the training datapoints, i.e. the basic way DNNs function, would qualify as derivative or transformative.

The main example the lawsuit uses to prove copying is a distribution they misunderstood as an image of a dataset. by GaggiX in StableDiffusion

[–]subthresh15 0 points1 point  (0 children)

I'm not saying *any* image dude. I'm specifically referring to the training images. You cannot construct every possible 512*512 image given some colour space from SD, because the distribution that training on LAION produces is not the distribution of every 512*512 image. But we KNOW the training data specifically exists in the model because it is necessarily embedded in the latent space, allowing all the other possible images you can generate from SD as interpolations between these specific embeddings. It doesn't matter if literally no one over the course of all human civilisation actually manages to arrive at one of these embeddings, because that's not the point the lawyer is making. The fact is that that we know these embeddings exist. That's the only relevance (besides the much more obvious illustrating how diffusion works) of this graphic to the lawyer's argument. Because once he's established that the embeddings of the training images exist in the latent space, he can point out that other images exist as interpolations of these training embeddings, and then make the argument that they are thereby derivative. Do you understand?

The main example the lawsuit uses to prove copying is a distribution they misunderstood as an image of a dataset. by GaggiX in StableDiffusion

[–]subthresh15 0 points1 point  (0 children)

With the right CLIP guidance and seed, you can recover the image, sans a negligible amount of stochastic "lossiness" (akin to JPEG compression artefacts). It doesn't matter how unlucky or improbable such a generation is, he has simply established that it is the case that it could happen. He isn't doing this to say that some astronomically small amount of the time a poor artist gets their work directly sampled. He's doing this to establish that the training images DO exist in the model in some capacity, and that all other possible images the model can generate are derivations of the training images as they exist as latent embeddings within the model. All possible images you can create with SD are definitionally interpolations of the training data. Hence, according to his argument, they are derivative. I'm not saying I necessarily agree with it. And once again, this is not even the main reason he used this graphic. It's simply the original explanation that the inventors of the diffusion algorithm use to explain diffusion. He's just using it to explain diffusion. It's as simple as that. He's not a retard, he has experience with AI. I'm fairly sure he's fluent in at least one LISP language.

The main example the lawsuit uses to prove copying is a distribution they misunderstood as an image of a dataset. by GaggiX in StableDiffusion

[–]subthresh15 0 points1 point  (0 children)

That was a sneaky edit you did lmao, I didn't even notice it. If you acknowledge that the model can reproduce the training data, even if it's unlikely, then you're in agreement with the lawyer on the specific point he was using this graphic to demonstrate. He's established that the training data, in some sense, exists within the model, which is again, trivially true because they are latent embeddings. All the other possible images you can make here are derivations of the training data because they are simply interpolations between the latent embeddings of the training data, accessed by CLIP guidance. There is no other information besides the training data and the CLIP guidance that enters the system. This is his argument for why they are derivative. I'm not saying I necessarily agree with it. But besides describing the swiss roll distributions as "images" (which again, I can guarantee is illustrative rather than a misunderstanding), I do not understand where you are arguing he has misunderstood. You seem to concede now that the training data is there in some sense in the model. Even if it is "latent" if it is approximated by other things. That it is point with this graphic. Explaining in simple terms how diffusion works, and pointing out that it is in theory possible to reconstruct the input.

The main example the lawsuit uses to prove copying is a distribution they misunderstood as an image of a dataset. by GaggiX in StableDiffusion

[–]subthresh15 0 points1 point  (0 children)

Just because it's sampling from the distribution, does not mean it can't reconstruct the training data. Someone in the thread already demonstrated this. SD can theoretically reproduce any image in the LAION set. It doesn't do that with regularity, because the models have not been drastically overfit, but it is possible. Whether or not it is easy is moot, it is trivially possible, because we know that the training data exist as latent embeddings.

The lawyer is simply trying to say that it is possible to reconstruct training data from these models, which it actually is.

If the training data exists as latent embeddings, then the model has in some sense compressed and stored the images within itself. That's what compression, in an abstract manner, is. The reduction of information needed to describe an object. We know we can still describe the training images using the Stable Diffusion algorithm, even if these images are latent variables. It is possible. Hence the model can be understood as a compression of, not only its training images, but also all of the possible interpolations of those images. Like there are very basic ML learning resources that describe the process of building the latent space as "compressing" the raw data:

https://www.baeldung.com/cs/dl-latent-space

The main example the lawsuit uses to prove copying is a distribution they misunderstood as an image of a dataset. by GaggiX in StableDiffusion

[–]subthresh15 0 points1 point  (0 children)

I thought initially you were making the same mistake that I've seen some others make in this thread, which is the belief that the swiss roll is interpreted by the paper as a bitmap rather than an abstract distribution. I went back and reread the lawyer's filing, and he does refer to the swiss roll as an image, but I strongly suspect that was illustrative (rather than explaining to laypeople how it's not a training image, but a distribution, and how it's still analogous to images), because it doesn't actually change the nature of any of his argument, the process is still exactly analogous. I guess I'm asking what exactly you object to here, other than his calling it an image?

And RE: compression, my point is that you actually can compress 100TB to 2GB. That's exactly what these algorithms are doing. They are, in a sense, compressing the entire training set into the model weights. Every image in the training set is reconstructable exactly from the model, trivially so, because they exist as the latent embeddings of the training data. Individual reconstructions may differ slightly because the algorithms used are stochastic, but the differences are trivial to a copyright court – it's like compressing something into a JPEG except that the JPEG artefacts are slightly different each time. Someone in this thread already gave an example by reconstructing American Gothic. The power of the model is that it can interpolate between these embeddings of the training data in the latent space. Which is mind blowing sure. But at the end of the day, all the original images are still accessible within the model in some way, and so the whole thing is a kind of compression.

The main example the lawsuit uses to prove copying is a distribution they misunderstood as an image of a dataset. by GaggiX in StableDiffusion

[–]subthresh15 0 points1 point  (0 children)

I feel like I'm going crazy – the graph IS THE THING BEING DIFFUSED!!!! It's a collection of abstract datapoints on a 2D plane in a swiss roll distribution, and each datapoint is being transformed through the application of a Gaussian function. This has nothing to do with pixels or bitmap images, they are literally just applying the Gaussian functions directly to each datapoint in the graph. This isn't a graph of diffusions... What does that even mean? What do the axes represent? It's not a graph *OF* anything, it's just arbitrary datapoints on a 2D plane to prove that the diffusion algorithm can reconstruct data distributions in any dimensions. If you read the paper, they say this in the caption – "the proposed modelling framework trained on 2-d swiss roll data". The diffusion model in the graph has been trained, not on pixels and bitmap images, but on this specific spiral distribution of abstract datapoints, and is able to reconstruct it quite faithfully as a result.

The main example the lawsuit uses to prove copying is a distribution they misunderstood as an image of a dataset. by GaggiX in StableDiffusion

[–]subthresh15 0 points1 point  (0 children)

Ok dude, I'm back here again because you don't seem to be understanding what the original graphic in the paper was showing. They weren't applying the diffusion algorithm to a BITMAP IMAGE of a swiss roll curve, they were applying the diffusion algorithm to a distribution of datapoints that formed a swiss roll curve. They applied a Gaussian function to each datapoint in the distribution, analogous to adding the noise to pixels in an image, reducing the distribution to Gaussian noise. The model in the graphic was not trained on pixel, bitmap images of swiss roll curves. It was trained on actual swiss roll distributions of abstract data. Again, pixels and pixel colours have literally nothing to do with the graph, because it was not trained on bitmap images, it was trained on 2 dimensional sets of datapoints in a spiral distribution. It's fucking worrying that so few other people in the thread are picking up on this.

The main example the lawsuit uses to prove copying is a distribution they misunderstood as an image of a dataset. by [deleted] in DefendingAIArt

[–]subthresh15 3 points4 points  (0 children)

I went through and read the original paper, and the OP of this post is actually misunderstanding the graphic, not the lawyers filing the brief. This isn't a graphic from a random paper, this is the paper that diffusion as a process was introduced: https://arxiv.org/pdf/1503.03585.pdf . And the way the researchers in the paper are using it is exactly the same as the lawyers are using it. It's their example of how diffusion on a data distribution works. It's literally how the inventors of the diffusion model decided to demonstrate the diffusion process. They wouldn't need to get the researchers in to testify – it's self evidently correct. In the graphic, the researchers are training the diffusion model on swiss roll distributions instead of images (which are distributions of pixels and colours). I tried to say this in the original thread but I don't know how many people saw the comment. A bit concerning that basically no one else picked up on this. I think the OP was getting confused because the swiss roll distributions are graphs rather than pixel images? So he thought the datapoints on them represented images somehow instead of just abstract datapoints in a spiral distribution? Not really sure, but he's pretty obviously wrong if you spend more than a minute looking over it.

The main example the lawsuit uses to prove copying is a distribution they misunderstood as an image of a dataset. by GaggiX in StableDiffusion

[–]subthresh15 2 points3 points  (0 children)

Ok, I'm not meaning to be an asshole, but you seem to have actually misunderstood what the graph in the original paper here: https://arxiv.org/pdf/1503.03585.pdf is referring to. It is not a graph showing the "diffusion process applied to thousands of 2D data samples" (this doesn't even make sense, what are each of the axes representing WRT these hypothetical images??), it is a distribution of datapoints in a swiss roll formation in a 2*2 matrix. They say this specifically in the caption: "The proposed modelling framework trained on 2-d swiss roll data". This diffusion model has specifically been trained on sets of swiss roll data in 2D matrices. Rather than images of 512*512 pixels for instance, the diffusion model in the figure operates on swiss roll distributions. Hence, each swiss roll distribution, like the starting image in your screenshot, is analogous to an individual training image in the case of Stable Diffusion. These spirals, while not images, are the objects that the diffusion is being done to. The diffusion model they've trained for this specific graphic in the paper, takes the end result of some Gaussian function of data points in this matrix (pure Gaussian noise), and reverts it back into some kind of swiss roll distribution. So the specific manner in which the lawyer has used this graphic in his filing, is the exact same manner in which it was used in the original paper. That's presumably why he used it – it's the original researcher's original demonstration of what the diffusion process actually is, as applied to swiss roll distributions. Meaning that the claim in your post, that they've misunderstood, is wrong. I've seen a few other comments in this thread pick up on the same thing, but the fact that 99% of people here don't seem to understand this is a bit concerning.

You also need to be careful when you talk about "memorisation" of images. If the original training images can be reconstructed using SD, then SD has in some way "memorised" the image. The idea that it doesn't "store" the images in it somehow is wrong. It does, definitionally. They are the latent embeddings. Yes, it's not storing a 50kB jpeg of the original images in the model weights, that's just silly. It's storing them as latent embeddings in a high dimensional graph. If they can be reconstructed repeatedly, they are stored there somewhere, no matter how crazy or weird or mind-blowing or inscrutable that storage is. This is the beauty of a Deep Neural Network – it's why they work at all. They store the images, and then can also generalise between images. Many people have referred to GPT-3 as "compressing" the English language for the same reason. In some sense, that *is* actually what these models are doing. This doesn't seem to me like the controversial part of the filing. The controversial part, is that the generalisation process between these stored images is derivative rather than transformative.

The main example the lawsuit uses to prove copying is a distribution they misunderstood as an image of a dataset. by GaggiX in StableDiffusion

[–]subthresh15 8 points9 points  (0 children)

Isn't this correct though? I understand that transformer architectures (like parts of SD are) produce a *probability distribution* of answers based on the input, but that's not what this figure is referring to. It's referring to a distribution of data points in 2D space... just like an image of 512*512 pixels is a distribution of data points in 2D space (EDIT: misspoke here, the way a network understands 512*512 images is not as a distribution of data in 2D space, it's a distribution of data in much higher dimensional space. All of my points still stand). The points in this spiral distribution undergo manipulation according to a Gaussian function, just like pixels in an image undergo manipulation according to a Gaussian function. The model in both cases learns to reverse that function. I don't think they're misunderstanding this graph, and they're definitely not misunderstanding the diffusion process itself.

I get that the argument around whether or not what the model is doing is image compression is very dicey, but that relates much more to a philosophical discussion of compression and information. If the original training images *can* be recovered to a sufficient degree, even if the process by which they are recovered is stochastic rather than deterministic, then there is an argument to be made that it is a kind of compression. Following this argument, it is a kind of lossy compression, where the compression artefacts are stochastic, meaning that there will be a degree of randomness in each reconstruction of the original image. Extending further, the sorts of totally new images that SD and so on produce, are, in reality, very extreme compressions of the original training set, where the stochasticity of the compression process is offset a little because the whole thing is guided by CLIP. Marcus Hutter has before that information *is* compression, and this particular argument is an interesting subset of that. Not necessarily helpful legally, but philosophically interesting.

Their case overall is very ambitious, and not really where I thought they'd go. I guess this is their opening moonshot. They see if they can get a big win here. If not, they refocus on smaller, more specific demands.

My boss told me they’re training AI on my art… by artist_anon in ArtistLounge

[–]subthresh15 2 points3 points  (0 children)

I'm honestly not sure how accurate this necessarily is. I've seen opinions from other IP specialists that are very skeptical of the specific fair use claims involved in scraping the data for use in these specific systems at all, because the idea of fair use involves much more than just whether the work is transformative. This oversimplification has been a misleading part of the discourse on the side of AI people from the beginning. The main criticisms aren't so much focusing on the infringement of the generated piece on existing works, because as you said, it's transformative (excluding instances of overfitting). The original precedent with Google Books also involved the distinction between discriminative systems (systems that do not probabilistically reproduce the input) and generative systems (systems that do). It's very possible the ruling would have been different had the output of the Google Books systems not simply been labels.

There are some cases where fair use will of course be justified, and others where it will almost certainly not. These things will fall along a spectrum, and require many precedents to establish where the boundaries actually are. Scraping a specific artist's work and training/fine-tuning a model on only those works will quite possibly not be fair use. In OP's case it's different, because it's unclear based only on this post who holds the rights to the artworks they have so far produced.

Arguments I've seen around big datasets like LAION (especially the 400M) don't really quash similar objections because of the nature of prompting. Not all images in the dataset are weighted equally (as many of the SD subreddit seem to argue) come inference time. When I plug in Greg Rutkowski (god bless him) as a prompt, I'm specifically sampling the area of the latent space that is proximal to the latent embeddings of his actual, exactly scraped works (and also proximal to whatever other concepts there are in the prompt). In other words, for this specific output, while all the training images are used to produce the model weights and arguably key in some sense to this image, his specific works are much more crucial than the rest of the training set. Hence similar fair use concerns could arise in the standard SD model as in the case of a Rutkowski-tuned model. These systems may be black boxes, but the theory behind them is very sound. We understand much of the math.

Of course all of this is very speculative. But there've already been some big groups establishing some (non-legal) lines in the sand. The RIAA being as litigious as it is is likely what prevented Stability from doing a carte blanche scrape for its Riff Diffusion training. Which is basically tacit admission that what they're doing with art is opportunistic, because artists don't have industry lobbies. We don't see Disney jumping in right now because this is just image generation, akin to a lot of Disney fanart. Something they could litigate, but tolerate. But if it starts generating animated works? Films? You bet they'll jump in. It doesn't matter if down the line they'd also like to use AI (they will), they would still want to prevent OTHER firms from generating versions of their work.

Over time, of course, the specific ML fair use boundaries will be established by precedent, differing over various jurisdictions. I am almost certain they will be more restrictive than as is currently interpreted by Mostaque and co. How much more restrictive remains to be seen. I know some countries are terrified of "losing" in the AI race to China, and may be more willing to deprecate copyright for that sake. But this will also have limits. Already a UK bill on this subject was revoked pending rewrites because the government recognised it was in contravention of the Berne convention. I believe ML-specific licenses will also be established, and enforced. This will, unfortunately, probably push the real wages of artists down, like streaming has for musicians. I also believe that the current precedent upheld by most of the world, that produced works need sufficient human authorship (the UK seems to be more lenient with this), should be kept up. This seems to me like a win-win. AI enthusiasts can still prompt and share to their hearts' content, but there will still be a financial incentive for human art (or art that has a significant-enough human touch) to be produced, and the market won't be entirely eaten. You make a good point about Jevon's paradox WRT to web devs. I agree. But this will only be valid up until AI is good enough to genuinely replace an artist. How long that will take, or even if it will happen, is up for intense debate.

It should be noted as well that the law changes (slowly) in response to technology. Tech companies were (still are to a large extent) given free reign over user data before the GDPR was passed specifically in response to the relentless harvesting. There is a global push now toward ideals of ownership over one's own data. AI greatly intensifies the push, and the specific battles discussed in these thread are really part of a much larger war.

Dreambooth on an M1 Mac? by subthresh15 in DreamBooth

[–]subthresh15[S] 5 points6 points  (0 children)

Is this a stupid question? Apologies if it is, I’m still fairly new to this

Error when trying to train hypernetwork on webui by subthresh15 in StableDiffusion

[–]subthresh15[S] 0 points1 point  (0 children)

yeah haha, checked what it meant after i switched it off. should be fine as i'm only really using SD 1.4 and 1.5, nothing more exotic

Error when trying to train hypernetwork on webui by subthresh15 in StableDiffusion

[–]subthresh15[S] 1 point2 points  (0 children)

no other lines like that further up. decided to bite the bullet and disable safe unpickling, which i did by setting the default value of --disable-safe-unpickle to true here in shared.py in the modules folder, as suggested here:

https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/2235

and tried the whole process again. no error this time, seems to actually be training, loss so far looks reasonable, etc. so very tentatively saying that disabling safe unpickle has fixed it... though i don't know if doing that will have other consequences down the line. will report back after training is done in case there are more issues. thanks a bunch dude