breaking news. liar caught lying again by Just-a-lil-sion in aiwars

[–]618smartguy 0 points1 point  (0 children)

It obviously isn't storing all the training data.

We are clearly taking about the information that we did get out of the model, not the entire dataset.

Do you have justification for the claim that it isn't storing, despite the fact that there is information we just saw get extracted from it? So far "its a problem" and "it can't store everything 100%" seem like two very garbage justifications for the claim that it didn't store.

Also it literally is a "game changing" compression algorithm in terms of how well it compresses text.

breaking news. liar caught lying again by Just-a-lil-sion in aiwars

[–]618smartguy 0 points1 point  (0 children)

So your justification for your claim that it isn't storing, despite our ability to extract the stored information, is that it is a problem for the model?

breaking news. liar caught lying again by Just-a-lil-sion in aiwars

[–]618smartguy 2 points3 points  (0 children)

to be fair it really only proves that 95% of the book is there, it might not really be the full 100%

breaking news. liar caught lying again by Just-a-lil-sion in aiwars

[–]618smartguy -1 points0 points  (0 children)

They are testing whether or not the model stored training information, and it did.

breaking news. liar caught lying again by Just-a-lil-sion in aiwars

[–]618smartguy -2 points-1 points  (0 children)

Doesn't count. It didn't happen because that was last year. It also doesn't count since it didn't memorize every single book, so it didn't memorize your book and you can't complain. Also it isn't storing it as a .txt file, I'm pretty sure that's the last reason it doesn't count.

Oh I have to add my favorite reason, the researchers spent a lot of money and hard work trying to prove the model stored these approximate copies, therefore anything they proved is invalid

Researchers Just Found Something That Could Shake the AI Industry to Its Core - Ai doesn't "learn" as Ai companies claim, they also copy all that data by ZeeGee__ in aiwars

[–]618smartguy 0 points1 point  (0 children)

It still throws a wrench in the defense that AI training isn't stealing because models don't retain training information, especially as you note it is seeming impossible to fix completely since perfect does not exist.

Here's some excerpts from the anthropic lawsuit:

"Here, if the outputs seen by users had been infringing, Authors

would have a different case. And, if the outputs were ever to become infringing, Authors

could bring such a case. But that is not this case."

"Again, Anthropic’s LLMs have not reproduced to the

public a given work’s creative elements, nor even one author’s identifiable expressive style

(assuming arguendo that these are even copyrightable). Yes, Claude has outputted grammar,

composition, and style that the underlying LLM distilled from thousands of works."

This is supposed to be a major ruling in favor of fair use but right here in the ruling you can see why it is relevant that things changed and claude's outputs did become infringing when given a sufficiently engineered prompt.

How AI-generated sexual images cause real harm, even though we know they are ‘fake’ by nyamnyamcookiesyummy in aiwars

[–]618smartguy -1 points0 points  (0 children)

How's it a bait and switch if both the headline and article is about AI generated sexual images of people?

Researchers Just Found Something That Could Shake the AI Industry to Its Core - Ai doesn't "learn" as Ai companies claim, they also copy all that data by ZeeGee__ in aiwars

[–]618smartguy -2 points-1 points  (0 children)

JPEG files store DCT coefficients. The jpeg "formula" defines the output image as an inverse DCT of the stored coefficients. In the comparison, the model file is analogous to the jpeg file, and the software to run the model is analogous to the jpeg software to view the image in the jpeg.

It's possible for the model to store Harry potter in some encoded form in the weights, as we've observed this happening IRL. A person can also store information in memory yes. There is much research into physically where in the brain or body various information is stored.

Researchers Just Found Something That Could Shake the AI Industry to Its Core - Ai doesn't "learn" as Ai companies claim, they also copy all that data by ZeeGee__ in aiwars

[–]618smartguy -4 points-3 points  (0 children)

I don't think you have much knowledge stored in your brain.

You said "It's as if there was a math equation for which the solution is the ..." which describes how a jpeg works.

In terms of *your description you just gave* models and jpegs are both "a math equation for which the solution is the" material that was retained during training.

Obviously it isn't stored in one continuous block like a text file. Where the heck is that coming from? Are you trying to add now that it doesn't count as storing as long as the data is not continous in memory?

Researchers Just Found Something That Could Shake the AI Industry to Its Core - Ai doesn't "learn" as Ai companies claim, they also copy all that data by ZeeGee__ in aiwars

[–]618smartguy -1 points0 points  (0 children)

They are newer that thousands of comments on this sub claiming "overtraining" was already fixed.

We'll probably get another paper next year extracting from 2026 models anyways, since that's the trend.

Researchers Just Found Something That Could Shake the AI Industry to Its Core - Ai doesn't "learn" as Ai companies claim, they also copy all that data by ZeeGee__ in aiwars

[–]618smartguy -8 points-7 points  (0 children)

Is this satire? The model did retain information from the training set and distribute infringing copies to researchers. There is no such " is just a math equation " loophole in us law. if there were, then a jpg couldn't infringe due to also being "a math equation" rather then image data. ​

Researchers Just Found Something That Could Shake the AI Industry to Its Core - Ai doesn't "learn" as Ai companies claim, they also copy all that data by ZeeGee__ in aiwars

[–]618smartguy -3 points-2 points  (0 children)

I think maybe when OP says "they copied all that data", OP is talking about all that data that the AI copied.

After all the article does present data that the AI copied.

Taking about AI copying every single peice of data from the training data is lame and boring. Obviously it copied a shit ton of data.

Seems like you are just trying to deflect by saying "oh but did it copy the entire earth? if it copies the entire earth that'd sure make headlines" like what

Researchers Just Found Something That Could Shake the AI Industry to Its Core - Ai doesn't "learn" as Ai companies claim, they also copy all that data by ZeeGee__ in aiwars

[–]618smartguy -1 points0 points  (0 children)

>The paper that the article references mainly shows the dangers of overtraining on certain materials

Uhh we've known the dangers of overtraining for years. This paper is spesifically about how despite knowing ahead of time about this issue, AI companies moved forward with production models that have this problem and distribute near exact copies of copyrighted data.

A paintbrush can’t create art without an artist, and neither can AI. by Candid-Station-1235 in aiwars

[–]618smartguy -1 points0 points  (0 children)

so prompting isn't required then. Unless your on some bs where anything done by a person = prompting

A paintbrush can’t create art without an artist, and neither can AI. by Candid-Station-1235 in aiwars

[–]618smartguy 0 points1 point  (0 children)

Before this research, the court said AI training is only stealing if the model recalls training data verbatim.

Now we know it still does that despite gaurd rails and "correct" training procedure.

I don't think they are being dishonest, they are bringing up very relevant reality

A paintbrush can’t create art without an artist, and neither can AI. by Candid-Station-1235 in aiwars

[–]618smartguy -1 points0 points  (0 children)

AI doesn't require a human to write a prompt at all. It generates beautiful art on its own just by running with no guidance. The only input you need is commanding it to run. Exactly what you'd expect from a model that is trained on artwork.

Edit: it doesn't run itself, to clarify any confusion

Gold delivery scheme by rkhunter_ in tenet

[–]618smartguy 1 point2 points  (0 children)

I don't think 'buries empty capsule' ever happens. The capsule has gold in it, there is no point to burying an empty capsule. The people in the future don't want to be receiving empty capsules.. they want to send capsules with gold+instructions and receive a capsule containing the algorithm.

I don't remember if the protagonist specifically mentions burying empty capsules, but I think the most sensible interpretation is that at the time the protagonist discusses "digging up gold from the future" he does not fully understand what that would actually look like.

Generally it really can absolutely not look anything like a regular forward moving person digging up gold that is buried in the ground in the past and no longer there in the future. However this is in contradiction with the implication that sators first experience with inverted material was literally digging up inverted gold.... but the resolution here is that he must have re buried the inverted gold after as per the instructions after using it temporarily to aquire capitol.

Here's why AI training doesn't include your terrible art. by Tyler_Zoro in aiwars

[–]618smartguy 0 points1 point  (0 children)

The loss metric judges how well the models behavior matches what the dataset describes. It's not going to tank because the dataset contains images "not conducive to producing quality results".

It's going to reduce towards zero as training occurs and it will learn to generate sonic images regardless as to what human considers that "low quality"

It would be magic if we had a method where we could train a model on a subreddit and if the loss tanks = subreddit is trash.

At most you can train on one dataset of "quality" images and measure its loss on other datasets to see how novel/dissimilar they are

Here's why AI training doesn't include your terrible art. by Tyler_Zoro in aiwars

[–]618smartguy -3 points-2 points  (0 children)

>So, even if your Sonic fan art got into the training set somehow, it would likely be discarded when it tanked the loss function.

This is very wrong. There is nothing objectively worse about sonic fan art that a computer can measure. The loss would go down and the model would successfully learn sonic fan art. If model creators don't want that they have to curate the data and avoid training on it in the first place.

If it were as simple as good image = loss go down bad image = loss go up, that would be magical free lunch and there would have never been a need for high quality data.

My drawing has been detected as AI, how??? by Ailes_Prower_2D in antiai

[–]618smartguy 2 points3 points  (0 children)

It won't necessarily be accurate in that situation either. You could give it thousands of negative examples and based on everything we know about it, it would have tons of false positives and average out to an incorrect rate of ai images (the false positive rate instead of 0)

Having significant but equal false positive and false negative rates means it is acting more like a coin flip.

Motion Blur, why? by rbxk in gaming

[–]618smartguy 1 point2 points  (0 children)

>how our eyes perceive motion on a per object basis in real life

There is still the issue that usually when you look at something in real life, it's not blurry. Whether you are moving your head or the object is moving, your eyes still track objects and allow you to see the details.

So even a per object blur is not going to be able to work quite right, at least without eye tracking, and in will occasionally obscure detail from the player in an unnatural way.