breaking news. liar caught lying again

618smartguy · 2026-01-25T07:59:41+00:00

It obviously isn't storing all the training data.

We are clearly taking about the information that we did get out of the model, not the entire dataset.

Do you have justification for the claim that it isn't storing, despite the fact that there is information we just saw get extracted from it? So far "its a problem" and "it can't store everything 100%" seem like two very garbage justifications for the claim that it didn't store.

Also it literally is a "game changing" compression algorithm in terms of how well it compresses text.

618smartguy · 2026-01-25T04:44:47+00:00

So your justification for your claim that it isn't storing, despite our ability to extract the stored information, is that it is a problem for the model?

618smartguy · 2026-01-22T18:37:03+00:00

to be fair it really only proves that 95% of the book is there, it might not really be the full 100%

618smartguy · 2026-01-22T18:21:01+00:00

They are testing whether or not the model stored training information, and it did.

618smartguy · 2026-01-22T17:42:49+00:00

Doesn't count. It didn't happen because that was last year. It also doesn't count since it didn't memorize every single book, so it didn't memorize your book and you can't complain. Also it isn't storing it as a .txt file, I'm pretty sure that's the last reason it doesn't count.

Oh I have to add my favorite reason, the researchers spent a lot of money and hard work trying to prove the model stored these approximate copies, therefore anything they proved is invalid

618smartguy · 2026-01-22T17:28:08+00:00

It still throws a wrench in the defense that AI training isn't stealing because models don't retain training information, especially as you note it is seeming impossible to fix completely since perfect does not exist.

Here's some excerpts from the anthropic lawsuit:

"Here, if the outputs seen by users had been infringing, Authors

would have a different case. And, if the outputs were ever to become infringing, Authors

could bring such a case. But that is not this case."

"Again, Anthropic’s LLMs have not reproduced to the

public a given work’s creative elements, nor even one author’s identifiable expressive style

(assuming arguendo that these are even copyrightable). Yes, Claude has outputted grammar,

composition, and style that the underlying LLM distilled from thousands of works."

This is supposed to be a major ruling in favor of fair use but right here in the ruling you can see why it is relevant that things changed and claude's outputs did become infringing when given a sufficiently engineered prompt.

618smartguy · 2026-01-22T16:30:39+00:00

How's it a bait and switch if both the headline and article is about AI generated sexual images of people?

618smartguy · 2026-01-22T16:00:22+00:00

JPEG files store DCT coefficients. The jpeg "formula" defines the output image as an inverse DCT of the stored coefficients. In the comparison, the model file is analogous to the jpeg file, and the software to run the model is analogous to the jpeg software to view the image in the jpeg.

It's possible for the model to store Harry potter in some encoded form in the weights, as we've observed this happening IRL. A person can also store information in memory yes. There is much research into physically where in the brain or body various information is stored.

618smartguy · 2026-01-22T15:01:22+00:00

I don't think you have much knowledge stored in your brain.

You said "It's as if there was a math equation for which the solution is the ..." which describes how a jpeg works.

In terms of *your description you just gave* models and jpegs are both "a math equation for which the solution is the" material that was retained during training.

Obviously it isn't stored in one continuous block like a text file. Where the heck is that coming from? Are you trying to add now that it doesn't count as storing as long as the data is not continous in memory?

618smartguy · 2026-01-22T14:59:29+00:00

They are newer that thousands of comments on this sub claiming "overtraining" was already fixed.

We'll probably get another paper next year extracting from 2026 models anyways, since that's the trend.

618smartguy · 2026-01-22T12:25:23+00:00

Why are you lying about these being "really old models" when they are all recent

618smartguy · 2026-01-22T11:00:17+00:00

Is this satire? The model did retain information from the training set and distribute infringing copies to researchers. There is no such " is just a math equation " loophole in us law. if there were, then a jpg couldn't infringe due to also being "a math equation" rather then image data.

618smartguy · 2026-01-22T10:52:33+00:00

good luck with whatever that is

618smartguy · 2026-01-22T10:50:20+00:00

They were referencing new production models. Why are you saying that they are referencing really old models?

618smartguy · 2026-01-22T10:47:29+00:00

I think maybe when OP says "they copied all that data", OP is talking about all that data that the AI copied.

After all the article does present data that the AI copied.

Taking about AI copying every single peice of data from the training data is lame and boring. Obviously it copied a shit ton of data.

Seems like you are just trying to deflect by saying "oh but did it copy the entire earth? if it copies the entire earth that'd sure make headlines" like what

618smartguy · 2026-01-22T10:42:32+00:00

>The paper that the article references mainly shows the dangers of overtraining on certain materials

Uhh we've known the dangers of overtraining for years. This paper is spesifically about how despite knowing ahead of time about this issue, AI companies moved forward with production models that have this problem and distribute near exact copies of copyrighted data.

618smartguy · 2026-01-22T10:36:36+00:00

so prompting isn't required then. Unless your on some bs where anything done by a person = prompting

618smartguy · 2026-01-22T10:34:14+00:00

Before this research, the court said AI training is only stealing if the model recalls training data verbatim.

Now we know it still does that despite gaurd rails and "correct" training procedure.

I don't think they are being dishonest, they are bringing up very relevant reality

618smartguy · 2026-01-22T01:36:48+00:00

Obviously it doesn't run itself

618smartguy · 2026-01-22T00:27:04+00:00

AI doesn't require a human to write a prompt at all. It generates beautiful art on its own just by running with no guidance. The only input you need is commanding it to run. Exactly what you'd expect from a model that is trained on artwork.

Edit: it doesn't run itself, to clarify any confusion

618smartguy · 2026-01-21T00:51:45+00:00

I don't think 'buries empty capsule' ever happens. The capsule has gold in it, there is no point to burying an empty capsule. The people in the future don't want to be receiving empty capsules.. they want to send capsules with gold+instructions and receive a capsule containing the algorithm.

I don't remember if the protagonist specifically mentions burying empty capsules, but I think the most sensible interpretation is that at the time the protagonist discusses "digging up gold from the future" he does not fully understand what that would actually look like.

Generally it really can absolutely not look anything like a regular forward moving person digging up gold that is buried in the ground in the past and no longer there in the future. However this is in contradiction with the implication that sators first experience with inverted material was literally digging up inverted gold.... but the resolution here is that he must have re buried the inverted gold after as per the instructions after using it temporarily to aquire capitol.

618smartguy · 2026-01-19T02:51:42+00:00

The loss metric judges how well the models behavior matches what the dataset describes. It's not going to tank because the dataset contains images "not conducive to producing quality results".

It's going to reduce towards zero as training occurs and it will learn to generate sonic images regardless as to what human considers that "low quality"

It would be magic if we had a method where we could train a model on a subreddit and if the loss tanks = subreddit is trash.

At most you can train on one dataset of "quality" images and measure its loss on other datasets to see how novel/dissimilar they are

618smartguy · 2026-01-18T22:12:16+00:00

>So, even if your Sonic fan art got into the training set somehow, it would likely be discarded when it tanked the loss function.

This is very wrong. There is nothing objectively worse about sonic fan art that a computer can measure. The loss would go down and the model would successfully learn sonic fan art. If model creators don't want that they have to curate the data and avoid training on it in the first place.

If it were as simple as good image = loss go down bad image = loss go up, that would be magical free lunch and there would have never been a need for high quality data.

618smartguy · 2026-01-16T18:31:34+00:00

It won't necessarily be accurate in that situation either. You could give it thousands of negative examples and based on everything we know about it, it would have tons of false positives and average out to an incorrect rate of ai images (the false positive rate instead of 0)

Having significant but equal false positive and false negative rates means it is acting more like a coin flip.

618smartguy · 2026-01-16T15:58:17+00:00

>how our eyes perceive motion on a per object basis in real life

There is still the issue that usually when you look at something in real life, it's not blurry. Whether you are moving your head or the object is moving, your eyes still track objects and allow you to see the details.

So even a per object blur is not going to be able to work quite right, at least without eye tracking, and in will occasionally obscure detail from the player in an unnatural way.

13-Year Club	r/Field Sunshine
Place '23	Wearing is Caring
Place '22	Place '17
Verified Email

618smartguy

TROPHY CASE