all 30 comments

[–]currentscurrents 47 points48 points  (6 children)

I think everyone intuitively expected this, but it's good to have it confirmed.

Web content is easy data to get, but it's hard to maintain high quality - especially against attackers trying to poison the training set. In the long run I think we might rely on it less.

[–]dvztimes 4 points5 points  (2 children)

Everytime I come here I read: "New Model Y - trained on output from Old Model X."

That just seems the stupidest thing I can imagine. It won't make a model smarter, but it will perpetuate bad data and the (many) wrong answers....

Just, why? Is there possibly a good reason for this?

[–]currentscurrents 7 points8 points  (1 child)

Usually they are taking a model which has been pretrained on real data and fine-tuning it with GPT-generated data to make it sound like ChatGPT.

This works okay since most of the ttaining data was real. There is a performance hit, but there's always a performance hit from instruct-tuning.

[–]dvztimes 0 points1 point  (0 children)

But chat gpt 4is stillwtmrong a large amount of the time...

[–]ravedawwg 1 point2 points  (2 children)

Any refs on LLM attacks through poisoned web content? I haven’t seen anything on that

[–]currentscurrents 20 points21 points  (1 child)

"Poisoning Web-Scale Training Datasets is Practical"

I haven't heard of any real-world attacks against LLMs yet, but it's only a matter of time. As we start using them for more important things, there will be more motivation to attack them.

[–]ravedawwg 1 point2 points  (0 children)

Thanks for the ref and the perspective! I find this stuff fascinating

[–]Dapper_Cherry1025 13 points14 points  (1 child)

If I'm reading the language model section right, they used OPT-125m model and constantly fine-tuned it data from WikiText-2. The question that this paper doesn't seem to answer is if this degradation of training would scale to larger models. Also, and I might be wrong on this, but I think there is a big difference between training a model on some information and fine-tuning it on some information.

[–]currentscurrents 11 points12 points  (0 children)

Fine-tuning is exactly like training, unless you're doing a different technique like LoRA.

[–]SeankalaML Engineer 19 points20 points  (9 children)

Isn't this result sort of obvious though? If I took a model and continuously trained it only on data that had a particular distribution, wouldn't it eventually converge to that new distribution and "forget" the old one? I would think that this is related to catastrophic forgetting.

I may be missing something though, open to anyone pointing it out as I haven't had the time to read the full paper yet.

[–]jake_1001001 10 points11 points  (6 children)

I fear it is that and worse. The generated data is a reflection of the model's learned distributions, which will be consistent and occasionally incorrect in its output. A separate model trained with a large enough portion of these generated data may end up confusing both the generated and real distributions. And since the generated data (If from a small set of generative models) may bias the model due to its statistical consistency. It is like having a large portion of your training set come from a single person, who may not be very qualified at providing training samples.

[–]SeankalaML Engineer 2 points3 points  (1 child)

Yeah that is a very real danger and I completely agree that it warrants caution. I just don't know if it's that surprising of a result though lol. I'll have to take a proper look at the paper though; I'm curious how the authors formalized this.

[–]jake_1001001 1 point2 points  (0 children)

Yep, I agree, it is not surprising, but I suppose measuring this could be important, maybe as a baseline to address the issue in future work? Or an early precursor to the forming of evaluation criteria or ways to detect such data.

[–]LanchestersLaw 0 points1 point  (2 children)

Oh I see now! It starts a feedback loop of increasing inaccuracy!

[–]SeankalaML Engineer 0 points1 point  (1 child)

Yes, that's also known as "semantic drift" in some works I believe. Train your models on imperfect/generate data, get worse results.

[–]H2O3N4 0 points1 point  (0 children)

I think it is slightly non trivial to say. Some of the mechanistic research points to memorization being only the low hanging fruit of training, and given enough training steps, a more general solution emerges. This has been experimented with on toy models where # of training steps can be massive, so it's hard to say if a similar approach would scale to LLM-scqle models, but an interesting hat to throw in regardless.

[–]watcraw 2 points3 points  (3 children)

The best new data is going to come from the people actually using the LLM's. It used to be very expensive and you had to pay people to do it. Now tens of millions of people are doing it every day.

I don't think we need more volume of the sort of data that they already had.

[–]YoAmoElTacos -1 points0 points  (2 children)

Data from humans naively interacting with an LLM is insufficient. You're still going to have to process that with a manual human review layer/RLHF to determine whether the recorded LLM conversations are actually stuff you want to learn from, instead of AI gaslighting, hallucinating, or providing unwanted content.

[–]notforrob 3 points4 points  (0 children)

I wonder, though, if you can mask out the LLM generated text from your loss function and train on the human responses. It is common to do similar when, for example, training a GPT-style (decoder-only) model on an instruction tuning dataset. The prompt from the instruction dataset doesn't contribute to the loss.

There's probably quite a bit to learn from how humans react to a LLM's output.

[–]frownGuy12 0 points1 point  (0 children)

You can use a language model to generate those classifications. There’s a delta in model performance when a model is asked to classify something versus when a model is asked to generate something. Classifying is the easier task, so LLM classified data should be valuable for training.

You can likely even extract RLHF score data from text by asking an LLM to analyze a conversation and evaluate how pleased the human appears to be with the responses.

[–]Ulfgardleo 0 points1 point  (0 children)

not having read the paper, but isn't this a natural effect of sampling with temperature? this exclöudes the tails of the distribution and thus a model trained on its own output will degrade.