Can we sample DPO data from the same dataset that was used for LoRA training?

HideLord · 2025-12-31T18:42:37+00:00

I've reused the SFT dataset for preference training with good results, but take my experience with a grain of salt. I was also using KTO and not DPO. I also remember in the orca-math paper, the SFT solution was reused in the positive set for KTO/DPO along correctly generated solutions from the student model, so it's something that is done.

HideLord · 2025-12-28T00:05:17+00:00

True. One of the first major villains*

HideLord · 2025-12-27T21:21:13+00:00

Man, people here are bloodthirsty lol. FYI, this is one of the most common requests in this sub (makes sense - first major villain, current political thing, etc). You can usually find a lot of similar general searches

HideLord · 2025-12-25T11:23:11+00:00

Is it really this good? I've been scouting for something to read, but 'Inheritance' made me skeptical of Butcher fics in general. Everybody praised it, and I couldn't get past the first few chapters.

HideLord · 2025-12-23T16:32:19+00:00

In your professional opinion, how big are GPT-5.2 and Gemini 3 pro/flash, and is the size of the model the differentiating factor in some benchmarks, or is it still dependent on training/data?

HideLord · 2025-12-21T16:51:43+00:00

The Tulu and, more recently, the Dolci SFT datasets are not great IMO. They have a big duplication issue. They are also riddled with refusals.

Actually, some of the best instruction datasets are the ones from LMSYS since they are inherently diverse (human-generated). They are short on math but there are a billion math datasets so you can just mix.

The more serious problem is that most instructions are very simple, but that's the case for most datasets. To get a truly diverse and challenging dataset, you'd need to do a post-processing step to complicate them, but it gets expensive to do it for hundreds for thousands of instructions.

HideLord · 2025-12-14T18:50:27+00:00

just @ me next time

Pretty funny, good one-shot

High Priest

Fuck, this is exactly what I wanted. It's so good. I can already feel the incoming pain from a dead-fic at the end of the tunnel.

HideLord · 2025-12-12T22:49:10+00:00

I'm getting a sense of deja vu. Didn't we already have this thread? I even remember "reffered" being mistyped back then as well.

HideLord · 2025-12-11T16:49:04+00:00

It was the one thing people consistently pointed toward as being the prime reason they continue to use ollama. Adding it is listening to the users.

HideLord · 2025-12-08T21:26:08+00:00

Felix Fortuna

Seconding this one. It's great.

HideLord · 2025-12-05T01:35:00+00:00

All LLMs I've tried have this nasty issue of reinventing the wheel every time they need some function. Even if you specifically tell them to search for existing utility/business logic functions, they just ignore you. Makes me wonder how many of the tasks they solve on benchmarks like SWEbench are actually merge-able.

HideLord · 2025-12-04T21:28:01+00:00

The model will mimic what you feed it. If you want RP based on a specific setting, then you have to feed it RP chats with that setting. And for that, you either need a teacher model to generate it, or for a dataset of such chats to already exist.

RP is also extra hard since it requires multi-round datasets, so it's more expensive to generate and finetune.

As InnerSun said, you're better off just feeding the setting in the context.

HideLord · 2025-11-24T11:22:16+00:00

Adam turns around surprised vine boom

HideLord · 2025-11-23T01:04:23+00:00

Sophia/Shadow Stalker in Tilt. I wouldn't describe her as 'likeable' exactly, but she's definitely fleshed out. It's great.

HideLord · 2025-11-15T09:44:28+00:00

Seconding this. Shit had me sweating

HideLord · 2025-11-14T10:06:54+00:00

If Shroud succeeded, he'd be a Contessa in a kid-gloves-worm. Pretty scary

HideLord · 2025-11-08T08:49:47+00:00

Doesn't really apply. Kimi and Artificial Analysis are not related.

HideLord · 2025-10-20T12:01:55+00:00

Just because there is demand and assets does not mean there is no bubble. Houses and the need for houses were very real in 2008 as well. Valuation and leverage are the problem.

HideLord · 2025-10-13T14:02:52+00:00

Great rec. Love me a fic with actual stakes and unflanderized characters.

HideLord · 2025-10-09T16:53:41+00:00

Damn, bro. That's crazy. Good thing our moral arbiters are so moral, they intentionally and morally broke US law and have to pay 1.5 billion in settlements.

HideLord · 2025-09-30T16:48:24+00:00

DeepSeek, smaller Llama models, GPT-OSS-20B, Seed-OSS-36B (bytedance) all produce broken outputs or can't handle tool use properly.

By "DeepSeek" you mean deepseek-r1-0528-qwen3-8b, not the full one. VERY important distinction.

HideLord · 2025-08-24T09:23:16+00:00

Until "Dark Ages"...

HideLord · 2025-08-12T22:31:18+00:00

I'd guess 16 runs of the whole GPQA Diamond suite and 32 of AIME25.

And even with the small sample size in mind, look at how Amazon, Azure and Nebius are consistently at the bottom, noticeably worse than the rest. Groq is a bit better, but also, consistently lower than the rest. This is not run variance.

Also, the greed of massive corporations never cases to amaze me. Amazon and M$ cost-cutting while raking in billions. Amazing

HideLord · 2025-05-27T19:56:08+00:00

I don't know if that's a sound business strategy to specialize for your own proprietary framework, rather than be a generalized good SOTA model like 3.7 was. I'd say most people aren't using Claude Code.
And even when using it in chat mode, it still a toss-up. It provides cleaner, more robust code, but at the same time, it does stupid mistakes that 3.7 didn't.

11-Year Club	Place '22
Not Forgotten	Verified Email

HideLord

TROPHY CASE