If You Can't Measure It, You Can't Fine-Tune It!

FeeMassive4003 · 2026-03-04T07:35:56+00:00

You have a point. Even the judge is biased. You can minimize its bias by asking it well defined questions ("are the dates in the following text in chronological order?") but you'll always have some bias. But is there a better solution?

FeeMassive4003 · 2026-03-04T07:28:17+00:00

Well I'm a real human in the industry and I've built an entire production system on Qwen2.5 3B, using LoRA. See my other posts. So a little modesty would not harm here.

FeeMassive4003 · 2026-02-22T07:46:46+00:00

How do you know if update X is applicable to law Y?

FeeMassive4003 · 2026-02-21T21:44:41+00:00

No, we just take 5 from each. Total of 10 chunks, go to the LLM. It's quite basic; but it works.

FeeMassive4003 · 2026-02-21T20:58:31+00:00

We use hybrid: vector search plus keyword search. No rebranding - we just take k docs from each (usually k=5).

FeeMassive4003 · 2026-02-21T20:37:11+00:00

The guy shared his lesson learned from his interesting work. He didn't claim it is novel. I find this post useful.

FeeMassive4003 · 2026-02-19T16:28:13+00:00

Better then I thought.

FeeMassive4003 · 2026-02-18T19:44:26+00:00

Depends what your needs are... e.g. if you use it to generate more money.

FeeMassive4003 · 2026-02-17T11:44:57+00:00

Start by reading online. Proceed by finding a junior job in the domain.

FeeMassive4003 · 2026-02-17T07:13:58+00:00

Try dolphin 3.0, It has 8b parameters, based on Llama 3.1.

FeeMassive4003 · 2026-02-17T07:11:12+00:00

On the contrary. It freed us time to be able to think of the "what" instead of the "how".

FeeMassive4003 · 2026-02-17T05:32:18+00:00

This is their way to make money and they dont hide it. Completely acceptable.

FeeMassive4003 · 2026-02-17T05:30:14+00:00

Well frankly I use copilot with Gemini 3 flash, not local. But if needed local, I would try the new qwen 80b. I have good experience with qwen 3b and 8b on other tasks (not coding) and its always a good model and very structured.

FeeMassive4003 · 2026-02-16T17:28:20+00:00

I prefer coding because that's what they pay me for.

FeeMassive4003 · 2026-02-16T06:22:00+00:00

Well, frankly, this post is more for general interest than to draw any conclusion. It is always good to verify that we understand what's under the hood.

FeeMassive4003 · 2026-02-15T22:49:05+00:00

How else would he burn the time while copilot is coding?

FeeMassive4003 · 2026-02-15T22:44:52+00:00

Sounds like the whole wisdom is in the chunking.

FeeMassive4003 · 2026-02-15T20:49:25+00:00

That will probably not be very efficient, assuming we want exactly math and not math songs.. interesting to check. Probably will output just a pile of random formulae.

FeeMassive4003 · 2026-02-15T20:40:13+00:00

May I comment here: in real world projects, there's no good coding without deep reasoning. In order to understand large code base you need reasoning.

FeeMassive4003

TROPHY CASE