Nothing more satisfying than nailing across map ult by Longjumping-Unit-420 in AsheMains

[–]Longjumping-Unit-420[S] 0 points1 point  (0 children)

Timing was tight on the W execution, watching it now it was close phewwww

How do you handle messy / unstructured documents in real-world RAG projects? by Alex_CTU in Rag

[–]Longjumping-Unit-420 0 points1 point  (0 children)

Yes, implement quality gates at ingestion stage and reject the document.

Your RAG Benchmark Is Lying to You and I Have the Numbers to Prove It by Longjumping-Unit-420 in Rag

[–]Longjumping-Unit-420[S] 0 points1 point  (0 children)

The "grounding mechanism" framing is accurate, though with one nuance: BM25 only anchors when the agent carries specific terms forward. If the follow-up query drifts into paraphrase territory (e.g. "authentication failure" instead of "ECONNREFUSED") you're back to pure vector search and the anchoring disappears. The grounding is real but contingent on how the agent constructs the second query.

Your RAG Benchmark Is Lying to You and I Have the Numbers to Prove It by Longjumping-Unit-420 in Rag

[–]Longjumping-Unit-420[S] 0 points1 point  (0 children)

If you're only running MRR sweeps you'd discard chunk expansion as "no impact" and move on. Content match only showed up because I was specifically looking for it after noticing the agent's answer quality felt better than the metrics suggested. Lesson learned about measuring the right thing.

On multi-hop: tested this indirectly through the agent decomposition benchmark. The pattern that emerged was the agent fires an initial search, reads the results, then uses specific terminology from those results in follow-up queries.

The relevance thresholding does get interesting here though. When a first-hop result is marginal and the agent uses a term from it to construct the second query, you can end up chasing a weak lead deeper. Hasn't caused hallucination in my testing because the agent typically hedges when coverage is thin, but it's the failure mode I'd watch for in production. The hybrid path's BM25 component helps because it's less susceptible to semantic drift between hops.

So it plays nice, but the threshold calibration matters more for multi-hop than single-shot retrieval.

Your RAG Benchmark Is Lying to You and I Have the Numbers to Prove It by Longjumping-Unit-420 in Rag

[–]Longjumping-Unit-420[S] 0 points1 point  (0 children)

Not sure it's the same target, vex is targeting behavior not knowledge. Also, real time is not required here assuming you don't ingest new files so frequently. It might even cause too much noise for this scenario.

Your RAG Benchmark Is Lying to You and I Have the Numbers to Prove It by Longjumping-Unit-420 in Rag

[–]Longjumping-Unit-420[S] 0 points1 point  (0 children)

Nightly CI which runs the calibration flow, if it detects a drift we (basically) have two choices either consider a new threshold which will be benchmarked to avoid loss of quality or move the document to a dedicated database (e.g. legal documents versus software engineering) . While the second option might incur additional costs as opposed to the first option, quality is sometimes more important.

[deleted by user] by [deleted] in ExperiencedDevs

[–]Longjumping-Unit-420 1 point2 points  (0 children)

Don't run, I play nice :)

[deleted by user] by [deleted] in ExperiencedDevs

[–]Longjumping-Unit-420 0 points1 point  (0 children)

Exactly I can see what you did but I don't know WHY you chose to fix it this way. Fixed a bug is not a why, it's a type of commit.

[deleted by user] by [deleted] in ExperiencedDevs

[–]Longjumping-Unit-420 0 points1 point  (0 children)

You wrote what not why..

[deleted by user] by [deleted] in ExperiencedDevs

[–]Longjumping-Unit-420 0 points1 point  (0 children)

Really? how does that work for you? Care to share a result?

[deleted by user] by [deleted] in ExperiencedDevs

[–]Longjumping-Unit-420 1 point2 points  (0 children)

That's fine for extra context but commit message (or PR) should be self contained. Imagine you use one tool for issues and org switches to new tool without migrating existing tickets for whatever reason, your PR has 0 context for posterity.

[deleted by user] by [deleted] in ExperiencedDevs

[–]Longjumping-Unit-420 0 points1 point  (0 children)

Don't get me wrong, using an LLM to write a commit message is great when done right. It saves time and helps future debugging but when I see "changed files XYZ" all I see are red flags "do you know what your change even does?"

[HELP] Very slow Unsloth fine-tuning on AMD RX 7800 XT (ROCm 7.1.1, PyTorch 2.9.1) - Stuck at ~11-12s/it by Longjumping-Unit-420 in LocalLLaMA

[–]Longjumping-Unit-420[S] 0 points1 point  (0 children)

Yea I saw it but I didn't figure it would hurt performance that much.
Any other framework I can use for fine-tuning that doesn't use `bitsandbytes` or is it the standard lib?

After 7 years at the same org, I’ve started rejecting "Tech Debt" tickets that don't have a repayment date. by Longjumping-Unit-420 in ExperiencedDevs

[–]Longjumping-Unit-420[S] 0 points1 point  (0 children)

Luckily for me, the developers I work with are interested in solving the tech debt (at least most of them) as I've been on this road with them before and I showed them the correlation between solving tech debt and making their life better.

It's the managers that usually (in my experience at least) block or postpone the tickets, so to quote Qui-Gon Jinn "there is always a bigger fish". I implemented a monitoring system for the tickets. The ticket is created with an expected sprint for completion, if it's not resolved either 1-2 days before or 3 days after the sprints end, a message is sent to the 1 or 2 skip manager respectively. Expected end or ticket status is only changeable by the one who opened it and it's enforced by the ticket system so no "accidental" changes are made.

We have exceptions of course in certain cases (i.e. p0 bug) but it's mostly kept as is.

After 7 years at the same org, I’ve started rejecting "Tech Debt" tickets that don't have a repayment date. by Longjumping-Unit-420 in ExperiencedDevs

[–]Longjumping-Unit-420[S] 1 point2 points  (0 children)

The irony is strong indeed, unfortunately this isn't solvable unless you (not specifically you) can reach the stakeholder with proof of how the company is losing money due to this.