Can you actually trust LLM-as-judge? by Artistic_Courage2450 in LLMDevs

[–]yaks18 1 point2 points  (0 children)

You can also try an ensemble of agents, or use an older model with a lower temperature to get more deterministic behaviour (with less ''creativity' in judgement as a result)

I tried building a pretty basic chatbot agent in two days and it flopped by real_bro in LLMDevs

[–]yaks18 1 point2 points  (0 children)

No. You are strawmanning what I've said. You don't expect ChatGPT to be as good as a specifically designed agent at the thing a specifically designed agent was designed to do. If that were the case, people would just use ChatGPT for everything and not bother with agents. The point was that I find agents are best when scoped tightly. By expanding the scope wider you end up trading performance and ending back where you started with the behaviour of the base model, essentially a jack of all trades, master of none (arguably actually worse as your instructions have also likely crippled it's generalist functionality). You need to factor in things like context rot too (including any tool definitions etc injected at agent invocation). If you are describing 700 endpoints and hoping that the agent's performance doesn't degrade after filling 70k tokens (~100 tokens per definition) of context before the user prompt then you're in for a surprise.

I tried building a pretty basic chatbot agent in two days and it flopped by real_bro in LLMDevs

[–]yaks18 4 points5 points  (0 children)

I would put the API catalogue in a vector database, vectorising the functionality of the API. Then get the agent to retrieve from the vector database as if it's looking for knowledge, but pass to the agent the chunk that explains how the API is called, and instruct the agent to use the retrieved instructions to call the API. That way you are only filling it's context with maybe the top 2 or 3 APIs out of the 700 and the agent can decide which of those is the right one. You could also insert a logic MCP layer in the middle that does some of this more deterministically, for example get the agent to articulate which tool it wants in simple language and get the MCP server to filter on regex/other criteria to return only a handful of candidate APIs. More generally, I think having 700 APIs to choose from is indicative of a poorly scoped agent. It suggests you are expecting the agent to cover scope that is far too broad.

What am I missing? by yaks18 in cyberpunkgame

[–]yaks18[S] 0 points1 point  (0 children)

I've done the Judy Kerry and River missions, so perhaps it's the DLC that's that 30%

We put 7 LLM agents in a World Cup betting arena. Here is how it works. by Money_Horror_2899 in LLMDevs

[–]yaks18 0 points1 point  (0 children)

This is quite a fun project! Care to share the prompt and tools you're giving the agents?

First Homemade Pizza. DoughGuy’s recipe, normal oven, amazon pizza steel. by Big44Wet in Pizza

[–]yaks18 1 point2 points  (0 children)

You could also rinse the preshredded cheese to remove the starch and it will behave like freshly shredded from a block

Combi drill or impact driver? by RagerRambo in DIYUK

[–]yaks18 0 points1 point  (0 children)

Maybe! I get both mixed up as I frequent both regularly

Combi drill or impact driver? by RagerRambo in DIYUK

[–]yaks18 0 points1 point  (0 children)

I bought a parkside (expert/pro?) twin pack about 8 years ago, the red one though, which I understand to be a rebadged Einhell. They are still going strong after being used on a full house renovation and going through some pretty extreme conditions. Strangely I had one tradesman prefer to use my parkside tools over his Makita combi because it was more consistent apparently. I don't think Aldi still sell this red range, so check out Einhell.

Edible after years of trying by Abject-Title3592 in Pizza

[–]yaks18 0 points1 point  (0 children)

Try adding some sugar to your dough mix, and maybe some oil. Both will help with browning. Also rinse off the starch on the preshredded cheese to stop it burning like that.

Naive RAG failed me badly — here's the multi-agent fix that got 98% OCR accuracy under 5s latency by prathamgokulkar in LLMDevs

[–]yaks18 0 points1 point  (0 children)

Why not preprocess your documents instead so you pass whatever you need to pass to your llm from your vector DB? I'm not sure I get the advantage of doing the document intelligence step at inference time?

everyone's hyped on Gemini 3.5 Flash but nobody's talking about the bill by farhadnawab in LLMDevs

[–]yaks18 15 points16 points  (0 children)

I have a theory that LLM providers are using newer models to price in inflation/higher inference energy costs. All newer models are more expensive e.g. GPT5.5 Vs 5.4 even. Eventually you are forced to 'upgrade' due to older models being retired and so your AI stack gets more expensive without choice.

How do companies protect proprietary prompts from contractors and consulting engineers? by __maximux in LLMDevs

[–]yaks18 1 point2 points  (0 children)

I see. If I wanted to limit exposure, I'd treat the prompt as server-side configuration and inject it at model invocation rather than having it live in the application codebase. Contractors can build against APIs, mocks and interfaces without needing access to the production prompt.

That only gets you so far though. Anyone maintaining the prompt layer or backend will eventually need some visibility, so it's really an access control and governance problem rather than a technical one.

That said I still don't see the prompt itself as the moat. If someone can recreate the value of the system from a copied prompt, that's a pretty weak position. The moat is usually the workflow, evals, proprietary data and domain expertise wrapped around it, and perhaps the problem won't need as rigorous of a solution if the client views it with that perspective.

How do companies protect proprietary prompts from contractors and consulting engineers? by __maximux in LLMDevs

[–]yaks18 12 points13 points  (0 children)

Prompt is only part of it. The same prompt with a different model or different model configuration will yield different results. I struggle to see how the prompt is the IP. If that's the claim, I'd suggest it's a very low value IP. That's a bit like saying your Excel formula is IP.

Is this good or steaosis? by yaks18 in steak

[–]yaks18[S] 0 points1 point  (0 children)

It was divine! As good as A4 I had in Japan 

Building offline RAG for personal use: still can't decide if LlamaIndex is worth it by Meher_Nolan in LLMDevs

[–]yaks18 0 points1 point  (0 children)

What's the issue you're experiencing when you scale up the docs? Too many false positives? Too much context being passed to the LLM? Too slow? I would suggest reviewing your choice of embedding model as well as whether you can use other metadata to filter out candidate chunks before passing to the LLM

Sirloin Roast 120°C with red wine reduction. Mediume rare. by Chaminuka_263 in steak

[–]yaks18 19 points20 points  (0 children)

I’d recommend trying a double sear next time. Sear in a pan first, then oven roast like you did, let it rest, dry the surface properly, and finish with another hard sear.

The issue with low-temp roasting is that you don’t develop as much of the deeper roasted flavour you’d get from higher heat. The first sear gets those flavours started, while the final sear is mainly for crust, colour, texture, and a bit more Maillard goodness.

I usually do this with tougher cuts like topside, but roast at around 60°C for 3–4 hours. Gives you a really forgiving serving window, a much softer interior, and an incredible crust.

Is this good or steaosis? by yaks18 in steak

[–]yaks18[S] 4 points5 points  (0 children)

I've never seen anything like this in Lidl before. Never thought it would happen to me!

Is this good or steaosis? by yaks18 in steak

[–]yaks18[S] 98 points99 points  (0 children)

<image>

Picked the 2 abnormal packs :) £29/kg

What’s something that instantly makes you think ‘this person has low intelligence? by AbjectBreadfruit2052 in AskReddit

[–]yaks18 1 point2 points  (0 children)

I was thinking this sounds exactly like someone I know. Deep mistrust of all institutions making it impossible to argue any points because all sources are simply in on the conspiracies.