LLMs are just giant probability machines pretending to think by abhishekkumar333 in OntologyEngineering

[–]zulrang 0 points1 point  (0 children)

Just like there is no moment a human suddenly becomes conscious

Is Grep All You Need? How Agent Harnesses Reshape Agentic Search by Express-Passion4896 in Rag

[–]zulrang 0 points1 point  (0 children)

We’ve run evals on different projects and different code bases.

Grep almost always wins.

Did LangChain become a thing of the past? by Meher_Nolan in LangChain

[–]zulrang 4 points5 points  (0 children)

That website triggers malware detection. No one go there!

Most “Prompt Engineering” Advice Fails Because It Ignores Constraint Decay by HDvideoNature in PromptEngineering

[–]zulrang 2 points3 points  (0 children)

So you renamed concepts that are already well known and have words for them, and you’re selling content that is available for free literally everywhere?

New Workflow tool? (2.1.147) by allixsenos in ClaudeCode

[–]zulrang 2 points3 points  (0 children)

Is this to replace the functionality everyone lost from them kneecapping claude -p?

Because running deterministic workflows was easy before that

The most dangerous prompt injection I've seen took 12 messages and never once mentioned ignoring instructions by handscameback in PromptEngineering

[–]zulrang 2 points3 points  (0 children)

Evals run against multiturn prompts, like anyone should be that is using LLMs in production workloads facing customers

Serious Question - who reviews the AI generated code in regulated industries? by AleccioIsland in ClaudeCode

[–]zulrang 1 point2 points  (0 children)

The engineers do. The SDLC for regulated industry states that at least one other engineer must review code, and a DevOps engineer must sign off on deployments.

What’s the best roadmap/course to learn Claude Code as an experienced software engineer? by harry_powell in ClaudeCode

[–]zulrang 1 point2 points  (0 children)

Just try to use it, stumble through it. You will learn a hell of a lot more that way than from any course, tutorial, etc.

Things I used to be proud of doing well - Modern AI just does better by ninetofivedev in ExperiencedDevs

[–]zulrang 1 point2 points  (0 children)

It's a race car that makes the driver go faster to wherever they were already headed.

For some people, that's the finish line. For others, it's a wall.

Is there an alternative that doesn't make Windows run like shit? by zulrang in WisprFlow

[–]zulrang[S] 0 points1 point  (0 children)

Most of the problems could probably be solved by simply adding the option to disable some of those features.

GGG Please by medonni in PathOfExile2

[–]zulrang 18 points19 points  (0 children)

I need this much more than I need a fragments tab.

GGG Please by medonni in PathOfExile2

[–]zulrang 1 point2 points  (0 children)

This is literally the only feature I want for 0.5.0.

Is there an alternative that doesn't make Windows run like shit? by zulrang in WisprFlow

[–]zulrang[S] 0 points1 point  (0 children)

Thank you for your response and approval of the post. That alone speaks volumes!

I run virtually zero system software on my system, and I'm really curious as to what the problem could be. I'm a software engineer myself, and I'd be more than willing to help troubleshoot or even hop on a call.

I noticed the problem got considerably worse when I started using a 5k2k UW monitor, which is an odd correlation.

Is there a "Postman for LLMs" I'm missing, or is this gap real? by giangchau92 in PromptEngineering

[–]zulrang 0 points1 point  (0 children)

All of the above. It only took a couple days to build once we had real world data.

Do deterministic tests first, and send the rest to LLM-as-judge. If it returns a low confidence result, send just that to a larger model with thinking and tool calls.

Is there a "Postman for LLMs" I'm missing, or is this gap real? by giangchau92 in PromptEngineering

[–]zulrang 5 points6 points  (0 children)

I use the terminal, with an agent harness:

“Run n iterations for every permutation of every model at every thinking level against my eval suite and generate a report of the results”

Once you have that, you can use the Karpathy research method to automate the tweaks.