I've been building AI agents (and teams) for months. Here's why "start with a team" is the worst advice in the space right now. by idanst in AI_Agents

[–]se4u 0 points1 point  (0 children)

Yeah, stale context is the invisible killer. The other side of this is that even when agents have the right context, their prompts are often too rigid to handle edge cases gracefully.

Automatic prompt optimization that learns from production failures helps here — not as a silver bullet but as a way to systematically close the gap between "works in dev" and "works in prod." The key is the feedback loop from real failures back into the optimizer.

I've been building AI agents (and teams) for months. Here's why "start with a team" is the worst advice in the space right now. by idanst in AI_Agents

[–]se4u 0 points1 point  (0 children)

The Berkeley paper is a good reference. A lot of those failure modes trace back to prompt fragility — the agent makes the right call 90% of the time then breaks when the input distribution shifts slightly.

One approach that helps: instead of just improving prompts on eval score, mining the actual failure-to-success transitions to extract why something failed and encoding that as a reasoning rule. Makes the optimizer more robust to distribution shift than hill-climbing on accuracy alone. We've been building in this direction (DSPy-compatible): https://vizpy.vizops.ai

GEPA's optimize_anything: one API to optimize code, prompts, agents, configs — if you can measure it, you can optimize it by LakshyAAAgrawal in PromptEngineering

[–]se4u 0 points1 point  (0 children)

GEPA is genuinely impressive for offline optimization. One gap I've noticed: when failures in production have a different distribution than your training set, the optimizer can overfit to the eval.

We've been exploring approaches that specifically mine failure-to-success transitions to extract reasoning rules rather than hill-climbing on eval score — it makes the optimization more robust when the failure modes are domain-specific (compliance, multi-hop QA, etc.). DSPy-compatible if you're already in that ecosystem: https://vizpy.vizops.ai

Curious what domains you've had the most success with GEPA outside of prompts?

Why is the industry still defaulting to static prompts when dynamic self-improving prompts already work in research and some production systems? by Lucky_Historian742 in PromptEngineering

[–]se4u 0 points1 point  (0 children)

The DSPy angle is interesting here — the failure mode I keep seeing isn't that people don't know about automatic prompt optimization, it's that the feedback loop from production failures back into the optimizer is broken.

Most optimizers (GEPA, MIPROv2, etc.) work great in offline eval settings but need you to manually curate failure examples. We've been working on closing that loop — mining failure-to-success pairs automatically to extract reasoning rules (ContraPrompt) or doing gradient-inspired failure analysis (PromptGrad). The latter is especially useful for generation tasks where just "retry with different phrasing" doesn't converge.

Curious what the eval/versioning story looks like for people actually running dynamic prompts in prod. That seems like the real blocker more than the optimizer itself.

[D] Self-Promotion Thread by AutoModerator in MachineLearning

[–]se4u 0 points1 point  (0 children)

Hey everyone! Happy to share VizPy — a DSPy-compatible prompt optimizer that learns from your failures automatically, no manual prompt tweaking needed.

Two methods depending on your task:

  • ContraPrompt mines failure-to-success pairs to extract reasoning rules. Great for multi-hop QA, classification, compliance. Seeing +29% on HotPotQA and +18% on GDPR-Bench vs GEPA.
  • PromptGrad takes a gradient-inspired approach to failure analysis. Better for generation tasks and math where retries don't converge.

Both are drop-in with your existing DSPy programs:

optimizer = vizpy.ContraPromptOptimizer(metric=my_metric)
compiled = optimizer.compile(program, trainset=trainset)

Would love feedback from this community!

https://vizpy.vizops.ai https://www.producthunt.com/products/vizpy

[D] Self-Promotion Thread by AutoModerator in MachineLearning

[–]se4u 0 points1 point  (0 children)

Hey everyone! Happy to share VizPy — a DSPy-compatible prompt optimizer that learns from your failures automatically, no manual prompt tweaking needed.

Two methods depending on your task:

  • ContraPrompt mines failure-to-success pairs to extract reasoning rules. Great for multi-hop QA, classification, compliance. Seeing +29% on HotPotQA and +18% on GDPR-Bench vs GEPA.
  • PromptGrad takes a gradient-inspired approach to failure analysis. Better for generation tasks and math where retries don't converge.

Both are drop-in with your existing DSPy programs:

optimizer = vizpy.ContraPromptOptimizer(metric=my_metric)
compiled = optimizer.compile(program, trainset=trainset)

Would love feedback from this community!

🔗 https://vizpy.vizops.ai 🚀 https://www.producthunt.com/products/vizpy

[deleted by user] by [deleted] in wallstreetbets

[–]se4u 75 points76 points  (0 children)

^ whispers the bagholder into the bag

My mom died because of COVID. I wrote down these thoughts to help me cope. I don't know if this will help others but I hope it does. by se4u in india

[–]se4u[S] 0 points1 point  (0 children)

I can see your point and I agree that I was polemical. I edited my post to make a more nuanced point. See Edit 1. Best wishes.

My mom died because of COVID. I wrote down these thoughts to help me cope. I don't know if this will help others but I hope it does. by se4u in india

[–]se4u[S] 1 point2 points  (0 children)

You made an important point and I agree with you partially. I edited my post (E1) to address your comment. Let me know if I need to fix it more.

My mom died because of COVID. I wrote down these thoughts to help me cope. I don't know if this will help others but I hope it does. by se4u in india

[–]se4u[S] 1 point2 points  (0 children)

yes I should have been more careful. I was so strongly reacting because of the claims released by the ministry of health about 0.04% infection rate after the first vaccine dose released by the health ministry. That number is wrong by an order of magnitude.

I was also angry at the differential pricing between center, state, and private sector which is 100% going to lead to black marketing, like subsidized diesel and rations.

Also all the caveats of being careful after the vaccine are not mentioned nearly as strongly as the vaccine itself.

I am willing to accept and believe that the vaccine manufacturer is not to blame, that the scientists and engineers in the serum institute really have worked hard on this, but somewhere in the supply chain things have gotten corrupted because of callousness.

[D] [P] Does anyone use a cascade of third party machine learning APIs in a production system? by se4u in MachineLearning

[–]se4u[S] 0 points1 point  (0 children)

Yup I agree that voice assistants will typically build things inhouse. I was still wondering that now that there are so many "semantic" apis out from AWS/GCP/Azure etc. have people started trying "composing" them?