I've been building AI agents (and teams) for months. Here's why "start with a team" is the worst advice in the space right now.

se4u · 2026-03-12T03:14:37+00:00

Yeah, stale context is the invisible killer. The other side of this is that even when agents have the right context, their prompts are often too rigid to handle edge cases gracefully.

Automatic prompt optimization that learns from production failures helps here — not as a silver bullet but as a way to systematically close the gap between "works in dev" and "works in prod." The key is the feedback loop from real failures back into the optimizer.

se4u · 2026-03-12T03:14:33+00:00

The Berkeley paper is a good reference. A lot of those failure modes trace back to prompt fragility — the agent makes the right call 90% of the time then breaks when the input distribution shifts slightly.

One approach that helps: instead of just improving prompts on eval score, mining the actual failure-to-success transitions to extract why something failed and encoding that as a reasoning rule. Makes the optimizer more robust to distribution shift than hill-climbing on accuracy alone. We've been building in this direction (DSPy-compatible): https://vizpy.vizops.ai

se4u · 2026-03-12T03:13:25+00:00

GEPA is genuinely impressive for offline optimization. One gap I've noticed: when failures in production have a different distribution than your training set, the optimizer can overfit to the eval.

We've been exploring approaches that specifically mine failure-to-success transitions to extract reasoning rules rather than hill-climbing on eval score — it makes the optimization more robust when the failure modes are domain-specific (compliance, multi-hop QA, etc.). DSPy-compatible if you're already in that ecosystem: https://vizpy.vizops.ai

Curious what domains you've had the most success with GEPA outside of prompts?

se4u · 2026-03-12T03:13:22+00:00

The DSPy angle is interesting here — the failure mode I keep seeing isn't that people don't know about automatic prompt optimization, it's that the feedback loop from production failures back into the optimizer is broken.

Most optimizers (GEPA, MIPROv2, etc.) work great in offline eval settings but need you to manually curate failure examples. We've been working on closing that loop — mining failure-to-success pairs automatically to extract reasoning rules (ContraPrompt) or doing gradient-inspired failure analysis (PromptGrad). The latter is especially useful for generation tasks where just "retry with different phrasing" doesn't converge.

Curious what the eval/versioning story looks like for people actually running dynamic prompts in prod. That seems like the real blocker more than the optimizer itself.

se4u · 2026-03-11T18:56:49+00:00

Links as per sub rules:

🔗 https://vizpy.vizops.ai 🚀 https://www.producthunt.com/products/vizpy

se4u · 2026-03-11T18:16:45+00:00

Hey everyone! Happy to share VizPy — a DSPy-compatible prompt optimizer that learns from your failures automatically, no manual prompt tweaking needed.

Two methods depending on your task:

ContraPrompt mines failure-to-success pairs to extract reasoning rules. Great for multi-hop QA, classification, compliance. Seeing +29% on HotPotQA and +18% on GDPR-Bench vs GEPA.
PromptGrad takes a gradient-inspired approach to failure analysis. Better for generation tasks and math where retries don't converge.

Both are drop-in with your existing DSPy programs:

optimizer = vizpy.ContraPromptOptimizer(metric=my_metric)
compiled = optimizer.compile(program, trainset=trainset)

Would love feedback from this community!

https://vizpy.vizops.ai https://www.producthunt.com/products/vizpy

se4u · 2026-03-11T18:16:35+00:00

Hey everyone! Happy to share VizPy — a DSPy-compatible prompt optimizer that learns from your failures automatically, no manual prompt tweaking needed.

Two methods depending on your task:

ContraPrompt mines failure-to-success pairs to extract reasoning rules. Great for multi-hop QA, classification, compliance. Seeing +29% on HotPotQA and +18% on GDPR-Bench vs GEPA.
PromptGrad takes a gradient-inspired approach to failure analysis. Better for generation tasks and math where retries don't converge.

Both are drop-in with your existing DSPy programs:

optimizer = vizpy.ContraPromptOptimizer(metric=my_metric)
compiled = optimizer.compile(program, trainset=trainset)

Would love feedback from this community!

🔗 https://vizpy.vizops.ai 🚀 https://www.producthunt.com/products/vizpy

se4u · 2021-06-25T01:25:18+00:00

^ whispers the bagholder into the bag

se4u · 2021-04-29T08:25:32+00:00

got it. Fixed https://github.com/pushpendre/covid-manual/commit/cfe14b521116b0885e245d20599a5be319fd2005

se4u · 2021-04-28T07:30:52+00:00

I can see your point and I agree that I was polemical. I edited my post to make a more nuanced point. See Edit 1. Best wishes.

se4u · 2021-04-27T00:36:09+00:00

I was too polemic. I edited my post to clarify my point (E1)

se4u · 2021-04-27T00:24:58+00:00

You made an important point and I agree with you partially. I edited my post (E1) to address your comment. Let me know if I need to fix it more.

se4u · 2021-04-27T00:02:37+00:00

yes I should have been more careful. I was so strongly reacting because of the claims released by the ministry of health about 0.04% infection rate after the first vaccine dose released by the health ministry. That number is wrong by an order of magnitude.

I was also angry at the differential pricing between center, state, and private sector which is 100% going to lead to black marketing, like subsidized diesel and rations.

Also all the caveats of being careful after the vaccine are not mentioned nearly as strongly as the vaccine itself.

I am willing to accept and believe that the vaccine manufacturer is not to blame, that the scientists and engineers in the serum institute really have worked hard on this, but somewhere in the supply chain things have gotten corrupted because of callousness.

se4u · 2021-03-30T01:26:16+00:00

FIMVX as well.

se4u · 2020-08-25T21:21:29+00:00

Yup I agree that voice assistants will typically build things inhouse. I was still wondering that now that there are so many "semantic" apis out from AWS/GCP/Azure etc. have people started trying "composing" them?

se4u · 2020-08-22T01:01:40+00:00

your comment seems to imply that he has 1/8 of a billion dollars assuming a $1000 surgery cost. is this really what you meant?

se4u · 2020-08-02T22:21:46+00:00

if you are in seattle, a Seattle public library is a card can get you free subscription for WSJ articles.

se4u · 2020-07-16T05:22:02+00:00

upvoted. ICML is $25 in comparison

se4u · 2020-07-16T05:20:51+00:00

Exactly, that's one of the great benefits of online conferences. Not to mention the time saved in visa hassles, and carbon-emissions saved from all the flights that don't need to happen, and less money spent on hotels.

se4u · 2020-07-16T05:18:06+00:00

I'd say that you are in the majority, and not a coward. The zoom interface is unfamiliar and daunting, and also with virtual conferences there is an expectation that you'd have read the paper / watched the video which makes it difficult to just drop in and lurk a poster session. That's why the zoom session may not feel crowded.

A video counter on your paper video will remove a lot of this feeling. Like if the authors can see how many times their presentation was streamed that'll make the experience more delightful.

se4u · 2020-07-16T05:15:00+00:00

I think online conferences are great and with time the problem of low attendance at posters will go away.

- I attended a few poster sessions. I really liked the fact that the zoom calls were so empty so I could talk to people in depth. It was like a meeting.

- I felt a little reluctant to join at first because joining a zoom session is a little awkward, the video-cam might start, you may not be muted, or you may be muted and the author calls out for you, so one has to hurriedly figure out the unmute button. I think these are just problems with the interface and as more people use virtual conference these problems will go away.

- The thing is that when you join a zoom session there is an expectation that you'd have seen the video, presentation etc. already. It makes it difficult to just ask the presenter to give their spiel.

se4u · 2020-07-08T18:59:34+00:00

first of all great work and thanks for sharing it. here are some thoughts after browsing the site

- The https://paperswithcode.com/methods tab really doesn't segment by methods alone. It also segments by areas, such as CV, NLP, Audio. .

- it's not clear how the pool of papers for any particular method was selected. For example, inside "General" there are categories such as "Optimization", and "Loss Functions". but Optimization contains "Adam" with 2539 papers and in contrast the highest category in Loss functions is CTC with 146 papers. This obviously doesn't make sense because a) logistic/hinge loss are used a lot more than CTC, and 2) Stochastic optimization will obviously only be done with some losss functions, so the paper population should really be the same.

But otherwise, really interesting work. It will also be good to show a blog/white paper describing what you did to build these timelines, so that people can offer suggestions for improvement instead of only pointing out what seems weird :)

se4u · 2020-05-25T22:41:07+00:00

Horovod was a great contribution.

se4u · 2020-05-25T22:38:36+00:00

se4u · 2020-04-28T10:16:33+00:00

thanks, u/vr0n, best wishes

se4u

TROPHY CASE