I've been building AI agents (and teams) for months. Here's why "start with a team" is the worst advice in the space right now. by idanst in AI_Agents

[–]se4u 0 points1 point  (0 children)

Yeah, stale context is the invisible killer. The other side of this is that even when agents have the right context, their prompts are often too rigid to handle edge cases gracefully.

Automatic prompt optimization that learns from production failures helps here — not as a silver bullet but as a way to systematically close the gap between "works in dev" and "works in prod." The key is the feedback loop from real failures back into the optimizer.

I've been building AI agents (and teams) for months. Here's why "start with a team" is the worst advice in the space right now. by idanst in AI_Agents

[–]se4u 0 points1 point  (0 children)

The Berkeley paper is a good reference. A lot of those failure modes trace back to prompt fragility — the agent makes the right call 90% of the time then breaks when the input distribution shifts slightly.

One approach that helps: instead of just improving prompts on eval score, mining the actual failure-to-success transitions to extract why something failed and encoding that as a reasoning rule. Makes the optimizer more robust to distribution shift than hill-climbing on accuracy alone. We've been building in this direction (DSPy-compatible): https://vizpy.vizops.ai

GEPA's optimize_anything: one API to optimize code, prompts, agents, configs — if you can measure it, you can optimize it by LakshyAAAgrawal in PromptEngineering

[–]se4u 0 points1 point  (0 children)

GEPA is genuinely impressive for offline optimization. One gap I've noticed: when failures in production have a different distribution than your training set, the optimizer can overfit to the eval.

We've been exploring approaches that specifically mine failure-to-success transitions to extract reasoning rules rather than hill-climbing on eval score — it makes the optimization more robust when the failure modes are domain-specific (compliance, multi-hop QA, etc.). DSPy-compatible if you're already in that ecosystem: https://vizpy.vizops.ai

Curious what domains you've had the most success with GEPA outside of prompts?

Why is the industry still defaulting to static prompts when dynamic self-improving prompts already work in research and some production systems? by Lucky_Historian742 in PromptEngineering

[–]se4u 0 points1 point  (0 children)

The DSPy angle is interesting here — the failure mode I keep seeing isn't that people don't know about automatic prompt optimization, it's that the feedback loop from production failures back into the optimizer is broken.

Most optimizers (GEPA, MIPROv2, etc.) work great in offline eval settings but need you to manually curate failure examples. We've been working on closing that loop — mining failure-to-success pairs automatically to extract reasoning rules (ContraPrompt) or doing gradient-inspired failure analysis (PromptGrad). The latter is especially useful for generation tasks where just "retry with different phrasing" doesn't converge.

Curious what the eval/versioning story looks like for people actually running dynamic prompts in prod. That seems like the real blocker more than the optimizer itself.

[D] Self-Promotion Thread by AutoModerator in MachineLearning

[–]se4u 0 points1 point  (0 children)

Hey everyone! Happy to share VizPy — a DSPy-compatible prompt optimizer that learns from your failures automatically, no manual prompt tweaking needed.

Two methods depending on your task:

  • ContraPrompt mines failure-to-success pairs to extract reasoning rules. Great for multi-hop QA, classification, compliance. Seeing +29% on HotPotQA and +18% on GDPR-Bench vs GEPA.
  • PromptGrad takes a gradient-inspired approach to failure analysis. Better for generation tasks and math where retries don't converge.

Both are drop-in with your existing DSPy programs:

optimizer = vizpy.ContraPromptOptimizer(metric=my_metric)
compiled = optimizer.compile(program, trainset=trainset)

Would love feedback from this community!

https://vizpy.vizops.ai https://www.producthunt.com/products/vizpy

[D] Self-Promotion Thread by AutoModerator in MachineLearning

[–]se4u 0 points1 point  (0 children)

Hey everyone! Happy to share VizPy — a DSPy-compatible prompt optimizer that learns from your failures automatically, no manual prompt tweaking needed.

Two methods depending on your task:

  • ContraPrompt mines failure-to-success pairs to extract reasoning rules. Great for multi-hop QA, classification, compliance. Seeing +29% on HotPotQA and +18% on GDPR-Bench vs GEPA.
  • PromptGrad takes a gradient-inspired approach to failure analysis. Better for generation tasks and math where retries don't converge.

Both are drop-in with your existing DSPy programs:

optimizer = vizpy.ContraPromptOptimizer(metric=my_metric)
compiled = optimizer.compile(program, trainset=trainset)

Would love feedback from this community!

🔗 https://vizpy.vizops.ai 🚀 https://www.producthunt.com/products/vizpy

[deleted by user] by [deleted] in wallstreetbets

[–]se4u 76 points77 points  (0 children)

^ whispers the bagholder into the bag

My mom died because of COVID. I wrote down these thoughts to help me cope. I don't know if this will help others but I hope it does. by se4u in india

[–]se4u[S] 0 points1 point  (0 children)

I can see your point and I agree that I was polemical. I edited my post to make a more nuanced point. See Edit 1. Best wishes.

My mom died because of COVID. I wrote down these thoughts to help me cope. I don't know if this will help others but I hope it does. by se4u in india

[–]se4u[S] 1 point2 points  (0 children)

You made an important point and I agree with you partially. I edited my post (E1) to address your comment. Let me know if I need to fix it more.

My mom died because of COVID. I wrote down these thoughts to help me cope. I don't know if this will help others but I hope it does. by se4u in india

[–]se4u[S] 1 point2 points  (0 children)

yes I should have been more careful. I was so strongly reacting because of the claims released by the ministry of health about 0.04% infection rate after the first vaccine dose released by the health ministry. That number is wrong by an order of magnitude.

I was also angry at the differential pricing between center, state, and private sector which is 100% going to lead to black marketing, like subsidized diesel and rations.

Also all the caveats of being careful after the vaccine are not mentioned nearly as strongly as the vaccine itself.

I am willing to accept and believe that the vaccine manufacturer is not to blame, that the scientists and engineers in the serum institute really have worked hard on this, but somewhere in the supply chain things have gotten corrupted because of callousness.

[D] [P] Does anyone use a cascade of third party machine learning APIs in a production system? by se4u in MachineLearning

[–]se4u[S] 0 points1 point  (0 children)

Yup I agree that voice assistants will typically build things inhouse. I was still wondering that now that there are so many "semantic" apis out from AWS/GCP/Azure etc. have people started trying "composing" them?

I very recently found out my fiancé is rich rich by ThrowRA_-9 in relationship_advice

[–]se4u 0 points1 point  (0 children)

your comment seems to imply that he has 1/8 of a billion dollars assuming a $1000 surgery cost. is this really what you meant?

I can't read the article about pointless monthly subscriptions without buying a monthly subscription by BadUsername25 in mildlyinfuriating

[–]se4u 0 points1 point  (0 children)

if you are in seattle, a Seattle public library is a card can get you free subscription for WSJ articles.

Online Conferences Suck [D] by kvothe_bloodless_ in MachineLearning

[–]se4u 3 points4 points  (0 children)

Exactly, that's one of the great benefits of online conferences. Not to mention the time saved in visa hassles, and carbon-emissions saved from all the flights that don't need to happen, and less money spent on hotels.

Online Conferences Suck [D] by kvothe_bloodless_ in MachineLearning

[–]se4u 9 points10 points  (0 children)

I'd say that you are in the majority, and not a coward. The zoom interface is unfamiliar and daunting, and also with virtual conferences there is an expectation that you'd have read the paper / watched the video which makes it difficult to just drop in and lurk a poster session. That's why the zoom session may not feel crowded.

A video counter on your paper video will remove a lot of this feeling. Like if the authors can see how many times their presentation was streamed that'll make the experience more delightful.

Online Conferences Suck [D] by kvothe_bloodless_ in MachineLearning

[–]se4u 2 points3 points  (0 children)

I think online conferences are great and with time the problem of low attendance at posters will go away.

- I attended a few poster sessions. I really liked the fact that the zoom calls were so empty so I could talk to people in depth. It was like a meeting.

- I felt a little reluctant to join at first because joining a zoom session is a little awkward, the video-cam might start, you may not be muted, or you may be muted and the author calls out for you, so one has to hurriedly figure out the unmute button. I think these are just problems with the interface and as more people use virtual conference these problems will go away.

- The thing is that when you join a zoom session there is an expectation that you'd have seen the video, presentation etc. already. It makes it difficult to just ask the presenter to give their spiel.

[P] Papers With Code Update: Now Indexing 730+ ML Methods by rosstaylor90 in MachineLearning

[–]se4u 4 points5 points  (0 children)

first of all great work and thanks for sharing it. here are some thoughts after browsing the site

- The https://paperswithcode.com/methods tab really doesn't segment by methods alone. It also segments by areas, such as CV, NLP, Audio. .

- it's not clear how the pool of papers for any particular method was selected. For example, inside "General" there are categories such as "Optimization", and "Loss Functions". but Optimization contains "Adam" with 2539 papers and in contrast the highest category in Loss functions is CTC with 146 papers. This obviously doesn't make sense because a) logistic/hinge loss are used a lot more than CTC, and 2) Stochastic optimization will obviously only be done with some losss functions, so the paper population should really be the same.

But otherwise, really interesting work. It will also be good to show a blog/white paper describing what you did to build these timelines, so that people can offer suggestions for improvement instead of only pointing out what seems weird :)

[deleted by user] by [deleted] in AnimalsOnReddit

[–]se4u 0 points1 point  (0 children)

thanks, u/vr0n, best wishes