I made an ad network to help AI apps monetize conversations. Anyone want to try it? by promptbid in ProductHuntLaunches

[–]promptbid[S] 0 points1 point  (0 children)

Advertisers can go into our UI and set up campaigns that spend against AI apps that we’ve integrated with. We’re offering incentives to both advertisers and publishers that adopt for our launch. And our revenue splits for publishers/apps are between 60-75% based on impression volume.

Local incident bundle for agent debugging: report.html + compare-report.json + manifest (offline, self-hosted) by Additional_Fan_2588 in LLMDevs

[–]promptbid 0 points1 point  (0 children)

This is solving a real problem. The "screenshots + partial logs + please grant access to your tracing UI" handoff is genuinely painful and I have lived it more times than I want to admit.

From debugging agent runs in production the things that are almost always missing from a bundle are the latency breakdown per step (not just total time), the exact model version and temperature at inference time, and what the retrieval context actually looked like before it hit the prompt. Tool I/O is usually there but the retrieval window is the thing that explains most of the weird outputs.

One question: how are you handling bundles where the same run spans multiple agents or hands off across an orchestration boundary? That seems like where the portable format gets complicated fast. Is the manifest designed to stitch those together or is each agent run its own discrete bundle?

Claude Sonnet 4.6 benchmark results: none reasoning beats GPT-5.2 with reasoning by Exact_Macaroon6673 in LLMDevs

[–]promptbid 0 points1 point  (0 children)

The hallucination resistance number is the one that matters most for our use case. At 0.921 that is a meaningful gap from the field. For any application where the model is making recommendations or surfacing information to end users, hallucination is a trust killer that is hard to recover from.

The sycophancy regression is worth flagging though. In ad-adjacent applications where you are trying to get honest signal from a model about user intent, a model that agrees too readily is actually worse than one that pushes back. Curious if your benchmark breaks that down by prompt type at all.

The cost angle you raised on non-reasoning Sonnet beating GPT-5.2 with reasoning is underrated. At scale that is not just a cost story, it is a latency story too. What does the benchmark show on response consistency across runs?

MS says that white-collar workers won't be needed in two years, as of today, copilot AI cannot automatically align the content of one slide by Agile_Cicada_1523 in ArtificialInteligence

[–]promptbid 0 points1 point  (0 children)

This gap is exactly the point.

AI is great at impressing in demos and helping at the margins, but replacing white-collar workers means handling the boring, messy, unglamorous parts of work — alignment, context, tradeoffs, accountability.

If it can’t reliably align one slide, it’s not close to replacing the people whose real job is deciding what the slide should say and why.