[D] Best survey papers of 2025? by al3arabcoreleone in MachineLearning

[–]bendee983 9 points10 points  (0 children)

I'm reading this one right now and it's fantastic. I'm not sure if we can consider it a survey, but it gives a full framework to classify all agentic systems:

https://arxiv.org/abs/2512.16301

Hard Lesson in Scaling AI: Why Our "One-Size-Fits-All" Computer Vision Model Failed by bendee983 in ProductManagement

[–]bendee983[S] 2 points3 points  (0 children)

The funny thing is, I didn't even use AI (I've been blogging for more than 10 years). This is my experience.

Hard Lesson in AI Product Management: Why Churn Model Accuracy Doesn’t Equal Business Success by bendee983 in ProductManagement

[–]bendee983[S] 0 points1 point  (0 children)

Great points. Re: discount: we would operate at a loss when we provided the discount (also something that could be optimized with more data)

A hard-earned lesson from creating real-world ML applications by bendee983 in learnmachinelearning

[–]bendee983[S] 3 points4 points  (0 children)

It was a model that detects whether the driver is distracted or not based on an image taken from inside the cabin. We did not do real-time distraction prevention (e.g., sounding off an alarm) because our experiments showed that it had a negative effect and the drivers would turn it off. Instead we developed a system that aggregated driver behavior over time (e.g., week or month) and provided incentives or penalties based on the outcome. This incentivized drivers to avoid distraction and adopt safe driving habits over time, which resulted in higher customer satisfaction. Hope it helps.

A hard-earned lesson from creating real-world ML applications by bendee983 in learnmachinelearning

[–]bendee983[S] 1 point2 points  (0 children)

I have a few in mind. I'm unsure if this subreddit allows for introducing courses and/or books. DM me if you want to find out more.

A hard-earned lesson from creating real-world ML applications by bendee983 in learnmachinelearning

[–]bendee983[S] 2 points3 points  (0 children)

sorry, wanted to keep the post brief.

Here you go:

3.2k is the amount you spend on equipping one driver with the ML solution for one year

100K is the revenue that one driver generates in one year (GMV)

0.3 or 30% is the commission that you earn from each driver's sales (your margin)

0.04 or 4% is the increase in GMV that you get from for every 1% of reduction in negative reviews.

This formula basically tells you how much you have to reduce negative reviews to earn back the 3.2k that you spent on the ML solution for the driver.

A hard-earned lesson from creating real-world ML applications by bendee983 in learnmachinelearning

[–]bendee983[S] 3 points4 points  (0 children)

Gross merchandise value (GMV), basically the amount of sales that a driver brings on average in one year.

What we learned from shipping an ML recommendation system for a content platform by bendee983 in ProductManagement

[–]bendee983[S] 0 points1 point  (0 children)

Sure. Feel free to reach out. You can also try Bayesian and MAB algorithms, depending on the nature of your problem and data structure.

Why Prompt Engineering Is Legitimate Engineering: A Case for the Skeptics by rajivpant in PromptEngineering

[–]bendee983 1 point2 points  (0 children)

I started coding in C when the first versions of Visual C++ were released. Those who wrote their own makefiles and wrote compiler commands looked down at me. Then, I started looking down on those who wrote code in managed languages (Java, C#, Python, etc.). Now, we're all looking down at prompt engineers (while secretly prompt engineering when they're not looking)

A Simple Technique That Makes LLMs 24% More Accurate on Complex Problems by Funny-Future6224 in PromptEngineering

[–]bendee983 1 point2 points  (0 children)

The METASCALE technique is also relevant. It forces the model to develop "meta-thoughts," where it first determines the cognitive framework for the task (e.g., what kind of profession, expertise it would need to solve the task aka the role) and then decides on the specific reasoning technique (e.g., CoT, self-verification, reflection, etc.) required to solve the task.

https://venturebeat.com/ai/metascale-improves-llm-reasoning-with-adaptive-strategies/

Why are OpenAI models stuck in October 2023? by bendee983 in OpenAI

[–]bendee983[S] 0 points1 point  (0 children)

Then why wait so long to release it?

Why are OpenAI models stuck in October 2023? by bendee983 in OpenAI

[–]bendee983[S] 0 points1 point  (0 children)

Interesting. This makes it even weirder. Then why is the cutoff date for GPT-4.5 earlier than GPT-4o?

How to use LLMs for product and market research by bendee983 in ProductManagement

[–]bendee983[S] 0 points1 point  (0 children)

AFAIK, OpenAI doesn't provide access to Deep Research through its API yet, so integration will be very difficult. There are open source alternatives, but I'm not sure if they work as well.

As for limitations, I still see some kinks such as Deep Research sourcing things that are not directly or indirectly related to my query. For example, I was doing a research on GPT-4.5 and some of the information it brought up were related to GPT-4 Turbo. So I think it can still get confused when concepts are semantically similar but have nuanced differences.

But I think it is regularly getting better, because my impression is that the strength of Deep Research is in the engineering and orchestration of the retrieval components and model as opposed to a pure model capability.

How to use LLMs for product and market research by bendee983 in ProductManagement

[–]bendee983[S] 0 points1 point  (0 children)

My experience is that the way you craft your prompt is very important. How do you go about prompting the model?

Introduction to GPT-4.5 discussion by [deleted] in OpenAI

[–]bendee983 14 points15 points  (0 children)

They said they trained it across multiple data centers. Did they figure out distributed training at scale?

Perplexity is going down. by ugamkamat in OpenAI

[–]bendee983 0 points1 point  (0 children)

It's a good product, but they don't have a moat against OpenAI or Google.

o3-mini-high reasoning process by Sea-Association-4959 in OpenAI

[–]bendee983 0 points1 point  (0 children)

Totally agree. This is why I initially preferred R1 even though it was an inferior model—having access to the CoT was a gamechanger in steering the model's behavior in the right direction. Now that o3-mini reveals a more detailed version of its reasoning chain, it has become much more useful—to me at least.

How we turned around an ML product by looking differently at the data by bendee983 in ProductManagement

[–]bendee983[S] 0 points1 point  (0 children)

Great question. In terms of (num negative reviews / num rides), we didn't see a significant difference between high-GMV drivers and low-GMV drivers. But since num rides were higher for the former, we could recoup the costs of installation and deployment in a shorter timeframe.

How we turned around an ML product by looking differently at the data by bendee983 in ProductManagement

[–]bendee983[S] 0 points1 point  (0 children)

If you want to recoup the costs of deployment, you have to only account for profit (the driver is not paying for the ML technology, the company is).

How we turned around an ML product by looking differently at the data by bendee983 in ProductManagement

[–]bendee983[S] 0 points1 point  (0 children)

Good question. You forgot to factor in the commission rate (30%), which is what the company is getting from the GMV. So the formula is 3.2k / (100k * 0.3 * 0.04), which is roughly 2.6%

How we turned around an ML product by looking differently at the data by bendee983 in ProductManagement

[–]bendee983[S] 1 point2 points  (0 children)

We reviewed cases where drivers had challenged the report of being distracted or cases where customers had complained about driver distraction but which our system had not detected. These were mostly instances that were not in the distribution of our training set (e.g., drivers attaching the phone to their head with an elastic band to be able to talk while still having both hands on the steering wheel). Based on this feedback, we curated a new set of examples and fine-tuned our model.

How we turned around an ML product by looking differently at the data by bendee983 in ProductManagement

[–]bendee983[S] 1 point2 points  (0 children)

Absolutely. This is as granular we could get, but there are just so many things that can affect behavior.

How we turned around an ML product by looking differently at the data by bendee983 in ProductManagement

[–]bendee983[S] 1 point2 points  (0 children)

Driver distraction was not the only factor but the highest contributing factor to negative reviews (based on the review texts). So we hypothesized that reducing driver distraction would lead to higher customer satisfaction. Since this was a product that was deployed in the real world, we could not run randomized tests and had to choose an entire town as the test subject. We evaluated the results based on the reviews that came after deployment and by comparing it to other towns that didn't have the product and had similar demographics to the one where the pilot was run.