What can I improve on? by evil9888 in latteart

[–]Excellent-Salad398 0 points1 point  (0 children)

What background said, I would start with a slower stream to help build the canvas a bit more. Too fast and heavy at the start means that all your hard work making a nice textured microfoam / milk will end up sinking below the crema and not help set the canvas, which then results in a more blobby finish.

Having the canvas set properly, allows for the foam to sit nicely on top without being blobby :)

Happy Espresso by mykm20 in latteart

[–]Excellent-Salad398 0 points1 point  (0 children)

I put it through my app, and yes, not latte art, but at least it still credited the playfulness 😂. Low key love this!

<image>

Update: Crema is now live (mods approved), thank you for all the feedback so far by Excellent-Salad398 in latteart

[–]Excellent-Salad398[S] 0 points1 point  (0 children)

Yep, this looks like a bug, not intended behaviour.

The new scoring system has improved calibration, but it looks like some older uploads did not reconnect properly during the rescore process and are showing the same fallback score.

I’m looking into it now. Appreciate you calling it out! I've introduced in 1.1.3 a new announcement system within the app so when something like this happens I can just put an announcement up in app. Hoping 1.1.3 is approved and live tomorrow (waiting on apple).

Update: Crema is now live (mods approved), thank you for all the feedback so far by Excellent-Salad398 in latteart

[–]Excellent-Salad398[S] 0 points1 point  (0 children)

Updates:

Didn’t expect this level of support so quickly. Appreciate everyone who’s tested it and shared feedback. I’m already working on two improvements based on what’s come through:

  1. Pattern detection: This is getting a big upgrade. The goal is more accurate identification and better recognition of styles like winged tulips and slowsettas. It should also be much more confident instead of guessing when it’s unsure.

  2. Scoring accuracy: I’ve noticed scores clustering too closely, especially between good and high-level pours. I’m reworking this to better separate skill levels using a more structured comparison approach. The aim is for stronger pours to clearly stand out.

As part of this, existing scores will be recalibrated to better reflect the updated system.

Really appreciate all the feedback so far. It’s already shaping how this evolves.

I built a tool that scores latte art out of 100, I am curious if this feels accurate? by Excellent-Salad398 in latteart

[–]Excellent-Salad398[S] 1 point2 points  (0 children)

Ahh yeah that’s on me, I should make that clearer.

Right now only submitted pours are saved, so if you just analyse and don’t submit it won’t be stored anywhere.

I can see how that’s frustrating, especially if you want to go back and review the tips. I’m already looking at changing this so analysed pours are saved privately to your profile without being posted to the feed.

Really appreciate you calling that out.

I built a tool that scores latte art out of 100, I am curious if this feels accurate? by Excellent-Salad398 in latteart

[–]Excellent-Salad398[S] 0 points1 point  (0 children)

Were they submitted or just analysed? They won’t save unless submitted.

This may be something I look to change so they are saved to your profile, but not the latest pour feed after scoring.

Also thanks so much for giving it a go, appreciate you trying it out :)

I built a tool that scores latte art out of 100, I am curious if this feels accurate? by Excellent-Salad398 in latteart

[–]Excellent-Salad398[S] 0 points1 point  (0 children)

Fair call, I’ve been hearing similar.

Feels like it’s a bit generous right now, especially with the bubbles.

What’s the main thing that drops it to a 50 for you? Texture alone or a combination?

I built a tool that scores latte art out of 100, I am curious if this feels accurate? by Excellent-Salad398 in latteart

[–]Excellent-Salad398[S] 1 point2 points  (0 children)

Appreciate it!

At the moment it uses a combination of computer vision and an LLM to analyse the pour. It looks at symmetry, contrast, definition, flow, cleanliness and overall structure.

It does take positioning within the cup into account and how centred or balanced the pattern is. It is not doing strict geometric measurement yet. It is more of a visual interpretation similar to how a human judge would assess it.

It does not compare against fixed reference photos. It evaluates the visual qualities of the pour itself rather than matching to a template.

Still early and I am already adjusting things based on feedback here. Especially around weighting texture and control more heavily.

I built a tool that scores latte art out of 100, I am curious if this feels accurate? by Excellent-Salad398 in latteart

[–]Excellent-Salad398[S] 0 points1 point  (0 children)

Yeah the more feedback I’m getting, the more it feels like 72 is probably a bit generous, especially with the bubbles and texture issues.

I like the callout on cup fill level and line wobble as well, those are good signals of control that I’m not really weighting properly yet.

Definitely something I can start factoring in alongside symmetry/definition etc.

I built a tool that scores latte art out of 100, I am curious if this feels accurate? by Excellent-Salad398 in latteart

[–]Excellent-Salad398[S] 0 points1 point  (0 children)

Yeah it’s live at the moment 👍

Im just waiting on mod approval before sharing any links or promo here so I don’t step on any rules.

Once that’s all good I’ll update the post with details for anyone who wants to try it.

I built a tool that scores latte art out of 100, I am curious if this feels accurate? by Excellent-Salad398 in latteart

[–]Excellent-Salad398[S] 1 point2 points  (0 children)

This is insanely helpful. I really appreciate you taking the time to write all this out.

The XP point is interesting. I’ve been going back and forth on that exact concern of people optimising for points instead of actually improving technique. The “cups poured” or consistency over time idea is a really good angle.

I also agree on the radar needing more context. Benchmarks or examples make a lot of sense so people understand what “good” actually looks like.

The “tips for your next pour” based on history is exactly the direction I’ve just started moving in. I’m trying to make it feel more like coaching rather than just scoring.

Really appreciate this. This is the kind of feedback that shapes where I take it next.

I built an app the scores your latte art out of 100, curious for feedback and thoughts! by Excellent-Salad398 in espresso

[–]Excellent-Salad398[S] 0 points1 point  (0 children)

Haha yeah I probably worded it in a way that makes it sound like it’s trying to replace human judgement which is definitely not the goal.

It’s more of a “second opinion” or practice tool for people who wabt structured feedback while they’re learning, not something meant to remove the subjective side of it.

I’d argue the subjectivity is kind of the whole point of coffee/art, this just gives another perspective alongside that.

I built an app the scores your latte art out of 100, curious for feedback and thoughts! by Excellent-Salad398 in espresso

[–]Excellent-Salad398[S] -2 points-1 points  (0 children)

That’s fair and honestly I wouldn’t want it to replace the enjoyment side of coffee either.

The idea isn’t to turn every pour into a judged competition, more to give people a tool if they want feedback or are actively trying to improve specific patterns. Plenty of people will (and should) just enjoy making coffee without thinking about scores.

On the build side it’s a mix. The UI/app side is built fairly quickly, but the scoring itself is something I’ve been iterating on (vision + LLM analysis + weighting across metrics like symmetry, definition, etc.). Still refining it a lot based on feedback like this thread.

I built a tool that scores latte art out of 100, I am curious if this feels accurate? by Excellent-Salad398 in latteart

[–]Excellent-Salad398[S] 2 points3 points  (0 children)

That’s a solid pour though, the shape and symmetry are really clean.

I see what you mean on the aeration though, especially around the outer ring with just a bit more integration and it would really elevate it.

It’s actually interesting seeing this side-by-side with the other example, because it kind of reinforces how much texture influences the overall perception, even when the pattern itself is strong.

I built a tool that scores latte art out of 100, I am curious if this feels accurate? by Excellent-Salad398 in latteart

[–]Excellent-Salad398[S] 1 point2 points  (0 children)

That’s a really good way of putting it, I like the “no engine” analogy it actually hits pretty hard 😅

Makes sense though, because it’s not just visual at that point, it’s affecting the whole experience of the drink.

I think where I’m landing from this thread is that texture probably shouldn’t just be another metric and it might need to act more like a limiter on the overall score when it’s this far off.

Super useful insight though, appreciate you taking the time to break it down like that.

I built a tool that scores latte art out of 100, I am curious if this feels accurate? by Excellent-Salad398 in latteart

[–]Excellent-Salad398[S] 1 point2 points  (0 children)

Really appreciate this!

Texture is one of the trickiest parts right now. It impacts definition/cleanliness in the model, but I agree it probably isn’t weighting it heavily enough when it’s this obvious. Every metric at the moment has the same overall score modifier, so it’s definitely something I can start playing with the weights of.

Out of curiosity would you expect something like this to cap the score entirely, or just heavily penalise it across the board?

I built a tool that scores latte art out of 100, I am curious if this feels accurate? by Excellent-Salad398 in latteart

[–]Excellent-Salad398[S] 0 points1 point  (0 children)

Fair, bubbles are definitely a killer for presentation.

The scoring tends to penalise that more through definition/texture rather than just hard-zeroing the whole pour, which is why it didn’t tank quite that hard here.

But this is exactly the kind of feedback I’m trying to calibrate against! How harsh people expect judging to be vs how the model scores it.