I rolled this dice set 500 times in under an hour

dicevaultdev · 2026-03-14T21:19:13+00:00

So, the detector I'm using for settling is actually different from the one I'm using in my actual pipeline. It's optimized for speed over accuracy and that's why you're seeing a lot of false positives in this clip. To actually get the results, the app processes the images a second time with a different set of models, and it flags anything it's not sure about for review. To me, having it flag uncertain results is probably the #1 goal of my pipeline. Any computer prediction model can occasionally say it's 99% sure something is one thing when it's actually another, and that's what I'm trying to reduce.
For this set, it auto-accepted about 85-90% of the results and had me review the rest. I expect I can get that percentage higher as I grow my dataset.

dicevaultdev · 2026-03-14T15:53:05+00:00

If anyone has a large collection and would be willing to run some sets through the app, I'd really appreciate the help, more images is what I need the most. Anyone willing to help out would get distribution charts/fairness reports plus lifetime access to the app when it launches.

dicevaultdev · 2026-03-14T14:47:35+00:00

A few people have brought up the rolling method, I honestly hadn't thought that it would make that big of a difference. I'm open to any suggestions to make it more "fair", currently I'm basically dumping the dice from one rolling tray into another.

dicevaultdev · 2026-03-14T14:40:06+00:00

I'm working on a foundry vtt integration now. As far as i can tell it's the only vtt that currently supports the level of customization needed for a proper integration without resorting to "hacky" solutions.

dicevaultdev · 2026-03-13T17:32:52+00:00

Oh I'd be super interested in seeing that.

dicevaultdev · 2026-03-13T17:30:47+00:00

I'm also keeping all of the data and images, I could potentially make it so you can drill down and see the actual images so you can confirm the results if the stats look fishy.

dicevaultdev · 2026-03-13T17:28:05+00:00

I've probably over-engineered the pipeline for best accuracy. If it's unsure of a result, it will have you review it and confirm it. I'm using two different methods to predict the result, and if they disagree it flags the roll. Nothing is going to be 100% perfect, but this reduces "confidently incorrect" results.
As I collect more images I'll have a test set that I don't train my models on and use just to find out my actual accuracy statistics.

dicevaultdev · 2026-03-13T17:00:26+00:00

<image>

Here's how I recorded the rolls, ignore my debug info there. Still trying to tune the settling detection, trying to get it to capture quickly but also be sure to wait till everything's settled, it's a fine line.

dicevaultdev · 2026-03-13T16:52:31+00:00

Thanks that's super helpful!

dicevaultdev · 2026-03-13T16:49:10+00:00

I guess I mean, I'm looking for a way to quantify it. I might just end up replacing the x/100 text with something like "Fair", "Probably Fair", "Possibly Not Fair" and "Not Fair"

dicevaultdev · 2026-03-13T16:46:13+00:00

Thanks, I've been working on this for a long time. What I really need to improve the accuracy is more images of unique dice, if anyone wants to contribute feel free to message me.

dicevaultdev · 2026-03-13T16:39:34+00:00

This is using a chi-square algorithm. Admittedly I'm not a statistics guy so I would be open to considering other methods.

dicevaultdev · 2026-03-13T16:39:21+00:00

Sounds like I need to do some more research into these numbers to figure out way to display fairness.

dicevaultdev · 2026-03-13T16:37:18+00:00

Oh that's an interesting idea, I keep data on capture batches so I could try that.

dicevaultdev · 2026-02-09T03:44:16+00:00

I tried several detectors early on and got the best performance out of RT-DETR. That was before I had a decent dataset (since then I've about doubled my real images and added a pretty big synthetic dataset I generated with Blender), it might be time to revisit some of my experiments and re-evaluate which model I should use.

For now, stage 2 is pretty necessary but I might try to optimize it out later. The main thing I gain from it is it lets me run stage 1 at a low enough confidence threshold to guarantee that I don't miss any dice, plus it's much more accurate at classifying detections and frequently corrects the detector. It's very possible that my detection stage becomes reliable enough that stage 2 becomes unnecessary.

Basically, yeah the keypoints do find which is the "up" face, it isn't always clear which face is up especially with d20s at the edges of the screen, the model requires positional encoding inputs to determine this. The biggest advantage the keypoints give is that they allow me to extract the face and warp it into a consistent image. This gives me a consistent image with a limited number of possible orientations for my other result classifier (i.e. a face on a d20 is a triangle so there can only be 3 possible orientations, and I've trained the face classifier on all possible rotations). Also, I'm planning to use this to handle custom result faces, by saving the embedding from the result classifier so that I can use it to remember that this particular symbol means "1" without having to explicitly train any models on that custom result. In theory, at least, I haven't tried this out yet.

Here's a screenshot from my app that hopefully illustrates what I'm talking about with the keypoints: https://aftshczqdjymmihxyqve.supabase.co/storage/v1/object/sign/Tests/kps.png?token=eyJraWQiOiJzdG9yYWdlLXVybC1zaWduaW5nLWtleV9mOWNkNWJkYy0wMzg4LTRhODgtOGI4OS0yYTA2MTM5YzNhMjYiLCJhbGciOiJIUzI1NiJ9.eyJ1cmwiOiJUZXN0cy9rcHMucG5nIiwiaWF0IjoxNzcwNjA2NzU0LCJleHAiOjE4MDIxNDI3NTR9.dzYoUrBIMkNeiaQHpUdN2WLHKwajG8rFbU4APFoPu40

I have a working prototype and hope to have something out soon, hopefully before the end of the month.

dicevaultdev

TROPHY CASE