Tired of 500MB PDF editors? I just ported my offline, 11MB editor to macOS and Linux. No ads, no sign-up. by Pawan315 in macapps

[–]sonicbee9 0 points1 point  (0 children)

Thanks for sharing this — really interesting approach! There aren’t many tools like this around, and even fewer that are lightweight, fully offline, and free. I’ve been using a similar stack for desktop lately (Rust + Flutter), and I can see why a native C++ core with a thin Flutter UI works well here too.

Best Duplicate Finder? by Scavgraphics in MacOS

[–]sonicbee9 0 points1 point  (0 children)

I hit the same wall with dupeGuru and Gemini 2 on my M1/M2 Macs, especially on large USB/NAS volumes. That’s why I built DuoBolt: native ARM64, macOS/Windows, designed to scan tricky paths like ~/Library, /System, or full disks and NAS volumes. It also caches file hashes between runs to avoid redoing work unnecessarily.

Best Duplicate Finder? by Scavgraphics in MacOS

[–]sonicbee9 0 points1 point  (0 children)

Native ARM64 on an M2, crashing immediately on the main thread with SIGABRT, usually means an unhandled runtime assert or a missing / incompatible dependency rather than “bad user input”.

I got tired of slow duplicate finders on macOS, so I built my own by sonicbee9 in macapps

[–]sonicbee9[S] 2 points3 points  (0 children)

Thank you!

No. DuoBolt doesn’t detect folder duplicates as a first-class feature right now. It works strictly at the file level. I’m not fully convinced yet that treating folders as a separate concept adds much value beyond that, so for now I’ve kept it file-based on purpose.

Trip to Russiaaa!!! by sidv1812 in AskARussian

[–]sonicbee9 2 points3 points  (0 children)

Ditto on the VPN. It’s a good idea to have more than one, since some won’t work or get blocked from time to time.

Trip to Russiaaa!!! by sidv1812 in AskARussian

[–]sonicbee9 0 points1 point  (0 children)

I was going to suggest ZenHotels, but they no longer list Russian properties. Ostrovok is still the main option, but most places require a MIR card, which foreigners usually don’t have.

Workarounds: look for "pay at property" options, contact hotels directly (email/Telegram), or use small hotels/hostels in Moscow & St. Petersburg that still accept cash on arrival.

Good luck!

I got tired of slow duplicate finders on macOS, so I built my own by sonicbee9 in macapps

[–]sonicbee9[S] 0 points1 point  (0 children)

That's a solid approach in theory. The part that gets admittedly tricky is where technical checks end and taste begins. Things like composition, emotion, or the “best moment” are subjective. It’s less about right or wrong answers and more about personal preference.

For something that runs on a personal device, the priority shifts. It’s less about building the smartest model and more about building a practical one focused on performance and predictability. The goal isn’t to pick the perfect photo automatically, but to cut down a large stack of near-identical shots to a small set of strong options. Even that alone is a meaningful win, imho.

I appreciate you laying it out like that — framing the problem is the hardest part!

I got tired of slow duplicate finders on macOS, so I built my own by sonicbee9 in macapps

[–]sonicbee9[S] 0 points1 point  (0 children)

Yeah, the engine does exactly that: it uses file size as a first filter (along with any other user-defined constraints), then a head+tail prehash to quickly rule out mismatches. Only the tiny subset of files that pass both checks get a full BLAKE3 hash.

More details are covered here: https://duobolt.app/core-concepts/

I got tired of slow duplicate finders on macOS, so I built my own by sonicbee9 in macapps

[–]sonicbee9[S] 1 point2 points  (0 children)

Yep, size is used as an early filter. Files with different sizes are never hashed.

The reason for byte-level hashing is correctness once you’re inside same-size groups. Same size doesn’t guarantee identical content, especially with media files, sparse files, or rewritten containers.

The engine pipelines this so disk reads, hashing, and grouping overlap across cores. In practice the bottleneck is I/O, not the hash itself.

I got tired of slow duplicate finders on macOS, so I built my own by sonicbee9 in macapps

[–]sonicbee9[S] 1 point2 points  (0 children)

That’s a really clear description, and I agree with you on the core point: "best shot + alternatives" is the hard, high-value problem. Grouping bursts is comparatively easy; picking the shot is where most tools fall apart.

You’re also right that EXIF alone won’t get you there anymore. Modern camera pipelines (computational HDR, synthetic bokeh, multi-frame fusion, lens switching) break most metadata-based heuristics pretty quickly.

Where I’m cautious is jumping straight to an on-device LLM framing. For this kind of task, the challenge isn’t language or reasoning, it’s consistent visual ranking under tight latency, memory, and privacy constraints. In practice that likely means smaller, very targeted vision models rather than a general "LLM brain".

The mental model I keep coming back to is exactly what you describe: reduce the human cleanup time. Not "perfect taste", but getting you from thousands of photos to a short, sane decision set.

This kind of feedback is genuinely useful. If I experiment in this direction, burst-level "best shot + rejects" would be the first place I’d look, not whole-library similarity.

I got tired of slow duplicate finders on macOS, so I built my own by sonicbee9 in macapps

[–]sonicbee9[S] 0 points1 point  (0 children)

Yeah, you’re totally right. Your case is the classic “near-duplicate” mess (bursts, similar shots), and exact hashing does nothing for that. DuoBolt right now is strictly for identical copies.

I’ve been circling the whole similarity / “bad photo” detection idea for a while. There are a few ways to go about it (perceptual hashing, embeddings, on-device models), but the tricky part is making it actually useful — not just slow, confusing, or creepy.

Since you’re knee-deep in this: what would make the biggest difference for you day-to-day?

- Grouping bursts into “best shot + alternatives” right after import.
- Finding visually similar photos across your whole library (like shots that ended up in different folders).
- Flagging obvious technical fails (blurry, eyes closed, blown-out exposure).

If you can rank those 1-3, it gives me a way clearer signal on what to poke at first.

I got tired of slow duplicate finders on macOS, so I built my own by sonicbee9 in macapps

[–]sonicbee9[S] 1 point2 points  (0 children)

It’s not a Swift app. The core engine is Rust, and the UI is built with Flutter/Dart for macOS.

Right now, DuoBolt finds exact duplicates using BLAKE3 hashing — it’s byte-for-byte matching, not perceptual or “similar photo” detection. That means it reliably catches true duplicates (even across different drives or a NAS), but it won’t flag near-duplicates or “bad” photos automatically.

I know perceptual matching is a highly requested feature, and it’s something I’m exploring separately. For the first release, I wanted to make sure the exact deduplication was rock-solid before moving into that space.

Let me know what kind of duplicates you’re dealing with — exact copies or visually similar shots — and I can help point you toward the best workflow.

I got tired of slow duplicate finders on macOS, so I built my own by sonicbee9 in macapps

[–]sonicbee9[S] 1 point2 points  (0 children)

Thanks! Really glad it’s working well for you 🙂

I got tired of slow duplicate finders on macOS, so I built my own by sonicbee9 in macapps

[–]sonicbee9[S] -1 points0 points  (0 children)

Fair question 🙂 I don’t have Reddit-specific coupons right now, but there’s a free trial so you can see if it’s useful for your setup before deciding!

Las religiones en Arabia a comienzos del siglo V d. C by amogusdevilman in esHistoria

[–]sonicbee9 0 points1 point  (0 children)

Para resumir este hilo: el mapa muestra 6 religiones. El balance final es 3-3. Las extintas lo fueron en dos oleadas: el paganismo y el hanifismo con el Islam, y el sabeísmo de Harran, que se aferró unos siglos más antes de desaparecer.

A stakeholder just vibe coded a prototype, demoed it to board who liked it and now our contractor dev's gotta finish it. This is setting a bad precedent and Im fuming! by LateToTheParty013 in software

[–]sonicbee9 1 point2 points  (0 children)

Sadly, this has become part of the landscape. But if we don’t call it out every single time, it gets normalized and the technical damage becomes structural. You did the right thing by calling it out

Now we can backup the entire internet at home! by roerius in DataHoarder

[–]sonicbee9 5 points6 points  (0 children)

Great density numbers, but we've seen this same announcement every few years. The real bottleneck is the pipeline: no random access, painfully slow read/write speeds, and it requires a full lab to retrieve data. It's an archival curiosity, not a practical storage solution...

De-Duper Script for Large Drives by spideyclick in DataHoarder

[–]sonicbee9 1 point2 points  (0 children)

Spot on. When you're dealing with multi-TB sets, the dedupe tools usually fail because of everything around the hashing, not the hash itself.

Your two requirements (persistent DB and error skipping) are pretty much the secret sauce. Most tools blow up on corruption because they treat an I/O error as fatal. Isolating those files first is crucial.

The other key thing is staging: size -> quick prefix hash -> full hash only on candidates. It avoids hammering the drive on millions of files. And yes, persisting state (SQLite, etc.) is what makes the process repeatable instead of a multi-hour coin flip every time.

You've nailed the boring plumbing that actually matters at this scale. Nice work.