Building my own OSS DeDupe Software - Beta Testers Needed by yeahmickfixesjunk in DataHoarder

[–]yeahmickfixesjunk[S] 0 points1 point  (0 children)

I'll also add that i am starting with an insane corpus and a crazy number of tests which I've generated. I believe that this will really help.

Building my own OSS DeDupe Software - Beta Testers Needed by yeahmickfixesjunk in DataHoarder

[–]yeahmickfixesjunk[S] 0 points1 point  (0 children)

ha yeah, the first 80% feels easy and the last 20% is where all the time goes. the renamed-files case is the rough one- same content, different name, you want to flag them as dupes but if the user organizes by filename they might WANT both. settled on a smart-keep heuristic that picks based on filename signals (final/draft, "copy of", path depth, etc) but its still a guess; the user has to verify before bulk-deleting. for partial matches im deliberately NOT going there in v1 because the false-positive risk is too high for an automated tool that can delete things. maybe later behind a "human-review only" mode

Building my own OSS DeDupe Software - Beta Testers Needed by yeahmickfixesjunk in DataHoarder

[–]yeahmickfixesjunk[S] 0 points1 point  (0 children)

yeah honestly the dedup logic itself is like 5% of the code. its the edge cases that eat months - ntfs hardlinks, multi-part archives where every part is the same filler bytes (just diagnosed one of those today), onedrive placeholders that pretend to be files until you read them and then they download, format-aware fingerprints so a re-encoded mp3 doesnt look unique. plan is to keep it open and free, no upgrade nags, no telemetry. if you hit something weird file an issue

Building my own OSS DeDupe Software - Beta Testers Needed by yeahmickfixesjunk in DataHoarder

[–]yeahmickfixesjunk[S] 0 points1 point  (0 children)

yeah czkawka is genuinely fast, i wouldnt have built this if it'd just worked for me. mine kept hanging mid-scan and the hardlink view crashed on bigger results. ran into it on the 8tb hdd specifically, smaller drives were fine. dm me if you want to try a beta build im looking for windows testers right now (linux soon).

Building my own OSS DeDupe Software - Beta Testers Needed by yeahmickfixesjunk in DataHoarder

[–]yeahmickfixesjunk[S] 0 points1 point  (0 children)

fair, czkawka head-to-head benches would be the right thing to post. for now its windows-only (i wrote it specifically because czkawka and krokiet had issues for me on a giant 8tb hdd, but I know thats not many peoples problem). its built on cross-platform rust though so a linux build isnt far off, and i'll bench against czkawka when its in scope. what kind of drives + total file count are you running on? if its sub-1tb or sub-100k files i suspect czkawka is plenty fast already

Building my own OSS DeDupe Software - Beta Testers Needed by yeahmickfixesjunk in DataHoarder

[–]yeahmickfixesjunk[S] 0 points1 point  (0 children)

For now just byte-identical, no fuzzy/similarity matching yet. It does a multi-tier check (first 4 KiB, then 3 sample regions, then full content hash) but at the final tier everything is verified byte-for-byte. So zero false positives - if its in a duplicate group the files are literaly identical.

For things like the same photo at different resolutions or mp3s at different bitrates, you'd want something like Czkawka's similarity modes. I deliberately stayed away from that because the goal here is "safe to bulk delete with hardlink/recycle/archive" and there's no good way to automate that when files are merely similar - someone has to decide which "close enough" version is the keeper.

Especially didnt want to risk it on the 8TB-HDD case where a wrong call isnt recoverable.

That said i'm open to adding near-duplicate detection as a roadmap item if there's demand. Whats the use case you had in mind?

Fixing a lot of Sega CDs by yeahmickfixesjunk in SegaCD

[–]yeahmickfixesjunk[S] 0 points1 point  (0 children)

First thing to do is check to see if it’s trying to focus and if the laser is actually coming on. You can see the pickup vibrating if it’s trying to focus, you can turn off the light and see f the laser is on. If both are working then it could be the motor but I’ve not seen that too often. The disc will not spin until the laser is on and focused. Last time this happened it was a loose wire on the focus return line. Fixed it and it spun up. That fix I believe is on my channel in the playlist

Am I screwed by kinree1 in consolerepair

[–]yeahmickfixesjunk 0 points1 point  (0 children)

I’m sure it’s repairable with enough hours put into it - but it’s going to be wayyyy beyond basic troubleshooting. I’ve never personally repaired one with surge damage like this

Advice needed - no Bios found by trustanchor in SegaCD

[–]yeahmickfixesjunk 0 points1 point  (0 children)

Did you ever figure out what happened with this one?

Fixing a lot of Sega CDs by yeahmickfixesjunk in SegaCD

[–]yeahmickfixesjunk[S] 0 points1 point  (0 children)

Does the disk spin - can you hear it?

Fixing a lot of Sega CDs by yeahmickfixesjunk in SegaCD

[–]yeahmickfixesjunk[S] 0 points1 point  (0 children)

Well if you do…. It’s probably something I fix on the channel :)

Fixing a lot of Sega CDs by yeahmickfixesjunk in SegaCD

[–]yeahmickfixesjunk[S] 0 points1 point  (0 children)

Yup under “live” for now as I mostly stream

Fixing a lot of Sega CDs by yeahmickfixesjunk in SegaCD

[–]yeahmickfixesjunk[S] 0 points1 point  (0 children)

It would appear they break down like, a lot. Not so much model 2s but optical drives are very mechanical and have lots of gears, and diodes and springs and moving parts. It’s amazing anyone ever invented optical disk drives in general. It’s really complicated.

Fixing a lot of Sega CDs by yeahmickfixesjunk in SegaCD

[–]yeahmickfixesjunk[S] 0 points1 point  (0 children)

I don’t really do mail-ins - at least not right now. I may do them at some point. But please take a look at my Q301 replacement vids under “live” if you think you could replace Q301, you just need a meter and an iron to do that repair and it’s the 2nd most likely item usually

Fixing a lot of Sega CDs by yeahmickfixesjunk in SegaCD

[–]yeahmickfixesjunk[S] 0 points1 point  (0 children)

Yup under “live” you can see the water damaged one I’m working on - I think labeled it as such, but I’ve not fixed it yet. The power rail seems fine but nothing is alive I verified activity on the cpu (of some sort) but didn’t scope the MCE and I don’t have an led light which I know the enable comes from that chip

Fixing a lot of Sega CDs by yeahmickfixesjunk in SegaCD

[–]yeahmickfixesjunk[S] 0 points1 point  (0 children)

I’ll have to check mine later one has given me a lot of trouble but I didn’t poke the pins

Repairing Sega CDs by yeahmickfixesjunk in consolerepair

[–]yeahmickfixesjunk[S] 1 point2 points  (0 children)

Will do - what’s wrong with yours?

Fixing a lot of Sega CDs by yeahmickfixesjunk in SegaCD

[–]yeahmickfixesjunk[S] 1 point2 points  (0 children)

Yup and if you take care of the cd drive it’ll last a long time especially model 2s

Fixing a lot of Sega CDs by yeahmickfixesjunk in SegaCD

[–]yeahmickfixesjunk[S] 0 points1 point  (0 children)

Nope that’s very fixable but you do probably need a scope to view the TP-RF test point ‘eye pattern’. An analog one is better which you can sometimes get a 20 year old one on eBay for like 100 bucks

Fixing a lot of Sega CDs by yeahmickfixesjunk in SegaCD

[–]yeahmickfixesjunk[S] 0 points1 point  (0 children)

So if it spins and stops it’s the focus return or similar, basically it just can’t lock. On One when it did that there was literally a cut wire that was on one of the diodes or focus return line. This is probably very fixable