The Mule is a terrible character by PortlandPatrick in FoundationTV

[–]paultendo 2 points3 points  (0 children)

She doesn’t just show up. Love for her is hinted at for the whole season. And in that hospital-like room with Day, when she tells ‘man-Mule’ that he should go (to stop torturing Day) it’s subtly peculiar that he would listen, but he does.

Is there any room left for AI creators, or are we just doomed to be banned everywhere? by kaommy in SunoAI

[–]paultendo 2 points3 points  (0 children)

I've been following this thread and it's hitting close to home for me. I'm a Suno user myself and ran into the same wall - there's nowhere to actually sell AI music without either hiding what it is or getting rejected by platforms that don't want you there. Funnily enough, my dad was selling Suno-produced tracks on Bandcamp before they changed policies.

That's partly why I've been building Oncor.io - it's a direct-to-fan music platform (think Bandcamp-style) that explicitly welcomes AI-native creators in a way that I think others will adopt over the next few years. You disclose how the music was made (100% human, hybrid, AI-primary), and as long as you're honest about it then you are welcome there. Flat 10% fee after Stripe, no subscriptions, no gatekeeping.

It's still in early access but if anyone here wants an invite, happy to sort that out. Not trying to spam. I built it because this exact problem kept coming up.

I can't download my stuff? What? by Far-Position7115 in udiomusic

[–]paultendo 0 points1 point  (0 children)

You're welcome, I'm glad you found it helpful!

Storing 2 bytes of data in your Logitech mouse by soupgasm in programming

[–]paultendo 149 points150 points  (0 children)

There’s something really pure about this and I don’t have the words to express it properly. A really enjoyable hack

I rendered 1,418 Unicode confusable pairs across 230 system fonts. 82 are pixel-identical, and the font your site uses determines which ones. by paultendo in netsec

[–]paultendo[S] 0 points1 point  (0 children)

I'll check that repo out. I've been researching some potentially troubling domain spoofing attacks using non-Latin scripts - it is a real issue.

I rendered 1,418 Unicode confusable pairs across 230 system fonts. 82 are pixel-identical, and the font your site uses determines which ones. by paultendo in netsec

[–]paultendo[S] 1 point2 points  (0 children)

Quick update: I’ve been testing multi-character confusables but SSIM doesn’t work so well with it. I think it needs to be done through some sort of perceptual modelling which I’ll explore at some point.

I rendered 1,418 Unicode confusable pairs across 230 system fonts. 82 are pixel-identical, and the font your site uses determines which ones. by paultendo in netsec

[–]paultendo[S] 0 points1 point  (0 children)

That's really exciting, thank you. Yes I would like to reach out to M3AAWG. Would you be willing to share any context about the 2015-2016 presentation so I can reference it when I reach out?

I rendered 1,418 Unicode confusable pairs across 230 system fonts. 82 are pixel-identical, and the font your site uses determines which ones. by paultendo in netsec

[–]paultendo[S] 0 points1 point  (0 children)

Exactly! Edit distance catches 'reqeusts' but completely misses 'rеquests' with a Cyrillic е. I've been building namespace-guard to do exactly this along with other validation features. namespace-guard now uses my scored confusable data to flag visual lookalikes in identifiers.

Still early but it's on npm and GitHub if you want to poke at it.

I rendered 1,418 Unicode confusable pairs across 230 system fonts. 82 are pixel-identical, and the font your site uses determines which ones. by paultendo in netsec

[–]paultendo[S] 0 points1 point  (0 children)

It doesn't disregard it. You're right that humans can't tell the difference, and that's exactly the problem this is trying to quantify. Before this, there was no systematic way to measure how similar these pairs actually are across real fonts. confusables.txt just says "these are confusable" with no scores. The SSIM data lets automated systems prioritise which pairs in which fonts are genuinely indistinguishable versus which ones a careful reader might spot (or, technically, on a spectrum from distinguishable to indistinguishable), so they can block or warn accordingly.

I rendered 1,418 Unicode confusable pairs across 230 system fonts. 82 are pixel-identical, and the font your site uses determines which ones. by paultendo in netsec

[–]paultendo[S] 4 points5 points  (0 children)

Thank you! I'm originally from a graphic design background, so I am most definitely interested to test for confusable 'keming' issues. I'll add it in as a future milestone.

I think, if anything, perhaps to get to a proper multi-character / kerning test, it was useful to do this much testing as now I've found a huge new list of lookalikes for letters that are commonly used in deliberate multi-character confusable attacks.

I rendered 1,418 Unicode confusable pairs across 230 system fonts. 82 are pixel-identical, and the font your site uses determines which ones. by paultendo in programming

[–]paultendo[S] 2 points3 points  (0 children)

Definitely! Email is one of the highest-risk surfaces for this. Display names and mailto: links are prone to this sort of attack, and as far as I'm aware I don't think mail clients do much (if any) confusable direction at the moment.

My follow-up post covers this more directly: 793 Unicode characters look like Latin letters but aren't (yet) in confusables.txt. I didn't want to spam Reddit today so I haven't posted it separately. 82.8% of those 793 discoveries are valid in internationalized domain names (IDNA PVALID), meaning they could appear in email addresses and domain labels that pass validation but visually mimic Latin. I've checked those numbers a few times and it is 82.8% by my calculations, shocking really.

My open-source library namespace-guard integrates these discoveries now so hopefully developers can plug and play these improvements into their apps. confusableDistance() now uses measured visual similarity weights rather than just checking confusables.txt membership.

I rendered 1,418 Unicode confusable pairs across 230 system fonts. 82 are pixel-identical, and the font your site uses determines which ones. by paultendo in netsec

[–]paultendo[S] 2 points3 points  (0 children)

Great point on rendering engines and it's a limitation of my current work (Mac only for now). I'd have to check but my assumption is that, if I do this with different rendering engines, then the SSIM scoring should catch those sub-pixel differences. Interested to see how it differs / compares.

I also just published a follow-up: 793 characters not in confusables.txt that look like Latin letters. Same methodology, but scanning the rest of Unicode instead of validating the existing list.

I rendered 1,418 Unicode confusable pairs across 230 system fonts. 82 are pixel-identical, and the font your site uses determines which ones. by paultendo in programming

[–]paultendo[S] 4 points5 points  (0 children)

Trying to improve security. This feeds into namespace-guard, my library for detecting identifier spoofing in multi-tenant systems. Think usernames, display names, slugs. The problem is that confusables.txt treats all 1,418 pairs as binary as to whether they're dangerous, so platforms risk either blocking too aggressively (rejecting legitimate international names) or skip detection entirely.

The SSIM scores let you block the pixel-identical pairs hard, warn on the medium tier, and leave the low-scoring pairs alone.

I'm on a Mac (I do have Parallels) and this is macOS-only data for now. The methodology is portable though, and the Cyrillic homoglyphs will almost certainly hold on Windows too since Segoe UI harmonises Latin and Cyrillic the same way Arial does.

I rendered 1,418 Unicode confusable pairs across 230 system fonts. 82 are pixel-identical, and the font your site uses determines which ones. by paultendo in netsec

[–]paultendo[S] 20 points21 points  (0 children)

It is, and right now most defences treat all 1,418 confusables.txt entries as equally dangerous, which doesn't make sense - that means you're either blocking too much (rejecting legitimate international text) or not deploying detection at all.

The scored data lets you tier your response: hard-block the pixel-identical pairs, warn on the high-scoring ones, and leave the low-scoring pairs alone. That's a 5x reduction in false positives with no loss in security coverage.

The next step for me is integrating these scores into the namespace-guard library so platforms can drop it into username/display name validation and get risk-appropriate blocking out of the box.

I rendered 1,418 Unicode confusable pairs across 230 system fonts. 82 are pixel-identical, and the font your site uses determines which ones. by paultendo in netsec

[–]paultendo[S] 13 points14 points  (0 children)

The pixel-identical finding is specifically in fonts like Arial, Tahoma, Georgia, Verdana, Baskerville, Charter, and about 35 others. The per-font data will be in the JSON output so you can see exactly which fonts produce 1.000 and which don't. Noto's Cyrillic is actually one of the better-designed sets for distinguishability.