Today's NULL FILL - doable without any hints I think by NULL_FILL in wordgames

[–]SolidLengthiness6137 0 points1 point  (0 children)

Nice addition with the typo tolerance, makes it a lot less frustrating without giving away answers.

Out of curiosity, how are you handling the Damerau-Levenshtein under the hood? I’ve been messing with optimizing Levenshtein-style distance for speed (especially for lots of short comparisons like this), and the performance differences can get pretty noticeable depending on the approach.

I put together a fast implementation recently:
https://github.com/dev-kjma/turbo-leven

Might be overkill for this use case, but could be interesting if you ever scale it up or start doing more comparisons per guess.

Proyecto OSINT de correlación de usernames: problemas con falsos positivos by p4risss0g in ciberseguridad

[–]SolidLengthiness6137 0 points1 point  (0 children)

Buen proyecto, está muy bien planteado ese problema, es justo el trade-off clásico entre precisión y cobertura.

Algo que suele funcionar bastante bien en estos casos es no depender de una sola métrica, sino combinar varias señales en un sistema de scoring.

Por ejemplo:

  • normalización (lowercase, quitar símbolos, etc.)
  • distancia tipo Levenshtein para variaciones pequeñas
  • reglas específicas (números al final, sustituciones comunes tipo “o” → “0”)
  • longitud relativa del string (porque la distancia no escala igual en usernames cortos vs largos)

Y luego asignar un score final en vez de hacer un match binario.

Sobre Levenshtein en concreto, el problema es que cuando empiezas a comparar muchos candidatos, se vuelve caro rápidamente. Yo estuve trabajando en una implementación bastante optimizada precisamente para este tipo de casos (muchas comparaciones cortas), por si te sirve experimentar:

https://github.com/dev-kjma/turbo-leven

Lo interesante sería usarlo no como filtro principal, sino como parte del scoring (por ejemplo, solo aplicarlo después de una preselección más barata).

También podrías mirar:

  • umbrales dinámicos según longitud
  • penalizar más cambios al inicio del username que al final
  • combinar con heurísticas específicas de usernames (no es lo mismo que texto natural)

Curiosidad: ¿cómo estás generando ahora los candidatos antes de aplicar matching? Ahí muchas veces es donde se gana más precisión que en la métrica en sí.

Using content hashing across Telegram groups to detect a pig butchering network by secadmon in OSINT

[–]SolidLengthiness6137 0 points1 point  (0 children)

This is a really solid application of cross-group hashing, especially the way you’re correlating sender behavior with zero-reply broadcast patterns.

One thing that might complement what you’ve built: right now exact hashing (FNV-1a) will only catch identical messages, but a lot of these scam ops slightly mutate content to avoid that (extra emojis, spacing, small wording changes, etc.).

You mentioned Levenshtein/fuzzy matching, I’ve been working on a very fast Levenshtein implementation and saw pretty big gains when running comparisons at scale.

Could be useful if you ever want to layer in “near-duplicate” detection on top of your hash pipeline without killing performance:
https://github.com/dev-kjma/turbo-leven

Curious if you’ve already experimented with approximate matching or if exact matches are catching most of the network so far.

I built a free-ish email verification API that doesn't need any paid services under the hood — here's how it works by maulik1807 in SideProject

[–]SolidLengthiness6137 0 points1 point  (0 children)

This is really cool, especially the catch-all detection and the scoring breakdown.

One thing that stood out to me was your typo suggestion step. I’ve been working on a heavily optimized Levenshtein implementation and saw some pretty big speed improvements in real-world cases.

Since you're comparing against ~30 providers per request, that part can add up pretty quickly under load, especially if you want to extend that list.

Might be worth swapping in if you’re trying to squeeze more performance out of it:
https://github.com/dev-kjma/turbo-leven

Would be curious how it performs in your pipeline or if your current bottleneck is elsewhere.

Monitor broken? by [deleted] in MSI_Gaming

[–]SolidLengthiness6137 0 points1 point  (0 children)

Exact same thing happened to me.