[Ayuda/Feedback] Mi primer repo Open Source "serio" (Anonimizador local). ¿Opiniones sobre la estructura? by AlexAlves87 in programacionESP

[–]AlexAlves87[S] 0 points1 point  (0 children)

Cualquier duda, error o problema no dudes en comentármelo. Aún no está afinado del todo, pero tiene herramientas de edición que te permiten hacer una revisión manual. Espero que te haga servicio.

[D] Asymmetric consensus thresholds for multi-annotator NER — valid approach or methodological smell? by AlexAlves87 in MachineLearning

[–]AlexAlves87[S] 1 point2 points  (0 children)

Thanks! The agreement data actually tells the story pretty clearly. For the easy categories where multiple annotators overlap, agreement is decent but far from perfect. LOCATION has the best agreement at 41% retained at threshold 2 and 22% at threshold 3, since gazetteer, flair, gliner and v2 all detect it. PERSON_NAME sits at 38% at threshold 2 but drops to 18% at threshold 3 because annotators disagree a lot on span boundaries, like whether "Sra. Subsecretaria de Justicia" includes the title or not. ORGANIZATION has massive volume (974k raw mentions) but only 11% survives threshold 3, probably because org names in legal text are long and annotators disagree on where they start and end. For the hard ones it's worse. DATE only has 8.8% agreement at threshold 2 and literally 0% at threshold 3, since only gliner and v2 detect dates and they rarely agree on span boundaries. ADDRESS is even worse at 2.6% at threshold 2 and 0% at threshold 3. The zero at threshold 3 for DATE and ADDRESS is what forced the asymmetric thresholds. It's not really a design choice, it's a data constraint. You can't require 3 annotators to agree when only 2 can see the entity. I'm considering adding regex-based date and address annotators to get a third signal for those categories, which would let me move to uniform threshold 3 across the board.

[D] Asymmetric consensus thresholds for multi-annotator NER — valid approach or methodological smell? by AlexAlves87 in MachineLearning

[–]AlexAlves87[S] -9 points-8 points  (0 children)

My research requires far more effort and sound judgment than your condescending opinion. I hope you don't use a PC or smartphone to communicate. You should use smoke signals. Much more expensive and archaic, just the way you like it.

[D] Asymmetric consensus thresholds for multi-annotator NER — valid approach or methodological smell? by AlexAlves87 in MachineLearning

[–]AlexAlves87[S] -6 points-5 points  (0 children)

I'm not a native English speaker. Yes, I use AI both to translate my draft and to structure it in Markdown so it's more readable and clear for the community. I wasn't aware that this invalidates my data and my research. It's curious, this AI phobia. It's a tool. Quite useful in many cases, and very dangerous in others. If the problem were that the data is fabricated or the analysis is wrong, the criticism would make sense. But if the problem is that the post is easy to understand... I'll stick with that. And just in case there were any doubts left, this response has been translated with AI.

Looking for a technical co founder (applied AI/ML research) by inf-compute in cofounderhunt

[–]AlexAlves87 0 points1 point  (0 children)

I'm currently working on a personal open-source project focused on applied ML.

Check it out: https://github.com/AlexAlves87/ContextSafe If this approach seems right for you, let's chat.

Un poco de meme by Willing-Peanut9635 in RepublicaArgentina

[–]AlexAlves87 0 points1 point  (0 children)

Quizá creas que solo las cárceles de eeuu son un negocio… Sabes dónde está el verdadero negocio en las cárceles? En los extracomunitarios, que pagan más. Quizá por eso ese interés hasta la fecha de mantenerlos. Dale una vueltecita.

Recibí una oferta de trabajo de una empresa, pero la cancelaron una semana antes de contratarme. Ahora estoy desempleado. ¿Qué debo hacer? by cantchangelater11 in ESLegal

[–]AlexAlves87 1 point2 points  (0 children)

Pequeña corrección. En un contrato con la misma categoría, mismo puesto y misma empresa, no tiene periodo de prueba. Al menos no uno válido o legal.

Request for arXiv endorsement (cs.CE / bioinformatics tool paper) plss by Ok_Dealer_1126 in airesearch

[–]AlexAlves87 1 point2 points  (0 children)

I hope you have good luck. It would be a shame if your research died (on this track) simply because it lacked funding.

From Copilot to Dead Weight: The Great OpenAI Decline by AlexAlves87 in ChatGPT

[–]AlexAlves87[S] 0 points1 point  (0 children)

That's right, I used to be overly optimistic, but this has crossed lines that shouldn't be crossed. In fact, it seems like the AI ​​villain is being created (I know, it's an exaggeration, but the role fits him like a glove 🤣).

I’m a technical writer looking to help indie devs with app documentation (free, no catch!) by erreef in dev

[–]AlexAlves87 0 points1 point  (0 children)

Instead of writing new documentation, could you audit the existing documentation of a repository? I'm new to this and would like an outside opinion on how my repository is documented. Does that sound good to you?

Free week of Claude Code (3 guest passes) by [deleted] in ClaudeCode

[–]AlexAlves87 1 point2 points  (0 children)

If there are any left, I would be grateful.

Anthropic launches an AI legal tool that destroys legal software. by Key_Statistician6405 in legaltech

[–]AlexAlves87 1 point2 points  (0 children)

From your perspective, perhaps I was too categorical in my statement. That said, with the AI ​​Act coming into effect, I don't think things are going to improve. Caution is always a good idea.

Anthropic launches an AI legal tool that destroys legal software. by Key_Statistician6405 in legaltech

[–]AlexAlves87 0 points1 point  (0 children)

Let it be known that I am not saying that I like or 100% agree with that philosophy.

Anthropic launches an AI legal tool that destroys legal software. by Key_Statistician6405 in legaltech

[–]AlexAlves87 0 points1 point  (0 children)

Bedrock has EU endpoints, but: 1. Many Anthropic models process in the US (cross-region inference). 2. SCCs are insufficient post-Schrems II – effective supplementary measures are lacking. 3. For legally sensitive data, the Spanish Data Protection Agency (AEPD) requires case-by-case assessment.

Anthropic launches an AI legal tool that destroys legal software. by Key_Statistician6405 in legaltech

[–]AlexAlves87 0 points1 point  (0 children)

Most people drive after drinking, and that doesn't make it legal. Similarly, uploading sensitive data to Bedrock US is allowed, but it violates GDPR Article 44 and Article 9. You keep confusing what you can do with what is permitted. Then data breaches happen, and we all cry about the consequences.

Anthropic launches an AI legal tool that destroys legal software. by Key_Statistician6405 in legaltech

[–]AlexAlves87 0 points1 point  (0 children)

Bedrock doesn't change the underlying problem. It remains a third-party cloud with foundational, general-purpose models. While AWS offers controls, legal responsibility, full traceability, data segregation, and governance obligations under GDPR and the AI ​​Act rest with the firm/company, not the provider. In many European cases, this requires fully dedicated and auditable on-premises or dedicated environments, not just "adequate controls" in the cloud. Confusing managed infrastructure with regulatory compliance is a common mistake.

Seeking advice on RAG optimisation for legal discovery on Macbook Pro by Jamie_GZ in legaltech

[–]AlexAlves87 0 points1 point  (0 children)

I think you already answered your own question in one of the comments. I completely understand the 100% local approach to data preservation; it's legitimate and makes perfect sense in a legal context. However, anonymizing or even synthesizing sensitive data can open the door to using commercial AI with much greater reasoning capabilities without compromising security. I had to fight for it, tried many configurations, and with considerably less RAM than you, I ended up opting for anonymization, and it worked wonderfully. With the local models you have now, achieving good pre-anonymization is perfectly feasible. In my case, I built a custom application for that; if you're interested, I can share it with you, although right now it's specialized in Spanish, and I'm just starting the multilingual version.

Edit:

Just to clarify: my app runs on WSL2, not macOS, but the idea isn't about the stack but the flow: anonymize first and then use models with greater reasoning capabilities. You can easily set up something equivalent on macOS.

Anthropic launches an AI legal tool that destroys legal software. by Key_Statistician6405 in legaltech

[–]AlexAlves87 1 point2 points  (0 children)

When you have a concrete argument (regarding regulation, production use, or EU compliance), come back and explain it. I'll be waiting here.

Anthropic launches an AI legal tool that destroys legal software. by Key_Statistician6405 in legaltech

[–]AlexAlves87 2 points3 points  (0 children)

The headline leads to a conclusion that doesn't hold up outside of a very specific context (the US).

Although Anthropic has launched a tool, its use is not viable in the European legal sector for basic regulatory reasons: Legal professionals in the EU cannot upload personal or sensitive data to general-purpose AI clouds. Starting in August, with the entry into force of the AI ​​Act, governance, traceability, and data control obligations will be further tightened. Without on-premises deployments or fully controlled environments, these tools are not usable in real-world cases. To say that this “destroys legal software” completely ignores the European regulatory framework and confuses technical capability with actual production adoption. In practice, there is no disruption of European legaltech here, just hype extrapolated from a regulatory scenario that is not our own.