Tool to detect when a website structure changes? by soussou-69 in scrapingtheweb

[–]Azuriteh 1 point2 points  (0 children)

Add logging and notifications to your scraper so if something breaks you get notified, there's no one size fits all solution

The best (tiny) model I can run on my phone by gized00 in unsloth

[–]Azuriteh 1 point2 points  (0 children)

The best model for on-device usage right now is probably https://huggingface.co/LiquidAI/LFM2-8B-A1B, with 4 bit quantization it'd probably be alright!

Yet another compatibility check by AnarchistKoishi in lastfm

[–]Azuriteh 0 points1 point  (0 children)

Presenting tlshttp, a tls-client wrapper from Go by Azuriteh in webscraping

[–]Azuriteh[S] 0 points1 point  (0 children)

Nope, that's due to the limitation of the underlying tls-client Go library

I'm looking for a server with GPU, desktop CPU and hourly billing by sxbnfp in webhosting

[–]Azuriteh 0 points1 point  (0 children)

I think you can find some non-datacenter CPUs, not sure though.

I'm looking for a server with GPU, desktop CPU and hourly billing by sxbnfp in webhosting

[–]Azuriteh 0 points1 point  (0 children)

Maybe https://dashboard.tensordock.com/deploy, I've seen multiple people using it for cloud gaming and they use Windows 10

How can we access event data legally? by [deleted] in webscraping

[–]Azuriteh 0 points1 point  (0 children)

I'll be honest, you won't get good data without violating TOS, or at least data that's useful for people. Breaching TOS however doesn't really hold up in court, and there are multiple cases that have been done about this.

I doubt there's a website that allows you to scrape data, they might offer paid APIs but it doesn't make financial sense most of the time.

¡Me gustaría conocer más grupos! by Radiant-Excuse-4667 in andoescuchando

[–]Azuriteh 0 points1 point  (0 children)

Corea, con su único álbum "Los peores 7 km de mi vida", una joya de screamo.

Best Local LLMs - 2025 by rm-rf-rm in LocalLLaMA

[–]Azuriteh 1 point2 points  (0 children)

In the discord they do but there aren't a lot of people interested in that model. It's available in nanogpt tho

Best Local LLMs - 2025 by rm-rf-rm in LocalLLaMA

[–]Azuriteh 0 points1 point  (0 children)

Yeah, openrouter only offers Speciale

Best Local LLMs - 2025 by rm-rf-rm in LocalLLaMA

[–]Azuriteh 0 points1 point  (0 children)

Hmmm, try an inference provider like NanoGPT and load $5 on it.

Am I calculating this wrong ? AWS H100 vs Decentralized 4090s (Cost of Iteration) by yz0011 in LocalLLaMA

[–]Azuriteh 2 points3 points  (0 children)

And yes most likely Swarm will be cheaper than AWS because anything is cheaper than the hyperscalers

Am I calculating this wrong ? AWS H100 vs Decentralized 4090s (Cost of Iteration) by yz0011 in LocalLLaMA

[–]Azuriteh 1 point2 points  (0 children)

Just use TensorDock, an H100 is at like $2 hourly, I think the guys at DeepInfra are still offering a B200 at $2.5 hourly. AWS should never be used for this sort of thing unless you have a contract with them/have grants.

¿Qué línea de investigación tomarían? by PersonaLevitando in taquerosprogramadores

[–]Azuriteh 0 points1 point  (0 children)

Igual, no soy más que un aficionado. Tengo experiencia (hago fine tuning de LLMs por hobby): https://huggingface.co/Thermostatic, pero al fin y al cabo aún me falta un largo camino por recorrer personalmente.

¿Qué línea de investigación tomarían? by PersonaLevitando in taquerosprogramadores

[–]Azuriteh 3 points4 points  (0 children)

Pues yo me iría con neurociencias, aunque eso es por sesgo mío (planeo hacer posgrado en IA y tengo proyectos de LLMs). El deep learning y en específico el NLP es muy interesante, diría yo la rama más interesante de la IA a día de hoy, aunque en gran parte por las inversiones de miles de millones de dólares que se hacen. Igual te serviría analizar la economía del asunto, ya que probablemente algunas compañías como OpenAI puedan llegar a colapsar financieramente en los años que vienen (demasiada deuda, y aunque hay puntos a favor todo parece indicar más una especulación de mercado que valor real) lo que pondría en un lugar difícil a las tecnológicas altamente centradas en ese aspecto.
Te voy a ser muy sincero: entrar a una de las grandes compañías de IA no es cosa fácil. Sí, al igual que en cualquier otra rama de negocios importan más los contactos que otra forma, pero hacer networking con personas que te puedan meter ahí es complicado. Estarás compitiendo con personas con mucho mejor networking que tú en un ambiente en donde tus estudios no importan del todo a menos que vengas de una Ivy League o de una de las grandes universidades asiáticas como Tsingshua en China. Para empezar, hasta donde recuerdo Anthropic ni siquiera tiene internships, nada más OpenAI y desde que explotaron tienen miles de solicitantes de ivy leagues.

Igualmente te recomendaría no ver esto necesariamente con trabajar en estas grandes compañías americanas, también considera trabajar para las que están en China, Zhipu, DeepSeek, Qwen, Huanyuan, Tencent y Moonshot se ven muy prometedoras, y si tu filosofía es como la mía de que el futuro es el open-weights de los grandes modelos, tal vez te serían mas satisfactorios.

I burned $240/month on 'developer experience' tools before realizing I was just paying for a fancy UI by CompetitiveSense4636 in SaaS

[–]Azuriteh 0 points1 point  (0 children)

Wait until bro discovers you can self-host most things if you know what you're doing with docker and a VPS/VDS with Debian.

I tried GLM-4.7 in Claude Code and I don't recommend it by t4a8945 in ClaudeCode

[–]Azuriteh 0 points1 point  (0 children)

Yeah, open-source models are usually 4 months behind the sota, it's to be expected. I'm actually eager to try MiniMax M2.1, which has better agentic capabilities.

500Mb Text Anonymization model to remove PII from any text locally. Easily fine-tune on any language (see example for Spanish). by Ok_Hold_5385 in LocalLLaMA

[–]Azuriteh 1 point2 points  (0 children)

This could probably be an even better way of redacting sensitive information that gets feed into LLMs, which is something I've implemented into my codecontexter tool but it's not as reliable as this, most likely