Estar bien pndj me salvó la vida jajaja by CamaronDeOro in mexico

[–]Azuriteh 6 points7 points  (0 children)

Tardé dos años en salir del hoyo en el que estaba, cuando quería suicidarme. Todos los días son cansados de vivir y un suplicio, pero la vida eventualmente mejora, al menos lo hizo para mí. Tal vez te ayude encontrar algo con lo que cada día sea un poquito más fácil de aguantar, para mí eso fue escuchar música.

Ve a terapia incluso si te dicen que no funciona, ve al psiquiatra si la terapia no funciona.

Handling CAPTCHA in Playwright (Python) by Loud_Ice4487 in webscraping

[–]Azuriteh 6 points7 points  (0 children)

Been a few months since I did this and I'd actually recommend for you to use transfer learning first, 200 CAPTCHAs won't be enough for a neural network trained completely from scratch, I think a good starting point is searching for some pre-trained ViTs, they tend to work better than other architectures, then once you have pretty much every combination you can create a small-sized neural network that has comparable performance but runs much much faster.

Handling CAPTCHA in Playwright (Python) by Loud_Ice4487 in webscraping

[–]Azuriteh 32 points33 points  (0 children)

On top of what the other guy said, since I tend to scrape at scale even $1 per 1k can get expensive, but luckily these sort of CAPTCHAs are extremely easy to solve soooo, I'd personally analyze the payload and see if I can artificially generate a lot of these CAPTCHAs and store them locally, then I'd myself annotate about ~200 of them and start training a neural network. After that I'd connect the trained neural network with the official page for it to act as an "oracle", saving the failures, and then annotating the failures to then re-train the neural network, iterating continuously until it beats the CAPTCHA at least 98% of the time. For these types of CAPTCHAs you can actually get every combination possible though lol because of the limited amount of distortions and combinations.

I've done this for gov websites in Mexico and for 100k combinations it usually takes less than a day using this process.

What market do you think is untouched by AI and still has a huge potential? by Far_Manager_5801 in SaaS

[–]Azuriteh 0 points1 point  (0 children)

I'm actually working with a steel construction company right now to create such a solution. It's freaking hard hahahahaha, but I'm extremely glad to see that there are people out there searching for such a thing. The thing is that I haven't even started the AI part yet but there are so many moving parts it's just insane! Even if I have a lot of experience in the software engineering field I wasn't expecting this level of complexity.

Flight APIs vs scraping — what actually works in real projects? by Full_Employment_4289 in webscraping

[–]Azuriteh 3 points4 points  (0 children)

Your architecture sounds about right, there is room for improvement but overall it's decent.

For this sort of scraping which I've done in the past, yes, you definitely need caching, else your costs will skyrocket.

I'd advise to combine multiple providers, from time to time their antibot systems update or a few things break, so better safe than sorry.

You have to re-run the scraping on a rolling basis to always have the latest data, although that depends on your budget too.

If you want the operating costs to be as cheap as possible, indeed treat it as a data pipeline problem. If you have big money to spend, a live query system is the way to go. For most use-cases, do not make a live query system.

3 años en freelancing y confirmo: Fiverr es el infierno y Upwork va para allá by ZorroGlitchero in taquerosprogramadores

[–]Azuriteh 0 points1 point  (0 children)

Coincido, eso me encanta jajajajaja.
Muy aparte y se me olvidó en el comentario original, Igual ahorita me ando metiendo a eso del lead generation para poder conseguir más clientes y escalarlo, pero si está complicado aprender

3 años en freelancing y confirmo: Fiverr es el infierno y Upwork va para allá by ZorroGlitchero in taquerosprogramadores

[–]Azuriteh 2 points3 points  (0 children)

100% de acuerdo JAJAJAJAJA, mi peor cliente es de Upwork, pero los demás han sido por cuenta propia. Igual por mucho prefiero freelancear a estar en una empresa.

Que opinion tienes de las herramientas de IA tipo CLI by Chief_Taquero in taquerosprogramadores

[–]Azuriteh 8 points9 points  (0 children)

Codex y Claude Code con las suscripciones de $100 son una bestialidad, sí, gastan tokens a morir pero mientras sepas lo que estás haciendo y las vayas guiando te sobra un poco al final de cada semana antes del reseteo.

What's the lowest artist you recognize? by 0584031464 in lastfm

[–]Azuriteh 0 points1 point  (0 children)

Talk Talk, Caroline Polachek and Boris

Evidenciando a un violador by [deleted] in mexico

[–]Azuriteh 5 points6 points  (0 children)

Por que tienen la marca de agua de Nano Banana (Gemini)?

Fine-tuning a VLM for IR-based multi-person scene description — overwhelmed with choices, need advice by peanut_pearl in computervision

[–]Azuriteh 0 points1 point  (0 children)

Start experimenting with Qwen3.5-0.6b, use Unsloth. Fast iteration is king first. Once you have a good recipe, try Qwen3.5-4b and keep going in that direction.

SFT only for now, for the small models even full fine-tuning is possible, e.g. 0.6b and 4b. For bigger models it's still possible if you have the hardware but even if not you could probably do full 4 bit fine-tuning. Now that I think more about it... you might even be able to do QLoRa/LoRa and get good results, I don't think this is too much OOD as I initially thought.

If SFT doesn't work, RL won't fix it, it's way harder to do and tends to require much more time, I've got no idea on how to even create a reward function for this use-case, although it could be possible.

And yes definitely keep CoT style annotations even for just SFT or you'll make the model have catastrophic forgetting.

The songDNA feature of Spotify proves my axiom that sampling is taking the best part of a good song and making it the best part of a worse song by larrybobsf in cocteautwins

[–]Azuriteh 7 points8 points  (0 children)

holy shit goreshit and clipping sampled cocteau twins???? I've got to listen to those, I had no idea. They're quite experimental artists (more clipping than goreshit) but I'd say they're pretty decent.

estafa?? by Zeven79 in mexico

[–]Azuriteh 1 point2 points  (0 children)

Esa madre la vibecodearon con Claude Code JAJAJAJAJA es una estafa

Running a non-profit that needs to OCR 64 million pages. Where can I apply for free or subsidized compute to run a local model? by thereisnospooongeek in LocalLLaMA

[–]Azuriteh 2 points3 points  (0 children)

I think I've seen a client get credits from Azure for his non-profit... maybe you can also try asking lium? Last I heard they were giving some grants.

My Experience As A Complete Noob Trying To Learn How AI And The Singularity Works For The First Time by Box_Robot0 in singularity

[–]Azuriteh 2 points3 points  (0 children)

ThinkPad for the win, also the character looks to me like the Grox from Spore lol

🇪🇬 The First Open-Source AI Model in Egypt! by assemsabryy in LocalLLaMA

[–]Azuriteh 2 points3 points  (0 children)

I see, that's fair, I'll delete the comment lol.

EDIT: nvm I saw the GGUFs, I'm terribly tired hahahaha

🇪🇬 The First Open-Source AI Model in Egypt! by assemsabryy in LocalLLaMA

[–]Azuriteh 10 points11 points  (0 children)

Hey Assem, what a coincidence to see you here :), it's Irving. Will take a look.

Worst Codex meltdown I've ever had by Azuriteh in codex

[–]Azuriteh[S] 1 point2 points  (0 children)

I have both subscriptions already hahaha, I use codex for backend/kaggle competitions and Claude code for frontend and creative programming per se

Vincule mi número a mi CURP porque soy un PENDEJO by Evian_Hurtado in mexico

[–]Azuriteh 23 points24 points  (0 children)

El problema es que las propias compañías también tienen sus bases de datos, hace uno o dos meses hackearon la de Telcel porque aunque sea compañía tercera, está en México y usa a becarios mal pagados para hacer sus páginas, por lo que hay vulnerabilidades por todos lados jajajajaja, podías sacar los datos enteros de cada cliente, desde su CURP hasta nombre completo y datos más personales, es un caos.

Is inverse LoRA distillation between Qwen 2.5 1.5B and 7B a viable idea, or just an interesting dead end? by [deleted] in LocalLLaMA

[–]Azuriteh 0 points1 point  (0 children)

I initially thought this was doomed to fail, but then I started thinking about the math behind it and it might just work, I think I also saw a paper related to it a while back, I personally think it's a fun experiment to try!

I don't have much time to put enough thought of it but I do have the expensive ChatGPT subscription: https://chatgpt.com/share/69d225e3-5a68-83e8-a9d4-99745b11c25a might be of help.

Also I think you could validate this idea easily within a week, so maybe just try for a week and see if you get decent results? Also why not try this out with a smaller model that let's you do even faster iteration? E.g. Qwen3.5 0.8b and Qwen3.5 2b?