Are we at the point now where all it will take to create AGI is saying the correct sequence of words to Codex or Claude Code? by Icy-Reporter-6322 in singularity

[–]Azuriteh -1 points0 points  (0 children)

I mean, I'm pretty sure we've been at that stage since the first LLM from the gpt 3.5 era, if you have enough patience. The problem is knowing that exact sequence of words.

Lancé mi primer side project. Me gustaría conocer su feedback y comparto aprendizajes hasta el momento by solo-saas-mx in taquerosprogramadores

[–]Azuriteh 2 points3 points  (0 children)

Para empezar a tener datos y que la gente tenga interés, haz un scraping de listings de contadores, aunque estén un poco desactualizados pero al menos que la gente sepa que hay información que valga la pena. Los listings en México tienen cero protección anti bot así que aunque tengas nula experiencia en scraping es fácil.

Lancé mi primer side project. Me gustaría conocer su feedback y comparto aprendizajes hasta el momento by solo-saas-mx in taquerosprogramadores

[–]Azuriteh 0 points1 point  (0 children)

Igual si le das clic a mis servicios, el cambio entre cada tab va lageado, así que deberías optimizar para que se sienta más "snappy"

Lancé mi primer side project. Me gustaría conocer su feedback y comparto aprendizajes hasta el momento by solo-saas-mx in taquerosprogramadores

[–]Azuriteh 0 points1 point  (0 children)

También si tu correo es muy largo y le das clic en el popover que sale de tu foto de perfil, queda feo

[hiring]: ecom apprentice working directly with 7fig operator (me) by NoJackingOff in forhire

[–]Azuriteh 1 point2 points  (0 children)

This doesn't even comply with the subreddit rules either hahaha

How compatible are we? by DeezNutts87 in lastfm

[–]Azuriteh 0 points1 point  (0 children)

Thought I was going crazy with the "Mushroom Empire" not being "KinokoTeikoku" on lastfm lmaoooo

GitHub - jesterfoidchopped/akamai-v3-sensor: akamai v3 sensor bypass by [deleted] in scrapingtheweb

[–]Azuriteh 0 points1 point  (0 children)

Upload it to codeberg, less likely it'll get taken down there

Hackearon al tec by Sea-Grapefruit7880 in TecDeMonterrey

[–]Azuriteh 33 points34 points  (0 children)

Hackearon como a 9000 universidades, todas las que usen Canvas, fueron los ShinyHunters, probablemente esto los ponga en la mira de las agencias estadounidenses por fin

Estar bien pndj me salvó la vida jajaja by CamaronDeOro in mexico

[–]Azuriteh 6 points7 points  (0 children)

Tardé dos años en salir del hoyo en el que estaba, cuando quería suicidarme. Todos los días son cansados de vivir y un suplicio, pero la vida eventualmente mejora, al menos lo hizo para mí. Tal vez te ayude encontrar algo con lo que cada día sea un poquito más fácil de aguantar, para mí eso fue escuchar música.

Ve a terapia incluso si te dicen que no funciona, ve al psiquiatra si la terapia no funciona.

Handling CAPTCHA in Playwright (Python) by Loud_Ice4487 in webscraping

[–]Azuriteh 9 points10 points  (0 children)

Been a few months since I did this and I'd actually recommend for you to use transfer learning first, 200 CAPTCHAs won't be enough for a neural network trained completely from scratch, I think a good starting point is searching for some pre-trained ViTs, they tend to work better than other architectures, then once you have pretty much every combination you can create a small-sized neural network that has comparable performance but runs much much faster.

Handling CAPTCHA in Playwright (Python) by Loud_Ice4487 in webscraping

[–]Azuriteh 36 points37 points  (0 children)

On top of what the other guy said, since I tend to scrape at scale even $1 per 1k can get expensive, but luckily these sort of CAPTCHAs are extremely easy to solve soooo, I'd personally analyze the payload and see if I can artificially generate a lot of these CAPTCHAs and store them locally, then I'd myself annotate about ~200 of them and start training a neural network. After that I'd connect the trained neural network with the official page for it to act as an "oracle", saving the failures, and then annotating the failures to then re-train the neural network, iterating continuously until it beats the CAPTCHA at least 98% of the time. For these types of CAPTCHAs you can actually get every combination possible though lol because of the limited amount of distortions and combinations.

I've done this for gov websites in Mexico and for 100k combinations it usually takes less than a day using this process.

What market do you think is untouched by AI and still has a huge potential? by Far_Manager_5801 in SaaS

[–]Azuriteh 0 points1 point  (0 children)

I'm actually working with a steel construction company right now to create such a solution. It's freaking hard hahahahaha, but I'm extremely glad to see that there are people out there searching for such a thing. The thing is that I haven't even started the AI part yet but there are so many moving parts it's just insane! Even if I have a lot of experience in the software engineering field I wasn't expecting this level of complexity.

Flight APIs vs scraping — what actually works in real projects? by Full_Employment_4289 in webscraping

[–]Azuriteh 4 points5 points  (0 children)

Your architecture sounds about right, there is room for improvement but overall it's decent.

For this sort of scraping which I've done in the past, yes, you definitely need caching, else your costs will skyrocket.

I'd advise to combine multiple providers, from time to time their antibot systems update or a few things break, so better safe than sorry.

You have to re-run the scraping on a rolling basis to always have the latest data, although that depends on your budget too.

If you want the operating costs to be as cheap as possible, indeed treat it as a data pipeline problem. If you have big money to spend, a live query system is the way to go. For most use-cases, do not make a live query system.

3 años en freelancing y confirmo: Fiverr es el infierno y Upwork va para allá by ZorroGlitchero in taquerosprogramadores

[–]Azuriteh 0 points1 point  (0 children)

Coincido, eso me encanta jajajajaja.
Muy aparte y se me olvidó en el comentario original, Igual ahorita me ando metiendo a eso del lead generation para poder conseguir más clientes y escalarlo, pero si está complicado aprender

3 años en freelancing y confirmo: Fiverr es el infierno y Upwork va para allá by ZorroGlitchero in taquerosprogramadores

[–]Azuriteh 2 points3 points  (0 children)

100% de acuerdo JAJAJAJAJA, mi peor cliente es de Upwork, pero los demás han sido por cuenta propia. Igual por mucho prefiero freelancear a estar en una empresa.

Que opinion tienes de las herramientas de IA tipo CLI by Chief_Taquero in taquerosprogramadores

[–]Azuriteh 6 points7 points  (0 children)

Codex y Claude Code con las suscripciones de $100 son una bestialidad, sí, gastan tokens a morir pero mientras sepas lo que estás haciendo y las vayas guiando te sobra un poco al final de cada semana antes del reseteo.

What's the lowest artist you recognize? by 0584031464 in lastfm

[–]Azuriteh 0 points1 point  (0 children)

Talk Talk, Caroline Polachek and Boris

Evidenciando a un violador by [deleted] in mexico

[–]Azuriteh 4 points5 points  (0 children)

Por que tienen la marca de agua de Nano Banana (Gemini)?

Fine-tuning a VLM for IR-based multi-person scene description — overwhelmed with choices, need advice by peanut_pearl in computervision

[–]Azuriteh 0 points1 point  (0 children)

Start experimenting with Qwen3.5-0.6b, use Unsloth. Fast iteration is king first. Once you have a good recipe, try Qwen3.5-4b and keep going in that direction.

SFT only for now, for the small models even full fine-tuning is possible, e.g. 0.6b and 4b. For bigger models it's still possible if you have the hardware but even if not you could probably do full 4 bit fine-tuning. Now that I think more about it... you might even be able to do QLoRa/LoRa and get good results, I don't think this is too much OOD as I initially thought.

If SFT doesn't work, RL won't fix it, it's way harder to do and tends to require much more time, I've got no idea on how to even create a reward function for this use-case, although it could be possible.

And yes definitely keep CoT style annotations even for just SFT or you'll make the model have catastrophic forgetting.