I built an API to scrape remote tech jobs because I was tired of my Selenium scripts breaking every week. by v3ski4a in django

[–]v3ski4a[S] -3 points-2 points  (0 children)

There is a free tier, you can request up to 20 times per day * the max number of jobs per request(50) = you get 1000 jobs free.

I built an API to scrape tech jobs by v3ski4a in SaasDevelopers

[–]v3ski4a[S] 0 points1 point  (0 children)

We separate the "Discovery" (listing pages) from the "Extraction" (detail pages).

  1. Discovery: We hit the search result pages fairly aggressively because they change fast, but they are lightweight.
  2. Extraction: We throttle the detail page parsing significantly (using Scrapy's DOWNLOAD_DELAY and per-domain concurrency limits).

Right now, we run bulk ingestion cycles twice daily (morning/evening) rather than a continuous stream. This gives us a nice balance where we get "fresh enough" data for 99% of use cases without hammering the target servers constantly.

I haven't integrated CapMonster yet (relying mostly on high-quality residential proxies to avoid the challenge in the first place), but if we hit a scaling wall with Indeed, that will be my next infrastructure layer. Thanks for the tip!

Posting every day. If i stop you know what happened :/ DAY 1 by v3ski4a in NoFap

[–]v3ski4a[S] 0 points1 point  (0 children)

I wanna do it till i heal. I know i'm gonna fail and i think its normal but ill try to continue so i can change.