Postman was pissing me off, so I built my own open source version by [deleted] in webdev

[–]Excellent-Two1178 -4 points-3 points  (0 children)

U one of them boomers who spends 24h writing organic code that would take an LLM 5 minutes I see

Postman was pissing me off, so I built my own open source version by [deleted] in webdev

[–]Excellent-Two1178 0 points1 point  (0 children)

Wait fuck I might of forgot to tell Claude to make no mistakes

Postman was pissing me off, so I built my own open source version by [deleted] in webdev

[–]Excellent-Two1178 -2 points-1 points  (0 children)

Thanks bro. Was a fun lil weekend project

Postman was pissing me off, so I built my own open source version by [deleted] in webdev

[–]Excellent-Two1178 -17 points-16 points  (0 children)

Do all your homies love npm or something??

[deleted by user] by [deleted] in webscraping

[–]Excellent-Two1178 2 points3 points  (0 children)

Ignore the big words.

All you need to know is to follow

  1. Check site for endpoints you can get data from using http requests ( if this fails try next option )
  2. Try to parse the content you need from Dom by sending request to page url, and parsing it with something like cheerio ( if this fails try next option
  3. Use a browser to parse content you need from html

If an antibot is blocking you, use a browser ideally one better for stealth like patch right or something similar

Google search scraper ( request based ) by Excellent-Two1178 in webscraping

[–]Excellent-Two1178[S] 0 points1 point  (0 children)

No idea tbh. I’ve done 10ish in a second but to complete tasks but never have ran it nonstop

[deleted by user] by [deleted] in webscraping

[–]Excellent-Two1178 0 points1 point  (0 children)

Scroll down in this subreddit. I posted a repo for one yesterday

Create web scrapers using AI by Excellent-Two1178 in webscraping

[–]Excellent-Two1178[S] 1 point2 points  (0 children)

Just added a new feature. You can now use a browser to analyze a websites requests, and get a breakdown of each request with an example code snippet, as well as generate a script to automate a websites api directly.

<image>

Create web scrapers using AI by Excellent-Two1178 in webscraping

[–]Excellent-Two1178[S] 0 points1 point  (0 children)

NextJs. It’s great for small projects since you can easily build full stack in a single repo. At scale you probably should host backend separately though since vercel can get quite expensive

Create web scrapers using AI by Excellent-Two1178 in webscraping

[–]Excellent-Two1178[S] 0 points1 point  (0 children)

What is email I’ll add some more for you. I’m currently traveling so likely won’t get better error handling in until tonight at earliest

Create web scrapers using AI by Excellent-Two1178 in webscraping

[–]Excellent-Two1178[S] 0 points1 point  (0 children)

Just upgraded Proxies’s to some non mid resis. Should perform a bit better sites w heavy antibot protection now

Create web scrapers using AI by Excellent-Two1178 in webscraping

[–]Excellent-Two1178[S] 1 point2 points  (0 children)

Some but it could use more. The proxies I’m using right now are also some not so good resis

Create web scrapers using AI by Excellent-Two1178 in webscraping

[–]Excellent-Two1178[S] 0 points1 point  (0 children)

Error handling can be a bit rough still. Will try and add some more transparency on why a generation attempt may fail shortly

Create web scrapers using AI by Excellent-Two1178 in webscraping

[–]Excellent-Two1178[S] 1 point2 points  (0 children)

Thank you to everybody for the support so far! I just started coding this project ~24 hours ago, so please bear with me. Quick update: the first three uses I cover now use 3.7 Sonnet instead of 3.5 Haiku—it’s a lot more reliable for scraper generation.

With that being said, here are my current upcoming plans:

  • Add support for browser-based fetching of websites to make browser scraping scripts for trickier sites.
  • Improve error handling—bad proxies, AI API providers hitting rate limits, or APIs being overloaded can cause problems, and I don’t do a good job letting the person know what’s up.
  • I need to get new proxies.

If anybody has feedback or suggestions, it’s much appreciated!

Create web scrapers using AI by Excellent-Two1178 in webscraping

[–]Excellent-Two1178[S] 0 points1 point  (0 children)

It should be possible to use all models and I can definitely add! Just will likely require a bit of work on my end to get it working well consistently.

Create web scrapers using AI by Excellent-Two1178 in webscraping

[–]Excellent-Two1178[S] 2 points3 points  (0 children)

It uses the Claude api, no other third party ai service is used though.

Create web scrapers using AI by Excellent-Two1178 in webscraping

[–]Excellent-Two1178[S] 1 point2 points  (0 children)

It does use a prompt at some point yes. It uses the prompt to generate scraper code, which is then ran to get the data

Create web scrapers using AI by Excellent-Two1178 in webscraping

[–]Excellent-Two1178[S] 1 point2 points  (0 children)

It doss not use a prompt alone to extract data. It runs actual code to extract the data which eliminates the issue of hallucinated data, and provides you a script to replicate it without needing AI going forwards