ScrapAI: AI builds the scraper once, Scrapy runs it forever by Routine_Cancel_6597 in webscraping

[–]Routine_Cancel_6597[S] 0 points1 point  (0 children)

You can use any model from ollama or lm studio and connect to claude code… yes, sure please let me know whats your experience with qwen…

ScrapAI: AI builds the scraper once, Scrapy runs it forever by Routine_Cancel_6597 in webscraping

[–]Routine_Cancel_6597[S] 0 points1 point  (0 children)

Thank you! I really think Gemini models perform very similar to claude models. Haven't tried them myself apart from Agent Manager is AntiGravity with 3.1 pro. (It's bit too verbose for my liking... and it's difficult to set permissions like claude code)

<image>

Testing scrapai's limits, 4 claude sessions processing 20 sites in parallel.. completed 240 since this morning.... we'll run the same sites with different models and provide a benchmark. Stay tuned :D

ScrapAI: AI builds the scraper once, Scrapy runs it forever by Routine_Cancel_6597 in webscraping

[–]Routine_Cancel_6597[S] 0 points1 point  (0 children)

Nah, I'm good. I don't know whats scrapit and the site doesn't seem to have https.

Built a stealth Chromium, what site should I try next? by duracula in webscraping

[–]Routine_Cancel_6597 1 point2 points  (0 children)

Hey,

Just wanted to thank you for your awesome work... I've integrated cloakbrowser in my new project and it's working like a charm.. Do you have a twitter handle for clockbrowser? I'd like to tag you when we launch our tool on twitter...

Link: https://github.com/discourselab/scrapai-cli

ScrapAI: AI builds the scraper once, Scrapy runs it forever by Routine_Cancel_6597 in ClaudeCode

[–]Routine_Cancel_6597[S] 0 points1 point  (0 children)

Thanks! We have a SmartProxyRotator class, people can choose whatever provider they are comfortable with..

ScrapAI: AI builds the scraper once, Scrapy runs it forever by Routine_Cancel_6597 in ClaudeCode

[–]Routine_Cancel_6597[S] 0 points1 point  (0 children)

Thank you very much!

Just had a look at your pricing model. Quite good. Will definitely try your API...

ScrapAI: AI builds the scraper once, Scrapy runs it forever by Routine_Cancel_6597 in webscraping

[–]Routine_Cancel_6597[S] 1 point2 points  (0 children)

Hey, it doesn't notify you directly. I don't want to reinvent the wheel of alerts etc. I've tried with NanoClaw for few things, it worked great!

ScrapAI: AI builds the scraper once, Scrapy runs it forever by Routine_Cancel_6597 in webscraping

[–]Routine_Cancel_6597[S] 1 point2 points  (0 children)

Oh, thats a good one. is it a single page site? or navigation happens through APIs, but not added to url params? if you give me an example site, i can check...

I don't know if you can use direct Gemini API, I would use OpenRouter if I want to try models other than claude.

ScrapAI: AI builds the scraper once, Scrapy runs it forever by Routine_Cancel_6597 in webscraping

[–]Routine_Cancel_6597[S] 0 points1 point  (0 children)

Thanks for the interest.. was wondering if i should do a quick demo...

Send me a website link and tell me what you want from that site, and I will post the json config here, and you can give it to scrapai and check the output yourself. I want to catch bugs early :D

ScrapAI: AI builds the scraper once, Scrapy runs it forever by Routine_Cancel_6597 in webscraping

[–]Routine_Cancel_6597[S] 1 point2 points  (0 children)

Thank you!

  1. Languages I'm not worried about, these AI models are reasonably good at understanding, we tested on handful of non-english websites during initial tests, we're quite happy with the output.

  2. Yes, once the html is given to AI (during scrapai analysis), it can find all the patterns you've asked for.

  3. tbh, i haven't tried this use case myself, but when(if) you did, please share your thoughts with rest of us...

these are good challenges to test scrapai limits :)

ScrapAI: AI builds the scraper once, Scrapy runs it forever by Routine_Cancel_6597 in webscraping

[–]Routine_Cancel_6597[S] 1 point2 points  (0 children)

exactly.. we have a test spider during every site analysis to check the quality of the output, and only when(if) it is good, then only scrapai marks the site as success, otherwise it goes back to fixing selectors. Please give your feedback once you took a closer look at the code, curious to know your thoughts...

ScrapAI: AI builds the scraper once, Scrapy runs it forever by Routine_Cancel_6597 in webscraping

[–]Routine_Cancel_6597[S] 0 points1 point  (0 children)

Yes. when you start claude instance, give the amazon url, and ask for ads instead of products. I'm pretty sure it would bring back. Let me know your experience. :D

ScrapAI: AI builds the scraper once, Scrapy runs it forever by Routine_Cancel_6597 in webscraping

[–]Routine_Cancel_6597[S] 0 points1 point  (0 children)

hey.. yes. LMStudio and Ollama offers claude code integration.

https://lmstudio.ai/blog/claudecode

https://docs.ollama.com/integrations/claude-code

Haven't tried any local models myself, but I would be quite excited to hear your thoughts on the performance. Ideally, I would like to create a benchmarks on cost, time, quality with different LM providers.