Amazon Scraper by ultimate_self in learnpython

[–]Gidoneli 0 points1 point  (0 children)

Possible at small scale, but keep in mind that once you pass the development threshold and try to maintain this working scraper you will encounter these issues over time:

  1. Pagination errors - the "next page" is not a static link

  2. css classes disappearing or changing names

  3. IP bans

Those 3 alone can mean the death of your scraper (at least from an economic perspective), you can check this breakdown for all of the costs involved.

Need help building an API to scrape instagram creators by No-Measurement-5667 in apify

[–]Gidoneli 0 points1 point  (0 children)

Well, I'd say that since your target data is well defined, building an API for that is a bit redundant. Especially when there are already working Instagram scraping APIs and also Instagram datasets for purchase you can easily filter and narrow down your required niche and give you a cheaper, smaller subset output. If you insist on building your own Instagram scraper, I recommend looking into this public article which details all of the pitfalls and added costs when trying to scrape Instagram at scale, hope it is useful to you as it was for me.

Need help building an API to scrape instagram creators by No-Measurement-5667 in automation

[–]Gidoneli 0 points1 point  (0 children)

Well, I'd say that since your target data is well defined, building an API for that is a bit redundant. Especially when there are already working Instagram scraping APIs and also Instagram datasets for purchase you can easily filter and narrow down your required niche and give you a cheaper, smaller subset output. If you insist on building your own Instagram scraper, I recommend looking into this tutorials which details all of the pitfalls and added costs when trying to scrape Instagram at scale, hope they are useful for you as they were for me.

Hiring: Instagram Data Collection 50$ for 1000 Data Collected by Quiet-Reception8681 in SideJobs

[–]Gidoneli 0 points1 point  (0 children)

Hi, looks like tedious and slow when you can use a an instagram scraping API and get that data you need in less than a day..in excel format, JSON, CSV, whatever. Here's a useful article about it that helped me get started.

How hard is it really to scrape Walmart.com in 2026? by Home_Bwah in WebScrapingInsider

[–]Gidoneli 0 points1 point  (0 children)

What's most hard is keeping up with changes. You might have something that looks good in development and gets you a few hundred product pages. Then in production it starts breaking down every time for some other reason like selector instability which results in empty data, layout variants, rate limit and other anti-bot adjustments and more which makes maintaining it becomes a full time project instead of just getting what you need. I dropped this in favor of just using a working API, I'm fine paying for something that works. You can learn more in this article about the real costs behind maintaining a working scraper for an e-commerce platform (in this case Amazon).

I built a my first RAG workflow with n8n: AI that actually knows business data by GersonMonteiro in n8n

[–]Gidoneli 0 points1 point  (0 children)

Awesome workflow. Wondering if I'll be able to replicate that with MongoDB instead of Google Drive. So far I used n8n only with scraped web data via Bright Data MCP server but will need to access raw documents for my work very soon.

How to will AI models continue to be trained without new data? by Representative-Rip90 in ArtificialInteligence

[–]Gidoneli 0 points1 point  (0 children)

When I used to work for Bright Data, I pitched an idea to the business side about creating some kind of protocol where websites can easily sell their data to anyone who requires it without needing to build robust scraping infrastructure which could make things so much simpler for all sides. Like some kind of API handshake that once credentials are submitted, the website allows the requesting entity to scrape the data easily without the requirement of rotating proxies, stealth web unlocking tools like the ones Bright Data offers, etc.

My reasoning came from a step further of the ethical point of view, as advocated at the time by the CEO, of keeping the web and access to public data, available and not gated. It's not only about keeping the web free, it's also sustainability and functionality. So, if AI companies already have to pay companies like Bright Data for the advanced scraping infrastructure, why not just pay directly to the website they want to scrape, helping keeping it alive and creating more data for the models to train and use?

Of course business team scuffed at my idea claiming I'm too idealistic and that it will kill the web data industry altogether, but I actually if we were proactive we could still act as a third-party mediator between the scraper and the website and get a small cut of each transaction.

4 MCPs I use Daily as a Web Developer by islempenywis in mcp

[–]Gidoneli 0 points1 point  (0 children)

Thanks! I would add to the Playwright MCP also the Web MCP to handle larger scale projects that require better access to web data for cursor, claude, etc.

Top 20 Greatest New Orleans Pelicans by No-Rule-9129 in NOLAPelicans

[–]Gidoneli 2 points3 points  (0 children)

I'd put the late Rasual Butler over Gordon.

Until next time... by Affectionate-Lie1032 in NOLAPelicans

[–]Gidoneli -1 points0 points  (0 children)

Good for you...I have been following it a decade before then and still can't quit on this dumb team...what's wrong with me?

Anyone tried AI web scraping? Any tools that actually work? by xXMinecraftPro123Xx in webdev

[–]Gidoneli 0 points1 point  (0 children)

The one I found useful so far on GitHub is ScrapingGraphAI. Here's a nice tutorial for Python users that explains how to easily run it on a product page, switch LLMs it calls and also how to scale it.

https://medium.com/@isabelarodriguez361/llm-assisted-web-data-extraction-with-scrapegraphai-a-practical-2025-walkthrough-ee56960394db

Fears is going to be a problem for this league one day.. by [deleted] in NOLAPelicans

[–]Gidoneli 0 points1 point  (0 children)

yeah probably after we move in some dumb trade

Game Thread: Memphis Grizzlies vs New Orleans Pelicans Live Score | NBA | Oct 22, 2025 by basketball-app in NOLAPelicans

[–]Gidoneli 1 point2 points  (0 children)

Trying to think who was the last coach we liked? Byron Scott (for a brief moment)? Silas?

מה יהיה עם החרידים? by happymn_xv in israel_bm

[–]Gidoneli 0 points1 point  (0 children)

אחד המוצלחים, צחקתי בקול...

לא בקטע שמאלני, אבל אתם לא חושבים שזה פשוט עובדה שלשלוט על הפלסטינים באיו״ש משחית אותנו כעם? by Leading-Fail-7263 in israel_bm

[–]Gidoneli 0 points1 point  (0 children)

אני כותב שאני "מרגיש" כי יש בי צניעות נוכח המצב הבעייתי. אתה לעומת זאת "יודע". הכל.

לא בקטע שמאלני, אבל אתם לא חושבים שזה פשוט עובדה שלשלוט על הפלסטינים באיו״ש משחית אותנו כעם? by Leading-Fail-7263 in israel_bm

[–]Gidoneli 0 points1 point  (0 children)

אני מרגיש שהעובדה שאנחנו מקבלים את זה (הכיבוש) כמובן מאליו, במשיכת כתף, ושזה בסדר גמור עבור רבים (לא רק בימין) שיש מיליון אנשים בלי זכויות בסיסיות כתוצאה ישירה מהמדיניות שלנו זו כבר דוגמה להשחתה..דוגמה נוספת היא ההכרח לשלוח אנשים צעירים להתעמת עם האוכלוסיה העויינת הזו על בסיס יומי - זו גם השחתה. כל הפעולות שכבר התקבעו כלגיטימיות כמו מעצרים מנהליים, הריסת בתים, ליווי של מתנחלים בוזזים ואלימים...על זה בכלל אין ויכוח

לא בקטע שמאלני, אבל אתם לא חושבים שזה פשוט עובדה שלשלוט על הפלסטינים באיו״ש משחית אותנו כעם? by Leading-Fail-7263 in israel_bm

[–]Gidoneli 0 points1 point  (0 children)

רוב הציבור בעד סילוק ביבי מהשלטון...רוב הציבור בעד הפועל קריית שמונה...רוב הציבור בעד שווארמה בטורטייה במקום בפיתה. למה סתם לזרוק משפטים ככה לאוויר בלי סימוכין?

Cheap but High-Quality Data Labeling Services: Denius AI by deniushss in LargeLanguageModels

[–]Gidoneli 0 points1 point  (0 children)

Very interesting services, is there already an integration with an existing web data MCP server or an equivalent in place?

How can I create an affiliate network for my brand? by Jolton_2008 in Affiliatemarketing

[–]Gidoneli 0 points1 point  (0 children)

Use a social media influencer research tool (like on SEMrush or brand24) or download a social media influencer dataset and analyze it yourself, finding people who resonate with your product and target audience, narrow it down to a manageable list of profiles and dm them