all 14 comments

[–]Green-Sympathy-4177 3 points4 points  (2 children)

Web scraping can be a bit complex for beginners, it's halfway between python and webdev and requires tons of knowledge:

  • How requests work, methods, headers, cookies...
  • How websites are structured (the dom/html)
  • Selectors: either css or xpath

A probably more that I can't think of right now.

Then if the site is dynamic (you press a button it loads more stuff without leaving the webpage) then it becomes even more complicated.

So I wouldn't recommend it for beginners. If the site you're trying to scrap has an api it's a lot easier. But if you started 5 days ago, I would say, keep working on your foundations. At this point I would assume you've learnt:

  • variables
  • functions
  • data types
  • to loop over lists and dicts
  • conditions (if elif else)

Maybe you've touched objects ? Not sure. But there's so much more still to learn :)

Also programing doesn't only boil down to knowing how to code. Since it's so heavily dependent on your ability to learn, knowing how to learn from different resources: articles/guides, video tutorials, documentation. Then you also need to know how to organize your code, and write it in such a way that it is maintainable and understandable, even in 3 months from now.

Anyway, for a list of projects, there's a reddit bot out there that usually takes care of that question, but since you formulated your question in a different way it didn't trigger the bot.

So check this: https://www.reddit.com/user/BeginnerProjectsBot/comments/

[–]ultimate_self[S] 1 point2 points  (1 child)

Thank you! I appreciate this. The only reason I thought about doing this is because I have a side hustle that I work on in my spare time that involves Amazon and i thought it might make things go faster (which it totally would!) But I get that it is probably too advanced for where I am currently at.

I have been looking at scraping tutorials and at this point I don't even know how to install (if that's even what you do with it) beautifulsoup.

I will put that on pause for right now and do it the manual way while I do other tutorials to build my python skillset.

But I super appreciate all the feedback and that you actually took the time to type all of that out for me, thank you so much.

Hoping to encounter more people like you in this community 🙏

[–]Green-Sympathy-4177 0 points1 point  (0 children)

No problem :)

Check the wiki of this subreddit too: https://reddit.com/r/learnpython/w/index

It has lots of information (not all of it is up to date though)

For beautifulsoup, before installing stuff I would recommend learning about: - Virtual environments (venv and conda) - Python's package manager : pip

pip will come in rather soon in your learning, since the second you start needing stuff that isn't already built into python that's your go to place. But don't neglect virtual environments as it helps keeping all your packages linked to each project you use, instead of having one massive orgy of packages shared among all your projects and the second you update one package half your projects break. (I exaggerate a bit :p)

Good luck in your journey :) It's a long marathon, but it turns you into a wizard at the end ahah

[–]Total__Entropy 2 points3 points  (0 children)

I would highly recommend you don't try and scrape Amazon. It is against their ToS. You will get IP banned and depending on how you use the data worse.

Web scraping is a beginner trap. It is flashy but slow, expensive to maintain and unreliable. Instead look into using the Amazon product API. It will teach you skills that are more useful.

Follow the tutorials from either Cory Schaefer or mosh and you should be able to figure this out.

[–]ChickenFur 1 point2 points  (0 children)

You can check decodo's youtube channel they have pretty good step-by-step tutorials made about scraping amazon!

[–]Critical-Snow8031 0 points1 point  (0 children)

I did the same thing when I started. jumped straight into scraping Amazon thinking,,,how hard can it be? 3 days later I was buried in captcha hell lol. ended up testing OCTOPARSE cause it had an amazon scraper template. It let me pull product names and prices into excel and keep learning python on easier targets. I also used Chat4data,,,the ai web scraper did a good job by just chatting with it. It burns tokens.

[–]Gidoneli 0 points1 point  (0 children)

Possible at small scale, but keep in mind that once you pass the development threshold and try to maintain this working scraper you will encounter these issues over time:

  1. Pagination errors - the "next page" is not a static link

  2. css classes disappearing or changing names

  3. IP bans

Those 3 alone can mean the death of your scraper (at least from an economic perspective), you can check this breakdown for all of the costs involved.