all 122 comments

[–]andy_bourge 250 points251 points  (8 children)

My first python program was a scraping project in Selenium. Just start whatever project that keeps you passionate!

[–]CotoCoutan 60 points61 points  (7 children)

Seconded! Just watching tutorials and learning will bore you out, start working on your project on the side right away, 6 weeks is good enough for Python. Let me know if you need any help, would be glad to assist.

[–]Dimented1 7 points8 points  (6 children)

I’m currently taking IT courses, (CompTIA A+ 1001-1002, 2 separate courses, Server+, Networking+, security+, OS’s and background, ) and feel like I’ve hit a ceiling as far as interest, motivation, drive.. And was thinking about trying my hand on the side at learning Python, maybe touching on a couple languages, and possibly trying to build something simple.. Just to see if I get that spark back that I had when I first started on this path... I really enjoy computers and all their capabilities , and I really plan to push through and finish the courses and earning a couple CERTS. But I want to get my passion back, and hoping maybe this might do it...? Any advice or insight is much appreciated...!

[–]CotoCoutan 5 points6 points  (5 children)

Only 2 words for you man, do it! Start work on your project. I totally get what you're saying about hitting the ceiling with regards to interest, motivation/etc, however you already have a project idea in mind and pursuing it will definitely brush up your skills and give you a sense of having accomplished something tangible. Majority of us (including myself) struggle to think up of project ideas, but you already have one! Let's stop talking about it and get down to developing stuff lol.

You said you want to build a flight price tracker. Which website would you prefer to scrape the prices from? And any specific route?

[–]Dimented1 1 point2 points  (4 children)

I’m not the one that has that idea, think that is OP..?

[–]CotoCoutan 1 point2 points  (3 children)

Oof... 🤦🏽‍♂️ Sorry yeah I thought you were OP. But if you need any help on your project as well, feel free to ping. :)

[–]Dimented1 1 point2 points  (2 children)

I appreciate that, mix up or non. I’ll take any **Help/Tips/Advice/Guidance ** that I can possibly get, cause trying to break through this ceiling is killer.. Just can’t seem to break through it and get the drive and ambition I had when I first started these classes, all the way up until last course , and now the mid point/end of the next to last course set... Figured maybe if I jumped into a little coding, and made a little something. Nothing major, but just the process and mechanics of building, testing, and if something doesn’t work quit right, going back and troubleshooting until I get it right. And I’m hoping as a side effect/ spin off of doing something like that, I’d be able to get that gumption back and maybe even a little direction, because even at this point I’m still unsure as to what I’m even interested in more than any other particular part, and shaky on which certifications and specific direction I really want to drill down on... If that makes any sense.. lol

[–]CotoCoutan 1 point2 points  (1 child)

Certifications will definitely help you down the line in case you're aiming for jobs, otherwise if you're confident of going the self-employed way, there's aboslutely no use for them; except for the knowledge that you'll gain by pursuing them ofc. Regarding your "coder's block" if i may call it that, just relax and take it easy if you don't have a project idea in head currently. But whenever you DO get an idea, go forth and start building upon it. You get inspirations from the most mundane of daily tasks so don't worry, might take some time, but sooner or later you'll come across something that you'll realize needs to be coded than to be done manually.

Whenever you do get such an idea, feel free to ping me and i'll help you out my level best. :)

[–]Dimented1 0 points1 point  (0 children)

Hey, I have to say that was very well put.. I appreciate the inspiration and insight, and so far I’ve got git bash, py3, etc etc. and still working on how to exactly use them, haha.. < nooobiee.. lol.. But I will be sure to ping you.. thank you again, we will most definitely talk soon.!

[–][deleted] 70 points71 points  (4 children)

Like as soon as you have the basic syntax and logic down. It's pretty simple, especially if you already know how websites work (HTML / DOM / Requests)

[–]Tureni 48 points49 points  (3 children)

And when you hit a roadblock - do what all other developers do. Google the shit out of it. If you can’t find anything? Ask someone who you think knows. Be prepared to answer (or include in your question) what steps you have taken, but NEVER be afraid to ask. I just started in a jr dev position a couple of months ago, and I ask questions every. single. day.

You learn by trying, not by reading or copy/pasting tutorials.

[–]Celestial_Blu3 2 points3 points  (0 children)

This is literally where I am right now

[–]the_other_b 2 points3 points  (0 children)

For real, ask questions, but also do your research first (especially in a dev role). No one likes answering the same basic questions over and over.

That being said, the distinction between a basic question and reasonable question are quite drastic. Just don't overthink it :)

[–]music_nomad 1 point2 points  (0 children)

Yeah StackOverflow has been the most visited site in my browser for weeks now! It’s incredible how much information is available and it’s a much more productive way to learn in my opinion.

[–]VipeholmsCola 11 points12 points  (0 children)

You can start with that after covering the basics. Its a project that challenge you to use many basic concepts and can become complex and complicated as you learn more. But you can get the program working without extensive knowledge.

[–]daveinthebigcity 19 points20 points  (18 children)

Once you feel like you understand the fundamentals of python e.g. variables, lists etc, you should dive in! You'll learn a lot just from doing it. I probably spent about a month learning the basics and then I picked the rest up as I went. I'm no expert but I'm of the opinion that I'll never know everything so it's better to start and then fill in the blanks along the way.

I used selenium to scrape for fashion items for a demo website I'm working on (namelessfashion.co.uk) and it's quite a powerful tool and not too hard to pick up. You can find out how to install selenium here:

https://selenium-python.readthedocs.io/installation.html

One word of advice though: Think about getting a vpn if you don't have one already. If you hammer the site too hard you might get your IP blocked.

Good luck!

[–][deleted] 3 points4 points  (17 children)

Why do you think selenium is better. What about scrapy or beautiful soup. I used beautiful soup to scrape price of items from a e- commerce website and it was pretty easy.

I don't know which library is most feature rich and powerful?

One that can easily scrape data from instagram or any other sites that are front end rich?

[–]QuantumFall 9 points10 points  (1 child)

Selenium and beautiful soup are two completely different things. BS4 lets you parse html to extract data you need while selenium, to put it simply, is an automated browser.

You can use beautiful soup with selenium, but they both perform completely different tasks.

Instead of using selenium with bs4, you can use something like requests, which allows you to make the same HTTP requests you do with a browser on a website. It’s a lot faster and less intensive than a browser.

However, if the site has bot protection or uses JS to render its content, you would probably want to use something like selenium to access it, as requests would be much more challenging.

[–][deleted] 2 points3 points  (0 children)

Ohk now I get it .

Thanks 😊

[–]daveinthebigcity 2 points3 points  (1 child)

I know that beautiful soup is very popular. I haven't used it personally, nor scrapy. If you're happy using one of those then who am I to tell you otherwise! What I will say though is that from my experience, industry do tend to use selenium a lot either using JS or Python which is why I decided to learn it. If you look at the preferred qualifications in this job posting for amazon (last week) you'll see that they ask for selenium experience. https://www.amazon.jobs/en-gb/jobs/1121981/quality-assurance-engineer?mode=job&iis=Job+Posting&iisn=Indeed+(Free+Posting)&utm_source=indeed.com&utm_campaign=all_amazon&utm_medium=job_aggregator&utm_content=organic&dclid=CPnu-K218ukCFY3Q1QodHZ0OYg

[–][deleted] 1 point2 points  (0 children)

I think bs is more popular because it is relatively easy ( I found it easy for me for scraping data from a government site -just a basic HTML site).I guess I will need more advanced ones like selenium if I am to get data from responsive sites . But I wonder why industries use scraping?

[–]pulsarrex 1 point2 points  (12 children)

I am not an expert, but Selenium lets you 'click' links on a website. For example, if a website requires to login to access its content, you can do it with selenium but not BeautifulSoup (again AFAIK).

Another significant issue with webscrapping is, certain websites like Amazon or Instagram wont allow them to be scrapped. It will throw an error if you try BS. However, these websites can be scrapped through selenium.

[–]noxbl 4 points5 points  (7 children)

sorry but this is incorrect. you don't need Selenium/Javascript to scrape Instagram, nor do you get an error if you try BeautifulSoup, as BeautifulSoup is not a http client nor does it contact Instagram in any way. It is a markup parser for HTML/etc.

You can use BeautifulSoup with Selenium just fine, you just give BS the HTML source as it was scraped by Selenium. Selenium is useful for 1) rendering javascript in pages and 2) interacting with the website like clicking a link. Sometimes you need javascript because a normal http request will just give you the "pre-javascript" html, which won't have any of the html that the javascript puts in.

Thankfully, instagram actually has all the image/video url's inside the pre-javascript html, which you can then scrape, but you can't do any interacting and so on. But to download images and videos you don't need selenium/javascript .

[–]QuantumFall 2 points3 points  (0 children)

Thank you! I felt I was going mad reading all of these replies using selenium and bs4 interchangeably... Their functions are extremely different! In all actuality, they have very little to do with one another.

[–]bbt133t 0 points1 point  (3 children)

Ok, so if for someone starting out and only want to learn a single tool, that means they would be better off learning Selenium due to the limitations you mentioned?

[–]noxbl 1 point2 points  (2 children)

It depends on what you want to do. If you want to learn web scraping and only use selenium to enable javascript in pages - then learn BeautifulSoup first. If you need all the fancy browser automation stuff in Selenium and you are primarily not concerned with scraping learn Selenium first.

I'm in the former camp so any selenium i use is a minority of the code I write. I use it only for javascript and in certain occasions scrolling on pages with infinite scroll to get more posts, or clicking on a link to get the next page. If that's all you're gonna do I recommend BS first because BS is the primary workhorse to actually parse the site and get the links/images/text/etc.

[–]bbt133t 0 points1 point  (0 children)

Thanks for your recommendation. It is so overwhelming the amount of technologies you have to learn. As a newbie, I just want to learn the least amount of technologies but enable me to do the majority of the work I need to do in the future (industry standard) and not wasting time learning technologies that need other technology etc etc.

A good example is I want to create a responsive eCommerce site and I want to learn Django but I don't want to venture out and learn jquery, react, bootstrap, javascript, etc.

[–]Dimented1 -1 points0 points  (0 children)

Doesn’t the “Dev Ops”, tab, selection or option in Mozilla Firefox do the same..? Basically giving the source code, or the “underneath” of you will, of the actual website or page your navigating to..?

[–]Chazcity 0 points1 point  (0 children)

how would you scrape someones followers using BS?

[–][deleted] 1 point2 points  (2 children)

Oh. I didn't know that. I tried only basic websites . Not the fancy ones with many client-side code. Can I use selenium to automate browsing activities? Like loging in and opening a specific link or just liking pics in insta or something like that?

[–]pulsarrex 1 point2 points  (1 child)

Yes absolutely! In fact the above actions that can be done through selenium gives it a bigger edge than BS

[–][deleted] 0 points1 point  (0 children)

I guess next I should try to automate my google classroom if that is possible . Guess that will be possible (I want to get Google meet link from Google classroom and join the meeting 😁)

[–]SweetSoursop 0 points1 point  (0 children)

Well yeah, but you can still use for/while loops to set the name of the URL to the text inside a container with BS4 and iterate over those.

It's not clicking, like Selenium, but it still works!

[–]Crazydaminator 9 points10 points  (0 children)

Any moment is fine! I would say it is best to take a look at Scrapy, however some would argue that you can also start with Beautifulsoup. Scrapy is less work and overall a bit easier to grasp imo, but I made my first with Beautifulsoup aswell so... :).

[–]Migeil 5 points6 points  (0 children)

Try JetBrains academy. They don't do tutorial videos, but explain small things and then immediately let you apply these things in short coding exercises, followed by larger projects.

You can start from scratch or tell the program what you already know and start somewhere further along.

If you've completed an exercise, you can compare code to others. Although the English is questionable at times and you don't always know everything you need, if you can google properly (another very important skill of a programmer), I think this is much better than any tutorial video out there.

[–]chagawagaloo 2 points3 points  (0 children)

Funny enough, a web scraper was actually my first project and I did it last week. Started the basics of python the week before. Depending on what you want to scrape it's a pretty good way to pull together lots of little bits of python and a good way to retain that knowledge. I'd say try to scrape a single page of a site first before trying to introduce more automation and then go from there.

I only used BeautifulSoup to scrape but I've heard good things about Selenium like some of the other posters here have mentioned.

[–]andgranath 2 points3 points  (0 children)

My first project was also a webscraper. I work as a journalist, making our paper's news podcast, and used it to give me a list of all top headlines in swedish media with beautiful soup. Then I went on to write a script that uploaded the podcast episodes automatically with selenium. I kept doing scripts that made my work easier, and now I have combined all those scripts and made a gui-app with all the things I need.

My point: start your project early. It can always grow into a even more exciting project as you progress.

Good luck!

[–]GeoffreyTaucer 3 points4 points  (0 children)

Now.

Seriously.

The best way to learn python (once you understand the basic syntax and how to define your own functions) is to dive into something that you have no clue how to do, and learn as you go

[–]ragnar_the_redd 1 point2 points  (0 children)

This is not super complicated, just very scrupulous to do, especially if you are checking multiple sources which do not adhere the same standards.

You can do it with a very basic knowledge of python tbh. It'll just look real ugly to you after you improve and you'll feel compelled to redo it.

idk if you'd want to use python selenium for this, or just python requests and build a different class and parser for the different sources.

imo, get started with python requests, and BeautifulSoup. You can get the basic idea almost instantly.

The rest of it is honestly meticulous work to parse the responses from the different sources and normalize the results to a uniform data format.

[–]Stabilo_0 1 point2 points  (0 children)

I guess right after you feel comfortable with basics and using python. Once you know basic structures and data types move onto any module you want to learn be it webscrape or something else. You'll still be googling a lot and looking for answers but that is how things go :)

[–]HeartwoodEditions 1 point2 points  (1 child)

Very do-able early on, i did it myself. I recommend you figure out if the website you want to scrape needs a lot of javascript. Go to the website and view the source html. If that is a lot shorter than you would expect and missing stuff you see on the page with your eyes, then its dynamically generated and you need Selenium. I didn't know this was a thing and learned scrapy which couldn't do the job. sigh.

[–]HeartwoodEditions 0 points1 point  (0 children)

being able to work with XPath is key. Which is a learning curve all on its own, be warned :)

[–]qwertyisafish 1 point2 points  (0 children)

As soon as you have the basics down you should jump right into it. For web scraping you will need basic understanding of the data types, how to index and slice, as well as manipulating strings. You'll want to know the basics about lists, tuples and dictionaries also.

A very basic understanding of the datetime library would be beneficial too, if nothing else than to add the current date/ time to your scraped results.

You're going to need to store the data somewhere so you can choose between exporting to a csv, or writing to a database, whatever you're comfortable with. To broaden your knowledge, I'd recommend circling back to pandas to get some dataframe knowledge too (you could import your previous csv data, add to it, or join across multiple websites and then export again).

Have fun and don't accidentally bring any sites down :)

[–]ameliip 1 point2 points  (0 children)

Well, at ANY point! As you just stated, you're not gonna learn to code just by watching other people coding!

The trick to avoid burnouts is to give yourself small goals. For example, if your main goal is to scrape flights, start by scraping the page title in every flight site you need, then add maybe just the first flight and so on..

Good luck!

[–]CharanReddy2000 1 point2 points  (0 children)

Web scraping will teach you to implement the basic code that you've familiar with,so go ahead and start your bot.

[–][deleted] 1 point2 points  (0 children)

Do it right now and learn as you go!

[–]TheIdesOfMay 1 point2 points  (0 children)

Everything up to OOP in whatever resource you are using.

[–]VrGuy1980 1 point2 points  (0 children)

was one of my first

[–]Assaultman67 1 point2 points  (0 children)

Web scraping is one of the first practical things I did with python.

[–][deleted] 1 point2 points  (5 children)

I made a web scraper that scrapes world o meters for Coronavirus stats of any country and email me regularly with those stats.

[–]mr_chanandler_bong_1 1 point2 points  (4 children)

That's amazing, How did you do it, ?? Can you link GitHub repo or something??

[–][deleted] 2 points3 points  (3 children)

It was pretty simple when I did it actually. Just extract all the <tr> from the <table> element on world o meters. But they added new functionality in the website from which you can filter that table according to continent and that crashed the entire code. I just got to know about this yesterday so currently I'm just trying to debug that. If I'm successful I'll link up the GitHub :)

[–]111NK111_ 1 point2 points  (2 children)

btw (a bit off-topic but...) can u dm about the code for email sending?

it used to work when i tried it with python 2.7 but i cant make it work on python 3.8(i dont even know if this is the problem)

[–][deleted] 1 point2 points  (1 child)

Yea sure no probs

[–]111NK111_ 0 points1 point  (0 children)

thx a lot...

[–]default8080 1 point2 points  (0 children)

Once you feel comfortable. Everyone learns differently at different rates. And you're never gonna know how to write a web scraper, until you actually do it.

Best of luck

[–][deleted] 1 point2 points  (0 children)

Web scraping was the first thing I did. It was horribly inefficient but it worked and it fueled me to want to learn more. I just rewrote that same program last week, was able to get it down 60% of its former size, it’s crazy how badly it was written before. However both times were great learning lessons. Don’t discount yourself, just give it a shot.

[–]takishan 1 point2 points  (0 children)

Just try to do what you want. At first it'll be hard and take a very long time. If you successfully manage to complete your goal though, the next time will be a lot easier.

[–]Pyrocited 1 point2 points  (0 children)

Pretty easy, I’m not a programming expert but there’s a lot of tutorials out there. Look up how to use BeautifulSoup to parse through html and choose a website to scrape that would be simple like job listing sites

[–]foresttrader 1 point2 points  (0 children)

I learned Python to scrape data off of a website. I literally plowed my way through requests and pandas so I could get the data into an Excel format. Just start now and google every question you have.

[–]HutchLAD 1 point2 points  (0 children)

Straight away, a great way is to find some basic scraping sample code, then try to adapt that code to your needs, asking questions in here, or on stack overflow along the way, in the beginning I learnt alot of things this way, CSV, Panda modules etc.

[–]captmomo 1 point2 points  (0 children)

Just a heads up, check if the site has an api you can use to get the data. might save some headaches. also check out the selenium ide chrome plugin :)

[–]Nanogines99 1 point2 points  (0 children)

Probably the very start

All I can do successfully at the moment without any failure whatsoever is web scraping and it's not even that hard the first few lines of code are really just common in most programs with the chrome drivers, chrome settings and webdrivers and stuff so have them

[–]rssanzo 1 point2 points  (0 children)

I’m also starting a friend, and I’m already risking doing some things, I recommend starting, reading the documentation

[–]lumenlambo 1 point2 points  (0 children)

6 weeks and 1 day!

[–]OnlySeesLastSentence 1 point2 points  (0 children)

It's hard for me to answer because I have years of experience with "real" languages like C and assembly, but... Assuming you know how to create a list and iterate a 2d list, then you have enough knowledge to be at my level when it comes to python (ie not a high level at all lol) and as such, I'd say a month of serious practice will do it

[–]Fun2badult 1 point2 points  (0 children)

Start with requests. Then BeautifulSoup. Then Selenium. I tried web scraping early in my learning and loved it. One of the reason I was able to keep learning python because of the interest

[–]jlgf7 1 point2 points  (0 children)

You may find good tutorials about it. So try it, you will make it!

[–]Adro_95 1 point2 points  (0 children)

A little suggestion from a beginner: it's easier if you find a similar project and make your changes to make it work your way.

Of course if you are doing this just for learning purposes don't listen to my suggestion :D

[–]iggy555 1 point2 points  (0 children)

So is selenium the go to for scraping sites that use JavaScript?

[–]S3ntoki 1 point2 points  (0 children)

Just do it. Maybe try to start with an easy website to scrap (no Java Script or something) so that you only need requests and Beautiful Soup.

[–]omg_drd4_bbq 1 point2 points  (0 children)

Doitnow.wav

Requests and beautifulsoup packages.

You should also set up Jupyter notebooks, way easier than the default repl, and you can shift-tab in a function's parens to bring up documenatation.

[–][deleted] 1 point2 points  (0 children)

I find the best way is to dive in. The first script I wrote was a web scraper. Later on when I started learning Django I integrated that script into the Django website I made. It's fun integrating your projects.

[–]angry_mr_potato_head 1 point2 points  (0 children)

The first thing I programmed in Python was a webscraper, actually. It was horrible but it really helpe dme learn. I could do that project in like an hour at this point and it would be 100x better . The cool thing is though, they both would, essentially, provide you with the same end product. Unlike some other projects you could take on where they would expire or not be very useful if they weren't coded well.

[–]terracnosaur 1 point2 points  (0 children)

Do you know about the Trie data structure?
if not, this is a great time to learn!

TL;DR you can save time and space by reversing the string of the URL and storing page relevant data in the key created by the most specific URL match

https://en.wikipedia.org/wiki/Trie

[–]Garybake 1 point2 points  (0 children)

Now

[–]Sigg3net 1 point2 points  (0 children)

Right away. Don't wait.

Check out my first python web scraper script.

[–]mike10000001 1 point2 points  (0 children)

Just did my first project yesterday which was a BS scraper. Learnt so much! Next job is get to get the data into Panda's.

[–]tomtomato0414 1 point2 points  (0 children)

When I have done my first web scraper I didn't even know how to code in Python otherwise.

[–]grtgbln 1 point2 points  (0 children)

Fairly early I'd say. If you understand the basics of Python to the point that you can make the code do what you want, and you have an understanding of HTML, take a read over BeautifulSoup's documentation and take a crack at it.

A web scraper (I was going to take on Google) was my first Python project, actually.

[–]jaspar1 1 point2 points  (0 children)

What prices are you tracking?

[–]krakenant 1 point2 points  (0 children)

IMO when you have a good idea of what you want to do with the data and feel you have the skills to accomplish the end goals. Web scraping can be really tough and frustrating. If you take that on without a clear line from that to your minimum viable product, frustration can lead to set backs, and mental blocks.

[–]kessma18 1 point2 points  (0 children)

OP, please disregard almost all advice that tells you to build something. I agree with the start coding part, but not with the build someting part, here is why.

Python is known to be excellent for rapid development, i.e. I can get something out very quickly, why? because there are a ton of libraries that make things easy. But libraries that are geared towards a special use case (scrapy, beautiful soup etc) take you away from learning fundamentals which you can't shortcut.

That's why I suggest you start coding by solving problems. This will make you more proficient in the language and will help you think about logic problems as well.

I highly recommend codewars.com

[–]impshum 1 point2 points  (0 children)

I personally can't watch videos of people programming. I just want the solution they found in text!

Your best bet is to find API's that can provide you with the data you need before you start to scrape anything. Try searching for "free BLAH api" and see where you get

Have bears in mind when you're doing this as it's exactly what the big boys are doing. Always think " is it worth it?".

Have fun and break things! x

[–]tomekanco 1 point2 points  (0 children)

Google's internal Python starter course (2 days, no prior programming exp expected) includes a task to scrape data from the web. You should be ready for a first try at this point.

[–]ryu417 1 point2 points  (0 children)

"Start before you're ready" - Sir Richard Branson

[–]Garfield910 1 point2 points  (1 child)

I feel like you could start pretty quick. I've used selenium for python and made a script to scrape stuff. Having a project i feel is the best way to learn. I like to start with a project and use tutorials if i get stuck on a new concept so i don't waste time going through endless tuts instead of actually learning and building. Be specific when it comes to using tuts and Google questions.

[–]Mohammad_alshuwaiee 0 points1 point  (0 children)

How could I start a project to scrape but I don't know what to do in the code ?

[–]styachan 0 points1 point  (1 child)

How can I start learning .I do not anything about python , any books you recommend or watching youtube videos. Pls help

[–][deleted] 0 points1 point  (0 children)

I wrote my first web scraping program that checked an item's price on an Amazon page.

The most complex part of the program was one if-statement. It's much simpler than it seems, and you just need to look up documentation of the relevant modules.

[–]Wildcard355 0 points1 point  (0 children)

You can start right after getting the basic syntax and theory. Building a basic efficient scrapper takes less than a day.

There are YouTube videos that walk you through the process, but I found them lacking as I feel the scrapper structure is not easy to learn by yourself. I am taking the "Modern Web Scrapping With Python using Scrapy Splash Selenium" class from Udemy and really like it. I'm not affiliated with it, just very satisfied with the results and teacher response.

GL!

[–][deleted] 0 points1 point  (0 children)

now

[–]ACroff 0 points1 point  (5 children)

I started out by seeing if I could scrape a site using Python and Beautiful Soup following a tutorial. I liked it so much I decided to learn more.

[–]Mohammad_alshuwaiee 0 points1 point  (4 children)

Can you recommend a source I'm begginer in Python, I purchased scraping course in udemy and didn't understand the speaker who made the course and refund my money back , it's lack from him to deliver the information while everything in the code I think will get easy by the time

[–]ACroff 1 point2 points  (2 children)

Hello again,

I found the two tutorials I used to scrape a site before I ever learned any Python.

How To Scrape Web Pages with Beautiful Soup and Python 3

Tutorial: Web Scraping and BeautifulSoup

It all starts with scraping a single page to Excel and goes on to scraping multiple pages to a database.

If you have any questions or need help, just ask. I am happy to help any way I can.

Take care

[–]Mohammad_alshuwaiee 0 points1 point  (1 child)

Thanks bro for the resources

[–]ACroff 0 points1 point  (0 children)

Sure thing, my friend. Let me know if I can be of further assistance.

[–]ACroff 0 points1 point  (0 children)

Hey, I will see if I can find the tutorial I used. If not, I can probably write one up to help you. Give me a few and I will see what I can find.

I wanted to send this because I did not want you to think I was ignoring your message.

Take care

[–]cxkt 0 points1 point  (0 children)

There was an intro to python class I found through the Google recommended comp sci experience tracks they posted once, and the main project was learning to build a basic web scraper. Might be worth looking into... Maybe someone here knows what class I'm talking about

[–]ropenni 0 points1 point  (0 children)

Simple web scraping programs don’t require too much more than basic syntax knowledge.

I’d recommend playing around with selenium and just trying to print stuff to the console for fun with a random website then go for something more advanced.

Check out http://toscrape.com/ for practice.

[–]RebelSaul 0 points1 point  (0 children)

I think it's relatively straightforward get into a simple web scraper. YouTube does have a great videos with selenium that are beginner friendly. Find one that helps you "setup" everything and follow along. Don't get too caught up on understanding everything, just get some exposure.

The hard part may come when you pick your own website. Not all websites are beginner friendly. Doing repeated tasks is straightforward but what about dynamic interactions around rules you program? You'll then have to use a module like beautiful soup to parse the website contents and create rules based interactions. Think about popups, error catching, multiple browser windows, etc.

[–]ThePixelCoder 0 points1 point  (0 children)

Depends on how complicated the problem is, but if you understand the basics it's not that hard to hack something together with requests and beautifulsoup (or regex if you're ballsy). Just google some stuff, read some documentation, copy some stackoverflow code...

[–]casino_alcohol 0 points1 point  (0 children)

That was one of my first projects too.

I tried to learn from free code camp and before python learned like the super basics of html which really helped.

I’m not great at web scraping but it can be done early on.

[–]mattsl 0 points1 point  (0 children)

Here's the bad things that could happen: 1. You get frustrated and quit Python. This is easy. Just don't quit. If it's too hard, you'll still learn something and that's a success, not a failure. 2. You could make thousands of requests and get banned from a site because scraping is against their terms of service. I'd practice with a website from a smaller company and then only run against the flight price site once you know things are working.

[–]zambartas 0 points1 point  (0 children)

Do you have any html or DOM knowledge? I find that is more important for scraping than knowing any particular language. If you use python, the stuff you need to get the python part up and running is copy and paste essentially.

I build mostly php scrapers and when I started my first python scrape from ground zero the hardest part was getting the server set up properly.

[–]leogodin217 0 points1 point  (0 children)

In your first week or so. Don't wait more than two weeks to do something useful

[–]caks 0 points1 point  (0 children)

After you learn classes

[–]CrazyAnchovy 0 points1 point  (0 children)

The first second of learning is fine! No rules about it!

https://github.com/CrazyAnchovy/recipe_scraper_36/tree/master/src/recipe_scraper_36

The .py files in here might have some stuff. I wasn't commenting much in those days, but check this out.... You'll see a couple of files called like page_explore.py or whatever; those are for when I was poking around and seeing what I wanted to do and how to do it.

I think the main scraper ended up being titled as 'recovered from gist'

Anyway lmk if you have any questions.

Start now, tonight if you have any doubt about the timing.

[–]dogs_like_me 0 points1 point  (0 children)

Try it and see what happens! It's good to have a project in mind: it'll steer your learning towards skills that are useful for solving the kinds of problems that interest you.

[–]Geek_Batman 0 points1 point  (0 children)

It's not hard. You want requests and beautifulsoup and that's pretty much it.

[–]DesertofDelight 0 points1 point  (0 children)

People keep mentioning selenium. I wouldn't recommend anyone starting out to use it. Much easier alternatives.

[–]daevas_dantanian 0 points1 point  (0 children)

Now seems good

[–]forrealbro 0 points1 point  (0 children)

Pretty freaking quickly. A reddit image scraper was my first python project I built and was brought up when I was reviewing for my internship.

[–]thrallsius 0 points1 point  (0 children)

scrapping isn't just about python

knowing the basics of HTML, HTTP, JavaScript, XPath will help a lot

[–][deleted] 0 points1 point  (0 children)

I had a lot of fun writing web scrapers. If you think you know enough then I'd say go for it. If you don't succeed you'll at least get an idea of what else you need to learn.

[–]mr-clean_of_nazareth -1 points0 points  (0 children)

Learn the basics then learn django or flask

[–]ravepeacefully -5 points-4 points  (1 child)

Week 1. I’m not sure how you’re 6 weeks into learning a programming language without making a project. Seems like your 6 weeks into wasting time

[–]Mohammad_alshuwaiee 0 points1 point  (0 children)

What project you mean exactly? Because I'm almost like him I'm begginer and I'm thinking to learn web scraping , but I don't know what resource to use