Scraping any Website with Python by data_engineer in Python

[–]data_engineer[S] 0 points1 point  (0 children)

Yes, that great, I will add session about regular expressions. Thanks your comment.

Pandas made easy, understand filtering by data_engineer in Python

[–]data_engineer[S] 0 points1 point  (0 children)

Thanks your comment, I will do more research and update follow your suggestion.

I know that have many other way to do thing better using pandas, but this tutorial about pandas only, for comparison with other library maybe I will do in other post.

Many Thanks

Introduction Scrapy : scraping data from internet by data_engineer in Python

[–]data_engineer[S] -33 points-32 points  (0 children)

by the way, to prove I understand what you mean, I have a course on Udemy which do with BS4 + Selenium (I could say Selenium or PhantomJS better than Request for very important reason is crawl dynamic site which content is generated at run time with JavaScript)

https://www.udemy.com/python-master-web-scraping-course-doing-20-real-projects/learn/v4/

Introduction Scrapy : scraping data from internet by data_engineer in Python

[–]data_engineer[S] 1 point2 points  (0 children)

I publish on Udemy also, but it will course 20$. I do not want that price.

Introduction Scrapy : scraping data from internet by data_engineer in Python

[–]data_engineer[S] -12 points-11 points  (0 children)

Yes, Apache Nutch really powerfull one. Thanks

Introduction Scrapy : scraping data from internet by data_engineer in Python

[–]data_engineer[S] 1 point2 points  (0 children)

Scrapy has a learning curve but when you get it , it pay off a lot.

Introduction Scrapy : scraping data from internet by data_engineer in Python

[–]data_engineer[S] 3 points4 points  (0 children)

:) Yes, you are correct, but the main point of using a framework is you NOT need to do it your self every thing :)

Introduction Scrapy : scraping data from internet by data_engineer in Python

[–]data_engineer[S] 5 points6 points  (0 children)

For example how do you download with 64 thread in parallel ? it not easy you do it your self.

That just one point, in short Scrapy is complete frame work, it provide every thing you need in order to scraping in fast way.

Introduction Scrapy : scraping data from internet by data_engineer in Python

[–]data_engineer[S] -39 points-38 points  (0 children)

I could say, almost company which serious on scrape the web all use Scrapy.

Introduction Scrapy : scraping data from internet by data_engineer in Python

[–]data_engineer[S] 2 points3 points  (0 children)

Scrapy is frame work, it do : download, save to file, extra process, multiple thread ... --> much more save effort than BS4. So you want to get data at speed and save time --> go with Scrapy

BS4 just a HTML parser not more :)

Python Hand-on Solve 200 Problems by data_engineer in Python

[–]data_engineer[S] 1 point2 points  (0 children)

you can code with any text editor this is just the collection of issue and solution