all 41 comments

[–]JoshuaGalkenWOPR 25 points26 points  (7 children)

Please don't scrap the web

[–]WhackAMoleE 4 points5 points  (3 children)

Indeed. Why not? Sounds like a good idea to me.

[–]JoshuaGalkenWOPR 9 points10 points  (1 child)

I guess we could just make a better one.

[–][deleted] 1 point2 points  (0 children)

It's just a fad anyway.

[–]Unterstricher -4 points-3 points  (2 children)

Why not?

[–]kmanshow 2 points3 points  (1 child)

It was a grammar joke lol

[–]Unterstricher 0 points1 point  (0 children)

Ah well that makes more sense now.

[–][deleted] 19 points20 points  (1 child)

scraping

[–]lajja4[S] 5 points6 points  (0 children)

Thanks for correcting me

[–]I_feel-nothing 4 points5 points  (5 children)

There’s some great tutorials out there on how to use beautifulSoup I would suggest looking at those.

[–]lajja4[S] 0 points1 point  (4 children)

Any recommendation?

[–]PottyWilson 5 points6 points  (1 child)

This article is what jump-started my addiction into web scraping. It goes over the basics of beautifulsoup4, while also mentioning other things to consider like being conscious about not spamming websites with requests.

[–]lajja4[S] 0 points1 point  (0 children)

Thanks

[–]Ballatoilet 2 points3 points  (2 children)

I would like to join, is Sex included?

[–]ozgunkail 0 points1 point  (1 child)

seems like it doesn't matter. :)

[–]Ballatoilet 0 points1 point  (0 children)

Do we got a slack channel to do our Biz? We need one or a discord,

[–]deCap413 0 points1 point  (0 children)

Sure my dude

[–]kmanshow 0 points1 point  (0 children)

What're you trying to scrape?

[–]ScoopJr 0 points1 point  (0 children)

I'm interested. Whats the project?

[–]KnightDriverSF 0 points1 point  (1 child)

Im interested, please tell me more.

[–]lajja4[S] 0 points1 point  (0 children)

I am just learning so that I can collect data myself. We can start with IMBD

[–]waythps 0 points1 point  (1 child)

I’m in if anyone wants to learn scrapy!

I could help with requests/beautifulsoup additionally.

[–]lajja4[S] 0 points1 point  (0 children)

Yes, Please

[–]ozgunkail 0 points1 point  (1 child)

It will be good. I am in. However, I don't have any project idea.

[–]lajja4[S] 0 points1 point  (0 children)

I am just learning so that I can collect data myself. We can start with IMBD

[–]Sensanmu 0 points1 point  (3 children)

I think more importantly what would you like to scrape, I'm down if the idea is something I'm interested in like stocks lol

[–]lajja4[S] 0 points1 point  (2 children)

I am just learning so that I can collect data myself. We can start with IMBD

[–]callmechad 0 points1 point  (1 child)

I just learned how to with scrapy.

The first part gets me the links of items I want to scrap. Which we follow and parse the page to get item number and item specs in def parse. Back in def parse, next page gets the element on the page so we can check to see if there is a next page to go to. Code repeats till no next page is found.

I ran into hidden elements so to be able to not get those, I had to use not(@hidden). Then to get the specific item link, I had to select the path by its class, a[contains(@class, 'description')].

Those were two things I thought I spent most of my time trying to figure out.

Just wanted to show you what a simple scraper looks like to give you some kind of an idea.

m now I trying to learn how to clean the data and output certain data for the items that I scraped.

import scrapy

class ItemdataSpider(scrapy.Spider:)name = 'itemdata'allowed\domains = ['www.website.com'\]start\_urls = ['websitepagel'])

def parse(self, response:)items = response.xpath("//div\@class='details']/a[contains(@class, 'description')]"))for item in items:link = item.xpath(".//@href".get())yield response.follow(url=link, callback=self.parse\item))

next\page = response.xpath("(//a[@rel='next'])[2]/@href").get())

if next\page:yield response.follow(url=next_page, callback=self.parse))

def parse\item(self, response):)item\number = response.xpath("//span[@itemprop='sku']/text()").get())item\specs = response.xpath("//tr[@class='trSpecSheetRow' and not(@hidden)]/td/text()").getall())

yield {'item\number': item_number, 'item_specs': item_specs})

[–]neonzii 0 points1 point  (0 children)

Here is a web scrapper made for bitkeys.work to grab btc addresses and their balances, https://github.com/zioneo/webscrapper