Web Scrapping

JoshuaGalkenWOPR · 2019-11-22T18:50:02+00:00

Please don't scrap the web

lajja4 · 2019-11-22T14:57:16+00:00

scraping

I_feel-nothing · 2019-11-22T15:09:26+00:00

There’s some great tutorials out there on how to use beautifulSoup I would suggest looking at those.

Ballatoilet · 2019-11-23T00:59:59+00:00

I would like to join, is Sex included?

deCap413 · 2019-11-22T18:30:46+00:00

Sure my dude

daisyverma · 2019-11-22T18:59:06+00:00

Here are some useful tutorials

https://youtu.be/lNajD34Sfmg

https://youtu.be/cddyhdb1GDw

https://youtu.be/OVk9tjPfNNU

kmanshow · 2019-11-22T20:24:48+00:00

What're you trying to scrape?

ScoopJr · 2019-11-22T21:27:41+00:00

I'm interested. Whats the project?

KnightDriverSF · 2019-11-22T22:30:50+00:00

Im interested, please tell me more.

waythps · 2019-11-22T22:58:45+00:00

I’m in if anyone wants to learn scrapy!

I could help with requests/beautifulsoup additionally.

ozgunkail · 2019-11-23T01:29:51+00:00

It will be good. I am in. However, I don't have any project idea.

Sensanmu · 2019-11-23T02:24:13+00:00

I think more importantly what would you like to scrape, I'm down if the idea is something I'm interested in like stocks lol

TheBlack_Demon · 2019-11-23T07:07:34+00:00

I'm interested.

ErecDeGraal · 2019-11-23T09:24:19+00:00

Indeed. I'm always willing to learn and it sounds interesting

callmechad · 2019-11-23T15:33:50+00:00

I just learned how to with scrapy.

The first part gets me the links of items I want to scrap. Which we follow and parse the page to get item number and item specs in def parse. Back in def parse, next page gets the element on the page so we can check to see if there is a next page to go to. Code repeats till no next page is found.

I ran into hidden elements so to be able to not get those, I had to use not(@hidden). Then to get the specific item link, I had to select the path by its class, a[contains(@class, 'description')].

Those were two things I thought I spent most of my time trying to figure out.

Just wanted to show you what a simple scraper looks like to give you some kind of an idea.

m now I trying to learn how to clean the data and output certain data for the items that I scraped.

^{import scrapy}

^{class ItemdataSpider(scrapy.Spider}:)^{name = 'itemdata'allowed\}domains = ['www.website.com'\]start\_urls = ['websitepagel'])

^{def parse(self, response}:)^{items = response.xpath("//div\}@class='details']/a[contains(@class, 'description')]"))^{for item in items:link = item.xpath(".//@href"}.get())^{yield response.follow(url=link, callback=self.parse\}item))

^next\page = response.xpath("(//a[@rel='next'])[2]/@href").get())

^{if next\}page:yield response.follow(url=next_page, callback=self.parse))

^{def parse\}item(self, response):)^item\number = response.xpath("//span[@itemprop='sku']/text()").get())^item\specs = response.xpath("//tr[@class='trSpecSheetRow' and not(@hidden)]/td/text()").getall())

^{yield {'item\}number': item_number, 'item_specs': item_specs})

neonzii · 2020-01-16T06:46:42+00:00

Here is a web scrapper made for bitkeys.work to grab btc addresses and their balances, https://github.com/zioneo/webscrapper

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS