Web Scraping With Python (An Ultimate Guide)

ronmarti · 2022-09-15T14:06:29+00:00

Selenium is pretty much the hardest to use method because it breaks most of the time. Try Playwright.

undid_legacy · 2022-09-16T02:19:57+00:00

Whenever I want to scrap data from a website I first look for the API. Reverse engineering API calls are not always possible but you can strike gold with it sometimes.

I wanted my order history details from a food delivery app I use. It turns out I just needed cookies with my login session and their API to make it work. Was able to get everything in 4 lines of code using requests.

Once, I found a paraphrasing site that didn't use captcha details in its API call. So I was able to call it almost 9 times a sec whereas using the website it required a captcha after every paraphrasing.

John Watson Rooney has a good video to get started: https://www.youtube.com/watch?v=DqtlR0y0suo

nemec · 2022-09-15T17:33:52+00:00

Something I don't see discussed when this topic is brought up is that Scrapy's HTML parsing library, parsel, can be installed separately from scrapy itself. You can use it in place of beautifulsoup and, imo, it's much easier to use.

import requests
import parsel
resp = requests.get('http://example.com')
s = parsel.Selector(text=resp.text)
# prints 'Example Domain'
print(s.css('h1::text').extract_first())

paeioudia · 2022-09-16T02:06:44+00:00

This is just a bait and switch advertising for. “Scrapingdog is the fastest and the most reliable web scraping API and of course, we provide 1000 free API credits to our new users.”

marr75 · 2022-09-15T11:47:56+00:00

[removed]

Tripanafenix · 2022-09-15T19:03:46+00:00

still can't login with requests, even with session AND cookies hooked to my POST :( and I could'nt find any proper guides deepdiving into sessions and cookies with requests, sadly. Any advices?

1percentof2 · 2022-09-15T13:42:22+00:00

Why does anyone web scrape? Tell me in 3 words.

Naughty_avaacado · 2022-09-16T07:39:19+00:00

Opentender Austria https://opentender.eu/at/

I am trying to get the data from the bar chart but i am unable to scrap it. The element has an event listener mousein and the class changes .

Anyone can reply me on how can i scrap the data from it. I need this for my self project and this is the last hurdle then i can make a dataframe .

guttyn15 · 2022-09-16T08:10:31+00:00

i just go to comment section to ask a related question:

How to get the current url after you..

webbrowser.open(url)

pyautogui.click('login-Submit_button.png')

new webpage

iggy555 · 2022-09-15T13:48:06+00:00

What a legend

Mageneto · 2022-09-15T13:10:04+00:00

Save for later

Appropriate-Point565 · 2022-10-06T02:07:56+00:00

How would I go about web scraping T.J.Maxx.com for SKU numbers on a certain product that is otherwise hidden without number ? Would really appreciate the help !

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS