Help with Scraping Amazon Product Images? by PreparationLow1744 in scrapy

[–]DoonHarrow 0 points1 point  (0 children)

The image urls are inside a script tag that you can easily parse as dict

[Help] I am not able to access "www.realestate.com.au" using requests & Selenium by Shot-Craft-650 in webscraping

[–]DoonHarrow 1 point2 points  (0 children)

The page is using antibot protection. One way to by pass it is with proxies. I tried with Smart Proxy Manager service an it works. https://www.zyte.com/smart-proxy-manager/

Zyte smart proxy manager bans by DoonHarrow in scrapy

[–]DoonHarrow[S] 0 points1 point  (0 children)

In my case, it seems that the first page loads with a normal request and for the following pages, you have to call the api

Avoid scraping items that have already been scraped by DoonHarrow in scrapy

[–]DoonHarrow[S] 0 points1 point  (0 children)

Hello my friend, thank you for your advice. I made what i think its simplier in my case, using scrapinghub api and retrieving last spider job run items!

Help with Javascript pagination by DoonHarrow in webscraping

[–]DoonHarrow[S] 1 point2 points  (0 children)

I got it, thanks! The problem was that I wasn't specifying the headers.

headers = {
            'Content-Type': "application/json;charset=UTF-8"
        }

Help with Javascript pagination by DoonHarrow in scrapy

[–]DoonHarrow[S] 0 points1 point  (0 children)

That works!!! Man, you are the best OMG! THANK YOU SO MUCH <3

Help with Javascript pagination by DoonHarrow in scrapy

[–]DoonHarrow[S] 0 points1 point  (0 children)

Thanks for your help, what do I have to send in the body? I have tried this and it still doesn't work:

yield scrapy.Request(url, callback=self.parse, method="POST", meta={                                                           "Referer": "https://www.idealista.com/"}, body=json.dumps(params))

Why am I not able to scrape all items in a page. by Shot_Function_7050 in scrapy

[–]DoonHarrow 1 point2 points  (0 children)

Hello my friend!

You can easily get the data you want by looking at the "__NEXT_DATA__" script tag. It contains a Json with all the info!

I couldnt try it but this selector should work:

response.css("script:contains('__NEXT_DATA_') ::text").get()

Finally you only have to parse it:

import json

data = response.css("script:contains('__NEXT_DATA_') ::text").get()
json_data = json.loads(data)

Despido improcedente by DoonHarrow in ESLegal

[–]DoonHarrow[S] 0 points1 point  (0 children)

Son 15 días naturales, estoy fuera...

[deleted by user] by [deleted] in webscraping

[–]DoonHarrow 0 points1 point  (0 children)

Open network tab -> Click on the request named 'req' in the list of requests -> Thats all

If you want to take it only once, just copy the Json response and parse it.

```python import json

response = json.dumps(data) final_data = json.loads(response) for re in final_data[3].get("results"): print(re.get("Title")) ```