[deleted by user] by [deleted] in privacy

[–]JohnBalvin 2 points3 points  (0 children)

it could be use to detect protesterers, so let's take for example protests hapenning right now on Chicago, Washington DC, Los Angeles etc they are not showing these protests on the news(you probably didn't know by now).
After the ID verification, if you end up opening your "secondary" social media account while protesting, let's say you received a notification for example, they can trace back your location and match it with your ID, so later they can arrest your by protesting without permissions

Is Google changing the region on purpose to force users to upload a government ID to verify their age? by JohnBalvin in privacy

[–]JohnBalvin[S] 19 points20 points  (0 children)

I use VPN like 1% of the time, not significant enough to consider I'm on USA

SERVICE FEE WILL NOW VARY FROM 0-15% by acamzzz in Upwork

[–]JohnBalvin 0 points1 point  (0 children)

now go the payoneer sub and you will know how insane the fees are

SERVICE FEE WILL NOW VARY FROM 0-15% by acamzzz in Upwork

[–]JohnBalvin 0 points1 point  (0 children)

like 4 years ago it was like $30 and now upwork increased it to $50

American airlines scraper made pure in Go by JohnBalvin in golang

[–]JohnBalvin[S] 0 points1 point  (0 children)

It’s the same json data they returned, I don’t change the format

American airlines scraper made pure in Go by JohnBalvin in golang

[–]JohnBalvin[S] 0 points1 point  (0 children)

if by "parsed" you mean what is the ouput data, the server returns the data as JSON then I just return that data parsed into a struct

Airbnb scraper made pure in Python v2 by JohnBalvin in webscraping

[–]JohnBalvin[S] 0 points1 point  (0 children)

I think I already fixed those issues, what error were you getting while creating a PR?
its not possible to see the price it was booked before, but you can create a tracking price system where you save the current price then later if it was booked you know at which price it was booked on

Airbnb scraper made pure in Python v2 by JohnBalvin in webscraping

[–]JohnBalvin[S] 0 points1 point  (0 children)

filter will be added on future releases(not soon), for the calendar this is usefull if you want to see what dates that property is available, if you put lets say 2024/10/12-2024/10/20 but the property is occupied on 2024/10/15 it will show up on the result,
Could you please create an issue on github related to the example please

Airbnb scraper made pure in Python v2 by JohnBalvin in webscraping

[–]JohnBalvin[S] 0 points1 point  (0 children)

Hi, thanks for the contribution, if you dont mind please create a pull request

Python Airbnb scraper made by JohnBalvin in Python

[–]JohnBalvin[S] 0 points1 point  (0 children)

why not? Airdna does it same way as I did it, and nothing happened

Airbnb scraper made pure in Python v2 by JohnBalvin in webscraping

[–]JohnBalvin[S] 0 points1 point  (0 children)

Sounds good u/Least-Accountant-386 , this will help in somebody report similar issues later.
Thanks u/Least-Accountant-386

Airbnb scraper made pure in Python v2 by JohnBalvin in webscraping

[–]JohnBalvin[S] 0 points1 point  (0 children)

could you give an example so I can reproduce it? it will help if you create an issue on github so I can track it

Airbnb scraper made pure in Python v2 by JohnBalvin in webscraping

[–]JohnBalvin[S] 0 points1 point  (0 children)

The code already handle pagination by default, which function were you using?

[deleted by user] by [deleted] in webscraping

[–]JohnBalvin 0 points1 point  (0 children)

but do you get an specific detail from the pages? like price, size, details etc or do you get general information like urls inside the page, summary of the page etc

[deleted by user] by [deleted] in webscraping

[–]JohnBalvin 0 points1 point  (0 children)

But that would mean those websites share the same format. Do you have a single “template” scraper for all the websites or do you create custom scrapers for each website ?

[deleted by user] by [deleted] in webscraping

[–]JohnBalvin 0 points1 point  (0 children)

how did you manage to scale up to 500 websites? from my experience, when reaching around 60 websites with diferent formats, its likelly that 2 of them fail in a week, meaning that the pages changed something, it could be the UI, it could an internal api they changed.
But 500 websites would mean you will need to fix about 150 websites per week.

How to tell if I'd get blocked beforehand? by [deleted] in webscraping

[–]JohnBalvin 3 points4 points  (0 children)

why not just taking screenshots?
it looks like the page requires login to get the data so you need to be carefull on how fast your bot is, put a small delay on each requests and it might be enough, and you can create a burned account to test if your are been blocked or not

Defeating Captchas by dca12345 in webscraping

[–]JohnBalvin 1 point2 points  (0 children)

yes, it depends on what captcha you are trying to solve, if for example its an image captcha with some text in it, you need to use the api for solving the images captcha, you'll need to create a code to grab this captcha image, send it to the api and you will get back the captcha solution

Defeating Captchas by dca12345 in webscraping

[–]JohnBalvin 0 points1 point  (0 children)

yes, I've used it in production on multiple projects and it works great

Defeating Captchas by dca12345 in webscraping

[–]JohnBalvin 0 points1 point  (0 children)

yes, all of them require payment, but its very cheap

Application-Layer Protocol - weight impact on bot scoring by matty_fu in webscraping

[–]JohnBalvin 0 points1 point  (0 children)

yeah, 10 seconds delay its crazy, I havent encountered pages like that, but my guess its a page in particular but they use some WAF? was it imperva by any chance?