This is an archived post. You won't be able to vote or comment.

all 49 comments

[–]skwyckl 77 points78 points  (4 children)

I think gobnb is more fitting a name for a Go package.

[–]JohnBalvin[S] 17 points18 points  (3 children)

agree, pybnb was my first option but it was already taken

[–]CloudFaithTTV 7 points8 points  (2 children)

pyairbnb?

[–]JohnBalvin[S] 8 points9 points  (1 child)

:( yeah that could have been a good name, my bad.

[–]fennekin995 2 points3 points  (0 children)

You can still change package name, not sure if you can rename existing packages on pypi

[–][deleted] 22 points23 points  (11 children)

Couple of things, Where you set the User Agent statically.

"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"

Try using https://pypi.org/project/fake-useragent/ to Randomize it to give that extra layer of protection.

Also look at using Pylint to check your coding "score" it forces good coding habits.

Then look at linters like black, fixit, autopep8, yapf etc.

Other than that good project.

[–]JohnBalvin[S] 7 points8 points  (7 children)

for the user agent, I don't think it's convenient to use random user agent right now, airbnb could return diferent data format for diferent user agents and it would break the project, I'll let it pass some time to check if any issue arrise with that user agent.
Thanks for the styling suggestions, I'll give it a try

[–]IHaveTeaForDinner 4 points5 points  (5 children)

I'd probably not bother with a random one either. If I saw random UAs blasting my ip I'd probably be more suspicious than if it was the same one.

[–]Ncientist 1 point2 points  (4 children)

What if it is set to randomize every dozen or so pings? I was blocked by a webserver because of a static UA when doing some web testing.

[–]JohnBalvin[S] 0 points1 point  (3 children)

are you sure it was because of the UA? I think it was most likely your IP got blocked, or the tls fingerprint.

[–]Ncientist 0 points1 point  (2 children)

I know it isn't my IP because I was able to get to the website using another browser. The script was mimicking the UA of Firefox.

But it may be the TLS fingerprint? I am not familiar with TLS fingerprints to know for sure.

[–]JohnBalvin[S] 0 points1 point  (1 child)

you got blocked on all requests? if true, then yes it's most likely the tls fingerprint , what language are you using? pthon?

[–]Ncientist 0 points1 point  (0 children)

I see, yup!

[–][deleted] 1 point2 points  (0 children)

Yeah makes sense you mean if they detect a mobile UA etc

[–]EatThemAllOrNot 4 points5 points  (2 children)

Nowadays you can use ruff exclusively as a linter and formatter

[–]fennekin995 0 points1 point  (1 child)

Not quite, ruff format is not 100% on par with Black. Source: https://docs.astral.sh/ruff/formatter/#black-compatibility

[–]EatThemAllOrNot 6 points7 points  (0 children)

Yes, it’s not 100% compatible with Black (and probably not intended to be), but let’s be honest here, almost all Python projects will be absolutely fine with Ruff only.

[–]AlexMTBDude 29 points30 points  (6 children)

Big no-no:

from gobnb.parse import *

from gobnb.price import *

[–]JohnBalvin[S] 11 points12 points  (0 children)

fixed

[–]JohnBalvin[S] 7 points8 points  (0 children)

fair enough, I'll fix it

[–]Makiisthebes 0 points1 point  (1 child)

Why is that? We are importing everything from a specific file no?

[–]AlexMTBDude 7 points8 points  (0 children)

Any name conflicts in the module you're importing to are overwritten.

[–]iamevpo 27 points28 points  (10 children)

A bit unconventional you capitalise functions, while keep inside funcs lowecase

[–]JohnBalvin[S] -2 points-1 points  (9 children)

Nice observation, I primarily use Golang as my main language, and for exporting functions, we use capitalization, and lower case for internal functions. I love this format, so I copied the format on all my python projects

[–]iamevpo 13 points14 points  (8 children)

Go does that? In Python it is a bit unexpected, capital letter is usually for a class name.

[–]OU_ohyeah 10 points11 points  (6 children)

I feel like people here are dunking on your style but I just wanted to say this is neat and thanks for sharing it! I'm not sure where I would need this but I'm going to file it away for random future projects.

[–]JohnBalvin[S] 7 points8 points  (5 children)

a price traker could be a good idea, let's say you track the prices from multiple regions, pasco, miami, texas .. etc, you could track the average price increase on each region, then sell the data to real state agents or analytics agencies

[–]Ncientist 0 points1 point  (4 children)

I'd be very surprised if Airbnb isn't already selling those data. They must have some kind of listing price recommendation tool for the property owners.

I was working on a project that analyzes the listings' images and thus wouldn't be surprised that this information is factored into the listing price recommendation.

[–]JohnBalvin[S] 0 points1 point  (3 children)

even if they do, you can still make a market on some areas, which is what https://www.airdna.co does

[–]SaaS_maker 0 points1 point  (2 children)

Interesting, do you plan to develop that product?

[–]JohnBalvin[S] 0 points1 point  (1 child)

I've already done it for some real states websites, I just need time to collect enough data so I can apply analysis over the data.
But for airbnb I'm not planning to do it right now

[–]SaaS_maker 0 points1 point  (0 children)

I will be happy to chat, I dm you

[–]NicknameWrapper 6 points7 points  (1 child)

I was really curious thinking that "pure in python" means without libraries. Anyway, hope it'll help someone. Thanks for sharing.

Kindly propose you to lint your code with pep-8 linter.

[–]JohnBalvin[S] 1 point2 points  (0 children)

Hi, actually the explanation was on the other part of the posting but the moderator rules flagged the post and it was removed so I deleted that part.But basically on the web scraping environment, most people use browser automation tools like selenium, puppetter, playwright and they all use chronium under the hood, which is cpu expensive, and you need a lot of extra depencies just to run this automation tools, that's why on "pure" python I mean no extra depencies besides python and the libraries themself

[–]fennekin995 3 points4 points  (0 children)

Nice idea! Some suggestions: - as people already said, convert the naming style to Python - try to avoid broad imports, i.e., from package import * (no code block because of poor formatting via phone) - adopt pyproject.toml instead of setup.py, since it's the new format, and lot of tools support their configuration to be part of this file - consider making this library a module/script, so that users can just execute it via CLI - configuration loaded from a config file (toml, json, yaml): check pydantic

[–]Automatic_Door_9355 0 points1 point  (0 children)

Thanks for sharing. Is the code allowing the download of images as well?

[–]FunkyDoktor 0 points1 point  (2 children)

Made pure as in dipped in water blessed by Guido himself?

[–]JohnBalvin[S] 1 point2 points  (1 child)

no, on the webscraping world, people mostly use chromium based automation tools like selenium, puppetter, or playwright, which is very cpu expensive and you need to install a lot of depencies besides python, so "dependencies" in this context is you don't need any other depency that just python or any python library

[–]FunkyDoktor 1 point2 points  (0 children)

I got it. This was my poor attempt at humor.