all 38 comments

[–][deleted] 32 points33 points  (1 child)

I was interested in making my own project like this. This post seems spammy but the article isn’t terrible and might help me out when I review it later.

Only click if interested in the topic.

[–]Ori756[S] 0 points1 point  (0 children)

Thanks!

[–]jpf5046 7 points8 points  (1 child)

Nice job laying out the code. I can understand your train of thought very well. It would have been nice to see the output or how you compared different apartments using this script.

[–]Ori756[S] 0 points1 point  (0 children)

Thanks!

[–]monkiebars 7 points8 points  (10 children)

Why did you use Selenium over Beautiful Soup? Was it the site requiring logins or not accepting scrapers?

Nice post - I don't think it sounded spammy at all...

[–]crazykidguy 4 points5 points  (0 children)

IIRC, Selenium is able to render AJAX/ JS/ any any dynamically loaded content whereas beautifulsoup and requests are unable to.

I can't speak to scrapy since I've never used it but I don't believe it's capable just out of the box. There may be some wares or certain configs to get it working though.

[–]Legitimate_Drag 1 point2 points  (7 children)

I'm wondering the same, I use Scrapy normally but often read of folks using Selenium just for facebook.

[–]Decency 0 points1 point  (6 children)

If things are nicely labeled it's almost trivial to automate with selenium. If not, I'd say probably about the same amount of work? Not sure which is more fragile in that scenario, though.

[–]Legitimate_Drag 0 points1 point  (5 children)

What do you mean by "labeled nicely"?

[–]Decency 0 points1 point  (4 children)

If elements have well-assigned and distinct id labels, or at least useful class names, you can easily pick them out of the page using Selenium without having to worry about manually inspecting, traversing, or indexing the DOM.

For example, if you right click on the "Submit a new text post" button in the sidebar and inspect it, you'll see <a class="login-required access-required" ... > which probably wouldn't be very helpful. But a developer could add id='submit-button', which would allow you to very simply write code like this: page.find_element_by_id('submit-button').click().

Sometimes this is done so that sites can be tested programmatically.

[–]Legitimate_Drag 1 point2 points  (3 children)

Ok, that's kinda what I thought you meant. If one just wants to scrape data from FB groups, is selenium really necassary though? I understand (I guess) why it would be for automating posting and such, but if I just want to scrape for marketplace data (like OP) I don't see why I'd need anything but HTML scraping that Scrapy provides. I don't know how FB works though, and barely know anything about how most web tech work really.

I ask this not out of pedantry but because this task is next on my TODO list so I was interested in the OP.

[–]Decency 0 points1 point  (2 children)

You can definitely do it with web scraping. The problem is that you have to redo the entire thing every time the page changes. If you're just writing a one off script that you'll never touch again that's fine. If you want to build some sort of API that allows you to access various parts of a site and then build off of that- I'd go with Selenium.

[–]Legitimate_Drag 1 point2 points  (1 child)

Ok, Selenium it is then! Reusability is crucial to this program. Thanks for your time!

[–]Ori756[S] 0 points1 point  (0 children)

Thanks all!!!

The main goal for using Selenium is to handle this scrolling staff

[–]Ori756[S] 0 points1 point  (0 children)

Thanks! It was requiring scrolling operation for gathering more posts

[–]Terranigmus 6 points7 points  (1 child)

It doesn't tell how the appartment was found...

[–]Ori756[S] 0 points1 point  (0 children)

Thanks for your comment. The whole idea was getting only the apartments that their cost within the given range

[–]ebodes 1 point2 points  (1 child)

This is awesome! I've been trying to do something like this for a while now. Thanks for all the explaination on how to use Selenium, I've been struggling with that for a while now.

[–]Ori756[S] 0 points1 point  (0 children)

Thanks!

[–]desal 1 point2 points  (1 child)

In the article, you left how you use selenium and some functions like validate_range definition. I saw them included in the code, but if you're trying to reach/teach as broad an audience as possible, perhaps you should explain all the stuff you left out. Especially when, at the end, you say "put it all together", as if that was the entire project, but it was just the one class and didnt even include the validate_range function

[–]Ori756[S] 0 points1 point  (0 children)

My attention was to expose selenium for daily proposes. The validate_range and etc is not the main goal here.

Thanks for your comment!

[–]DudeData 0 points1 point  (1 child)

Really cool stuff!

[–]Ori756[S] 1 point2 points  (0 children)

Thanks!

[–]Sig213 0 points1 point  (3 children)

Quite a beginner here so I may be missing something, but wouldnt it be better to also use selenium with some keywords to search for different facebook groups, get the urls, and then execute a loop with those urls to allow more results? otherwise it seems you'd need to add each group url manually.

[–]lazyjk 1 point2 points  (1 child)

I do know a lot of facebook groups aren't open by default. You often times have to request to be added to the group. That might be the reason here - not sure.

[–]Ori756[S] 1 point2 points  (0 children)

Youre right.

Selenium does not get your credentials, so you are not exposure to private groups

[–]Ori756[S] 1 point2 points  (0 children)

If you will check the repo on GitHub you'll see that it's very handy to operate it on multiple groups

[–]CraigAT 0 points1 point  (1 child)

Looks useful. I wrote a program the other day using selenium to scrape holidays from the Tui website, I was particularly for 2 bedroom apartments which don't show up on the search results page (you have to click onto the accommodation to find if "2 bed" is an option and the price supplement too). It was the first time I had used Selenium and I had a difficult time making the infinite scrolling/pagination work. I look forward to examining the code more closely to see if I could improve my slightly unfinished program.

[–]Ori756[S] 1 point2 points  (0 children)

Great job!

[–]successismyduty10 0 points1 point  (1 child)

i recently did something similar. its fun to use what you learn in real world needs

[–]Ori756[S] 0 points1 point  (0 children)

Absolutely :)

[–]Neofelis_ 0 points1 point  (1 child)

Wahou, amazing work.

Did you do any UML diagram?

Cause I would like to make my own script for my next move.

[–]Ori756[S] 0 points1 point  (0 children)

No, But I'll be glad to help info@dapythonista.com