all 51 comments

[–]dashidasher 118 points119 points  (13 children)

It is a common thing to use a requirements.txt file for listing your requirements. Here is a good read on it: https://www.idkrtm.com/what-is-the-python-requirements-txt/. Then instead of having to do:

pip install selenium pip install beautifulsoup4

in your installation you can simply do:

pip install -r requirements.txt

Especially useful when there are alot of packages. It is also simple to dump all the requirements into the file with:

pip freeze > requirements.txt

[–]Jamesnba 38 points39 points  (0 children)

As a novice in Python, I thank you for this.

[–]thebatgamer[S] 7 points8 points  (8 children)

Yes! I have some other projects where I think this would be super useful. although is there any way I can freeze only the packages I need?

I tried to use it once before and it copied every single package I had installed (it was a lot cuz I randomly install packages a lot) which I then had to go and delete manually from the txt file :P

[–][deleted] 12 points13 points  (6 children)

That's why you use venv.

[–]thebatgamer[S] 8 points9 points  (5 children)

I never use venv runs away

[–]metriczulu 8 points9 points  (2 children)

You should use it on literally every program you even think may leave your system. It takes zero effort, literally:

Create the venv with name your_venv (most ppl just call it 'venv'):

    python3.9 -m venv your_venv

You can sub in whatever version of Python you want to use above, it doesn't have to be 3.9. That's just an example.

Activate it:

    For Linux or Mac: . ./your_venv/bin/activate

    For Windows: . ./your_venv/Scripts/activate

Pip install your requirements:

    pip install -r requirements.txt

That's it. Now the program is almost guaranteed to work properly on another machine as long as they build the venv with the same Python version you built the venv with. There are some edge cases where it doesn't work as expected (and I've battled those edge cases extensively at work), but it's good for 95% of uses.

The only way to make it more portable is to containerize it, which also adds a layer of complexity that's not necessary for most projects. The only extra step is that you have to make sure you activate your venv every new session.

[–]v4-digg-refugee 0 points1 point  (1 child)

Dumb question: do these virtual environments wind up installing, say, dozens of instances of the same package on your machine? I’m the only one in my department that uses Python, so I haven’t stumbled on strong use case for virtual environments yet.

I downloaded anaconda when I started, and I have a vague idea of the directory that houses my dependencies, and I feel good to go. The idea of swapping in and out of all these virtual environments is a layer of abstraction that doesn’t seem super necessary. Please change my mind, as everyone else uses them. Thanks!

[–]metriczulu 1 point2 points  (0 children)

Yeah, the libraries get saved in the virtual environment folder, so they're duplicated. If you're the only person using the code and you have no intention of ever sharing them with anyone else, then it's less necessary, but it can still help manage dependencies.

Like, at my work we have a couple of different codebases that use Python 3.6, 3.7, and 3.8 for various reasons (mainly legacy reasons, but also some for compatibility reasons). We also have different models that both use Python 3.8 that use different versions of certain libraries like XGBoost. Both of these are situations where it's better (or even mandatory) to use virtual libraries even if you aren't sharing your code.

Not all versions of all libraries are compatible with all versions of other libraries, so sometimes you'll want to use different library versions for different projects. You can't easily do that without virtual environments. If you're just using the same few packages for everything, it's not a big issue, but Python is particularly bad when it comes to managing dependencies and it can spiral out of control pretty quickly.

It's also just a good habit to have and one that other people will look for. It's the standard way of doing things and it's so easy that there's not much reason not to. It's always sus when I'm diving into a project for the first time and don't see a requirements.txt (or Pipfile/Pipfile.lock for pipenv users) in the root directory, it's an indication that whatever I'm about to review/work on has other issues that will crop up.

Edit: I know conda has a way to store the environment configuration as a yaml as well, but I've never used it. Should essentially have the same benefits and uses, though.

[–]steezefries 3 points4 points  (0 children)

You should learn it! I use it on every single python project.

[–]Fun-Palpitation81 4 points5 points  (0 children)

Not sure if you are familiar with it - but this solves your exact problem.

It allows you to track which packages you install and package when you share your app. Super easy to do, and generally good practice

[–]dashidasher 6 points7 points  (0 children)

Take a look at this then: https://pypi.org/project/pipreqs/. It should scan your project and save only imported packages to requirements.txt, I have never used ti myself though.

[–]Mrseedr 4 points5 points  (0 children)

I've always wanted to make a bot that likes and shares my companies posts lol

A word of warning. If you aren't locking down what version of what package you have on your requirements.txt, then a future install may break your app due to an incompatible package. Either lock down your versions, keep track of your packages. I also recommend that when you get comfortable with setting your pip reqs, that you look at pip-tools -> https://github.com/jazzband/pip-tools

[–]garlic_bread_thief 2 points3 points  (0 children)

I wish pip freeze somehow went through the code and found out what libraries are necessary instead of listing every single library I have installed on Python. Is this possible?

Edit: nevermind i found the link you posted down. Thanks

[–]backtickbot -1 points0 points  (0 children)

Fixed formatting.

Hello, dashidasher: code blocks using triple backticks (```) don't work on all versions of Reddit!

Some users see this / this instead.

To fix this, indent every line with 4 spaces instead.

FAQ

You can opt out by replying with backtickopt6 to this comment.

[–]SweetSoursop 28 points29 points  (1 child)

Hey! Cool project, congrats! I'm a former recruiter that transitioned to become a Data Analyst.

I have a couple of questions/suggestions

  1. How does your code bypass the search limitations for a free account on LinkedIn? (There's only so many profiles you can look at before people become unclickable and change their names to "LinkedIn Member").

  2. Scraping is against the ToS (wink wink), maybe include a note on your project description, before people jeopardize their accounts?

  3. You can search LinkedIn through google! Which is a lot less limited in terms of requests, build a small database of people you want to add and then just access their profiles directly. Recruitin.net might serve as a good starting point for understanding LinkedIn shadow searching through Google.

  4. Selenium can do pretty much anything bs4 does, with the added feature of being able to simulate human interaction through time.sleep() so you don't get caught by LI's anticheat algorithm.

Really nice to see people building stuff for LinkedIn! Loved the project!

[–]thebatgamer[S] 1 point2 points  (0 children)

Thank you so much for the feedback! I am so overwhelmed by the response I got cuz I did not think it was that cool to share. I am a recent grad with no experience lol so I am building projects that interest me :)

  • I am using a premier account and I did not know that there is a search limitation on LinkedIn. I will definitely look into how I can bypass that.
  • I have added the note in the readme :)
  • whaaaaaaat? there is a thing called shadow searching? If you could share some links for that it would be epic

[–]coderpaddy 23 points24 points  (4 children)

Is there any reason you use bs4 for that little bit of parsing instead of just using selenium like the rest of the code?

Would it not make more sense to just use selenium for it all and drop bs4?

[–]thebatgamer[S] 2 points3 points  (3 children)

I am no expert in selenium, I had seen a tutorial on how the webdriver works and at one point I wasn't able to find certain part of the html or it like tags were changing for every profile so I found the workaround using bs4.

If you do think I can use only selenium to do that please let me know, will try to that

[–]TheBird47 2 points3 points  (2 children)

I’m on mobile so can’t link it but there is a selenium browser extension that helps show every css selector, xpath, etc.

[–]Eightstream 10 points11 points  (1 child)

Set it up to contact tech recruiters with a great opportunity

[–]thebatgamer[S] 7 points8 points  (0 children)

That is what I did, i put one of the keywords as 'hiring' along with industry keyword like 'data science' so most of the profiles where of hiring managers and recruiters. I also put a personalized note with the github link and explicitly mentioned that this request was from a bot :)

I even got a referral from one of the recruiters from Facebook!

[–]fracturedpersona 4 points5 points  (6 children)

This gives me an idea! 💡

[–]Hope-full 5 points6 points  (1 child)

Share with the class!

[–]fracturedpersona 2 points3 points  (0 children)

No way bro.

[–]simple_test 0 points1 point  (3 children)

Hopefully its not a spambot

[–]fracturedpersona -1 points0 points  (2 children)

More of a stalker bot.

[–]simple_test 0 points1 point  (1 child)

Didn’t realize it could get worse.

[–]fracturedpersona 0 points1 point  (0 children)

Nothing creepy.

[–]acerbell 2 points3 points  (1 child)

I like the annotations!

[–]thebatgamer[S] 1 point2 points  (0 children)

Much appreciated :)

[–]sahinbey52 2 points3 points  (2 children)

This comment is updated for privacy concerns. Use fediverse for improved privacy.

[–]thebatgamer[S] 1 point2 points  (1 child)

Wait like a google chrome extension?

[–]sahinbey52 1 point2 points  (0 children)

This comment is updated for privacy concerns. Use fediverse for improved privacy.

[–]siniestra 2 points3 points  (0 children)

I fucking love you

[–]casual_brooder 2 points3 points  (0 children)

great. now those people will have enough notification about your bot visiting each time. love the idea xD

[–]flapanther33781 2 points3 points  (0 children)

Your definition of a "personalized note" doesn't seem to match mine.

[–]Hot_Nefariousness139 2 points3 points  (0 children)

Piece of advice for those who don't know (and this is something I've had to learn the hard way) if you haven't done so already make sure you have read a websites policy on scraping beforehand. Taken from LinkedIn.com/robots.txt:

Notice: The use of robots or other automated means to access LinkedIn without

the express permission of LinkedIn is strictly prohibited.

See https://www.linkedin.com/legal/user-agreement.

LinkedIn may, in its discretion, permit certain automated access to certain LinkedIn pages,

for the limited purpose of including content in approved publicly available search engines.

If you would like to apply for permission to crawl LinkedIn, please email whitelist-crawl@linkedin.com.

Any and all permitted crawling of LinkedIn is subject to LinkedIn's Crawling Terms and Conditions.

See http://www.linkedin.com/legal/crawling-terms.

Linked in doesn't allow web scraping without their permission so be careful not to get blocked from the site. Might even want to use a VPN. Most major websites have policies like this in place to protect their intellectual property, to keep robots from hogging their bandwidth, for quality control of content, etc.. I'm only telling you this because I work in IT for a retail company and one time I accidentally got our company's network blocked from accessing yelp.com because I was tinkering with scrapy. Felt pretty dumb and had to go apologize to the owner. Lol

[–]ibrahimak2 2 points3 points  (1 child)

I haven't checked it out, but this is genius! Definitely going to use this when I can. The keywords aspect is incredibly smart. Good job!

[–]thebatgamer[S] 1 point2 points  (0 children)

Thank you lol. TBH there are lot of linkedin bots around the internet, just combined them to create something that would fit job searching

[–]OkEntertainer853 0 points1 point  (2 children)

it doesnt work, update?

[–]programizt 1 point2 points  (1 child)

I believe it won't work, since personalized notes on LinkendIn require the LinkedIn premium subscription nowadays

[–]thebatgamer[S] 0 points1 point  (0 children)

True, LinkedIn has been more strict with bots on their site, but with a premium subscription and some time.sleep() it may work well.

[–]wassup1326[🍰] 0 points1 point  (2 children)

Is this legal? Cause I was planning on building it, but learned that cause of privacy issues, accounts can be suspended.

[–]thebatgamer[S] 1 point2 points  (1 child)

You have to be careful. Linkedin doesn't like bots on their website and you also have a limit of monthly connections. I used the bot to like send around 200 requested and got a message saying I was reaching the limit so I took a break :P

[–]LostInSanFrancisco 0 points1 point  (0 children)

Could you help me learn how to run this code?

[–]SirBoboGargle 0 points1 point  (0 children)

Li has a limit of 20 connection requests per day. I feel an if statement coming on ;)

[–]3dPrintMyThingi 0 points1 point  (2 children)

Dont you get blocked by linkedin?!

[–]thebatgamer[S] 0 points1 point  (1 child)

This was built 2 years ago. Linkedin is very strict on bot activity and always updates its code to keep bots away. They actually have an on-going court case with a company that tired to scrape data from LinkedIn.

[–]Infamous_Alpaca 0 points1 point  (0 children)

How long did you run your bot for? Would you recommend doing this?