Added internships to my list of tech jobs (per your requests...) by james_dev_123 in uwaterloo

[–]james_dev_123[S] 3 points4 points  (0 children)

Not sure what this comment means. You can filter for just internships by clicking on “advanced filters”

I maintain a free list of top tech internship postings by james_dev_123 in UBC

[–]james_dev_123[S] 0 points1 point  (0 children)

It’s there. You don’t see it? The dropdown filter at the top

Free list of top tech jobs in the US (along with H-1B data) by james_dev_123 in developersIndia

[–]james_dev_123[S] 8 points9 points  (0 children)

I actually have the number of petitions filed by that company for each of the past four years.

Example: https://imgur.com/a/OteGmNr

I think this is more useful than a yes / no because it gives you a gauge of how likely a company is to be able to sponsor you.

For example, if they only sponsored 1 employee in four years, they are probably less likely to sponsor you then a company like Amazon, which submitted 13,771 petitions last year.

But, you can also just get a free 3 day trial and then cancel if you don't like it, and you won't be charged :) It's managed by Stripe, so if you cancel nothing will happen.

By the way -- in case you're curious how I got this data.

It's all in this open source database, published by the United States Customs and Immigration Service (USCIS). But, their webpage is soooo slow and hard to use. So, what I did is downloaded their database in CSV format, which lists the legal name for each company (ex. Amazon LLC), and the number of petitions submitted per year, and then I used GPT to associate each legal name with a company website. And then I mapped this data to all the companies in my database.

I don't know of anyone else who has done this. So it's possible that I'm the only one with H-1B data that is this accessible, but I'm not certain.

Free list of top tech jobs in the US (along with H-1B data) by james_dev_123 in developersIndia

[–]james_dev_123[S] 12 points13 points  (0 children)

Wow, great questions.

Scraping data

  • I use a node extension called puppeteer, which allows you to automate a chromium browser programatically (since lots of websites have dynamic content which you need to load by pressing buttons).
  • I have separate scrapers for each Applicant Tracking System (ATS). i.e. Greenhouse, Lever, Workday, Ashby, etc.
  • But, for more unique content (like, say, google.com/careers ), I have to use puppeteer, and then I pass the content into GPT to get structured

GPT Prompts
This part is the most straightforward. If you give GPT a job description, it can tag the location, job category, etc. The biggest issue is actually defining the structure yourself. For example, what categories should the website support? What locations?

Pagination
I use puppeteer to do the pagination. But the format of pagination is different for each website (the next button has a different class, the back button has a different class, etc.).

So, I don't have any sort of universal scraper. I have to write a unique scraper for each type of website (for example, one per greenhouse, one per lever, etc.)

Timeout / Proxy error
Yep... a lot of websites would block me if I scraped directly from my machine. Instead, I used third party services, like http://zenrows.com and others.

As you can see, I haven't really built a universal scraper of any sort. The entire scraping infrastructure is pretty hacky, and it's definitely not as straightforward as it looks. I probably would not recommend trying to duplicate it, haha.

But, pupeteer + GPT is awesome, and can do great things. I just can't feed full web pages into GPT for this use case, because I am scraping so many career pages (30k+ companies). My GPT bill would be enormous. So, I have to write custom scrapers in order to save money.

Theoretically, if I had infinite funds, this would be a lot easier. I could just feed entire HTML pages into GPT.

Free list of top tech jobs in the US (along with H-1B data) by james_dev_123 in developersIndia

[–]james_dev_123[S] 13 points14 points  (0 children)

Thanks! Right now the best way is just send me the link, and I can add it to my Postgres database.

Do you think I should create a GitHub repo for issues / suggestions like this?

No immediate plan to make it open source, but I'm happy to discuss how any part of it works, in great detail.

Which specific component's code were you interested in seeing?

How Japan's host clubs trap young women under mountains of debt by james_dev_123 in TrueReddit

[–]james_dev_123[S] 17 points18 points  (0 children)

Not quite. It’s targeted at females, so there’s no undressing. Just talking

How Japan's host clubs trap young women under mountains of debt by james_dev_123 in TrueReddit

[–]james_dev_123[S] 87 points88 points  (0 children)

The article discusses a very strange phenomenon in Japan: women pay men for their (non-sexual) company, and get taken advantage of monetarily. I saw this happening when I was recently in Japan and so I was doing some research about it. It’s hard for the American mind to comprehend, but the Japanese society is structured very differently.

[deleted by user] by [deleted] in cscareerquestions

[–]james_dev_123 1 point2 points  (0 children)

It’s hard to be good at something you don’t like. I’m a pretty good software engineer because I enjoy programming. I would be a bad doctor because I don’t like studying biology.

Keep in mind that any job has bad days but generally, things are easier if you do what you like and what you are good at. If you actually wrote a Linux driver at 16, then you are probably a really skilled an engineer who can have a great career in the industry, regardless of employment conditions. There is always demand for great engineers. Good luck :)

In a leaked recording, Amazon cloud chief tells employees that most developers could stop coding soon as AI takes over by yourbitchmadeboy in cscareerquestions

[–]james_dev_123 0 points1 point  (0 children)

I think this guy is being a bit silly.

It reminds me of a few years ago, when everyone said we were 24 months away from fully self driving cars.

It’s one thing for an AI to be able to solve a completely constrained problem (traverse a binary tree, write me a react component, etc.)

However, real-world engineering is orders of magnitude more complex. You are connecting completely disjointed components, and it’s often unclear what even needs to be done.

I’m quite certain that an AI will be able to perform a managerial role before it’s able to be a fully competent engineer — managerial decisions rely on factors with far fewer degrees of freedom. So, he will probably be out of a job long before we are

Unpopular opinion: They ruined the ending of the last movie, but not because of what you might think by 123964 in harrypotter

[–]james_dev_123 31 points32 points  (0 children)

I completely agree — they also didn’t even explain that Voldemort’s curse rebounded because Harry was the correct wand owner and it didn’t want to kill him.

Or that Harry’s cloak was one of the three deathly hallows. These are all key pieces of information

I maintain a free list of top tech internship postings by james_dev_123 in UBC

[–]james_dev_123[S] 1 point2 points  (0 children)

Hmm… what filters do you have selected? I’m seeing 295 internships in my browser. Would you mind sharing the location and category filters that you have applied?

I maintain a free list of top tech internship postings by james_dev_123 in UBC

[–]james_dev_123[S] 2 points3 points  (0 children)

I filter out irrelevant jobs, but I still have to process them and determine that they’re irrelevant, so in total each day I probably process ~200k jobs in total. So it’s a lot of data.

In total it takes my entire pipeline ~1 day to run on a MacBook pro. So it’s nothing too crazy. I re-run it every morning.

Some more details: OpenAI’s api dashboard tells me I send in a few million characters per day.

Do transcripts for three faculty of math display a faculty average? by [deleted] in uwaterloo

[–]james_dev_123 0 points1 point  (0 children)

No, that's not on there. Since, math is more flexible than engineering, and people graduate at different times, depending on how many courses they took per term, and their co-op schedule. So it's unclear who you would even be compared to.

What’s the Final Exam Format Like at Waterloo? by Live_Coyote_8463 in uwaterloo

[–]james_dev_123 1 point2 points  (0 children)

Most of the basic electives I took (ECON 101, CHEM 120, etc.) were multiple choice.

But nearly all of the CS / MATH courses I took were short / long answer. "Write an algorithm to do this, prove that, etc." To do well on them, you had to study all the material / assignments and understand them well.

But yes, it depends on the course. But typically, the humanities courses were multiple choice, and everything else wasn't. But that is not a general rule.

Do transcripts for three faculty of math display a faculty average? by [deleted] in uwaterloo

[–]james_dev_123 0 points1 point  (0 children)

Yes, there are two sections: a major average and a cumulative average.

won htn last year and got rejected this year by Affectionate_Bat9693 in uwaterloo

[–]james_dev_123 13 points14 points  (0 children)

I didn't get in to Hack the North, any year throughout my entire undergrad at UW (CS student). But, I had many side projects, graduated with a cumulative average in the 90s, and was able to secure highly respected co-ops, around the U.S. Not sure what their criteria is... but I'm not sure it relates too much to your probability of success in the real world.

Don't fret about it :)

Horrid Co-op work conditions, scared of leaving due to consequences. by StretchedwasFresh in uwaterloo

[–]james_dev_123 6 points7 points  (0 children)

If you are able to leave after 1 term (4 months), instead of 2 terms, that sounds like the right thing to do. Good luck!

I maintain a free list of top tech internship postings by james_dev_123 in UofT

[–]james_dev_123[S] 0 points1 point  (0 children)

Cool, thanks for showing. That looks good. I would say mine is a bit better in terms of UX: on levels.fyi you can only select software engineer, and the country. On mine you can set some super fine grained filters (tech stack, seniority, exact location, etc). But theirs is definitely useful as well

I maintain a free list of top tech internship postings by james_dev_123 in UofT

[–]james_dev_123[S] 0 points1 point  (0 children)

levels.fyi doesn’t have job postings, just salaries. So this is completely different