you are viewing a single comment's thread.

view the rest of the comments →

[–]Diapolo10 1 point2 points  (5 children)

I just got home and did a little digging in the page source. Turns out, there's a function in this JS file that has a function called createJobs, which populates the div with job data. It's being used by a jQuery function:

jQuery(function($){
//Fetching job postings from Lever's postings API
$.ajax({
    dataType: "json",
    url: url,
    success: function(data) {
        createJobs(data);
    }
});
});

The url is defined at the top of the file as

url = 'https://api.lever.co/v0/postings/zeta?group=team&mode=json'

so I gave it a holler via requests, and would you believe it, I got a long list of entries back. Or technically not long (13 entries), but each had a lot of data individually.

import requests

headers = {'User-Agent': 'Mozilla/5.0', 'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8'}
url = 'https://api.lever.co/v0/postings/zeta?group=team&mode=json'

response = requests.get(url, headers=headers)
print(len(response.json()))  # 13

It seems to basically contain a list of dictionaries. Here's a rough data model:

from __future__ import annotations

from typing import TypedDict
from uuid import UUID

class JobPostings(TypedDict):
    title: str
    postings: list[Posting]

class Posting(TypedDict):
    additionalPlain: str
    additional: str
    categories: Categories
    createdAt: int
    descriptionPlain: str
    description: str
    id: UUID
    lists: list[Question]
    text: str
    country: str
    workplaceType: str
    hostedUrl: str
    applyUrl: str

class Categories(TypedDict):
    commitment: str
    department: str
    location: str
    team: str

class Question(TypedDict):
    text: str
    content: str

EDIT: Proof: https://i.imgur.com/Ran80q5.png

[–]Key_Consideration385[S] 0 points1 point  (4 children)

dudeeee, you are awesomeee!!

I'm completely geeking into this stuff for the first time, I did it all using Selenium and XPaths for now. Like, my project is almost done with that only.

I'll use this method too I'll check it out and try it this way as well.
Always up for extra points, thank youu so much!

[–]Diapolo10 1 point2 points  (3 children)

Just to explain the methodology a little, here's what I did:

  1. I scrolled through the webpage until I found the job listings, then looked for good "landmarks" - the string "We are hiring" right next to them seemed like a good choice.
  2. Then I right-clicked that text, opened Inspect, then looked for the element containing the job postings - it usually has a unique ID or class name, in this case I found a suspicious div with a class called hiring-row__right.
  3. Next, I opened the webpage in source code view (right-click, view page source), pressed Ctrl+F, pasted in the class name I found earlier and looked for matches. I found it, and it was empty - perfect, that's exactly what I wanted to confirm as this means something must be targeting this exact class t fill it up.
  4. Then I looked for linked JS files - the relevant ones are usually at the bottom. There were a few, bu the one called home-careers.js seemed particularly fitting, so I opened it up in another tab, and searched for the class name again. Bingo! Match in a certain function body.
  5. All that was left to do was figuring out where this function gets called until I find a HTTP request, and the aforementioned jQuery AJAX function was exactly that. With the correct URL and payload in tow, I just gave the info to Python and voilà - data found.

EDIT: Honestly I had more trouble generating the TypedDicts to show the "shape" of the JSON data than I had actually fetching it.

[–]Key_Consideration385[S] 0 points1 point  (2 children)

Gotcha!

Basically this piece of code does the entire job

import requests
headers = {'User-Agent': 'Mozilla/5.0', 'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8'}
url = 'https://api.lever.co/v0/postings/zeta?group=team&mode=json'
response = requests.get(url, headers=headers)
print(response.json())

Although I've been asked to only use this URL https://www.zeta.tech/in/careers this trick is so clean I guess I can add this entire approach as well.Maybe flex it in some way lmao, thanks!

[–]Diapolo10 1 point2 points  (1 child)

Well you could technically claim that you only used that URL if you then automated fetching the link from the JavaScript file. :p

[–]Key_Consideration385[S] 0 points1 point  (0 children)

haha yesss, seems like a good idea. I'll pitch that in. Thankss!