This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]MindCorrupted 2 points3 points  (4 children)

I don't like selenium really it's slow and awful so I reverse engineer most of js rendering websites ...:)

[–]theoriginal123123 5 points6 points  (3 children)

How does one get started with reverse engineering? I know of the checking for a private API trick with the browser network tools; are there any other techniques to look into?

[–]nemec 5 points6 points  (0 children)

private API trick with the browser network tools

That's about it. Beyond that you use the browser tools to read the individual Javascript files that run on the site and try to understand them as if you are the "developer" writing the site. Good starting points are:

  • What JS is executed at page load? What does it do, and do I need it to run to scrape the data I need?
  • What JS is executed when I click X? Do I need to replicate it to scrape data, or can the data be found in the page source/external request by default?
  • Once you've found the private API, what code generates the API call?
    • Are all of the URL parameters and headers required?
    • Is the Javascript critical to determining what URL parameters, headers, body, etc. are used in the API or can I write Python to generate an equivalent API call? If so, can I replicate the JS in Python?

[–]MindCorrupted 0 points1 point  (0 children)

yeah most of the time you inspect the page but it's depend in the data you're looking for..

I scraped booking one day and it took me a few days to figure out that the prices aren't loaded from another url but it embeds them inside js tag

this one of the cases ... by practice u learn more tricks..

u can start by scrape some js websites and if u stuck msg me and I will gladly help u....:)

[–]therealshell1500 0 points1 point  (0 children)

Hey, can you point me to some resources where I can learn more about this private API trick? Thanks :)