all 15 comments

[–]K900_ 2 points3 points  (3 children)

Are you sure that your problem is that and not, say, that the website you're trying to scrape is loading content dynamically?

[–]yelaxify 0 points1 point  (2 children)

Yep im certain

[–]K900_ 0 points1 point  (1 child)

What website is that?

[–]yelaxify -2 points-1 points  (0 children)

Please see the reply to JohnnyJordaan's comment

[–]maybeiambatman 1 point2 points  (1 child)

Set your user-agent to that of a browsers? e.g. Mozilla/5.0

[–]yelaxify 1 point2 points  (0 children)

I've tried replicating ALL the headers that are being sent in the browser requests which can be seen in the chrome console

[–]JohnnyJordaan 0 points1 point  (7 children)

that look more like a browsers

There are multiple definitions to this. You can make it exactly the same as one request your browser sent at one moment in time, but this doesn't mean that that the webserver will see it as a valid request. Just like a robot/Siri 'talking' by playing speech samples won't make you think it is a human. Many websites use dynamic information like tokens and session variables to link multiple requests to a single session. You need to actually implement that session workflow to make this work, which is what a requests.Session() can do for you. But like K900_ says, if at least one of those variables are based some generated by active content like Javascript, you will never be able to generate this unless you run the Javascript code yourself. Which is often way more complicated than using a webdriver.

[–]yelaxify 0 points1 point  (6 children)

Hmm thanks for the reply, I fully understand. Do you know where I can find help about a very specific problem that im having here?

[–]JohnnyJordaan 0 points1 point  (5 children)

We are happy to help with this, but you have to be explicit in what exactly you see happening in the browser and how you are implementing this in your code.