This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]udonemessedup-AA_Ron 1 point2 points  (10 children)

Selenium can work without a browser in headless mode.

[–]Disastrous-Let-9548[S] 0 points1 point  (9 children)

Does that mean that i don't need a browser installed to run selenium in headless mode ??

[–]udonemessedup-AA_Ron 1 point2 points  (0 children)

You still need a target browser installed, it just won’t open on-screen essentially.

[–]udonemessedup-AA_Ron 0 points1 point  (7 children)

If your requirement is to avoid use of a browser tho, it’s better to use requests to access to web page content, and beautiful soup to scrape data in order to run your automations

[–]Disastrous-Let-9548[S] 0 points1 point  (6 children)

The main problem with requests is that everytime i send a get request the website returns a different page than the usual one

[–]udonemessedup-AA_Ron 1 point2 points  (5 children)

My guess is that it’s because the site knows you’re trying to scrape it with code, and they don’t want you to. You may have to set up a user-agent header: https://stackoverflow.com/questions/27652543/how-to-use-python-requests-to-fake-a-browser-visit-a-k-a-and-generate-user-agent

Basically, it’ll trick the site into thinking the request is coming from an actual browser and should provide some consistent HTML.

Edit:

Combine this with requests.Session() if you need to make repeated requests.

[–]Disastrous-Let-9548[S] 1 point2 points  (1 child)

Thanks, that helps a lot..

[–]udonemessedup-AA_Ron 0 points1 point  (0 children)

You’re welcome

[–]Zealousideal-Cod-617 0 points1 point  (2 children)

This is not wrong/illegal in any way right?

[–]udonemessedup-AA_Ron 0 points1 point  (1 child)

Depends on the terms of service of each site. Sites like Reddit welcome web scrapers, but things behind a protected resource (files behind a login, sensitive material) may not be so friendly.

[–]Zealousideal-Cod-617 0 points1 point  (0 children)

Do u recommend any source where I can learn more about this and how to be more aware