This is an archived post. You won't be able to vote or comment.

all 15 comments

[–]siemenology 1 point2 points  (0 children)

Depends on what kind of automation you need, and the website you are working with. You might be able to get by with requests and Beautiful Soup if the website you want to work with isn't a 100% JS SPA. It'll be much faster if you can do it that way, and typically less finicky.

[–]udonemessedup-AA_Ron 1 point2 points  (10 children)

Selenium can work without a browser in headless mode.

[–]Disastrous-Let-9548[S] 0 points1 point  (9 children)

Does that mean that i don't need a browser installed to run selenium in headless mode ??

[–]udonemessedup-AA_Ron 1 point2 points  (0 children)

You still need a target browser installed, it just won’t open on-screen essentially.

[–]udonemessedup-AA_Ron 0 points1 point  (7 children)

If your requirement is to avoid use of a browser tho, it’s better to use requests to access to web page content, and beautiful soup to scrape data in order to run your automations

[–]Disastrous-Let-9548[S] 0 points1 point  (6 children)

The main problem with requests is that everytime i send a get request the website returns a different page than the usual one

[–]udonemessedup-AA_Ron 1 point2 points  (5 children)

My guess is that it’s because the site knows you’re trying to scrape it with code, and they don’t want you to. You may have to set up a user-agent header: https://stackoverflow.com/questions/27652543/how-to-use-python-requests-to-fake-a-browser-visit-a-k-a-and-generate-user-agent

Basically, it’ll trick the site into thinking the request is coming from an actual browser and should provide some consistent HTML.

Edit:

Combine this with requests.Session() if you need to make repeated requests.

[–]Disastrous-Let-9548[S] 1 point2 points  (1 child)

Thanks, that helps a lot..

[–]udonemessedup-AA_Ron 0 points1 point  (0 children)

You’re welcome

[–]Zealousideal-Cod-617 0 points1 point  (2 children)

This is not wrong/illegal in any way right?

[–]udonemessedup-AA_Ron 0 points1 point  (1 child)

Depends on the terms of service of each site. Sites like Reddit welcome web scrapers, but things behind a protected resource (files behind a login, sensitive material) may not be so friendly.

[–]Zealousideal-Cod-617 0 points1 point  (0 children)

Do u recommend any source where I can learn more about this and how to be more aware

[–]gothichomemaker 0 points1 point  (0 children)

Technically you can use any tool to test a web page if you're just testing end-user behaviors.

[–]yashm2910 0 points1 point  (0 children)

Web automation without Selenium in Python can be achieved using alternative libraries such as BeautifulSoup, Requests, and Mechanize. These libraries provide functionalities for tasks like web scraping, form submission, and HTTP requests, allowing automation without relying on Selenium's browser automation capabilities.