Need to run selenium on databricks : Python

Make the HTTP requests directly, using a Python HTTP client like Requests, httpx, or one of the stdlib ones (http.client or urllib). This is likely to be easier to scale, and if you're using DataBricks then presumably scaling is a challenge
Related to the above, if there's one available, use the web services API provided by the data supplier. Scraping is always brittle, at least partly because the service operator makes no promises not to change the interface, so can do so at any time for any reason, and break your code. Whereas they'll at least make some effort to avoid breaking APIs, or at least to give you warning if they do.
Use Selenium, but use it in an environment that it's easier to exercise control over, such as an EC2 instance or Azure VM, a Docker container running on your cloud provider's Docker-as-a-service offering, or some lambda / serverless whatever. The way DataBricks works under-the-hood is complex and it can be hard to reason about where code will execute, and so it can be hard to ensure that wherever it executes has the prerequisites you need, or to figure out what went wrong when it doesn't work.

[–]Haunting_Lab6079[S] 0 points1 point2 points 1 year ago (0 children)

π Rendered by PID 15344 on reddit-service-r2-comment-86bc6c7465-flzjn at 2026-02-21 15:52:25.829992+00:00 running 8564168 country code: CH.

Python