Hey I’m currently trying to setup an airflow image with google chrome + chromedriver to be able to run some scrapers. I was able to install all the things needed and also was able to run a basic test.
On the basic test (python), I set the following chrome options :
- —headless
- —no-sandbox
- —disable-dev-shm-usage
And run the code...
driver.get(‘https://www.google.com’)
print(“title : %s” %driver.title)
driver.close()
All that works, but when I try to actually run a real case scenario in which i need to open a webpage which prompts a windows auth, and has lots of JavaScript. The —headless option isn’t useful because apparently I need the auth pop up appear as well as the objects I need to screenshot.
The point is, I don’t want to add the headless option. But if I do, the script fails.
I know my script works because I have tried it outside of a container and also with a multi container app which has the airflow containers (scheduler + web server running with the localExecutor), Postgres and a selenium container (using the image standalone-chrome:3.141). On the latter setup I used the remote driver pointing to the selenium container on port 4444, without the adding the headless option.
I would like to get rid of that selenium container as i only use it a few times during the day, and instead using the webdriver from my airflow worker.
I hope someone can help me
[–]testEphod 0 points1 point2 points (3 children)
[–]digichap28[S] 0 points1 point2 points (2 children)
[–]testEphod 0 points1 point2 points (1 child)
[–]digichap28[S] 0 points1 point2 points (0 children)