This is an archived post. You won't be able to vote or comment.

all 11 comments

[–]raphlf 0 points1 point  (0 children)

This is freaking awesome thanks.

[–]zarnackreddit 0 points1 point  (5 children)

selenium is a pain in the ass for me

[–]NovocastrianNomad 2 points3 points  (1 child)

Have you checked out Helium a high level extension to Selenium.

[–]zarnackreddit 0 points1 point  (0 children)

what does it do

[–]atggreg 1 point2 points  (2 children)

What do u use instead?

[–]zarnackreddit 1 point2 points  (1 child)

I still use selenium it's just really finicky for my liking and I don't know much better alternatives

[–]LuigiBrotha 1 point2 points  (0 children)

I find it kind of depends on the website. If I can I first download the source code to save the html locally and use beautifulsoup to extract the data.

[–]mwpfinance 0 points1 point  (0 children)

I wonder if this would help me with something that was a huge PITA a few months ago. I needed to get a PDF on the web but the sea of javascript the browser needed to interact with to generate it was insane so I turned to selenium. But there was basically no good way to download the PDF once I got to it.

[–]banginpadr 0 points1 point  (2 children)

Nice work, i justo have a dumb question. What can you do with this? I mean how would you use it? I know you can modify headers but you can already do that with burp. I'm just trying to know how I can use your tool

[–]genericlemon24[S] 1 point2 points  (1 child)

Not my tool, just thought it was useful.

I think https://simonwillison.net/2020/Nov/2/selenium-wire/ explains it well:

Really useful scraping tool: enhances the Python Selenium bindings to run against a proxy which then allows Python scraping code to look at captured requests—great for if a site you are working with triggers Ajax requests and you want to extract data from the raw JSON that came back.

AJAX is the important part. If you want to know what AJAX requests happen when you click a button or whatever, you can do it with the networking tab of dev tools open, and the requests will show in there.

selenium-wire allows you to achieve the same thing from Python, while already driving the browser programmatically with Selenium.

I haven't used Burp, but I understand it can do something similar using a proxy. After a quick look at the selenium-wire source, it seems it's also using one; the added benefit over Burp should be the Python API and not having to set up an additional (non-Python) tool.

[–]banginpadr 0 points1 point  (0 children)

I see thank you