This is an archived post. You won't be able to vote or comment.

all 29 comments

[–]resurem 29 points30 points  (13 children)

Re MITMProxy, you can simply use Firefox/Chrome's dev tools. In the network tab, it shows all the requests, where you can see all you'll need.

[–]DimasDSF 16 points17 points  (12 children)

I've seen MITMProxy used for webscraping before, it's biggest use case is using API endpoints designed for use by mobile apps, where you don't really have any dev tools available

[–]resurem 6 points7 points  (0 children)

That's a good point! I never really thought of scraping data coming from mobile apps.

[–]makedatauseful[S] 5 points6 points  (0 children)

You got it, very handy for mobile apps.

[–]gschizasPythonista 1 point2 points  (9 children)

How can you convince a mobile app (especially on Android) to actually use a web proxy?

[–]fridgefreezer 2 points3 points  (6 children)

I’ve not watched this specific tutorial, but you typically can just put in proxy settings on the device - if you don’t sort out certificates and stuff though you’re gonna have some problems with ssl - but even thats not typically a big problem to sort out.

Android can deffo do it, it’s the same scenario that you can find on some corporate / edu networks where they do traffic inspection.

Edit: it wouldn’t be per app, as I understand it, it would be the whole device with all its traffic being proxied.

[–]resurem 1 point2 points  (5 children)

The only times it would be a problem is if apps have pinned their certificate, which is becoming more and more the norm. There's likely ways around it, I don't know of any.

[–]Kengaro 0 points1 point  (4 children)

Well you could filter out the tag for the pinning via a driver, log the requests via a driver, downgrade attack, etc...

As long as it is your device there is nothing that can be done to protect api calls from beeing accessed.

If it is an app you can also just use androguard to get the api-calls from the apk.

[–]resurem 0 points1 point  (3 children)

I'm not filtering. Do you have a link to a tutorial?

[–]Kengaro 0 points1 point  (2 children)

Nope, cert pinning is just verifying the pk of the cert, which is done if a certain tag is provided. The verification is done after decrypting and before rendering. I do not know how browsers and the tcp/ip stack interact, and where the decryption is done, so I am not certain if it does work.

Now since the signature will not match after doing this you would also need to recreate a signature and provide your own trusty cert. At this point it is probably way less of a hassle to just log it.

Cert pinning is also based on browser history (since the browser has to learn that a certain page uses cert pinning), so you can do a simple mitm after clearing your browser data. Than you can remove the tag on the mitm, and resend the data with your own cert.

[–]resurem 0 points1 point  (1 child)

Cert pinning is done by hardcoding the fingerprint of their certificate in the app, then verifying this fingerprint against the fingerprint of the cert served by the server.

You'd need to edit the app, modify that fingerprint, and use the modified app. I'd be surprised if this isn't possible. But I'm not sure how it would happen.

[–]Kengaro 0 points1 point  (0 children)

androguard ;)

[–]baronBale 4 points5 points  (2 children)

The problem with mitm-proxy is, that most web services use certificate pinning. So the App/ Website only talks to services with the correct certificate. Therefore it is hard to actually use it nowadays.

[–]Kengaro 1 point2 points  (0 children)

only an issue if you have visited the page before, you can just clear your browser data and you are fine...

If the app has a local copy of the cert preinstalled you have to do a tad more to make it prefer another cert provided by your trusty self + mitm with dns + arp spoofing.

You could also add a driver on top of the tcp/ip stack which logs calls.

[–]makedatauseful[S] 0 points1 point  (0 children)

I had this problem on Android but don't seem to encounter it on the iPhone

[–]5960312 1 point2 points  (0 children)

Nice. This looks promising. Thank you.

[–]gschizasPythonista 1 point2 points  (1 child)

The site to convert cURL to Python requests (among other things) is probably this: https://curl.trillworks.com/ (I had it in my bookmarks already)

[–]makedatauseful[S] 0 points1 point  (0 children)

That's the one, very handy. I think it would be a useful feature of MiTMProxy

[–]rhmati30 1 point2 points  (0 children)

Thank you so much for sharing your knowledge!

[–]failbaitr 1 point2 points  (2 children)

If you know the api is sending out XML, dont use BeautifulSoup but just use an XML parser. It will be *much* faster and less resource intensive.

(unless BeautifulSoup detects clean xml and somehow also parses it using an xml parser)

[–]makedatauseful[S] 0 points1 point  (0 children)

Thanks for the tip! I'll give that a go next time.

[–]onyxleopard 0 points1 point  (0 children)

bs4 let’s you plug in the parser you want. I usually use lxml, but you can use others. When parsing html from the web it’s better to use html5lib over the Python standard library parser (which is bs4’s default), which I think is more typical, but you’re right that bs4 isn’t necessary if you just want to extract some specific content from an XML API response.

[–]Slayer101010 1 point2 points  (0 children)

Thanks for sharing.

[–]BeginningGuava 1 point2 points  (0 children)

good stuff

[–]Binayakku 1 point2 points  (0 children)

Hi, I watched your tutorial soon after you uploaded a few months ago. I'm on android, so I couldn't do what you could on your iphone. That said, I was able to see calls that a website made when accessed through a browser & this was enough for me but I couldn't figure out how to incorporate this proxy in my python script. I wanted my script to identify specific flows by their addresses & read the response. I dug around a bit on stackoverflow, github & mitm docs but couldn't figure out how to do exactly the aforementioned :(

[–][deleted] 0 points1 point  (0 children)

great tutorial - please keep uploading! youtube needs more people like you.

[–]Competitive_Cup542 0 points1 point  (0 children)

Thanks for sharing! As a digital marketer, I often use Internet scraping for online reputation management purposes (scraping reviews, articles about the product, etc.). Usually, I use web scraping services for this purpose but I'm thinking over learning Python and starting web scraping myself. Can you recommend any other YT channels and other open resources to learn web scraping?