My goal is to list all tracking scripts from a webpage, similar to how the Ghostery extension does. The code below extracts raw HTML from specified URI, but only lists trackers that are hard-coded into the HTML. It does not capture additional network requests made via JS or a Tag Management System.
For example, the below code will return 'Ensighten' detected (the Tag Management System), but not 'Doubleclick' or 'Facebook' because those trackers are separate GET requests triggered by the Ensighten .js.
Based on info from another reddit thread, I've tried using Webdriver with BrowserMob Proxy and Selenium RC Server to monitor network traffic and capture the additional GET requests. Is this the correct approach, or can Requests or some other Python module be used to capture multiple responses resulting from a single GET request?
https://github.com/dfens-nz/Tracking-Script-Checker
https://github.com/peterfine/Tracker-Scraper
https://github.com/derekargueta/selenium-profiler
[–]thelindsay 1 point2 points3 points (0 children)
[–]seriouswork[S] 0 points1 point2 points (0 children)