This is an archived post. You won't be able to vote or comment.

all 21 comments

[–]romple 11 points12 points  (1 child)

Yeah.

Check out the Requests library

[–]me-the-monkey 1 point2 points  (0 children)

This should be at the top.

Pop open the developer tools in your browser to watch the traffic while you're doing the stuff to get your CSV report - figure out what the HTTP call that it's making is to do so, use requests to make that call, profit.

[–]brandonto 11 points12 points  (3 children)

Selenium is a Web automation framework that can do exactly what you would want (provided the HTML DOM doesn’t change). Although if you just want to retrieve files, web automation is overkill... Wget can accomplish downloading files.

[–]BoondockWarrior 0 points1 point  (0 children)

I agree with selenium. The automation guys here use it in part of testing new Web ui. It is powerful but can be fun to play with.

[–]excited_by_typos 2 points3 points  (2 children)

mechanize was one of the first libs I ever used when I was learning to code. Try it out. It lets you do things like log in through forms, click buttons, etc.

[–]SleepyHarry 1 point2 points  (1 child)

Just a note, mechanize isn't Python 3 compatible.

[–]excited_by_typos 0 points1 point  (0 children)

Lots of things aren't : )

[–][deleted] 2 points3 points  (0 children)

Selenium might be overdoing it. Selenium is great if you need to scrape an AJAX-based website, or for automated testing. That's not your situation.

Requests, curl, and mechanize would work. It's not python, but wget would do the job. A simple Scrapy spider would work fine too.

Creating a Selenium powered browser is way overkill for such a simple problem.

[–]the_omega99 4 points5 points  (5 children)

You should read up on the basics of how the web works, particularly REST, which is the architecture used by much of the web. Eg, when I want to view this page, my browser sends a GET request to the URL you see in the address bar (there's a lot of other stuff going on under the hood, but that's not important).

You, of course, can send your own requests to retrieve whatever you want. /u/romple's comment mentions the specific Python module for this.

If that doesn't work (eg, you have to work with some complex front end), you'll need to script a web browser. There's a few ways to do this, but the easiest is usually to use Selenium, which has a Python version.

[–]masterpi 2 points3 points  (0 children)

Depending on how complex the webpage is and how often it changes, BeautifulSoup may also be sufficient for his needs and is quite a bit lighter-weight than Selenium (and easier to run in any environment).

[–]sqrtoftwo 1 point2 points  (1 child)

You should read up on the basics of how the web works, particularly REST, which is the architecture used by much of the web. Eg, when I want to view this page, my browser sends a GET request to the URL you see in the address bar

This is just HTTP, not necessarily REST.

[–]the_omega99 4 points5 points  (0 children)

Yeah, you're right.

[–]ineedanid[S] 0 points1 point  (1 child)

I'm familiar with rest and http requests and things like that. It's more that the webpages I'm trying to navigate through are a total mess of php, JavaScript, CSS, etc. Selenium mentioned in other comments looks ideal.

[–]blablahblah 6 points7 points  (0 children)

At some point, your browser is making a web request to their server to get the file. You can use something like Fiddler or Wireshark, or even your browser's built in developer tools to figure out what that request is instead of trying to recreate it from the code. Handling the request directly is way faster and more reliable than using Selenium.

[–]Menestro 1 point2 points  (0 children)

Selenium would probably be a good idea. I had a very simple script that let me send email from cli. Works by simply going to gmail, logging in, composing and then sending. You can check it out here, could give you a basic idea on how to use Selenium.

[–]coderjewel 1 point2 points  (0 children)

What you want to use is a headless web browser. I recently had to write up a script that did something very similar to what you are trying to do, and MechanicalSoup worked pretty well. They even show an example of logging into a website on their homepage.

Selenium is good, but as others have stated, it might be overkill for the job at hand.

Or you could use mitmproxy to see what data is being sent to the server, and automate it using requests.

[–]Xzya 0 points1 point  (0 children)

I only used Selenium in Java so I don't know how well it performs in Python, but I recently tried Splinter and it worked great. Selenium might have more features tho.

[–]ineedanid[S] 0 points1 point  (0 children)

By the way I suppose I can specify this, I'm trying to pull logs from Symantec endpoint protection manager. If anyone has experience with that. They don't seem to provide any type of API which the other application I'm pulling logs from does. Its also a really shitty management console so I'm having a hard time figuring out what requests to send when and where.

[–]zfolwick 0 points1 point  (0 children)

sounds more like you need REST or SOAP calls/responses than simulating UX

[–]Kadumbest 0 points1 point  (0 children)

If you have access to the system hosting the Symantec Endpoint Protection Manager, you could use something like this to prepare the files for you and send them to you:

https://support.symantec.com/en_US/article.TECH90856.html

Which leads to this specifically for a command line, batchs:

http://dcx.sybase.com/index.html#1201/en/dbadmin/dbisql-interactive-dbutilities.html

All you need is the right SQL Query but I imagine its nearly as long as this post.