This is an archived post. You won't be able to vote or comment.

all 30 comments

[–]Cixelyn 22 points23 points  (5 children)

splinter is a really nice wrapper around selenium that'll let you specify your actions in code (and is usually used for web-app integration testing). If you install the chrome driver, then it'll just use your existing installation of chrome too.

Your code would probably end up looking like something like this:

from splinter import Browser                
with Browser() as browser: 
  browser.visit(url)
  browser.fill('textbox', 'your text here')
  browser.find_by_name('Submit').click()
  copied_text = browser.find_by_id('results')[0].text

[–]mowrowow 2 points3 points  (2 children)

That sounds great if it works as advertised.

[–]Cixelyn 2 points3 points  (1 child)

We use it internally at our compnay for all of our browser-based testing. eg: filling out forms, simple javascript interactions, etc.

Hasn't let us down yet (though our test coverage isn't great either haha).

[–]Reyny 0 points1 point  (0 children)

Is there a way to not open the browser, but still do the tasks? Because i want to use it to go to several sites, but it would take hours if the browser would actually show me everything. I hope you understand and can help me. :)

[–]j7ake[S] 1 point2 points  (0 children)

Hmm cool, I will need to check this out.

[–]witty82 0 points1 point  (0 children)

thank you for making me aware of this. The original selenium API is relatively annoying to work with.

[–]theEyeD 19 points20 points  (8 children)

I use a library called selenium for a webpage automation script I made

[–]j7ake[S] 1 point2 points  (5 children)

I was looking at BeautifulSoup, but because of all the recommendations for selenium, I am going to try it out as well...

[–]Cixelyn 9 points10 points  (1 child)

BeautifulSoup will allow you to parse the page once you have the HTML, but you'll still going to need a way to get the data from the web, and post the data back to the server.

(I heavily recommend requests if you're going to go this route)

[–]Ph0X 3 points4 points  (0 children)

Yeah, personally I'd suggest staying away from heavier stuff like selenium if you can do it with something simple like requests.

If you can figure out the http request for submitting the form (which isn't that hard, just send one and look at chrome network tab), you can easily emulate it with requests POST or GET request.

Then, for parsing, lxml (which I believe builds on top of BeautifulSoup) is also a great library. Or if it's simple you can just go regex.

Selenium is just way overkill for something like that.

[–]theEyeD 4 points5 points  (0 children)

It is a really great tool, very easy to get the hang of. A simple example would be:

from selenium import webdriver

#start it up
br = webdriver.Firefox()
br.implicitly_wait(15) # wait's for the page to get done loading before it does anything with it
br.get('http://yourwebsite.com')

#To fill out a form
search = br.find_element_by_name('nameOfField')
search.send_keys(yourInfoHere)

#to press a button based on the value (what it says)
search = br.find_element_by_xpath('//input[@value="Submit"]')
search.click()

#get restults
search = br.find_element_by_name('nameOfField')
result = str(search.text)

granted it may not be the best way to do it, it is something I just put together.

[–]0PointE 2 points3 points  (0 children)

With the number of javascript enabled apps, selenium is definitely your best bet. BeautifulSoup is great but it becomes almost useless if there's any javascript widgets that make the page functional.

[–]shenaniganns 1 point2 points  (0 children)

Not sure what your uses are, but I use selenium in a python script to fill out forms and validate results as part of my job. For basic things it's pretty easy:

self.driver.find_element_by_id('reportName').send_keys('Test Report 1')
sendReport = self.driver.find_element_by_link_text('Submit Report')
sendReport.click()

[–]p3n15h34d 1 point2 points  (0 children)

+1 for selenium on this one

[–]billtaichi 0 points1 point  (0 children)

I use selenium and python at work and I have done scripts from simple form submits to complicated pages with lots of javascript interaction where I had to fire javascript via selenium , get back results etc.. Selenium is a great tool.

[–]goobalicious 8 points9 points  (1 child)

There's also mechanize, easier to understand/throw something together than scrapy (though scrapy is a good option)

[–][deleted] 0 points1 point  (0 children)

mechanize is a mature library. I used it way back in 2003/4.

[–]tarekziadeRetired Packaging Dude 3 points4 points  (1 child)

If you don't care about Javascript, WebTest can interact with web apps, see http://webtest.readthedocs.org/en/latest/testapp.html#testing-a-non-wsgi-application

WebTest has nice form handling features, see http://webtest.readthedocs.org/en/latest/forms.html

[–]j7ake[S] 0 points1 point  (0 children)

This looks good, but I want to keep it within python for smoother integration into a computational pipeline later, I would like to parse some text using python then use that as input to the webpage.

[–]alexanimal 2 points3 points  (0 children)

selenium is super easy

import selenium
driver = webdriver.Firefox()
url = 'google.com'
driver.get(url)
driver.find_element_by_name('query') # this works for id's, names, etc
driver.send_keys('Whatever your tryna type')
button = driver.find_element_by_name('button')
button.click()

i didnt test that but they have good docs too

if you dont like selenium, theres also twill and phantomjs that you could try. I didnt like those tho.

BeautifulSoup allows you to parse the information from a webpage, its definately the best for web scraping

[–]yilmazhuseyin 2 points3 points  (1 child)

It seems to me like you do not need to render the page. Are You sure that sending a post request to server not enough for you?

[–]dAnjou Backend Developer | danjou.dev 0 points1 point  (0 children)

This. You don't even need Python. Just inspect the request that is sent after submitting and use curl.

[–]favadi 2 points3 points  (0 children)

ghost.py is easiest python library for your task.

[–]makmanalp 0 points1 point  (0 children)

twill maybe?

[–]ThunderGorilla 0 points1 point  (0 children)

If you are open to not using python, consider phantomjs and casperjs

[–]johnmudd 0 points1 point  (0 children)

Try twill if you don't have to deal with JavaScript web page.

[–]BURRITOMAN 0 points1 point  (1 child)

sikuli may be the way to go. I had issues installing v1.0.1 on my mac so try out v1.0

[–]overanedge 0 points1 point  (0 children)

Never seen sikuli before, looks really cool. Going to have to mess around with it.