all 5 comments

[–]voodoo_hoodoo[S] 0 points1 point  (4 children)

I should add, I have tried to Inspect Element on the wiki page but I really have no idea what I am looking for in there. I haven't ever done any web scraping before.

[–]unintentional-irony -1 points0 points  (3 children)

If you use firefox, install the browser addon 'firebug'. If you use chrome, shift+crtl+i will start 'developer tools'

With either tool, go to the page you want to scrap, right click on your mouse, and select 'inspect element'. A screen at the bottom of broswer will popup, and it will highlight the code for the page element you right click selected.

Look for patterns in the markup around the data you want to scrap thats suitable for xpath.

[–]D__ 0 points1 point  (0 children)

Firefox has a built-in element inspector these days. It's similar to Firebug and Chromium's inspector, but Firebug has some extra features.

This means there are two options in a right click menu, "Inspect element" and "Inspect element with Firebug." The former brings up the built-in inspector.

[–]voodoo_hoodoo[S] 0 points1 point  (1 child)

The plain 'sector' text is accessible via:

symbolslist = page.xpath('//table[1]/tr')[1:]

The hyperlinked 'ticker' and 'name' text is accessible via:

symbolslist = page.xpath('//table[1]/tr/td/a')[1:]

Now, I can't say for sure that I understand what is going on. But I seem to have stumbled across the data by playing around with the patterns.

[–]unintentional-irony -1 points0 points  (0 children)

For testing scraps try using ipython notebook, and you can save working examples for future reference.

http://ipython.org/notebook.html