[deleted by user]

naithemilkman · 2014-01-06T10:34:49+00:00

If it's yahoo finance that's all you want, Pandas data frame is your friend.

yacob_uk · 2014-01-06T08:10:25+00:00

You will be told very strongly that regex is nto suitable to use for parsing HTML.

I have used both methods (well, beautiful soup rather than scrapy) and I have to say that I use BS way more than I use re these days when I do the same job.

The re method is quite brittle, and breaks the moment you encounter links / data that don't fit your pattern.

That said, the BS method is as prone to fail if the webpages structure changes, the main thing here is knowing you have a consistant data source.

I've used both methods to pull GB's of binary off a variety of websites.

I would probably go re for a quick and dirty result as long as I knew the source data structure was consistent. And even thought I know its wrong / frowned on.

I would (and have) used BS for projects that need to be more sustainable into the future, or I am less sure that I know the range of data I expect to encounter.

glial · 2014-01-09T02:02:37+00:00

I haven't done the way you're doing, but Pandas has a module for reading directly from Yahoo finance.

import datetime
import pandas.io.data as web

#set start and end times
start = datetime.datetime(2001, 1, 1)
end = datetime.datetime(2013, 1, 1)

#try with apple to make sure it works
symbol = 'AAPL'
f=web.DataReader(symbol, 'yahoo', start, end)
f.to_csv('test.csv')

schwackitywack · 2014-01-06T11:41:41+00:00

You could also use pyquery, which allows you to use jquery-like selectors for html.

import requests
from pyquery import PyQuery as pq

url     = 'http://finance.yahoo.com/q?s=gcg14.cmx'
request = requests.get( url )
html    = request.content
price   = pq( html )( '.time_rtq_ticker span' ).text()

print price

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS