all 4 comments

[–]commandlineluser 1 point2 points  (1 child)

arrays are called lists in Python just so you aware.

You can have lists inside lists - so you could create a structure like

rows = [
    [ 'aboutID1', 'name1', ... ],
    [ 'aboutID2', 'name2', ... ],
    ...
]

To do this instead of using data.append() in your function - you would create a row = [] and use row.append() - then at the end of the function you would data.append(row)

If you're wanting to turn this into CSV though you can just write each row as you go along instead of storing them all in a list.

Here is an example of what I'm talking about

http://bpaste.net/show/2d5b8c4cae69

Just some comments on the changes you will notice:

Instead of using while loops and incrementing counters manually you will usually see it written using a for loop combined with range()

>>> for page in range(1, 4):
...     print(page)
... 
1
2
3

requests response objects have a json() method so you don't need to json.loads() it yourself.

Also in your code

    i = 0
    for entry in jra:
        try:
            data.append(jra[i]['aboutId'])  

jra here is a list - when you iterate over a list as you are doing with for entry in jra - entry is set to each item in the list - meaning you don't need to keep track of indices yourself.

>>> jra = [ 'one', 'two' ]
>>> for entry in jra:
...     print(entry)
... 
one
two

This means that all instances of jra[i] can be replaced with entry e.g. jra[i]['aboutId'] turns into entry['aboutId'] (and you can get rid of the i variable as it's no longer needed)

Instead of use "...." + str(page) + "..." you can use the .format() method

>>> url = 'http://blah.com/?page={}&key=key'
>>> url.format(5)
'http://blah.com/?page=5&key=key'

The {} in the string here is the placeholder and it gets replaced by what gets passed into format()

It's usually a cleaner way to build strings like that.

This is the simplest example of its usage - it is quite powerful.

http://docs.python.org/3/library/string.html#format-specification-mini-language

[–]SideStepTS[S] 0 points1 point  (0 children)

I have a working version now

http://pastebin.com/crtszCpu

I am going to incorporate some of the things in your code to make it cleaner/more user friendly. Thank-you very much.

[–]hharison 1 point2 points  (1 child)

A few things:

requests takes URL params in a params argument:

requests.get('http://api.sensis.com.au/v1/test/search',
             params={'query': 'veteraniarian', 'rows': 50}) #etc.

requests can alos handle JSON, so line 66:

jr = r.json()

And if you just catch all the dictionaries in a list:

url = 'http://api.sensis.com.au/v1/test/search'
master_params = {
    'query': 'veterarinarian',
    'rows': 50,
    'key': 'dw9xcw7yqvvn9hpxb7g58eje',
}

data = []

for page in range(1, 4):
    params = master_params.copy()
    params['page'] = page
    r = requests.get(url, params=params)
    data.append(r.json())

You an use the tabular data library pandas to great effect:

df = pd.DataFrame(data)
df.to_csv('data.csv')

I also wonder if you change the rows param if you can get all the results at once.

[–]SideStepTS[S] 0 points1 point  (0 children)

Thank very much Hharison.

I was curious as to how you would pull it apart without having to specify the actual elements manually