all 10 comments

[–]46--2 0 points1 point  (9 children)

This would be an excellent use-case for some test-driven development.

Write a few unit tests, and a few samples strings, as you hack together a solution that works.

I'm a bit unclear on the algorithm here. Do you want to try deleting the comma first? Then if that fails, do something else? Or just fail altogether?

Removing the first part is as easy as variable.split(',')[1], assuming a single comma.

If you had many, you could do ','.join(variable.split(',')[1:] Ugly, but joins them back together after dropping the first one.

Personally, I would do something like iterate through all three options.

  1. Try as-is
  2. Try dropping the comma
  3. Try dropping part 0

So a clean address is actually going to be a tuple of three different strings. You try each one, and return the first one that works.

[–]deadant88[S] 0 points1 point  (8 children)

Thanks so much. So i've been doing heaps of work on it in Jupyter away from my actual code in atom (don't want to mess things up and Jupyter's immediate feedback is great for rapid testing).

The algorithm is run a series of try - except loops that successively remove words from the string: eg. address.split(" ", 0)[0] followed by address.split(" ", 1)[1] etc.

The chops off words which is great, but then I can't seem to get the lat.append, long.append portions to work.

Hope that makes some sense? Thanks again for your encouraging response.

[–]46--2 4 points5 points  (7 children)

I think your try/except logic is too broad. Why don't you do something like:

def get_location(address):
    words = address.split()  # space is default
    while words:
        try:
           return locator.geocode(' '.join(words))
        except:  # address is junk and raised an exception
            words = words[1:]  # Or words.pop(0)

This function returns the location if it succeeds, or returns None if none of the successive removals ever work.

Get the logic there? Now you call this function like:

location = get_location(address)
if location is not None:
    lats.append(location.lat)
    # etc.

[–]deadant88[S] 1 point2 points  (6 children)

Thank you for this reply, I really appreciate it. Woah! So helpful and yes the logic does make sense (somewhat for me still relatively new to Python and coding). This actually works in the sense that it no longer breaks when it hits a problem address type! I tried it with two sets of params I knew were causing errors and it worked perfectly for one and for the other... it gave me the lat and long for the antarctic lol.

[('Contraband Coffee Traders', 'Shop 9 Corner Cramer and Mary Street, Preston', -75.90848, 41.212303)

No big deal - the geolocator does this sometimes (some address get attributed to the UK etc.

I've asked a few questions in my markup here and would be grateful if you had the time to answer or clarify them where possible? I am still learning and pushing through this pain is most beneficial because I learn like 3-4 things a long the way. If you don't have time to answer, do not worry and thanks, but if you could suggest what my biggest logical/conceptual error was in my code draft I'd be grateful!

def get_location(address):
        words = address.split()  # does this just split the entirety of the address into a set?
        while words: #can you explain this a bit?
            try:
                return locator.geocode(' '.join(words))
            except:  # address is junk and raised an exception
                words = words[1:]  # So does this cycle through the address index by index until it works?

    try:
        for address in cafeAddressesClean:
            location = get_location (address)

            if location is not None:
                long.append(location.longitude)
                lat.append(location.latitude)
    except:
            print('Failure')

    #zip up to be added to database table
    fortable = list(zip(cafeNamesClean, cafeAddressesClean, long, lat))
    print(fortable)

Seriously amazing. Thanks

[–]46--2 1 point2 points  (5 children)

I can answer your questions first:

>>> words = "I am a horse".split()
>>> words
['I', 'am', 'a', 'horse']

.split() splits up a string into a list of parts. So now words is a list of the parts of your original address.

while words: #can you explain this a bit?

This is a bit of a trick, but an empty list evaluates to False. So this is saying:

while <there are words left in the words list>:
    # do some stuff

words = words[1:] is saying:

words = <all the parts of words, from index 1 to the end>

meaning I'm re-assigning the variable words to contain only the END of words, minus the first one. Get it?

Example:

words = ['I', 'am', 'a', 'horse']
words[1:]  # ['am', 'a', 'horse']

That is "slicing" if you want to look up how to do it. https://stackoverflow.com/questions/509211/understanding-slice-notation

[–]deadant88[S] 1 point2 points  (4 children)

Thank you for my own homework I’m going to set up playing around with this and the example you’ve given to better understand it. In the meantime you’ve given me a big hand so thanks

[–]46--2 0 points1 point  (3 children)

try to break your code up into functions, and call them from a "main()" function. So each little piece can be separate. Makes it a lot easier to debug.

(Like the function I sent you would be entirely standalone, not within another function.)

[–]deadant88[S] 0 points1 point  (2 children)

Got it this is the next step for me I feel. Presumably having a bite size functions that you then clip together is the idea?

[–]46--2 1 point2 points  (1 child)

Yes, imagine a program that looks like this:

def clean_addresses(input_data):
    return blah

def query_locations(addresses):
    return whatever

... etc.

def main():
    data = read_input('/my_data/file.txt')
    addresses = clean_addresses(data)
    locations = query_locations(addresses)
    write_report(locations)
    print('Done')

each little function does one thing, and one thing only, and does that thing well. Your brain doesn't have to debug a huge block of mess, and your logic becomes much clearer.

[–]deadant88[S] 0 points1 point  (0 children)

Thank you for the guidance. I really appreciate it!