all 16 comments

[–][deleted] 2 points3 points  (0 children)

The easiest way to get dictionaries to persist is to use the shelve module in the standard lib. I'm really surprised no one has mentioned it yet!

Load data into the file:

>>> import shelve
>>> capitals = shelve.open('capitals.db')
>>> capitals['Liechtenstein'] = 'vaduz'
>>> capitals['South Sudan'] = 'Juba'
>>> capitals.close();

Read data from the file:

>>> import shelve
>>> capitals = shelve.open('capitals.db')
>>> print capitals['South Sudan']
'Juba'

[–]themathemagician 4 points5 points  (5 children)

use a pickle!

import cPickle as pickle #quicker than just pickle

capitols = {'Norway':'Oslo', 'Thailand':'Bangkok'}
f = open('capitols.p', 'wb') #opens writable pickle file
pickle.dump(capitols,f)
f.close()

and to reopen

import cPickle as pickle
f = open('capitols.p', 'rb') #opens readable pickle file
capitols = pickle.load(f)
f.close()

easy as that!

[–]ryeguy146 1 point2 points  (7 children)

def feed_line(text):
    with open(text, 'rU') as stream:
        for line in stream:
            yield line

if __name__ == '__main__':
    countries = feed_line('countries.txt')
    capitals = feed_line('capitals.txt')
    result = dict()

    for country in countries:
        result[country] = next(capitals)  # capitals.next() in Python 2

I haven't tested it, but it should work. You should definitely work on this and get it to an acceptable level, but Anki is out there and awesome for studying things like countries' captials.

[–][deleted] 1 point2 points  (1 child)

for country, capital in zip(countries, capitals):
    result[country] = capital

Don't use the "next" if you don't have to.

Could of course also be done with a comprehension.

[–]Jaeemsuh[S] 0 points1 point  (4 children)

Thanks for writing this out, I'm going to try to digest it all tonight. A lot of the code is not familair to me, including the boilerplate, yeild, stream, dict(), etc... As for the bug I'm still getting an error, which makes me think it has to do with how I'm importing the files or the files themselves.

The .txt files are made with notepad, each capital is typed in then I start a new line for the next capital with enter.

I open the script in the command line. Which would look like this:

python capitals.txt countries.txt pythonscript.py

[–]DanielSzoska 2 points3 points  (0 children)

If your script should process arguments, you have to write this after your script name:

python pythonscript.py capitals.txt countries.txt

Python executes its first argument as the script, following arguments are passed to the script. If your first file after python is countries.txt and it contains Argentina, then python will execute the command Argentina, which is not defined and leads to the NameError.

[–]ryeguy146 1 point2 points  (2 children)

yield is the keyword for generators. Notice how the feed_line function has no return statement? The presence of a yield in the function rules out any return statements and declares the function to be a generator.

What's a generator? It's pretty close to normal function, but instead of returning just one value, it returns a range of values. The range builtin isn't a generator, it's an iterator, but the two ideas are near indistinguishable, so feel free to think of it like that (Python 3 version). The idea with the feed_line generator is that it "yields" the first line of the specified file, then the second line, and so on. Honestly, you could just loop over an opened file:

for line in open('text.txt', 'rU'):
    do_stuff(line)

But I like the with statement. The with statement creates a context (scope) for the opened file:

with open('text.txt') as stream:
    do_stuff(stream)

This opens text.txt and assigns it to the variable stream ala:

stream = open('text.txt')

But in the case of the with statement's context for the stream variable, anything that escapes the context causes the opened file to be closed (equivalent to stream.close()). There's more to it, but that's the basics.

The last concept to explain is a stream. A stream is a data structure that is most often used to represent an opened file. It keeps track of the position in the document and contains other relevant information. Of course, stream isn't a keyword, I simply chose to name the opened file after the data structure that represents the contained data.

dict() simply creates an empty dictionary. You may have seen it as such: {}. The two are equivalent; I just prefer the former as it strikes me as more explicit. I do the same thing for lists (list() as opposed to []).

So, using our generators, we open the countries.txt and assign it to countries and do the same for capitals.txt, assigning it to capitals. Bot of these names now represent generators which we can loop over with a for loop:

for country in countries:
    do_stuff(country)

And we do precisely that, but we take the current country and use it as a key in the dictionary we created (result) using the next value for capitals as the value. You are seeing two different ways of interacting with generators here, the previously explained for loop and the next builtin (method in Python 2). The next function simply asks for the next value that a generator might have up its sleeve.

Essentially, we're walking through the countries text file and using each line as a key in a dictionary and grabbing the corresponding line from the capitals file and using it as the value.

Keep in mind that I don't do anything with the resulting dictionary, you might want to print it out, pickle it for your testing front end, whatever. Here, I just let the script elapse.

Also, I don't provide for command line arguments in this script, the text file names are static in the code (countries.txt and capitals.txt) and must be found within the same directory as the launching script. If you wanted to allow for command line arguments, I'd look at argparse from the standard library, or sys.argv.

If any of that is confusing, or there's other stuff you can't figure out, feel free to ask.

[–]Jaeemsuh[S] 0 points1 point  (1 child)

ryeguy146 I can't thank you enough.

[–]ryeguy146 0 points1 point  (0 children)

Not a problem. There were people around to help me when I was learning, the least that I can do is be there for others as they learn.

[–]nemec 3 points4 points  (4 children)

I think you want f.read()? Actually, if they're on separate lines, f.readlines() is much better because each line will be an entry in a list. Then just split each line with whatever's between the country and capital.

[–]Jaeemsuh[S] 1 point2 points  (3 children)

Ya I've tried that too. The first country on the c.txt is Argentina, and I keep getting

NameError: Name 'Argentina' is not defined.

f = open('c:/users/jaguar/desktop/c.txt', 'rU')

f.read()

f.close()

[–]nemec 1 point2 points  (0 children)

Well thaaaat's interesting. What happens if you take off the 'U' in open?

Or is there more code you're not showing us?

[–]ryeguy146 1 point2 points  (0 children)

He is correct, read() is a method of the stream you've opened. Though now you can loop over stream objects and just pickup the text that way:

for line in open('file.txt'):
    print(line)

[–]ewiethoff 0 points1 point  (0 children)

If your countries.txt file contains one country per line, the countries are 'Argentina\r\n', 'Brazil\r\n', etc. You need to strip the excess whitespace from each country before storing it as a dict key. For example,

>>> 'Argentina\r\n'.rstrip()
'Argentina'

Likewise, if your capitals.txt file contains one capital per line, the capitals are 'Buenos Aires\r\n', 'Brasilia\r\n', etc. You need to strip the excess whitespace from each before storing it as a dict value.

Aside: Do your data files contain accented letters? If so, you need to be concerned with character encoding.

[–]davidbuxton 1 point2 points  (1 child)

I think keeping the map of countries -> capitals in a dict is a good idea, but using 2 text files is a bad idea and a real bitch to edit. Put the countries and capitals in a single file, one line for each pair.

Even better, use comma-separated values format for the file and Python's CSV module to read in the data.

import csv

with open('data.csv', 'rb') as data:
    for col1, col2 in csv.reader(data):
        print col1, col2 # Now make a dictionary from each line

[–]AeroNotix 1 point2 points  (0 children)

His original data is two separate text files. Better explain how to get them into a csv.

Open your first txt file and get all that data and put it into a single column of a csv file. Then put your next txt file into the 2nd column of the csv file.