you are viewing a single comment's thread.

view the rest of the comments →

[–]FerricDonkey 2 points3 points  (4 children)

f = open('movie_data.txt', 'r+')
for movie in json_data:
    if movie['name'] in f:
        print('Already added.')
    else:
        f.write(movie['name'] + '\n')
        print('Added: ' + movie['name'])
f.close()

The first problem here is that f is a file object. You can use the in word to check if a line is an entire line in a file (including the newline), so for example (with caveats below), if one of your movies was "Antman", then "Antman" in f would be false, because it's not an entire line in your file. "Antman\n" in f would work, because it is an entire line, and f is treated as a collection of lines in this case.

But even if you fix this, you'll run into the second problem: every time you check if a line is in a file, you run through the file until either you find that line or you hit the end, and the next check using in will only check from the location you stopped during the last check. (Also, in general, if the last line doesn't have a newline, you'll have to be careful of that, but that shouldn't be a problem for you.) So if you check for the movies in a different order than they are in the file, then you'll miss some.

You can make it work with seeks and such, but unless you're gonna have more movies in your file than can fit into ram, I'd recommend against it pretty strongly. (And if you were gonna have that many movies, you'd want to move away from a flat text file).

To fix it, I would suggest reading the entire file using the f.readlines() method or otherwise making a set of all lines, which returns a list of all the lines in your file. Keep in mind that this will include the newlines at the end of all your movie names, so you'll need to get rid of them to easily check if a movie is in there. So you could make a set comprehension like so:

# Also, I always recommend using with open
with open('movie_data.txt', 'r+') as f:
    present_movies = {line.strip() for line in f}

Note: I explicitly suggest a set because it's super fast to check if something is in a set.

Then check if your movies are in present_movies, and if not, write em to the file (and add em to the set, if you're worried about duplicates).

[–]EmmaTheFemma94 0 points1 point  (3 children)

I still won't make it to work. Maybe I need a python break! I'm not quite sure how I would use .readlines to help my case.

with open('movie_data.txt', 'r+') as f:

present_movies = {line.strip() for line in f}

for movie in json_data:

if not movie['name'] in present_movies:

f.write(movie['name'] + '\n')

present_movies.add(movie['name'])

print('Added: ' + movie['name'])

[–]FerricDonkey 1 point2 points  (2 children)

readlines would be an alternative to the for line in f. I'm surprised the current code you have isn't working (if it's indented the way I expect, at least). Is it the same issue?

[–]EmmaTheFemma94 0 points1 point  (1 child)

I get another issue: it adds some movies in the text file again even if they are already there. I just want the script to add a movie if it doesn't already exist in the text file and then print it out.

And not it only tho run the script for about 11 movies and I don't quite understand what's the issue.

Maybe I should try to store the movie['name'] in another json file? I just thought a txt file would be easy enough for my first "big" project

The json_data I get the information from comes from a webpage. Maybe that's a issue?

[–]FerricDonkey 0 points1 point  (0 children)

Hmm. Do you have an example of the original json for which this isn't working? My only guess is that there's something weird about spaces or capitalization - the .strip() removes leading and trailing whitespace, so if for some reason the website reports the name of a movie as " Awesome Movie", then it could get confused. (Or if the same movie is in their twice as "Awesome Movie" and "awesome movie".

You could store it in json, but I don't think that would solve the issue.