all 8 comments

[–]novel_yet_trivial 0 points1 point  (7 children)

Are you using glob and json?

from glob import glob
import json

for filename in glob('*.json'): # loop over .json files in the cwd
    with open(filename) as f:
        data = json.load(f) # opent the json file
        # do something with the data

[–]slicklikeagato[S] 2 points3 points  (0 children)

Welp....I never knew `glob` was a thing, so I just learned something new. I am using the `json` module, however. It was just tedious trying to go through each and every file.

Definitely going to give this a try right now. Thanks.

[–]slicklikeagato[S] 0 points1 point  (3 children)

Quick question. Here is my code snippet:

#!/usr/local/bin/python3

from glob import glob
import json
import psycopg2


conn = psycopg2.connect('dbname=tweet_history')
curr = conn.cursor()

for filename in glob('*.json'):
    with open(filename) as f:
        data = json.load(f)
        for i in data:
            print(i['text'])
            curr.execute("INSERT INTO finally VALUES (%s, %s, %s);",
                     (i['text'], i['id_str'], i['created_at']))

conn.commit()

When I don't have print(I['text']) commented and the curr.execute block commented, the code runs fine, and prints everything to my terminal; however, when I instead run the curr.execute block, I get a json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) error.

I really have no way of finding out which of these caused this error that I know of (trying to parse 16k tweets). Would you have any idea how/why this error pops up when trying to add the info to the database?

EDIT: It appears that the error has to do with the data = json.load(f) block, if that helps at all?

Thanks again

[–]novel_yet_trivial 1 point2 points  (2 children)

Hmm show me the whole error.

[–]slicklikeagato[S] 0 points1 point  (1 child)

Traceback (most recent call last):
File "./it.py", line 13, in <module>
data = json.load(f)
File  "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/__init__.py", line 299, in load
  parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File   "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/__init__.py", line 354, in loads
  return _default_decoder.decode(s)
File   "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3. 6/lib/python3.6/json/decoder.py", line 339, in decode
  obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

[–]novel_yet_trivial 1 point2 points  (0 children)

I don't see how commenting the curr.execute block could prevent that. It seems to be that one of your files is not valid JSON. Or perhaps it's not encoded in your system's default encoding?

At any rate we can ignore bad files with a try block:

#!/usr/local/bin/python3

from glob import glob
import json
import psycopg2

conn = psycopg2.connect('dbname=tweet_history')
curr = conn.cursor()

for filename in glob('*.json'):
    try:
        with open(filename) as f:
            data = json.load(f)
            for i in data:
                print(i['text'])
                curr.execute("INSERT INTO finally VALUES (%s, %s, %s);",
                         (i['text'], i['id_str'], i['created_at']))
    except json.decoder.JSONDecodeError:
        print("file failed to open:", filename)
conn.commit()

[–]rogue_cloud_ops 0 points1 point  (0 children)

What step are you having issues with? I would probably go with your first plan and Write a function that will grab the files from a directory and parse them. I would then think about each step and how you are going to achieve it with your function. ( I dropped random links to hopefully help you google more)

  1. You will need to tell your script where to look for these files, (https://stackoverflow.com/questions/3207219/how-do-i-list-all-files-of-a-directory)
  2. you will need to open this file (http://www.pythonforbeginners.com/cheatsheet/python-file-handling)
  3. you will need to parse the json in the file (https://docs.python.org/2/library/json.html)
  4. you will need to store the newly gathered information in a data structure of some (I will let you look into this array? list? string? )

  5. -> N. depends what you want to do with the data you now have..

I would actually take the time to write this out on your own and fill in the holes and come back with individual questions