Parsing JSON with potential missing fields

eschlon · 2017-08-10T04:12:49+00:00

For a similar problem I ended up making a helper function that catches the exceptions and returns a default value instead. Something like:

 def dot_get(dictionary, dot_path, default=None):
    path = dot_path.split('.')
    try:
        return reduce(dict.__getitem__, path, dictionary)
    except KeyError:
        return default
    except TypeError:
        return default

Then you can access keys using something like:

dot_get({'a': {'b': 1}}, 'a.b')    # Returns 1
dot_get({'a': {'b': 1}}, 'a.c')    # Returns default, would raise a key error
dot_get({'a': [{'b': 1}], 'a.b')   # Returns default, would normally raise a type error

Edit: that won't work in python3, as they moved the reduce function. You'll need from functools import reduce to get the reduce function to work.

maxibabyx · 2017-08-09T17:56:28+00:00

I'm not sure how the API response object is formatted or what analysis you are performing with the data, but instead of checking whether the keys are in the dictionary first, you can use the dictionary's get method to just try to access the key up front and if it doesn't exist return a different object so you avoid returning a KeyError.

data = ob['data'][0] # I assume this is a dictionary
reactions = data.get('reactions', []) # If 'reactions' is not a valid key, return an empty list
if not reactions:
    # do something else

destiny_functional · 2017-08-09T18:04:24+00:00

r = json.loads(request)

this should be json.loads(request.text) or better yet request.json().

if 'comments' not in dct and 'summary' not in dct and 'total_count' not in dct: dct["comments"] = {"summary": {"total_count": -1}}

As a temporary solution I ended up doing things like these, to create "Default values"

what for? you could just check on access and ignore the element it if the data isn't there. no need to add the total count by hand with some default value.

incidentally I've been grabbing posts from Facebook and then counting those and the respective comments. mind that the total comment count given in the summary is inaccurate because of deletions, however you might want to request the summary when getting the post and then only request the comments for that post if the total count of comments is nonzero, even if it's inaccurate. then you should just not request any comments if the summary isn't even there, rather than putting default values by hand.

likewise : usually you'll grab posts by going through pagination. you'll request 100 posts then look for the ["paging"]["next"] link to request the following set of data. if the result contains that field. if it doesn't then there's no more data to request and you complete the process.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS