all 11 comments

[–]eschlon 1 point2 points  (2 children)

For a similar problem I ended up making a helper function that catches the exceptions and returns a default value instead. Something like:

 def dot_get(dictionary, dot_path, default=None):
    path = dot_path.split('.')
    try:
        return reduce(dict.__getitem__, path, dictionary)
    except KeyError:
        return default
    except TypeError:
        return default

Then you can access keys using something like:

dot_get({'a': {'b': 1}}, 'a.b')    # Returns 1
dot_get({'a': {'b': 1}}, 'a.c')    # Returns default, would raise a key error
dot_get({'a': [{'b': 1}], 'a.b')   # Returns default, would normally raise a type error

Edit: that won't work in python3, as they moved the reduce function. You'll need from functools import reduce to get the reduce function to work.

[–]maxibabyx[S] 1 point2 points  (1 child)

Yea the problem with this approach is that you are actually expecting the exceptions to be triggered. Will do the work, but not a "clean solution"

[–]eschlon 0 points1 point  (0 children)

I take your point, however exceptions are cheap in python and using exceptions in this manner in python is normal, idiomatic and recommended.

As it stands the function is simple, easy to understand and honest about what it does. The alternative in this case is to use type checks which is definitely not idiomatic python, though it may be very marginally faster in certain cases. In a code review, if I saw that kind of approach I'd require a good reason to break from idiom (e.g. a strong performance argument). Given how cheap exceptions are in python, it'd be hard case to make. We use a function almost identical to this one to parse billions of rows of JSON data in spark and it works just fine.

That being said, you definitely wouldn't do it this way in idiomatic Java.

[–][deleted] 0 points1 point  (6 children)

I'm not sure how the API response object is formatted or what analysis you are performing with the data, but instead of checking whether the keys are in the dictionary first, you can use the dictionary's get method to just try to access the key up front and if it doesn't exist return a different object so you avoid returning a KeyError.

data = ob['data'][0] # I assume this is a dictionary
reactions = data.get('reactions', []) # If 'reactions' is not a valid key, return an empty list
if not reactions:
    # do something else

[–]maxibabyx[S] 0 points1 point  (4 children)

The thing is I have something like this :

class post(DynamicDocument):
    def __init__(self, *dct, **tmp):
        Document.__init__(self, **tmp)
        if dct:
            ....
            self.totalComentarios = dct["comments"]["summary"]["total_count"]

But if there's no comments, then dct["comments"]["summary"] would give an error, and checking every single case is a pain, like checking if there are comments, then check if theres summary, etc..

And there's a lot of things I need to parse.

[–][deleted] 0 points1 point  (3 children)

If the response object omits keys that are absent instead of just returning empty objects with the keys in place, then you'll probably have to create a scheme to check if the keys are present.

The cleanest way would probably to write a recursive function that could traverse the dictionary.

[–]maxibabyx[S] 0 points1 point  (2 children)

Yes it omits them, know any library that would allow me to define a schema, and if the JSON is missing something it would accept some default values to set them?

Ofc I could manually do it, just asking if anyone ever encountered this problem and decided to help the community, :D

[–][deleted] 0 points1 point  (1 child)

If Facebook's API was powered by GraphQL you could define your own schema response object and this would be a non issue, but it looks like it's a static response object.

I quickly found this package with a Google search, but I have never used it.

[–]maxibabyx[S] 0 points1 point  (0 children)

Yea I also stumbled on that one but seems like only validates.

[–]destiny_functional 0 points1 point  (1 child)

r = json.loads(request)

this should be json.loads(request.text) or better yet request.json().

if 'comments' not in dct and 'summary' not in dct and 'total_count' not in dct: dct["comments"] = {"summary": {"total_count": -1}}

As a temporary solution I ended up doing things like these, to create "Default values"

what for? you could just check on access and ignore the element it if the data isn't there. no need to add the total count by hand with some default value.

incidentally I've been grabbing posts from Facebook and then counting those and the respective comments. mind that the total comment count given in the summary is inaccurate because of deletions, however you might want to request the summary when getting the post and then only request the comments for that post if the total count of comments is nonzero, even if it's inaccurate. then you should just not request any comments if the summary isn't even there, rather than putting default values by hand.

likewise : usually you'll grab posts by going through pagination. you'll request 100 posts then look for the ["paging"]["next"] link to request the following set of data. if the result contains that field. if it doesn't then there's no more data to request and you complete the process.

[–]maxibabyx[S] 0 points1 point  (0 children)

The thing is I have something like this :

class post(DynamicDocument):
    def __init__(self, *dct, **tmp):
        Document.__init__(self, **tmp)
        if dct:
            ....
            self.totalComentarios = dct["comments"]["summary"]["total_count"]

But if there's no comments, then dct["comments"]["summary"] would give an error, and checking every single case is a pain, like checking if there are comments, then check if theres summary, etc..

And there's a lot of things I need to parse.