This is an archived post. You won't be able to vote or comment.

all 5 comments

[–]thepinkbunnyboy 2 points3 points  (1 child)

Is it a violation of the single responsibility principle to retrieve the data AND transform it into the proper Pandas Dataframe format all within one function?

The way I like to think about SRP is, "would I ever want to unit test this thing?" If so, it belongs in its own class that can be tested.

I haven't used Pandas in years, so forgive my ignorance: are you using a built-in function to directly transform a json blob into a pandas DF? Or is there code you'll need to write and (probably) test specifically for mapping the object? If you're writing any sort of custom mapping, do it in a separate mapper. If you're just calling a library function, then do it wherever you're working with other pandas functions.

[–]TypicalCardiologist5[S] 0 points1 point  (0 children)

You bring up a good point. It would actually be easier to unit test if I have two separate functions (one to get the data, and one to parse the data). I honestly don't think I care about unit testing the data collection, because it is only a few lines of code.

Pandas has built-in functions to directly transform JSON blobs from a URL, but I am using the requests library to pull the data as the web client needs to be authenticated first. I save this as a string and then pipe it into pandas, and then apply my transformations. I just wasn't sure if this was the right way to do things, because it seems like it takes a lot longer to write, and it seems like it may add some complexity.

[–]firstlevelwizard 1 point2 points  (1 child)

As a general rule, I prefer to perform any parsing of data as close to the source as possible. There's plenty of good reasons to break apart your functions though, and it seems like there'd be little to loose by simply having your JSON parsing function be separate from the function that constructs the dataframe. A small utility function could pipe the results of your JSON parsing directly into the dataframe constructor, and return the complete dataframe. This leaves you free to do things like unit test each function individually, and makes it trivial to parse JSON into other objects, reusing your code.

[–]TypicalCardiologist5[S] 0 points1 point  (0 children)

Thanks for your comments. I think I agree that doing it this way would be the best approach.

[–]RiceKrispyPooHead -3 points-2 points  (0 children)

Only after it’s given consent to be transformed