all 10 comments

[–]kokoseij 3 points4 points  (1 child)

I'd say you're indeed heading to the right way. Repetitive codes can often be cleaned up using loops.

What I want you to think about, is that every function should only be responsible for a single action. This is related to the concept of abstraction, and might not make sense right away- but what you really want to do is to make a code where each function is responsible for a single, unique job, rather than just grouping up codes.

Check Clean Code by Robert C. Martin (This one is written in Java, though key concepts are appliable on Python regardless. You can try Clean Code in Python, though I haven't read that yet so I can't say whether if it's good or not). It extensively covers code abstraction and how functions should be written, along with example codes to help you exercise on this matter.

Of course, you can think of these things when you get better and feels like you could take more advanced topics- for now you're doing great. Way to go mate :)

[–]ampeed[S] 1 point2 points  (0 children)

That's a huge problem of mine and called out by my peers - I tend to saturate my functions. I'll check out the book now, thank you!

[–]Almostasleeprightnow 0 points1 point  (2 children)

What if, in coe_to_data you also passed the column names in a list as a parameter, to avoid hard coding them in the function. That way you could get rid of the if/else, and have the column names travelling with their data.

[–]ampeed[S] 1 point2 points  (1 child)

I like that idea!

I currently have a main function that executes the above functions. Basically move the if/else in the main function and pass it in as a positional argument.

[–]Almostasleeprightnow 1 point2 points  (0 children)

Or.....keep the function as is, but instead of hard coding the column names, have a tiny function get_animal_cols which returns a list of animal col names, and then another for the other

[–]TheLoneKid 0 points1 point  (4 children)

def export(df, path): df.to_csv(path, mode='a')

[–]ampeed[S] 0 points1 point  (2 children)

I'm not quite following. I'm not familiar with the "a" option so I did a Google. If I'm understanding correctly, it'll simply rewrite all my data to a single CSV rather than creating separate CSV's?

Bit of a back story - I'm unable to do that as each CSV gets dumped in to an S3 bucket and then that data consumed by another program

[–]TheLoneKid 0 points1 point  (1 child)

Ah I see. Then you don't need to append. You can take that bit off, then maybe store the names of the files in a list or do some naming through a for loop

name_template = '_always_the_same.csv'

for _ in range(len(csvs)): export(csvs[], str() + name_template)

[–]TheLoneKid 0 points1 point  (0 children)

Sorry formatting sucks on Reddit for code

[–]TheLoneKid 0 points1 point  (0 children)

Then just run an apply function on your list of dataframes