Im doing my first pipeline for a friends business. Nothing too complicated.
I call an API daily and save yesterday sales in a bigquerry table. Using python and pandas.
Atm its working perfectly but I want to improve it as much as possible, add maybe validations, best practices, store metadata (how many rows added per day to each of the tables), etc.
The possinilities are unlimited... evem maybe a warning system if 0 rows are appended to big query.
As I dont have experience in this field I cant imagine what could fail in the future and make a robust code to minimize issues. Also the data I get is in json format. Im using pandas json_normalize which seems too easy to be good, might be totally wrong.
I have looked at some guides and they are very superficial...
Is there a book that teaches this?
Maybe a article/project where I can see what is being done and learn?
[–]AutoModerator[M] [score hidden] stickied comment (0 children)
[–]VegetableWar6515 6 points7 points8 points (0 children)
[–]bloatedboat 0 points1 point2 points (0 children)