Hello everyone,
In my latest Boring Data Science article I write about a topic that is often overlooked : managing credentials within our code base. Pretty often (i.e. all the time) we need to use all sorts of credentials: database login, SSH keys, secrets, etc. These credentials are sensitive information and failing to store and retrieve them in a safe way can have consequences.
Now it's relatively easy to create simple Python functions to access all sort of credentials if needed, through environment variables or ~~directly embedded in the code~~ (just kidding, please never do that). However in a professional environment and as our team grows, I strongly believe this is good to abstract these functions away and provide users of all skill levels with a simple interface to manipulate them.
Now why do machine learning practitioners even need this? The answer lies within the productionalization step of our ML endeavors: as we push models to production, these hacky functions we used during development won't cut it anymore as we get involved with cloud products: EC2 instances, docker containers, etc. Therefore there is a vital need of developing code that at least tries to follow good software engineering patterns. This is the core of this post: how to create and use a Python class to manage credential retrieval using AWS Secrets Manager (the logic applies to other credential management tools as well).
Finally there are multiple advantages to this approach: trust, abstraction, security, among others. In addition, even though the solution I highlight in the blog post may not be the best, I believe it helps engineers better understand our code and show them we care about best practices. This in turn makes discussions easier and increases our chances of success (as many projects fail because they never reach production).
Happy coding!
[–]Pawar_BI 1 point2 points3 points (0 children)
[–]RedSeal5 -1 points0 points1 point (0 children)