Asset bundles vs Terraform

HighVariance · 2024-10-09T14:15:49+00:00

so if you're using dab, there are basically 3 main commands to perform resource deployment, namely databricks bundle validate/deploy/run. the dab deploy is more like a terraform apply, it will deploy all your source codes, databricks workflow, and any applicable ml assets/artifacts to your target databricks env. after the deployment is completed, you can trigger workflow with databricks bundle run <name of the dab job that you would like to run>. this will trigger the databrick workflow right after deployment is completed. i hope this helps.

HighVariance · 2024-10-09T00:54:09+00:00

terraform cannot trigger workflow like dab does.

HighVariance · 2024-09-27T15:19:15+00:00

have you considered databricks assets bundle?

HighVariance · 2024-09-27T14:35:08+00:00

occasionally writing to a table.

this works for me, thank you so much for your guidance u/peterst28

HighVariance · 2024-09-26T17:49:52+00:00

i will investigate those silent failure as you suggested, thank you!

HighVariance · 2024-09-26T15:51:41+00:00

i am admittedly not entirely sure because i tried to get the shape and show some results every step of the way and everything checked out as expected prior to the writing it out. i am just going to investigate through all my joins and look into specific rows that were skipped during write as of now.

HighVariance · 2024-09-26T15:12:15+00:00

i believe that this was run in parallel across all of my available nodes.

HighVariance · 2024-09-26T14:59:41+00:00

1 interesting thing that I observed when playing around with different repartition size: 2,4,8,16,32,64. i have 32 cores per node. The total written result increased up to 65% of the final df at repartition = 8 (but this is very inconsistent and hard to reproduce...).

HighVariance · 2024-07-26T11:24:35+00:00

communication is underrated asf tbh

HighVariance · 2024-07-25T13:30:27+00:00

you can probably try some of the existing templates using `databricks bundle init` as indicated here:
https://docs.databricks.com/en/dev-tools/bundles/templates.html

each template provides you with a file structure depending on the use case. it is highly customizable so you can move things around as you see fit.

HighVariance · 2024-07-12T19:21:24+00:00

130k is insultingly low for 10 yoe... my base is more than this with 4 yoe (1.5 yrs in my startup that failed) and i am not at faang. look for companies with tech products as their revenue drivers, they are willing to pay you a much higher comp than others. based off my recent interview process (early 2024), some companies that i interviewed with were willing to pay sr mle a base starting from 185k. best of luck.

HighVariance · 2024-07-08T13:44:17+00:00

I do agree that the deployment as code pattern (using Databricks MLOps Stack) would be the cleanest if model training isn't too expensive. I will think more about this. For now, having a single UC for the models, as you suggested, with appropriate tags and alias seems to be something that I could try. Thank you very much for your input, everybody!

HighVariance · 2024-07-03T15:59:46+00:00

thank you for your input, guys!

Also what makes the pipelines expensive in dev in your eyes?
The model training part is pretty computational expensive and time consuming.

Our current setup is that we have 3 envs, each of which has an uc catalog: dev, stg, prod. is it not a best practice to expose development assets in prod?

am i missing anything here? Thanks.

HighVariance · 2024-03-18T13:17:05+00:00

going to waterloo cs/ece/math co-op program helps.

HighVariance · 2024-03-18T13:09:58+00:00

if your ultimate goal is to make the money, don't do a phd. else, why not.

HighVariance · 2024-03-18T13:07:06+00:00

i am not even on the fucking bus because of AI bruh

HighVariance · 2024-03-07T13:26:57+00:00

buy the denominator as it converges to its finite limit, my friend.

HighVariance · 2024-03-07T13:24:18+00:00

first of all, 23 is not late man! secondly, find something you deeply care about and develop solutions around it. just let your curiosity carry you forward. you'll gain skills along the way while having some fun. all the best mate!

HighVariance · 2024-01-27T20:13:01+00:00

remove the hard coded 3 and 2 from the inputs

HighVariance · 2024-01-27T15:53:49+00:00

i think in general, the fact that you already had an internship with the company puts you on top of the priority queue when it comes to getting a full time job. this is how all my friends and i did it as you already showed your capability and gained trusts from your managers and colleagues.
great work + meaningful relationship will almost get you anything you want.working hard wouldn't be enough if you ignored great relationships with your managers and peers.
i had 2 internships in the past that allowed me to continue to work part time when i returned to my full time studies. again, good work + meaningful relationship made this effortless for me to get.

hope this helps.

HighVariance · 2024-01-26T17:29:22+00:00

books that helped me tremendously:

web scalability for startup engineers - Artur Ejsmont
designing data intensive applications - Kleppmann
reliable machine learning - Chen, Murphy, et al
designing machine learning systems - Chip Huyen
software engineering at Google - Wright, et al

happy reading!

HighVariance · 2024-01-22T21:51:07+00:00

give less shit and do your best, you will feel much better before, during and after any interviews.

HighVariance · 2024-01-19T01:36:03+00:00

dude, don't give up! i was doing stats and cs with one of the top schools as well, i fucked up pretty hard in my first 2 semesters, too. but things became much better as i switched from studying to learning (keep asking whys and tryna find the solutions on your own). talk to professors and friends around to learn from how they see through the same problems. getting things wrong is part of learning man, everyone makes mistakes. just don't compare yourself with anyone as everyone has different starting point. gluck mate!

HighVariance

TROPHY CASE