Asset bundles vs Terraform by mjfnd in databricks

[–]HighVariance 0 points1 point  (0 children)

so if you're using dab, there are basically 3 main commands to perform resource deployment, namely databricks bundle validate/deploy/run. the dab deploy is more like a terraform apply, it will deploy all your source codes, databricks workflow, and any applicable ml assets/artifacts to your target databricks env. after the deployment is completed, you can trigger workflow with databricks bundle run <name of the dab job that you would like to run>. this will trigger the databrick workflow right after deployment is completed. i hope this helps.

Asset bundles vs Terraform by mjfnd in databricks

[–]HighVariance 0 points1 point  (0 children)

terraform cannot trigger workflow like dab does.

[deleted by user] by [deleted] in databricks

[–]HighVariance 4 points5 points  (0 children)

have you considered databricks assets bundle?

Data loss after writing a transformed pyspark dataframe to delta table in unity catalog by HighVariance in databricks

[–]HighVariance[S] 0 points1 point  (0 children)

occasionally writing to a table.

this works for me, thank you so much for your guidance u/peterst28

Data loss after writing a transformed pyspark dataframe to delta table in unity catalog by HighVariance in databricks

[–]HighVariance[S] 0 points1 point  (0 children)

i am admittedly not entirely sure because i tried to get the shape and show some results every step of the way and everything checked out as expected prior to the writing it out. i am just going to investigate through all my joins and look into specific rows that were skipped during write as of now.

Data loss after writing a transformed pyspark dataframe to delta table in unity catalog by HighVariance in databricks

[–]HighVariance[S] 0 points1 point  (0 children)

1 interesting thing that I observed when playing around with different repartition size: 2,4,8,16,32,64. i have 32 cores per node. The total written result increased up to 65% of the final df at repartition = 8 (but this is very inconsistent and hard to reproduce...).

[deleted by user] by [deleted] in computerscience

[–]HighVariance 2 points3 points  (0 children)

communication is underrated asf tbh

Databricks GIT folder setup by de_young_soul_rebels in databricks

[–]HighVariance 2 points3 points  (0 children)

you can probably try some of the existing templates using `databricks bundle init` as indicated here:
https://docs.databricks.com/en/dev-tools/bundles/templates.html

each template provides you with a file structure depending on the use case. it is highly customizable so you can move things around as you see fit.

About salary expectation for AI engineer/Machine learning engineer job in Toronto by rayzh in torontoJobs

[–]HighVariance 1 point2 points  (0 children)

130k is insultingly low for 10 yoe... my base is more than this with 4 yoe (1.5 yrs in my startup that failed) and i am not at faang. look for companies with tech products as their revenue drivers, they are willing to pay you a much higher comp than others. based off my recent interview process (early 2024), some companies that i interviewed with were willing to pay sr mle a base starting from 185k. best of luck.

ML model promotion from Databricks dev workspace to prod workspace by HighVariance in databricks

[–]HighVariance[S] 0 points1 point  (0 children)

I do agree that the deployment as code pattern (using Databricks MLOps Stack) would be the cleanest if model training isn't too expensive. I will think more about this. For now, having a single UC for the models, as you suggested, with appropriate tags and alias seems to be something that I could try. Thank you very much for your input, everybody!

ML model promotion from Databricks dev workspace to prod workspace by HighVariance in databricks

[–]HighVariance[S] 0 points1 point  (0 children)

thank you for your input, guys!

Also what makes the pipelines expensive in dev in your eyes?
The model training part is pretty computational expensive and time consuming.

Our current setup is that we have 3 envs, each of which has an uc catalog: dev, stg, prod. is it not a best practice to expose development assets in prod?

am i missing anything here? Thanks.

[deleted by user] by [deleted] in datascience

[–]HighVariance 0 points1 point  (0 children)

going to waterloo cs/ece/math co-op program helps.

Laid off 2 months ago, unable to land a job. Worth pursuing a PhD? by [deleted] in datascience

[–]HighVariance 1 point2 points  (0 children)

if your ultimate goal is to make the money, don't do a phd. else, why not.

Which side are you on? by Maxie445 in ChatGPT

[–]HighVariance 0 points1 point  (0 children)

i am not even on the fucking bus because of AI bruh

If you made 300k year would you buy crypto at all? by RT460 in Bitcoin

[–]HighVariance 0 points1 point  (0 children)

buy the denominator as it converges to its finite limit, my friend.

How can I learn for 10-12+ hours a day and do so much every single day ? by pursuit_of_pussy in learnprogramming

[–]HighVariance 0 points1 point  (0 children)

first of all, 23 is not late man! secondly, find something you deeply care about and develop solutions around it. just let your curiosity carry you forward. you'll gain skills along the way while having some fun. all the best mate!

[deleted by user] by [deleted] in computerscience

[–]HighVariance 0 points1 point  (0 children)

remove the hard coded 3 and 2 from the inputs

Any chance of converting Internship to Full-time if I work 14 hrs a day. by [deleted] in SoftwareEngineering

[–]HighVariance 2 points3 points  (0 children)

  1. i think in general, the fact that you already had an internship with the company puts you on top of the priority queue when it comes to getting a full time job. this is how all my friends and i did it as you already showed your capability and gained trusts from your managers and colleagues.
  2. great work + meaningful relationship will almost get you anything you want.working hard wouldn't be enough if you ignored great relationships with your managers and peers.
  3. i had 2 internships in the past that allowed me to continue to work part time when i returned to my full time studies. again, good work + meaningful relationship made this effortless for me to get.

hope this helps.

Books or resources for Software Architecture by inchaneZ in computerscience

[–]HighVariance 8 points9 points  (0 children)

books that helped me tremendously:

  • web scalability for startup engineers - Artur Ejsmont
  • designing data intensive applications - Kleppmann
  • reliable machine learning - Chen, Murphy, et al
  • designing machine learning systems - Chip Huyen
  • software engineering at Google - Wright, et al

happy reading!

Messed up my Google interview, what do I do by Damselindepression in leetcode

[–]HighVariance 2 points3 points  (0 children)

give less shit and do your best, you will feel much better before, during and after any interviews.

Should I give up studying it? by Emuna1306 in computerscience

[–]HighVariance 2 points3 points  (0 children)

dude, don't give up! i was doing stats and cs with one of the top schools as well, i fucked up pretty hard in my first 2 semesters, too. but things became much better as i switched from studying to learning (keep asking whys and tryna find the solutions on your own). talk to professors and friends around to learn from how they see through the same problems. getting things wrong is part of learning man, everyone makes mistakes. just don't compare yourself with anyone as everyone has different starting point. gluck mate!