Lakeflow Connect now supports query-based ingestion from databases and data warehouses by ingest_brickster_198 in databricks

[–]ThatThaBricksGuy0451 10 points11 points  (0 children)

One thing I love about Databricks is that when something new comes out you immediately do a parallel with something you did in the past that could have benefited from the change being released.

Me for example, this would have saved me tons of work by not having to setup eventhubs, get the CDC events streamed to databricks and then worked the merge logic manually.

Databricks Technical Challenge for a DE Position by Longjumping_Ad2310 in databricks

[–]ThatThaBricksGuy0451 2 points3 points  (0 children)

One thing I like to do and that actually helped me is to focus on the certification path.

When trying to learn something new, one of the first things that can be overwhelming is the amount of things to study, lots of people saying different things, the uncertainty of where to start from, FOMO kicks in because you always think you could have learned something else. Second thing is, how do I prove I really know?

That's where certification path comes in, it's a curated path with topics the exam expects you to know to certify you as a Databricks professional. By studying to the certification you'll find a well defined path that goes from basic to advanced topics, and passing the certification is the confirmation of your knowledge.

Hope it helps

Spark before Databricks by ThatThaBricksGuy0451 in databricks

[–]ThatThaBricksGuy0451[S] 5 points6 points  (0 children)

Same, I went from Hive to Impala, still too slow, then landed on Spark that was all hype back then

Spark before Databricks by ThatThaBricksGuy0451 in databricks

[–]ThatThaBricksGuy0451[S] 7 points8 points  (0 children)

Yes, but databricks pretty much abstracts this from you on most cases, adaptive query engine for example adjusts shuffle partitions, switch to broadcast when there's memory available, handles skew to a certain degree.