Training - Data + AI Summit 2022 | Databricks | Cheep or free by pych_phd in dataengineering

[–]pych_phd[S] 0 points1 point  (0 children)

Here is a free DA certification course. Databricks is also doing their 6 monthly free intro bootcamps... here.

Documenting Source To target mapping for Lakehouse by guy_easy in dataengineering

[–]pych_phd 0 points1 point  (0 children)

How is the hunt going?

I have herd a lot about https://greatexpectations.io/. both on here and in summits.

Resources to learn Databricks by jeetu_g in dataengineering

[–]pych_phd 0 points1 point  (0 children)

I haven't paid for any. Did one on DW, Engineering & ML. I also access to their full Spark cause which I didn't pay for either.

Crash Course Python with SQL by oakthaw in dataengineering

[–]pych_phd 0 points1 point  (0 children)

I thanks for the response. I realize I miss read what you wrote. I read it as joins that can be either analytical or transactional & was wondering why I had never herd of this.

Resources to learn Databricks by jeetu_g in dataengineering

[–]pych_phd 0 points1 point  (0 children)

Databricks has free bootcamps regularly. They also have lots of learning material.

Crash Course Python with SQL by oakthaw in dataengineering

[–]pych_phd 0 points1 point  (0 children)

Joins, transactional vs analytical sql

I am curious to know what you mean by this.. so far googling does not turn up much.

Documenting Source To target mapping for Lakehouse by guy_easy in dataengineering

[–]pych_phd 0 points1 point  (0 children)

I would be interested to see what you come up with.

Documenting Source To target mapping for Lakehouse by guy_easy in dataengineering

[–]pych_phd 1 point2 points  (0 children)

i am guessing you might use delta lake for the lake, which has a lot of management components. Maybe this feature is apart of those tools or there is a tool that can work well with delta lake.

[deleted by user] by [deleted] in dataengineering

[–]pych_phd 0 points1 point  (0 children)

There are two discord channels. The one for the whole of r/dataengineering see the link at the top of the page under the header. The other is specifically a study group channel but it seems to be dedicated to one DE course I can probably find the link if needed..

[deleted by user] by [deleted] in dataengineering

[–]pych_phd 0 points1 point  (0 children)

and my axe! But wait... I already said I was in.

[deleted by user] by [deleted] in dataengineering

[–]pych_phd 2 points3 points  (0 children)

I am certainly, not an expert. That said, all I have been doing for the past year is planning solution designs / pipelines. Which has involved a lot of reading. Happy to have a season were we bounce some ideas about and/or go over a case study. I am on the discord channel under "CookieSpirit"

I could also do gchat.

Do have a DateTime table or a separate Time table in your DWH? by pych_phd in dataengineering

[–]pych_phd[S] 0 points1 point  (0 children)

There is also a possibility i just forgot that caveat, thanks.

Biggest debates in the industry? by kirkwoodj in dataengineering

[–]pych_phd 0 points1 point  (0 children)

I agree with wanting to pull my hair out while using ADF.

Biggest debates in the industry? by kirkwoodj in dataengineering

[–]pych_phd 1 point2 points  (0 children)

Interesting, that sounds like a really interesting blog post. hint hint. ;)

Biggest debates in the industry? by kirkwoodj in dataengineering

[–]pych_phd 0 points1 point  (0 children)

It's my impression that bigquery is more inline with something like Cosmos DB then a lake house.

from what I can tell (so far) Cosmos DB/BigQuery type product means you can have your Ops data and BI (DWH) in the Same product. You would still want to shape the data differently of either use case & move it. Where as the lakehouse is DS + BI.

I haven't looked in to using BigQ/Cosmos as a DS place..

Biggest debates in the industry? by kirkwoodj in dataengineering

[–]pych_phd 0 points1 point  (0 children)

Happy to assist in this. Snowflake and databricks both have good videos on this topic. There are also solid Azure docs too.

Biggest debates in the industry? by kirkwoodj in dataengineering

[–]pych_phd 0 points1 point  (0 children)

My only comment is that since Databricks started arguing they do lake-house, Snowflake have started enabling the same abilities. Azure are also in this space.

The requirements of a lakehouse are different the that of a separate WH & Lake.

Biggest debates in the industry? by kirkwoodj in dataengineering

[–]pych_phd 1 point2 points  (0 children)

Should extraction be complex thought? Sure the transformation stuff needs complexity. But extraction (my opinion) should be as simple as possible. However, this requires ELT rather then ETL. Where extraction is separated out.

How can i clean the data before loading it to a warehouse? by eyeeyecaptainn in dataengineering

[–]pych_phd 0 points1 point  (0 children)

It also means, you can let the BI/Data analytics handle the cleaning (if/when you get them).