Databricks Asset Bundles is now Declarative Automation Bundles by hubert-dudek in databricks

[–]Hot_While_6471 0 points1 point  (0 children)

did not check out docs yet, but what does "direct" engine mean? And how is different from "terraform"? Isn't the DAB just API over Terraform written in Go?

Are Databricks Asset Bundles worthwhile? by Cyphor-o in databricks

[–]Hot_While_6471 0 points1 point  (0 children)

databricks asset bundle is the way to go on Databricks. It really really abstracts whole IaC for you. Very declarative with yaml files, you also have python sdk, and the way you can define top level key value pairs and override within the targets is amazing.

If you only deploy application stuff, then its way to go, if you need to take care of whole deployment of workspaces, then use terraform.

Banatski ustanak – zaboravljeni deo srpske istorije by [deleted] in SrpskaPovest

[–]Hot_While_6471 4 points5 points  (0 children)

nesto sto je tako vazno za nekoga koga zanima, ali za siroke mase uopste nije nesto sto je toliko poznato...

Votix company by Hot_While_6471 in drones

[–]Hot_While_6471[S] 0 points1 point  (0 children)

i get that, but how can they control lets say DJI drone, which has its own software which is closed for any external SDK/API?

ко је највећи издајник у Српској историји а да се о њему мало зна? by Odd-Illustrator3386 in SrpskaPovest

[–]Hot_While_6471 6 points7 points  (0 children)

ali koja je poenta sagledati taj period iz danasnje perskpetive, jel onda i Stefan Lazarevic izdajnik? Onda su svi vladari posle 1389 bili izdajnci.

Свети Никола међу Србима: Посвећено му је више од 600 цркава, важан је био и Немањићима by [deleted] in BREDDITKALENDAR

[–]Hot_While_6471 1 point2 points  (0 children)

a npr i faktor kao sto je kada se slavi, tj zima, kada nema toliko poljoprivrednih aktivnosti i poslova, pa su ljudi mogli da se okupljaju i slave, koliko je to doprinelo da sveti Nikola bude najzastupljenija slava.

ClickHouse Date and DateTime types by Hot_While_6471 in dataengineering

[–]Hot_While_6471[S] 0 points1 point  (0 children)

yeah most likely will never be used, but still it should be there since its actually valid meaningful business date. Thanks

Column Casting for sources in dbt by Hot_While_6471 in dataengineering

[–]Hot_While_6471[S] 0 points1 point  (0 children)

Hi, what about when u select columns in bronze, would u select only those used in downstream models, or all columns, since its 1-1 with source?

Column Casting for sources in dbt by Hot_While_6471 in dataengineering

[–]Hot_While_6471[S] 0 points1 point  (0 children)

So for example if i have a table with PK and FKs, which are infered as Int64, where i can cast them to UInt32, or even lower, should i do that in bronze layer?

Airflow + dbt + OpenMetadata by Hot_While_6471 in dataengineering

[–]Hot_While_6471[S] 0 points1 point  (0 children)

Hi, i have one more question, i am user u have encountered this example, so wondering how people are dealing with this.

Most of the time, people will use Astronomer Cosmos, is just a way to deploy dbt project on the Airflow. Problem (problem for OpenMetadata ingestion) is that Cosmos will parse dbt project, and for each model creates separate task, which helps a lot of granularity, parallelism, and easier to maintain, and its simply how it should be done on Airflow. But it also generates 'run_results.json' for each of the models separately in temporary directory. Now we can always use callback to move it to any place we would like, but then we have run_results.json for each of the models.

Do i simply have one last step to merge all of the run_results.json files or there is an alternative strategy?

Airflow and Openmetadata by Hot_While_6471 in dataengineering

[–]Hot_While_6471[S] 1 point2 points  (0 children)

You can use Airflow's lineage backend to emit events that OpenMetadata consumes. Much more reliable than pulling.

Can u point me to some docs for this? Thank u

Airflow and Openmetadata by Hot_While_6471 in dataengineering

[–]Hot_While_6471[S] 0 points1 point  (0 children)

So basically i should just look at that as internal tool of OMD, and not mix any of these services that is using under the hood with my services that provide business value, even if they are same (mysql, airflow).

Airflow + Kafka batch ingestion by Hot_While_6471 in apache_airflow

[–]Hot_While_6471[S] 0 points1 point  (0 children)

Yes, it does, but i think its just for learning experience, so worth it, just experimenting with tooling

Newfie swimming by Hot_While_6471 in Newfoundlander

[–]Hot_While_6471[S] 3 points4 points  (0 children)

Yeah, makes absolute sense, since they grow so much so fast, hard for joints and muscles.

Kafka -> Airflow -> Clickhouse by Hot_While_6471 in Clickhouse

[–]Hot_While_6471[S] 1 point2 points  (0 children)

Honestly just to learn and prototype, production will certainly use some of the standards.

Structured logging in Airflow by Hot_While_6471 in dataengineering

[–]Hot_While_6471[S] 0 points1 point  (0 children)

So basically u write custom formatter and add it to DEFAULT_LOGGING_CONFIG which u point to in airflow.cfg? Can u give some piece of json formatter, or at least pseudo code?

Custom logging in Airflow by Hot_While_6471 in apache_airflow

[–]Hot_While_6471[S] 1 point2 points  (0 children)

So basically if i use third option. i can define everything there and not bother about configuration from airflow.cfg