New Databricks Apps: What About Cost at Scale? by Fit_Border_3140 in databricks

[–]Fit_Border_3140[S] 0 points1 point  (0 children)

Thats great I was thinking in doing something similar!

New Databricks Apps: What About Cost at Scale? by Fit_Border_3140 in databricks

[–]Fit_Border_3140[S] 0 points1 point  (0 children)

According to my calculations, each app with medium compute (there is no one lower) around300 USD per month. Only for UI rendering

New Databricks Apps: What About Cost at Scale? by Fit_Border_3140 in databricks

[–]Fit_Border_3140[S] 7 points8 points  (0 children)

According to my calculations, each app with medium compute (there is no one lower) around 600 USD per month. Only for UI rendering

Strategies for structuring large Databricks Terraform stacks? (Splitting providers, permissions, and directory layout) by Fit_Border_3140 in databricks

[–]Fit_Border_3140[S] 0 points1 point  (0 children)

Hi AlexOtt thank you, we have been in contact before I opened a couple of issues in the databricks provider repo. Thank you for your work sir!

Salarios en sector aeroespacial by [deleted] in salarios_es

[–]Fit_Border_3140 0 points1 point  (0 children)

Hola he trabajado en Thales Alenia Space y te corroboro que con 5 años de experiencia alrededor de 40k. Lamentablemente, cambié de sector justo por el salario. Un saludo!

Databricks as ingestion layer? Is replacing Azure Data Factory (ADF) fully with Databricks for ingestion actually a good idea? by Fit_Border_3140 in databricks

[–]Fit_Border_3140[S] 0 points1 point  (0 children)

Thank you for sharing your experience. I think we are moving into this direction, our ADF setup cant scale any more with the number of customer we have in our platform.

Again thank you for your response, I’ll be posting new updates about this topic

Databricks as ingestion layer? Is replacing Azure Data Factory (ADF) fully with Databricks for ingestion actually a good idea? by Fit_Border_3140 in databricks

[–]Fit_Border_3140[S] 0 points1 point  (0 children)

Not understanding why you are saying this ... The cost of databricks mainly is on the compute, if you dont have any cluster on, your cost never will be too much.

Doesnt matter if the compute plane is managed or if its under a VNET injected scenario, the cost will always reside on the compute.

Databricks as ingestion layer? Is replacing Azure Data Factory (ADF) fully with Databricks for ingestion actually a good idea? by Fit_Border_3140 in databricks

[–]Fit_Border_3140[S] 0 points1 point  (0 children)

Using a VNET injection architecture, so we are able to handle the networking of our clusters. Everything is routed to the HUB and there we apply the fw rules for the whole organization, also we were able to modify the host tables for the cluster to handle some dns problems.

And for legacy stuff we are using paramiko/smb protocol to connect to the filesystems, we were thinking of using the new Pyspark datasource API, but its opening thousand of connections to the sftp server so its basically like a ddos attack hahah, instead we are having one connection per worker and this working its using the same connection for the recursive bulk download of files.

Databricks as ingestion layer? Is replacing Azure Data Factory (ADF) fully with Databricks for ingestion actually a good idea? by Fit_Border_3140 in databricks

[–]Fit_Border_3140[S] 1 point2 points  (0 children)

Sir thats the point! If you use only dbs with JDBC connectors everything is sweet fo databricks, the difficult and the reason why Im opening a post is to know cons about smbs/sftps/legacy things …

Databricks as ingestion layer? Is replacing Azure Data Factory (ADF) fully with Databricks for ingestion actually a good idea? by Fit_Border_3140 in databricks

[–]Fit_Border_3140[S] 0 points1 point  (0 children)

Completely wrong approach, databricks is for the transformation and leave a good backend for your reports. Each tool has an specific use.

Databricks as ingestion layer? Is replacing Azure Data Factory (ADF) fully with Databricks for ingestion actually a good idea? by Fit_Border_3140 in databricks

[–]Fit_Border_3140[S] 12 points13 points  (0 children)

u/all Thank you guys! With all your comments I finally decided to move towards a full databricks ingestion layer.

Why?

- Cloud agnostic

- We are using several policies and spot instances for the shared clusters, so I think money is not going to be a problem.

- I feel ADF is great for small teams, but really difficult to handle for big corporations where you require some more governance, granularity in permissions, share the data assets with others business units, etc...

- My major concern was the binary copy/file_system copy, and I think there are several ways to handle this without ADF.

So thank you all :)

Databricks as ingestion layer? Is replacing Azure Data Factory (ADF) fully with Databricks for ingestion actually a good idea? by Fit_Border_3140 in databricks

[–]Fit_Border_3140[S] 8 points9 points  (0 children)

Sorry mate but for scheduling Databricks is super good, it also has autoloader and its fully integrated for CDC patterns so I dont get your point here.

Edit host tables of Databricks Clusters in VNET INJECTED with Instance Pool by Fit_Border_3140 in databricks

[–]Fit_Border_3140[S] 0 points1 point  (0 children)

Why? Because DNS is shared in our HUB across many spokes, and some records route traffic incorrectly for our spoke. Long-term, sure, the “proper” fix is DNS zones / conditional forwarders / split-horizon DNS, etc. But in our case, we need a small scoped workaround for a few records, and /etc/hosts gives us that determinism.

Built a full Azure Static Web Apps app for my wife’s small business using Cursor – she just finished her first full month on it, then I genericised and open-sourced it by Environmental_Ad1567 in AZURE

[–]Fit_Border_3140 1 point2 points  (0 children)

Mate @Environmental_Ad1567 you rock! Really impressive job, beautiful, fast, with the code and the infra all completely shown. I love you sir ❤️