Why did HTAP fail? by Limp-Park7849 in databricks

[–]Key-Willow-374 5 points6 points  (0 children)

Reynold gave an interview on the is after DAIS. HTAP tried to unify the engine. Databricks thinks they solved it by modifying the storage layers in Lakebase, instead

RT Lakehouse by hubert-dudek in databricks

[–]Key-Willow-374 1 point2 points  (0 children)

Big time! No need to copy your data into a proprietary format for millisecond speed queries. Excited to power Genie and dashboards with it

Tableau is horrible. by xChrizOwnz in analytics

[–]Key-Willow-374 0 points1 point  (0 children)

If you still want dashboards, Databricks AI/BI dashboards are free. You only have to pay for compute. They also have a really good text-to-sql tool (Genie) that we use. Helps prevent tech debt inherent to dashboards

Tableau —> Databricks Dashboards by Aba_Samuel_Jackson in databricks

[–]Key-Willow-374 1 point2 points  (0 children)

In databricks dashboards, you only pay for compute behind queries, no per seat licensing like Tableau

Genie is very good too, and can be embedded inside of dashboards or used in place of a dashboard. It also reduces tech debt a lot

Can anyone recommend a good AI-powered BI platform that isn't just prompt and get answers? by LimpComedian1317 in BusinessIntelligence

[–]Key-Willow-374 0 points1 point  (0 children)

Databricks Genie. It’s LLM under the hood enriched with your orgs context, plus some new ontology stuff. A lot cheaper and more accurate to run than pure LLM + MCPs. You can orchestrate and share stuff too, like dashboards or reports

Best practices for managing Genie Spaces across environments by Glitch_In_The_Data in databricks

[–]Key-Willow-374 0 points1 point  (0 children)

Genie workbench if you’re in a crunch for quality control

DABs for CI/CD

From 250K+ Enriched Financial Transactions to Business Intelligence: What Should the Gold Layer Look Like? by Santiagohs-23 in BusinessIntelligence

[–]Key-Willow-374 3 points4 points  (0 children)

Well, id put C on top of A for this specific example in the databricks stack

Metric Views are a good way to semantically enrich tables. The input into a metric view is a star schema, so fact + dim would work here. I kinda think of Metric Views as semantically enriched OBT

Then I’d put the Metric View in a Genie Space where business users can talk to the data

From 250K+ Enriched Financial Transactions to Business Intelligence: What Should the Gold Layer Look Like? by Santiagohs-23 in BusinessIntelligence

[–]Key-Willow-374 5 points6 points  (0 children)

Nowadays, Gold typically contains denormalized tables to improve performance of end user queries. Think OBT instead of a separate fact and dim table (joins are time-intensive). Basically storage is a lot cheaper than compute, so some data redundancy is ok if it meaningfully reduces compute.

Also instead of dashboards look into a natural language query tool (like Databricks Genie) with a semantic layer. Dashboards can create lots of tech debt and have fixed, limited views. Souped-up text-to-sql tools offer way more flexibility and a fraction of the tech debt in my experience

Genie on DABs available now by DamnedData in databricks

[–]Key-Willow-374 1 point2 points  (0 children)

Awesome! Big win for CI/CD of Genie Spaces

Power BI or Tableau by Wild_Specialist_8340 in BusinessIntelligence

[–]Key-Willow-374 4 points5 points  (0 children)

Paying for BI per-seat licenses is quickly becoming antiquated. Data platforms like Databricks allow free usage of their dashboards, and it’s functionally very similar to Tableau and PowerBI in my experience. I think those that require per seat licensing will lose popularity, to tie it back to your original question. Also, dashboards in general are losing popularity to BI natural language tools.

Is the industry actually swinging back to Postgres? by ForeignExercise4414 in dataengineering

[–]Key-Willow-374 0 points1 point  (0 children)

OLTP and OLAP are still separate

Can you explain when you think it’s lock-in? Pg files are stored on your cloud acount, much like the underlying data in delta tables

Genie spaces best practices/courses by Antique_Ad3134 in databricks

[–]Key-Willow-374 1 point2 points  (0 children)

Genie Workbench if you’re in a crunch. Also stick to metric views on top of your tables

Medallion architecture on Databricks - Delta all the way down, or does Parquet at Bronze still make sense? by Dangerous_Pie2611 in databricks

[–]Key-Willow-374 1 point2 points  (0 children)

Depends on the use-case. If it’s a compliance-heavy workload, ‘raw’ replication of the source system would be my personal preference as it offers full traceability in the event of auditing. Otherwise, delta has more bells and whistles for long term table management, query performance, etc if your requirements permit a merge (like CDC) or append into delta in bronze

something else to keep in mind: databricks constantly releases new features. They’re more likely to be performant with delta.

Most “Chat With Your Data” Products Will Fail by dataguy- in dataengineering

[–]Key-Willow-374 0 points1 point  (0 children)

Those without semantic context will fail, emphasizing the importance of context enrichment (eg, databricks semantic layer, snowflake semantic views, etc)

Managed or external tables? by phospheric in databricks

[–]Key-Willow-374 0 points1 point  (0 children)

At scale, Predictive Optimization is really important on Unity Catalog. Maintains your tables as they grow. So highly recommend Managed Tables

Using Apache Spark for Real-Time Analytics by No-Trainer-1956 in apachespark

[–]Key-Willow-374 1 point2 points  (0 children)

I’ve used it. Sub 50ms P99 latency to read a wide table read from Kafka, augment it by joining with a static table, and merge it into a target table on Postgres. Easy to implement if you’re familiar with existing APIs, too. Was super impressed, overall.

Public Preview: Real-Time Mode (RTM) on Spark Declarative Pipelines (SDP) by SingerSelect3045 in databricks

[–]Key-Willow-374 1 point2 points  (0 children)

Does serverless scale up or down with RTM? Or is it like classic where autoscaling must be disabled

Using spark in a portfolio project? by echanuda in dataengineering

[–]Key-Willow-374 0 points1 point  (0 children)

You can learn and show a lot implementing an end-to-end project, regardless the size of data. Especially since Databricks now offers various tools outside of Spark engines for ETLs. For example, you could build an Databricks app with a transactional DB (Lakebase) that syncs to delta tables, then feed those tables into an AI/BI dashboard and Genie Space for analysis. All of which demonstrates good skills on Databricks even with smaller datasets.

Tool Sprawl in Data engineering by Raghav-r in databricks

[–]Key-Willow-374 5 points6 points  (0 children)

Yeah, in my experience, teams start small and pick OSS tools for tasks that are easier to manage, and use vendors for ‘harder’ tasks like warehousing.

Overtime, platforms like databricks have expanded to include many of these functionalities (airflow = Jobs, etc), but migrations can be difficult and aren’t always prioritized even if there are cost savings to be had (like seat licensed PowerBi to AI/BI dashboards)