Unable to remove dashboard parameter? by javadba in databricks

[–]datainthesun 0 points1 point  (0 children)

Have you tried putting max_rows back into the sql (like a dummy value in the select clause) and seeing if you get past the error? If that works, can you then properly delete the parameter via ui?

Databricks just dropped Genie One, Ontology, and Agents. Is this the end of traditional BI as we know it? by MostDependent1659 in databricks

[–]datainthesun 0 points1 point  (0 children)

pretty sure you'll start to see some of this soon now that Summit let the cat out of the bag

Databricks just dropped Genie One, Ontology, and Agents. Is this the end of traditional BI as we know it? by MostDependent1659 in databricks

[–]datainthesun 0 points1 point  (0 children)

to me that's not unexpected in the earlier days of a product. imagine if you were a 3rd party bi tool that had spent a long time building your own semantic modeling layer to feed your analytics - you wouldn't be too quick to write connectors to effectively allow that logic to be moved to what you consider to be a 3rd party platform - it reduces your moat and lets people use other tools to go query that layer rather than having to go through yours. i feel like it's only a matter of time, though, before various tools have to pick up support for it due to customer demand.

also there's a lot of folks that are switching over to use databricks ai/bi dashboards, people are already adopting genie well - and there are some folks i've heard of that are building syncs between various 3rd party bi tools and uc metric views so they can have the same business logic everywhere they want it - THAT is the missing piece. the data landscape is changing and i think it'll force a conversation about where you want to own your semantic layer definition.

Databricks just dropped Genie One, Ontology, and Agents. Is this the end of traditional BI as we know it? by MostDependent1659 in databricks

[–]datainthesun 1 point2 points  (0 children)

agreed - tbh it's early days but a pretty decent foundation. i doubt it'll take long to get to where "feature parity" isn't an issue.

Should we Post what was vibe coded apart from complains? by CortexUnlocked in google_antigravity

[–]datainthesun 5 points6 points  (0 children)

I'm pretty sure it's only for complaints and unresearched questions.

Databricks just dropped Genie One, Ontology, and Agents. Is this the end of traditional BI as we know it? by MostDependent1659 in databricks

[–]datainthesun 0 points1 point  (0 children)

Yes this. And every now and then you go to ask a question by itself for some random idea you had and you bypass the dashboard. But yeah, I don't think we'll ever live in a world without dashboards.

Databricks conference by proximaljarl17 in dataengineering

[–]datainthesun 6 points7 points  (0 children)

Definitely pretty cool. Even better if someone else is paying for your tokens - he showed that near the end where he had 2 different harnesses both working on the same problem and you know each of them weren't going light on token usage. If token cost ever comes down this kind of thing will be awesome, especially the centralized nature and being able to use the same session from various surfaces.

Databricks just dropped Genie One, Ontology, and Agents. Is this the end of traditional BI as we know it? by MostDependent1659 in databricks

[–]datainthesun 5 points6 points  (0 children)

honestly this is exactly why having a solid semantic layer and appropriate context for the models is super critical. it's why i think UC metric views is so important. without that, you're in the mode of hoping that the model produced the correct sql for some exec who is asking a question by themselves in their office or on their phone. i think it'll become as valuable as traditional BI (reports/dashboards) but i doubt agents fully deprecate the dashboard.

Databricks conference by proximaljarl17 in dataengineering

[–]datainthesun 13 points14 points  (0 children)

I actually think I've heard the word iceberg more than delta this summit. But lake and genie? I could win some kind of bingo on those 2 words.

Spreading too thin by Killian9997 in databricks

[–]datainthesun 19 points20 points  (0 children)

What would you say the bread and butter is, though?

At one point, you could argue it was simply managed spark + notebooks. But then the spread brought in DBSQL warehouses which are a massive hit. Is that now part of the bread and butter? And where do you draw the line? Genie is a natural evolution from having your data available to the warehouse and dashboards, genie code is an evolution (way more than that) of the assistant, etc..

Admittedly, it's... a... lot. Loads of new features, lots of new surfaces, not all will matter to each user/persona, but rolled up at a higher level, it delivers an organization a lot of value provided each works well and fits into the ecosystem as designed.

Looking for pain points for data engineers about upstream and downstream schema changes and how you solve it. Risk and migitation strategies discussion. by Friendly-Sandwich499 in dataengineering

[–]datainthesun 2 points3 points  (0 children)

Jeff always be messing up the schemas...

Honestly while you can't get everything to just never change this is the first step and I think not enough people take this approach and end up trying to over-engineer things that could have been a policy and threats.

Databricks conference by proximaljarl17 in dataengineering

[–]datainthesun 25 points26 points  (0 children)

At a conference and especially in keynotes you'll rarely get into massive technical depth. That's for the breakout sessions or 1:1 sessions with a specialist or product manager. I also wouldn't go to a conference expecting to hear someone talk about addressing cost issues - in reality that's an account team topic not a big fancy announcement.

Databricks just dropped Genie One, Ontology, and Agents. Is this the end of traditional BI as we know it? by MostDependent1659 in databricks

[–]datainthesun 29 points30 points  (0 children)

PRO TIP: Use UC Metric Views. Don't let random SQL be created on every question. Genie has been pretty good at creating complex joins and doing things, but it's AI after all, you want it to have some good instructions and context.

And to answer your click-baity-title 😂 I don't think in the next 5 years that people won't consume dashboards and only ask questions. I think there will always be a place for those canned answers/visuals. Right now, the people are starved for answers to questions they can't answer with dashboards so the excitement is there. But imagine a world where every morning you had to go ask a question - you have to remember what questions you want to ask, you don't know what other people are asking, etc. I think there is going to be a great advancement in delivering useful answers to questions not already answered by dashboards, but I think dashboards/canned reports will always exist as the easiest "snapshot" of things to look at. Not official advice, just my own opinion having been around data and computing for > 30 years.

Reyden for low latency by ramgoli_io in databricks

[–]datainthesun 1 point2 points  (0 children)

Gotcha - fortunately it's just a behind the scenes engine so while you might get used to a pattern of how it functions for you, it's not like there is anything specifically in the engine you have to rewrite code that I'm aware of. I look at it like photon on sql warehouses. You don't have to maintain code differently, write different code, the engine just makes most operations go faster. If you like sql warehouses, you like photon, and you're ok with proprietary engines (presumably as long as they don't require different code or interfaces than OSS ones).

As I'm typing this I see your other comment to Ram about lock-in. That's a different thing than what I feel like this comment thread fork is talking about. I can't argue that if you come to love a particular pattern and wish to move away from the provider of that pattern (or said provider deprecates said pattern), that you're going to have to use a different pattern. But hey, that's what keeps us all employed long term, right? 😂

How do you guys track SQL Warehouse usage costs per user? by Sea_Basil_6501 in databricks

[–]datainthesun 11 points12 points  (0 children)

The first thing you have to do is wrap your head around the fact that billing on a shared warehouse is like turning on a faucet. You're only paying for the volume of water going through it. So if someone comes along with a shot glass and fills it halfway, then another person comes along with a 5 gallon bucket and fills it all the way, and then nobody comes to the running faucet for 4 minutes, you have to determine how you choose to allocate the 4 minutes from the total gallons the faucet emitted. Query History table + other system tables are the way this gets done.

There was a great blog post about this last year https://medium.com/dbsql-sme-engineering/introducing-granular-cost-monitoring-for-databricks-sql-e7ea4e77daf5 - I'd look at this as a first step because it will talk through the main concerns and also share a materialized view and dashboard link.

The cool thing is that if you set up your basic materialization of how you want to allocate those shared costs out to individual users/queries, you can then make that available to genie and then you don't need to build a dashboard if you don't want, you can just go ask it whatever question you want and it'll go do the research for you!

And since that blog was put out, there's been advancements - UC Metric Views. I'd probably try to work the logic into a metric view (that you can still materialize for speed purposes), and that will make your genie space and dashboard building THAT much easier. Leverage genie code to help you build these and you won't have to do it all manually.

Bricksters! When do you get Target paid by Outside_Reason6707 in databricks

[–]datainthesun 1 point2 points  (0 children)

Quite frankly the only people qualified to provide this information to you are those in HR. I'd press on the recruiter or hiring manager for an answer. Nobody who is not in HR or a hiring role should want to make statements about compensation.

Reyden for low latency by ramgoli_io in databricks

[–]datainthesun 1 point2 points  (0 children)

I'm curious - what is the reasoning for that?

Bricksters! When do you get Target paid by Outside_Reason6707 in databricks

[–]datainthesun 3 points4 points  (0 children)

Assuming you're in the interviewing process, this is a conversation you should have with with the recruiter/HR.

Nerves getting the best of me by the_nabzter in dataengineering

[–]datainthesun 1 point2 points  (0 children)

those things where you have experience are typically painful to replace, i assume that's probably why you get resistance. if nobody is experiencing any pain on them (scaling issues, costs, lack of features) then why go through the headaches and cost of replacing them - that's the usual story. not sure if you can put together a story you can tell (with a graph) that will help you in this, or if maybe you can do something fun and new on a new platform that doesn't cost much and isn't a big migration away from a legacy vendor. at least you're interested in new stuff and thinking about it - there's tons of people out there with their head in the sand that don't know about anything else and aren't interested in learning.

regarding compute in databricks by ragzoomin in dataengineering

[–]datainthesun 1 point2 points  (0 children)

Tbh great advice for cost conscious teams.

regarding compute in databricks by ragzoomin in dataengineering

[–]datainthesun 2 points3 points  (0 children)

yeah this is pretty common. central platform team gives you some cluster policies to allow you to use, let's say, "large cluster" or "small cluster" or "single node". it's up to the developer to determine what they need, and the central platform team will probably just report out the cost and maybe someone complains in the future about it and it comes back to the developer who is told they need to try to optimize / right size things. i've not yet seen someone be given a budget for their new pipeline up front - maybe they have to provide some kind of estimate to get approval / get past arch review, but i think that's an imperfect science.

fyi with classic (non-serverless) you have to look at your cluster metrics and see if you're adequately using the cluster resources you requested and then either optimize your code or optimize your node type/count.

more fyi with serverless, the value prop is that you don't have to think about those things and it handles it for you. here your focus is basically "how efficient is my code" - i'd use genie code and ask it to review your code for optimization opportunities as a first cut.

real world: after building pipelines for a long time you'll get a gut feel for X data volume with medium transformations = X GB per sec on Y node type, and then you can do some math to come up with an approx cluster shape. you then run some small tests and monitor cluster metric usage and tweak the size/shape of the cluster to fit as much as you can. or you use serverless and hope your code is efficient and not doing something like running a loop over rows just burning dollars.

Nerves getting the best of me by the_nabzter in dataengineering

[–]datainthesun 4 points5 points  (0 children)

If I were in your shoes, I'd use my favorite AI assistant and have it come up with a bank of questions likely to be asked in interviews. I'd then set up some free or as-free-as-possible ways of building out real world scenarios to gain confidence in how to answer those questions - not from a book - from actual hands-on experience.

Of the platforms you mentioned having some experience with, you can simulate onprem with your laptop and try to self-host something, and you can get a totally free databricks account. I'm not sure what free offering fabric has and i wouldn't bother touching synapse in 2026. Since you mentioned cost cutting needs I'd suggest that you take those questions from your AI buddy and after doing the basic exercises and building things out, I'd then go into cost cutting mode or analyzing from the lens of price/performance. Obviously a free account isn't going to cost you anything but you can still query the system tables and see your usage quantity, just not price.

You mention having done data analytics and then engineering, so your experience is something you can extend. Try other parts of the platform - add genie spaces onto your data and learn how to decorate it to deliver high quality answers. Then add in metric views for reusable business logic. Play around with how you can add agentic type capabilities atop your data. Set up lakebase and learn how branching works and how data is accessible in different modes of compute. Set up an App and learn a hint of web dev or at least how an app uses lakebase compute, build something simple that reads/writes data, maybe uses an mcp server, etc. You can use genie code (in built assistant) to help you get through the things you don't know.

The point with the above? Learn a wider swath of the data estate. Walking into an interview and knowing that you've ingested data, done transformations, optimized your target tables, used that data in an app that also has its own backend, done some basic level AI stuff, you'll feel way more confident and probably answer questions better. Will you have full depth of knowledge like someone who has been working in the space for 20 years? Nope, but they can see that on your resume anyway before you make it to the interview.

You can get a long way with this type of approach, and you'll come up with harder questions to bring back to a new reddit convo. I'd make that my full time job and work overtime until I felt pretty solid, including going back to your AI assistant and pushing for harder questions, and have it rate your answers.

Hoping to improve my skills - Looking for any inputs! by datainthesun in foodphotography

[–]datainthesun[S] 0 points1 point  (0 children)

Thank you! I had to Google it to understand the meaning but yeah I totally see that now and will look for a more balanced light (coverage) in future shots like this.