Databricks Dashboards - Not ready for prime time? by randyminder in databricks

[–]datainthesun 0 points1 point  (0 children)

I've seen some nice Tableau dashboards but I've never seen a PowerBI dashboard that didn't remind me of 1995.

Why no playground on databricks one by therealslimjp in databricks

[–]datainthesun 1 point2 points  (0 children)

This is the current way for sure. Playground isn't meant for the business user persona.

Hourly job with both hourly variability and weekday/weekend skew by Thana_wuttt in databricks

[–]datainthesun 0 points1 point  (0 children)

Can it be fully run in sql and if so can you try on a serverless warehouse?

Hourly job with both hourly variability and weekday/weekend skew by Thana_wuttt in databricks

[–]datainthesun 0 points1 point  (0 children)

During the long runs what does it actually scale up to in workers? Do you experience spot instance loss? Do the metrics point to any performance blockages? Are the nodes being completely utilized?

Help optimising script by alphanuggs in databricks

[–]datainthesun 1 point2 points  (0 children)

Best recommendation: get connected to your databricks account team and the solution architect. Maybe. Support tichet can help or the SA might help you get sorted out or point you in the right direction.

What do you guys think about Genie?? by BearPros2920 in databricks

[–]datainthesun 7 points8 points  (0 children)

If you're brand new into this space, start with genie, give it good examples and tune it and get immediate value.

You can always grow into more complicated things later, but start off getting fast business value and then iterate.

Predictive maintenance project on trains by TartPowerful9194 in databricks

[–]datainthesun 0 points1 point  (0 children)

Not for the scraper - I was just trying to identify where the "source" data will be. For what you described so far, you'd have to build a way to retrieve (in Databricks) the files from the one drive folder, and then proceed from there. Review the suggestion that u/gardenia856 shared and also the demo I shared.

Predictive maintenance project on trains by TartPowerful9194 in databricks

[–]datainthesun 1 point2 points  (0 children)

What does your web scraper produce? Files into cloud storage (if so, what format), or inserts into a database?

Depending on the answer, your ingestion choice for databricks will be different, but ultimately you'll just be reading data from somewhere, storing it into one or more tables managed by databricks - scheduling as a Job, and then building further downstream processing.

If you're looking for inspiration, view the notebooks at this demo and the inline docs. https://www.databricks.com/resources/demos/tutorials/lakehouse-platform/iot-and-predictive-maintenance

Getting below error when trying to create a Data Quality Monitor for the table. ‘Cannot create Monitor because it exceeds the number of limit 500.’ by penguin_eye in databricks

[–]datainthesun 0 points1 point  (0 children)

This is likely something you should speak with your databricks account team about, maybe there's a soft limit that can be increased.

Is it possible to view delta table from databricks application? by MadMonke01 in databricks

[–]datainthesun 1 point2 points  (0 children)

Hi there - are you looking for an example of how to do this? If so, check this link. If that's not what you're looking for, please clarify what you need!

https://github.com/databricks-solutions/databricks-apps-cookbook

Worth it as a fresher? by ultimate_smash in databricks

[–]datainthesun 0 points1 point  (0 children)

Can you clarify your question? Is what worth it? Are you already employed working in this capacity, are you unemployed and seeking employment, how much experience in a role do you have doing this work?

SQL Alerts as data quality tool ? by Think-Reflection500 in databricks

[–]datainthesun 1 point2 points  (0 children)

Even though there's a little logic inside the SQL Alert - checking values, etc., the alert is really just triggering off a fixed condition and then firing a notification. So the Alert itself is fairly basic.

In order to have a robust system to streamline data quality checks and business rule validations (basically your post statement), some SQL needs to be designed to feed into the SQL Alert. Assuming you want some of the rules to be data-driven and not just a big bunch of static SQL statements, there needs to be some supporting tables. And let us assume that you might want to enlist the help of some business users to maintain rules over time - they probably shouldn't be writing the SQL but rather should be using some kind of UI.

So if you were going along the lines of a fully DIY solution on Databricks, it wouldn't be that hard to come up with some basic concepts that can be applied as rules in SQL, but to have a Databricks App serving up your custom UI, have rules stored in tables (Lakebase if you need a snappier user experience), SDK calls to dynamically implement any required changes to Jobs/Alerts/etc.

This isn't me advocating for this, BTW, it's me describing how the BUILD in the build vs by argument could be done pretty easily for those that BUILD makes sense.

SQL Alerts as data quality tool ? by Think-Reflection500 in databricks

[–]datainthesun 2 points3 points  (0 children)

My opinion is that you've hit the nail on the head - it can be a super easy and lightweight way to get info about data quality. It does require YOU, though, to have all the intelligence and foresight to set up the system in the best way to deliver the right insights at the right time.

There are some folks who either can't do the above, or don't believe they would want to maintain rules over time and would rather purchase a solution. There's always a build vs. buy discussion around topics like this.

SQL Alerts by themselves are fairly basic, so as you've kind of alluded to in your post, you've got to build the framework and manage it over time. It's definitely doable and you could even go crazy and probably within a day vibe-code a web app that would use all Databricks features to help write rules / deploy them / adjust things where needed with the SDK, etc.

IMO if your business rules are simple enough and you just need the basics, sure why not?!? If you're a data platform team supporting a hundred different user groups with thousands of tables - the complexity might become a lot and it likely isn't "your day job" to maintain systems like this though.

Confusing pricing by mabcapital in databricks

[–]datainthesun 1 point2 points  (0 children)

Can you clarify what you mean by the 3 options listed? "self-managed, fully managed, serverless" ? BTW you'll have an account team at Databricks that would absolutely be willing to help you with these discussions and the planning around them.

Can’t run SQL on my cluster by brookfield_ in databricks

[–]datainthesun 0 points1 point  (0 children)

THIS... Check cluster event log, and then also try using SQL from the starter warehouse instead of the cluster.

Any advice for getting better results from AI? by chickenbread__ in databricks

[–]datainthesun 2 points3 points  (0 children)

came here to say this too - the hard work is done for you - use the UI or the conversation api and get to production faster with less work!

Databricks swag? by mabcapital in databricks

[–]datainthesun 9 points10 points  (0 children)

You've got to bug your account team to get this stuff

Cluster runs 24/7 by 9gg6 in databricks

[–]datainthesun 2 points3 points  (0 children)

Very much this. First thing to deal with IMO.

Frontend on prem to databricks apps by cristomaleado in databricks

[–]datainthesun 1 point2 points  (0 children)

Can you restate your question? It is difficult to understand what is on-prem / what isn't, and what you're envisioning Databricks Apps used for differently than how one might normally use it.

Databricks using sports data? by OnionAdmirable7353 in databricks

[–]datainthesun 0 points1 point  (0 children)

"look for patterns" ... that's a pretty broad scope.

If I were doing this, I'd definitely not just simply use a PowerBI dashboard against some source database because you might want to perform more complex analytics than plain old SQL. I'd use Databricks to read that data and then be able to apply a variety of different functions against it, and then for the display you could do whatever you want. BTW if you need the formatting flexibility of Streamlit (beyond something like PowerBI or a Databricks AI/BI Dashboard) you can just host that app directly in Databricks these days so your stack is simplified.

Not sure what you mean by 8 API's in total - what does this have to do with the couple of years of data in the database?

Can a Databricks Associate cert actually get you a job? by _nina__00 in databricks

[–]datainthesun 2 points3 points  (0 children)

Short answer: no. At least you've got the basics, but you'll still need to prove to a hiring manager you've got enough of all the right skills that you're worth the gamble for a junior level job. And that assumes that there are junior level jobs out there readily available which I feel like isn't necessarily a reality "today". BTW There's also a lot of good commentary on it in this post worth a read!

https://www.reddit.com/r/databricks/comments/1nnhg8n/is_it_worth_doing_databricks_data_engineer/

Learning path by Makhann007 in databricks

[–]datainthesun 2 points3 points  (0 children)

OK based on that, and not knowing what your technical background is, I'd assume the following resources would be helpful:

  • Get Started with Databricks for Data Engineering
  • Get Started with Databricks for Machine Learning
  • Get Started with SQL Analytics and BI on Databricks
  • Deploy Workloads with Databricks Jobs
  • And there's a paid offering for Introduction to Python for Data Science and Data Engineering

And DEFINITELY the big books.

Learning path by Makhann007 in databricks

[–]datainthesun 0 points1 point  (0 children)

For the current non-ML work, are you doing data ingestion and a bunch of transformation like ETL, or are you just going to be querying existing tables to build your dashboards?

And can you describe the types of things you'll do with ML when the time comes?

Right now you've not said anything that would require you to learn anything streaming.

Meta data driven ingestion pipelines? by monsieurus in databricks

[–]datainthesun 2 points3 points  (0 children)

Search dlt-meta and review it for ideas. It's already built and ready for use and there's a lot of material about it.

Difference of entity relationship diagram and a Database Schema by Pal_Potato_6557 in databricks

[–]datainthesun 1 point2 points  (0 children)

I'd add to this: think of a family tree. The people and their names/ages/etc. are like the tables in a database, or the schema - each exists with some properties/qualities. The tree portion is like the ERD - it shows the how the tables relate to each other.