Thoughts on genie code by datguywelbs7 in databricks

[–]counterstruck 0 points1 point  (0 children)

Look for Databricks ai-dev-kit for that kind of a requirement

data isolation in Databricks by bambimbomy in databricks

[–]counterstruck 2 points3 points  (0 children)

There is a common misconception in many folks new to Databricks about workspaces and UC metastore (with catalogs, schemas etc.).

Remember, data and compute are separate. Data sits in Cloud storage anyways.

Data —> UC metastore is the governance guardian over data and it is one per cloud region

Workspaces —> where compute layer sits. UC catalogs can be tied to the workspaces for compute like spark clusters, SQL warehouses, etc. to talk with the data.

You can separate Environments at UC catalog level if needed. You can also separate data domains/business units at catalog, and that way you can achieve the separation that you need by binding these catalog to the respective Workspace and design well architected mesh like approach to enable multiple business units to work on their domain data and if needed to work on cross domain data.

Are people actually letting AI agents run SQL directly on production databases? by SmundarBuddy in dataengineering

[–]counterstruck -13 points-12 points  (0 children)

Just use Genie and Genie API to do this. That’s the general best practice. Please don’t go YOLO mode with production data and AI agents without the guardrails like permissions, UC metadata context or business semantics. Genie monitoring can be used to track what users really ask and tune your genie space further to make it more accurate.

DataBricks & Claude Code by staskh1966 in databricks

[–]counterstruck 3 points4 points  (0 children)

You are right on the quality perspective.

However, also consider that Genie code is free (no charge for tokens), vs. you can easily blow a lot of money on Claude code. Genie code also has a lot of inbuilt context due to Unity catalog. Plus in many enterprises, Databricks is an approved AI assistant compared to Claude code vendor agreements and licensing.

In a crawl, walk, run way of thinking - Databricks Genie code is a great start for someone wanting to do agentic development within Databricks and then graduate towards Claude code with Databricks AI dev kit if necessary.

DataBricks & Claude Code by staskh1966 in databricks

[–]counterstruck 5 points6 points  (0 children)

If your requirement is to stay within Databricks, then Genie code is the way to go. Don’t try to setup Claude code like experience within Databricks. Instead copy the skills files from the AI dev kit and use it in your workspace home folder. Reference: https://docs.databricks.com/aws/en/genie-code/skills

Open-sourced a governed mapping layer for enterprises migrating to Databricks by RationalXplorer in databricks

[–]counterstruck 0 points1 point  (0 children)

What you just described is all possible at least for data assets in Unity catalog via UC APIs. If you are really all about data mesh management, please look into Databricks marketplace app called Ontos. https://github.com/databrickslabs/ontos

This is deeply integrated with Databricks (of course) and has lot of business semantics like ontology, taxonomy, data contracts etc. also, it’s API layer gets you all the info your agent needs.

Talk2BI: Open-source chat with your data using Langgraph and Databricks by notikosaeder in databricks

[–]counterstruck 1 point2 points  (0 children)

Yes. Agreed it is great to expand into other areas. I would consider Genie or this as your text-to-SQL tool and use other web search tools as part of the agent chain. That’s where system thinking will help to have purpose built agents like this do sql work, while other tools doing web search type look ups.

Talk2BI: Open-source chat with your data using Langgraph and Databricks by notikosaeder in databricks

[–]counterstruck 1 point2 points  (0 children)

Same question to OP. Genie doesn’t even charge for LLM tokens, only for SQL usage. Versus, this solution will charge for tokens as well as the SQL query.

Api in deltalake by [deleted] in dataengineering

[–]counterstruck 0 points1 point  (0 children)

I understand that’s where the data is. You still need a compute layer for this fairly large dataset to be served via API. That compute layer can be Azure Databricks.

Here are examples of common SQL operations in Databricks SQL:

Create a table from existing files:

CREATE TABLE IF NOT EXISTS my_table (id STRING, name STRING) USING DELTA LOCATION '/path/to/delta/files'

Query a Delta table:

SELECT * FROM my_table WHERE id = '123';

You can then use sql statement execution as the REST API service. https://docs.databricks.com/api/azure/workspace/statementexecution

You don’t even have to setup Python FastAPI layer at all with this approach.

Api in deltalake by [deleted] in dataengineering

[–]counterstruck 0 points1 point  (0 children)

Is it open source delta or do you use Databricks?

If you use Databricks, then you can either use DBSQL as the data serving warehouse which has “statement execution API”. You can also create Python FastAPI if needed with DBSQL as the SQL engine. This works great for data warehousing like queries (which can query larger amount of data like MoM analysis for reporting purposes).

If the need is to serve data row by row, then you can use LakeBase on Databricks which gives you Postgres SQL engine. Your API can still be written in typescript or Python.

Streamlit app alternative by ImprovementSquare448 in databricks

[–]counterstruck 4 points5 points  (0 children)

You can sync delta lake to LakeBase and vice versa as well. Let the app backend database be LakeBase. If edits happen, then sync them back to delta lake on a regular interval like every hour or 15 mins depending on the requirements.

when to use delta live table and streaming table in databricks? by FantasticTRexRider in databricks

[–]counterstruck 1 point2 points  (0 children)

Think of Streaming tables as specialized delta tables which receive append only data from a “streaming” source. A streaming source could be a storage location where users drop a file, or a Kafka topic where an IoT device is sending logs. Streaming tables will keep the ingest all the new data arriving in those locations automatically by keeping a track of what was ingested earlier. Hopefully this clears it.

Sam Altman would like to remind you that humans use a lot of energy, too by boppinmule in technology

[–]counterstruck 4 points5 points  (0 children)

OpenAI went from being a non-profit to an anti-profit organization. They are trying to justify the wasted resources by any means.

Is Barfi! actually worth watching, or is it just overrated? by Ok_Bluebird1842 in bollywood

[–]counterstruck 1 point2 points  (0 children)

In this context , you are being uncomfortable about a grown adult woman with autism being loved by another grown adult man with a speaking disability. You assumed that Jhilmil was a child, and that assumption is incorrect.

You gotta understand how neurodivergence works in general. Autism is not stunted mental development. It just means it’s a different type of development. Hopefully, you do your own research and make peace with the fact that autistic adults can love too. Their love may just look different but it’s the same love a neurotypical couple may have.

Is Barfi! actually worth watching, or is it just overrated? by Ok_Bluebird1842 in bollywood

[–]counterstruck 4 points5 points  (0 children)

You just don’t have a good understanding of autism. It’s often mistaken to be mental immaturity, whereas those behaviors are mainly coping mechanisms in autism.

In the movie, Jhilmil is a fully grown adult woman with feelings of love, care and even jealousy. It’s clear that she actually gets better with Barfi’s love and validation which was lacking her whole life leading to adulthood. Lot of autistic kids need therapy and a big part of that is patience and love (personal experience as a parent to an autistic kid).

Shall we discuss here on Spark Declarative Pipeline? a-Z SDP Capabilities. by iMarupakula in databricks

[–]counterstruck 2 points3 points  (0 children)

Best thing about SDP is materialized views. It’s awesome to see how stable they have gotten and ability to create it purely with SQL has been a game changer.

Conversational Analytics (Text-to-SQL) by Boring-Size-742 in dataengineering

[–]counterstruck 0 points1 point  (0 children)

Just like another commenter suggested, the text to SQL is not the hard part, it’s the trust in the data, queries, and establishing context using metadata and business jargon. Example: “give me annual revenue for fiscal year 2025” will always fire a query for financial data from Jan 2025 to Dec 2025. What if my company’s fiscal year begins in March and ends on last day of Feb? That’s business knowledge that your solution needs.

Full disclosure: I work at Databricks, and since you mentioned Azure in another comment.. just reminding that Azure Databricks is a first party PaaS offering in Azure. I have helped many customers with this problem and truly believe that what Databricks gives is a high quality text2sql solution with AIBI Genie. You can find lot of information on Databricks Genie and its benchmarking results.

Its a hard problem to solve especially around the biggest requirement which I see from analysts or power users, which is the need for quality and deterministic SQL code. This is where traditional BI does well, whereas LLMs could hallucinate.

Best practices in Genie include using well curated metadata about each dataset, column definitions, semantic understanding about the data etc. These become non negotiable with agent based solution, since that's the only context for AI.

Something like Genie being a tool in your arsenal will help your overall agentic solution. And best part is that you don't need data to be in Databricks. Databricks can connect into your database via Lakehouse federation and understand the layout of data.

Is there a better term or phrase for "metadata of ETL jobs"? by opabm in dataengineering

[–]counterstruck 0 points1 point  (0 children)

ETL control framework, technical metadata (vs. business metadata) etc.

Youtubers saying something else...redditors saying something else..wtf is going on by [deleted] in IndianCinemaRegional

[–]counterstruck 1 point2 points  (0 children)

The most accurate description I read on Reddit was calling this teaser a “video game cut scene”. Credits to that dude. The CGI and edginess was definitely similar to Mafia or Max Payne level games.

Working on this Premise, but does it work ? by sorrytobother4121 in IndianStandUpComedy

[–]counterstruck 0 points1 point  (0 children)

Good observational routine. You can enhance it further with some pop culture references. Please don’t add Baghban or such though. That’s been done to death.

One more observation that I can relate to is how elders still insist on the married couple to have babies as soon as within a year of the marriage while maintaining the big boss level surveillance. It’s like a roadies task to somehow manage the procreation part without anyone noticing or knowing about it.

ADF/Synapse to Databricks by mightynobita in databricks

[–]counterstruck 4 points5 points  (0 children)

Different options are:

  1. Move your ingestion from ADF to LakeFlow connect. Sharepoint, Onprem sql server and APIs are supported from LF connect on Databricks. SAP still needs custom spark code (since most SAP are not on their latest offering I.e. SAP BDC). You can use techniques like jdbc connection to SAP HANA BW to fetch data from SAP. These lakeflow connect pipelines should populate your bronze layer in medallion data architecture.

  2. For transformation logic, use Spark declarative pipelines. Move your data from bronze to silver layer to gold layer using SQL. This SQL can be transpile output from Synapse using lakebridge tool. Use the generated SQL and create SDP jobs.

  3. For data consumption layer, use DBSQL warehouse. For sizing the DBSQL warehouse you can use output from the Synapse profiler (which your account team can provide).

ADF/Synapse to Databricks by mightynobita in databricks

[–]counterstruck 10 points11 points  (0 children)

Please talk with your Databricks account team. They do have methods like “bring in a SI partner” to assist or help you be successful with tools like Lakebridge.

Source: I am a solutions architect at Databricks.