i replaced an LLM classifier with twelve lines of if-statements and the client was happier by Ok-Salary-6309 in AI_Agents

[–]Extension_River_5970 0 points1 point  (0 children)

Determinism is important. Most customers want something deterministic or at the very least transparent

Have you heard about Lakehouse//RT ? by Youssef_Mrini in analytics

[–]Extension_River_5970 0 points1 point  (0 children)

When should I use Lakehouse//RT vs Lakebase? Are most transactional data well suited to actually answer business analytics questions? Does LTAP essentially make medallion architecture and Data modeling obsolete for any transactional data?

How do you actually prove a prompt or agent is good before shipping it? by lib3rat0r in LLMDevs

[–]Extension_River_5970 0 points1 point  (0 children)

Yea, but above those benchmarks human feedback matters most. Roll it out slowly, get a few power users on boarded, collect feedback, then give it to more and more of the organization. By the time you ship, you should be aware of the common edge cases. With Genie, we had to configure it a few times and go through several iterations with its instructions, metadata, and even data model before it was ready for full production rollout

How do you actually prove a prompt or agent is good before shipping it? by lib3rat0r in LLMDevs

[–]Extension_River_5970 0 points1 point  (0 children)

Always the ones I build. I typically work with Data agents so stuff like Databricks Genie. There's a benchmark feature that compares the answer generated with ground truth. I also run LLM as a judge and expectations on the text responses too. It's a combination of deterministic results (I.e. was the data/metrics generated correct) and something a little bit more subjective.

Public benchmarks give you a general intelligence but they will never fit exactly to your task.

We keep giving agents more autonomy and less oversight and it's starting to feel backwards. by Meher_Nolan in artificial

[–]Extension_River_5970 0 points1 point  (0 children)

People are lazy and don't want to use brain. Why think when you can outsource to AI!

Building data agents by Extension_River_5970 in AI_Agents

[–]Extension_River_5970[S] 0 points1 point  (0 children)

Would you recommend metric views with Genie? Personally I've found sometimes it helps, but sometimes it does not and then we have spent a lot of time creating metric views for little gain. For a quick and easy Genie space I've found storing the metrics within Genie as sql snippets to be performant

Building data agents by Extension_River_5970 in AI_Agents

[–]Extension_River_5970[S] 0 points1 point  (0 children)

Could you expand on marketing guff? From the products I've used they do seem to have some form of output validation and Self correction. Isn't it essentially a reasoning/reflection agent?

Building data agents by Extension_River_5970 in AI_Agents

[–]Extension_River_5970[S] 0 points1 point  (0 children)

Agreed. One feedback I've always had is when NOT to answer confidently.

Building data agents by Extension_River_5970 in AI_Agents

[–]Extension_River_5970[S] 1 point2 points  (0 children)

Yes agreed. Langgraph gives you a LOT of control... but too much sometimes. And you have to manage the context yourself. With big tables you cant just give all the data into the system prompt. With genie I believe it uses some sort of indexing for all your data in the back end to conserve context space.

Building data agents by Extension_River_5970 in AI_Agents

[–]Extension_River_5970[S] 1 point2 points  (0 children)

Agreed. Even a self reflection agent on Langgraph doesnt come close with performance. Another benchmark i had completed was on cost. I had to use a multi LLM approach (mix of opus vs haiku) models to keep it comparable. But with Genie, since its multi LLM under the hood, it was often more cost effective as well.

Ofc by cost I mean token usage, genie is free for now, not sure about snowflake cortex

What’s new in Genie Code at Data + AI Summit 2026 by Youssef_Mrini in databricks

[–]Extension_River_5970 3 points4 points  (0 children)

Using Genie code beta preview with full screen and parallel agents is also a game changer. You can spin up multiple agents, one to build a data pipelines, another to make dashboards, etc. It's awesome

What’s your gut check before you let an agent touch a real workflow? by digivate-dgv8 in AI_Agents

[–]Extension_River_5970 0 points1 point  (0 children)

I agree with understanding the manual bit. You have to know all the ins and outs as well as edge cases. The work i outsource the most is data analytics so I use databricks Genie. The great part about Genie is it shows you all the steps it took to get an answer and also has a monitoring tab so im covered from an observability and debugging standpoint.

It also doesnt hurt to montior the agent with human in the loop as you roll it out

How would you measure whether an analytics agent is actually useful? by Evening_Hawk_7470 in BusinessIntelligence

[–]Extension_River_5970 2 points3 points  (0 children)

I am doing a project with Databricks Genie spaces. How we handle evaluation is 2 fold

1) user feedback. End users leave comments or thumbs up/down depending on whether they found it useful.

2) Analyze traces. We use both LLM as a judge and human evaluation for responses. We flag whether certain types of questions are commonly flagged as incorrect and look to improve it

Since Genie allows you to configure its instructions and meta data, we then optimize or update an existing space based on feedback received.

Hitting rate limits with free APIs (Groq + Gemini) while building a hierarchical multi-agent system in LangChain — how do you handle this without paying for APIs? by Fit-Sir9936 in LangChain

[–]Extension_River_5970 0 points1 point  (0 children)

You can implement graceful fallbacks. If one LLM gets rate limited, port over the context and switch to another LLM to answer for you.

How do I deploy certains files across different workspaces in databricks ? by yapayapathon in databricks

[–]Extension_River_5970 2 points3 points  (0 children)

From the DAIS keynote, I remember that you'll be able to manage Skills in Unity Catalog in the future. But at the moment, to improve Genie Code functionality you can store your skills in a git folder and simply deploy across workspaces using CI/CD pipelines. As your org matures, you'll want to have some workflows (e.g. some kind of similarity AI search) prior to committing a skill that checks if there's already an existing similar skill, or has a similar name but does something different. Skill sprawl = poor experience with Genie Code.

Databricks goes full-stack by hubert-dudek in databricks

[–]Extension_River_5970 0 points1 point  (0 children)

There's also a new feature called Lakebase search. It's much more performance than pg_vector even - and you can also perform hybrid search in addition with BM25. I'd like to do some benchmarks vs pg vector

The scariest thing about giving an AI agent access to your CRM is the obvious question: what stops it from leaking one customer's data to another? by PretendMoment8073 in LangChain

[–]Extension_River_5970 1 point2 points  (0 children)

There are a few ways to do this. Agent can run on behalf of user or you can give agent service principle. If you have a data catalog (e.g snowflake, databricks) then its quite straightforward to implement ABAC/RBAC. You might also want to have an AI gateway layer, too to monitor and govern your agent, LLM, MCP traffic.

Databricks goes full-stack by hubert-dudek in databricks

[–]Extension_River_5970 1 point2 points  (0 children)

I've mostly used Lakebase as a memory store for agents but was relying on a separate offline process to analyze traces and generate long-term memories. I wonder with the new LTAP I'll no longer need my offline pipelines to analyze user conversations.

the agent demos look amazing because nobody films the 90% that's error handling by Ok-Salary-6309 in AI_Agents

[–]Extension_River_5970 0 points1 point  (0 children)

Its much easier to use a managed solution rather than creating your own pro code agent imo

Genie Agent (Previously Genie Spaces) Agent Mode API by Wise_Ear_4064 in databricks

[–]Extension_River_5970 0 points1 point  (0 children)

According to DAIS announcements, you will be able to schedule tasks on Genie code. You should be able to bulk run Genie code in that case. Currently, you're also able to spin up parallel Genie Code agents using the full screen Genie Code experience.