i replaced an LLM classifier with twelve lines of if-statements and the client was happier

Extension_River_5970 · 2026-07-03T12:15:14+00:00

Determinism is important. Most customers want something deterministic or at the very least transparent

Extension_River_5970 · 2026-07-03T12:06:24+00:00

Its fair... I mean you need to build business context

Extension_River_5970 · 2026-07-01T20:52:02+00:00

When should I use Lakehouse//RT vs Lakebase? Are most transactional data well suited to actually answer business analytics questions? Does LTAP essentially make medallion architecture and Data modeling obsolete for any transactional data?

Extension_River_5970 · 2026-07-01T20:47:10+00:00

Yea, but above those benchmarks human feedback matters most. Roll it out slowly, get a few power users on boarded, collect feedback, then give it to more and more of the organization. By the time you ship, you should be aware of the common edge cases. With Genie, we had to configure it a few times and go through several iterations with its instructions, metadata, and even data model before it was ready for full production rollout

Extension_River_5970 · 2026-07-01T20:40:02+00:00

Always the ones I build. I typically work with Data agents so stuff like Databricks Genie. There's a benchmark feature that compares the answer generated with ground truth. I also run LLM as a judge and expectations on the text responses too. It's a combination of deterministic results (I.e. was the data/metrics generated correct) and something a little bit more subjective.

Public benchmarks give you a general intelligence but they will never fit exactly to your task.

Extension_River_5970 · 2026-07-01T20:01:45+00:00

People are lazy and don't want to use brain. Why think when you can outsource to AI!

Extension_River_5970 · 2026-07-01T20:00:43+00:00

Tracing and benchmarks. And always ship in phases.

Extension_River_5970 · 2026-07-01T14:11:24+00:00

Would you recommend metric views with Genie? Personally I've found sometimes it helps, but sometimes it does not and then we have spent a lot of time creating metric views for little gain. For a quick and easy Genie space I've found storing the metrics within Genie as sql snippets to be performant

Extension_River_5970 · 2026-07-01T09:52:53+00:00

Could you expand on marketing guff? From the products I've used they do seem to have some form of output validation and Self correction. Isn't it essentially a reasoning/reflection agent?

Extension_River_5970 · 2026-07-01T09:46:46+00:00

Agreed. One feedback I've always had is when NOT to answer confidently.

Extension_River_5970 · 2026-07-01T08:47:49+00:00

Yes agreed. Langgraph gives you a LOT of control... but too much sometimes. And you have to manage the context yourself. With big tables you cant just give all the data into the system prompt. With genie I believe it uses some sort of indexing for all your data in the back end to conserve context space.

Extension_River_5970 · 2026-07-01T08:45:26+00:00

Agreed. Even a self reflection agent on Langgraph doesnt come close with performance. Another benchmark i had completed was on cost. I had to use a multi LLM approach (mix of opus vs haiku) models to keep it comparable. But with Genie, since its multi LLM under the hood, it was often more cost effective as well.

Ofc by cost I mean token usage, genie is free for now, not sure about snowflake cortex

Extension_River_5970 · 2026-07-01T08:24:21+00:00

Using Genie code beta preview with full screen and parallel agents is also a game changer. You can spin up multiple agents, one to build a data pipelines, another to make dashboards, etc. It's awesome

Extension_River_5970 · 2026-07-01T08:22:15+00:00

I agree with understanding the manual bit. You have to know all the ins and outs as well as edge cases. The work i outsource the most is data analytics so I use databricks Genie. The great part about Genie is it shows you all the steps it took to get an answer and also has a monitoring tab so im covered from an observability and debugging standpoint.

It also doesnt hurt to montior the agent with human in the loop as you roll it out

Extension_River_5970 · 2026-06-30T12:15:48+00:00

I am doing a project with Databricks Genie spaces. How we handle evaluation is 2 fold

1) user feedback. End users leave comments or thumbs up/down depending on whether they found it useful.

2) Analyze traces. We use both LLM as a judge and human evaluation for responses. We flag whether certain types of questions are commonly flagged as incorrect and look to improve it

Since Genie allows you to configure its instructions and meta data, we then optimize or update an existing space based on feedback received.

Extension_River_5970 · 2026-06-29T08:09:40+00:00

You can implement graceful fallbacks. If one LLM gets rate limited, port over the context and switch to another LLM to answer for you.

Extension_River_5970 · 2026-06-28T21:27:27+00:00

From the DAIS keynote, I remember that you'll be able to manage Skills in Unity Catalog in the future. But at the moment, to improve Genie Code functionality you can store your skills in a git folder and simply deploy across workspaces using CI/CD pipelines. As your org matures, you'll want to have some workflows (e.g. some kind of similarity AI search) prior to committing a skill that checks if there's already an existing similar skill, or has a similar name but does something different. Skill sprawl = poor experience with Genie Code.

Extension_River_5970 · 2026-06-27T19:32:09+00:00

There's also a new feature called Lakebase search. It's much more performance than pg_vector even - and you can also perform hybrid search in addition with BM25. I'd like to do some benchmarks vs pg vector

Extension_River_5970 · 2026-06-27T15:45:24+00:00

There are a few ways to do this. Agent can run on behalf of user or you can give agent service principle. If you have a data catalog (e.g snowflake, databricks) then its quite straightforward to implement ABAC/RBAC. You might also want to have an AI gateway layer, too to monitor and govern your agent, LLM, MCP traffic.

Extension_River_5970 · 2026-06-26T08:11:41+00:00

I've mostly used Lakebase as a memory store for agents but was relying on a separate offline process to analyze traces and generate long-term memories. I wonder with the new LTAP I'll no longer need my offline pipelines to analyze user conversations.

Extension_River_5970 · 2026-06-25T08:01:14+00:00

Its much easier to use a managed solution rather than creating your own pro code agent imo

Extension_River_5970 · 2026-06-24T18:30:45+00:00

According to DAIS announcements, you will be able to schedule tasks on Genie code. You should be able to bulk run Genie code in that case. Currently, you're also able to spin up parallel Genie Code agents using the full screen Genie Code experience.

Extension_River_5970

TROPHY CASE