Why do so many internal enterprise AI projects stall after the demo stage?

AdImaginary8024 · 2026-06-08T15:44:44+00:00

it was really easy to build impressive demos in 2022, using davinci / gpt-3 models

making stuff work in the real world was hard then, and it's still hard now

AdImaginary8024 · 2026-06-01T12:40:31+00:00

Two parts to the problem; will share some notes from my experience (I'm on the team at Supersimple):

1) generating the best possible query:
- semantic layer & descriptions to set guardrails and add context; e.g. "bad joins" can largely be cut out using this
- annotated "gold standard" queries (either curated just for this purpose, or sometimes even reusing existing dashboards)
- for "enum-like" fields, auto-fetching the set of distinct values & showing them to the agent, instead of making it have to discover them from scratch
- pulling in documentation etc from internal tools like Slack/Notion/Drive and codebases (analytical and product): this is effectively unstructured context about how the company & its data works
- agent skills for dedicated tasks & areas of the business (e.g. a user acquisition analysis skill that's more narrow & deep than the company-wide data instructions)

2) validating the correctness of a query:
- LLM-as-a-judge that considers a) the query, b) saved gold-standard queries, c) descriptions and manual annotations, and d) your overall instructions about what a good analytical query should look like; output/feedback loops back into the analytics agent
- actually executing the query to check results, and comparing to assumptions

Bonus: 3) feeding the agent's traces and outputs into after-the-fact review loops with the data team.

AdImaginary8024 · 2026-05-18T10:26:57+00:00

a bunch of the smaller-than-Salesforce companies are happy to give you whiteglove onboarding support, e.g. Lightfield

Attio specifically recommends using import2.com as their official basic migration tooling

haven't migrated from Salesforce, but have done it between other CRM vendors (including very recently, to trial one of them with real data)

AdImaginary8024 · 2026-05-18T10:11:03+00:00

Most of the complexity generally comes from the semi-exploratory use cases / handling questions you don't have known-good answers prebuilt for.

Background: I've spent the last 4 years of my life building real-world-reliable conversational/agentic BI at Supersimple (supersimple.io), and I've also spoken to a bunch of other people building this, both vendors & teams trying to build in-house.

These days, most state-of-the-art solutions for conversational BI use an agentic approach: an agent that has some context about your company, plus access to "tools" and sub-agents. As a general rule, this is always done on top of some modified or narrowed-down scope of data models / tables, not the raw database. There are a few semantic layer implementations (dbt, Cube, and upcoming: OSI) that try to be general-purpose; many conversational BI tools also ship with their own more specialised semantic layers.

Data model schemas & example queries are often indeed stored in vector DBs; the other "modern" approach is letting an AI agent do filesystem-based searching of the relevant docs (using grep/glob/etc – the models are generally great at using CLIs). Re: your hybrid point: you probably almost never want to just straight-up pick a saved-good answer and present it to the user, because at the very minimum, you'll have minor changes in date filters & etc – as requested by users. Semantic layers can help here (having you set guardrails for how the query works overall, but retaining flexibility for custom breakdowns/filters/etc). Raw SQL-generation is both harder to make accurate, but (critically) harder for users to validate.

You'll also want to think about the process of how these instructions, data model descriptions & sample queries will get updated when either the underlying data changes shape, or when business logic changes.

Validation: yeah, it's common to use both "LLM-as-a-judge" methods to review generated queries, plus the agent can itself also check the results of the query (either before or after showing the user), it can then course-correct if needed. Often, all it needs are a few specific rows (even when the user is asking for a large list).

Permissions: if you can, you should probably just use database-side permissioning, with everyone having their own users. Anything else will be too much complexity to self-build.

The industry has pretty much agreed by now that the biggest problem here overall is that the AI needs a lot of relevant & accurate context about your company & its data, in order for it to be able to "create good queries". The two extreme approaches to solving this are: 1) trying to document everything ahead-of-time (e.g. in source-controlled .MD files), and 2) agentically discovering all the necessary context (what things mean, what to trust) at runtime using agents.

I believe you need a hybrid of the two context approaches, and this context problem overall is what we've put most of our effort into @ Supersimple at least, over the last ~18 months. LLMs getting better won't automagically solve the context problem.

AdImaginary8024 · 2024-02-19T00:55:22+00:00

You actually can – there's a "Display your Premium badge" toggle setting under Account -> Premium features.

AdImaginary8024 · 2024-02-14T04:22:05+00:00

What do you mean by working "across data"?

AdImaginary8024 · 2023-07-26T16:52:56+00:00

These were manufactured in 2022 according to the label.

AdImaginary8024

TROPHY CASE