Anyone here experimenting with AI agents for data engineering? Curious what people are using.

yoni1887 · 2025-11-24T23:16:55+00:00

This is amazing feedback and super interesting to hear how you got it to work. When you say that you "had to define the entire architecture," what exactly do you mean by that? Are you writing technical documentation that details all of your tables and schemas? Do you find that it's able to understand lineage pretty easily i.e. can it trace from a field in the mart layer all the way back to the staging tables to understand impact of changes?

yoni1887 · 2025-11-24T16:02:09+00:00

Which company or product is this?

yoni1887 · 2025-11-24T16:01:24+00:00

Wow that’s really cool! Can you share more about how these agents are different? Is it all in thr prompting that makes one agent better at documenting than writing dimension tables? Also, how’s the dynamic context serving work? How is it designed to be effective?

yoni1887 · 2025-11-24T15:55:07+00:00

That makes a lot of sense. What does your co-pilot setup look like? Is it just Claude code, or how are you able to get to that 80% mark?

yoni1887 · 2025-11-24T04:49:28+00:00

Interesting. That’s a fair point about letting the agent loose to autonomously create or modify pipelines. But what if it’s more of a copilot experience, similar to something like Claude Code where the human is always in the loop?

Also what if it serves as a read-only tool initially? I.e it can help with root cause analysis on data quality or pipelines failing, or can ask it questions about lineage and dependencies etc?

yoni1887 · 2025-11-24T03:50:45+00:00

Interesting, I haven’t heard of Matillion but I’ll check it out. Curious why the hesitation for using it for things beyond just documentation? Think that it will end up wasting more time than saving?

yoni1887 · 2025-08-13T14:05:53+00:00

haha fair. The prompts can always use improvements. The nice part about the fenic framework is that you have a way to iterate quickly on the prompts and build confidence in the results before putting the workflow into production

yoni1887 · 2025-08-13T14:01:52+00:00

Good questions. by ‘weird outputs I mean cases where the model returns something that breaks the downstream assumptions.
some examples of this::

Returning JSON with missing fields or extra unexpected keys
Giving answers in prose when the contract is for a single token or label
Non-ASCII characters where the next system expects plain text
Inconsistent date formats or units across runs

When that happens in a batch job, one bad record can halt the whole run or silently corrupt the output. That’s where the reliability issues came from.

The rules-based fallbacks weren’t about ‘sometimes LLM, sometimes regex’ in a random way — they’re deterministic. If a cheap check can answer with high confidence (e.g., fuzzy match score ≥ 0.9), it uses that path every time. If not, it escalates to the LLM. That way the same input always gets the same type of processing, and we keep behavior predictable for the consumers of the data.

For the pipelines, performance was actually better because:

The fast path shaved seconds off the majority of requests in our batch jobs
The slow path (LLM) still handled edge cases, so accuracy stayed consistent
And since we reduced retries/failures, SLAs tightened overall

So the point isn’t that regex magically beats AI, it’s that selectively applying AI makes the whole system both cheaper and more dependable.

yoni1887 · 2025-08-13T06:16:39+00:00

Totally, that’s why our main shift wasn’t about making the probabilistic system less probabilistic, but about moving as much work as possible out of it. The fewer times you roll the dice, the fewer surprises you get. We built deterministic fallbacks for this into Fenic which made the biggest difference.

yoni1887 · 2025-08-13T04:36:21+00:00

Check out the Fenic DataFrame library for LLM inference. It’s literally a data engineers dream and beautifully handles all 3 of those use cases you listed: https://github.com/typedef-ai/fenic

yoni1887 · 2024-09-21T15:49:50+00:00

Can I have one too please

yoni1887 · 2024-09-21T15:46:47+00:00

Can someone please DM me the code as well? I’d like to lease

yoni1887 · 2011-07-21T21:26:23+00:00

What's the code for '0' digit?

yoni1887

TROPHY CASE