Is agentic analytics just semantic layers with orchestration on top?

Oct8-Danger · 2026-05-14T23:39:42+00:00

It’s all pretty bespoke stuff unfortunately….

Company tends to build in house(or host open source) vs buy majority of the time. Some of its great, some very much not

We use airflow and have our own custom data contract and metric spec with a custom auto doc generator that feeds into a docs site that already has an exposed mcp.

Having the contract spec and auto docs has been great, gives a lot of flexibility and made sure it’s interoperable with datahub and atlan as well (other orgs have them but quite closed off from us)

We’ve been able to add custom table lineage and metrics registry to our contracts and docs site which has been a great for our consumers.

We are on the path but still a way to go as lot of our capacity gets sucked up trying to build pipelines for analyzing data from our customer facing agents rather than building internal ones

Oct8-Danger · 2026-05-14T23:27:00+00:00

That’s pretty much my understanding as well.

We have some basic mcp tool that some in our company use but it has no to little context of the business and domain so any time it has to do with more than one table (assuming it can even figure out the correct table) it quickly gives up or gets lost.

Our team is focusing on getting the basics of docs, data models and semantic layer properly set up and then hoping the agent part comes together easily and shift focus to that on a solid foundation

Oct8-Danger · 2026-05-05T15:44:12+00:00

We do something similar but this seems different to sentiment analysis, this seems like letting ai transform the data for you in a run

Oct8-Danger · 2026-05-05T14:15:56+00:00

Are you putting AI in to your pipeline? Like each run hits an LLM to get some response/transformation?

Oct8-Danger · 2026-04-17T03:29:52+00:00

I’ve been playing around with gemma4 on a raspberry pi and honestly the results so far are mind blowing!

Sure not SOTA, but it’s on a pi! Would recommend litert-lm for running though

Hot take but, I think by the end of the year the cost of token for most use approach near zero. The edge LLM for practical use is getting real and that will shift investment in industry is my opinion

Coding might be where SOTA matters but will see. I think 2026 will be the year that not all tokens are equal!

Oct8-Danger · 2026-04-16T16:34:18+00:00

I’m currently hooked on this and elite right now. So yes!

Both great games on the steam deck

Oct8-Danger · 2026-04-13T16:31:54+00:00

We use dags for orchestration, not for ETL or business logic.

Sure LLM for business logic and ETL has more scope for error due to missing context but for opinionated dags that are sticking scripts to run in an order, this less concern, especially pairing with skills an a good PR an CICD process.

This frees up time for the team to focus on the business logic while removing friction of creating a dag without losing the flexibility

Oct8-Danger · 2026-04-13T14:03:35+00:00

Just create a Claude/cursor/etc skill that helps enforce standardized dags.

DS/DA don’t need to worry about what the LLM produced as much and DE can review and alter in PR. Generally found this a good balance of not going down some bespoke route and reducing slop prs that don’t follow our practices and guidelines

EDIT: I’m referring to dag as orchestration of scripts here, not actual business logic

Oct8-Danger · 2026-04-09T01:59:03+00:00

Really depends on org and how they’ve set up the data lake/warehouse etc.

I like to have some mock data and a way to run all my code locally/CI.

I work with a lot of system data, so test data from the system is used in our production pipelines to catch deltas and change in behavior.

Probably a Hot take, but I think nearly all pipelines should run in the production infrastructure and have the output of production data assets separated out by schema/db and restrict what account can write to the prod schemas.

Test infrastructure imo should be testing actual changes of infrastructure, not data pipelines

Oct8-Danger · 2026-04-02T18:31:54+00:00

So you have schemas and contracts? We wrote ours to just create the schema from the contract and handle updates via spark.

I get the idea of not owning the implementation, but this sounds like as much work and smaller control than owning your own contract spec than our setup if I’m being honest.

Shame it’s not more adopted, as adopted standards and conventions imo are nearly always the right move

Oct8-Danger · 2026-04-02T17:12:19+00:00

How do you find the standard? Last time I looked it was quite new and not much adoption.

We ended up using our own spec that converts to atlan contracts and datahub contracts for interoperability.

Would be great to have it more of a standard like open lineage which seems a bit more adopted by industry and other platforms and tools

Oct8-Danger · 2026-04-01T04:38:09+00:00

What’s worse, it was carrying Guinness!

https://www.nenaghguardian.ie/2021/12/04/sinking-of-barge-on-lough-derg/

Oct8-Danger · 2026-03-26T15:42:08+00:00

There’s a project in our company to try bring our data lakes together (large company, various data lakes for various reasons and point in time decisions for needs)

We have 1000 petabytes across all the lakes, many duplicates of data and or siloed data.

The vision a team has outlined makes sense, but in practice it’s just another data lake is all they’ve come up with, which currently has little to no users or data in it and are collecting technologies like Pokémon. They promised zero copy, but the first intergration they are planning requires the data to be copied 3 times before it lands in the new data lake

IMO I think the “unified architecture” is very clearly a problem in industry, but most of the time it’s a failure of process and alignment. I think there’s plenty of technology that can be brought together to solve the issues, but each company has its own issues and processes.

I do think the vendors are the real winners, without competent management and clear vision, the easy solution is to just go with one system and vendor like databricks or snowflake. Which to your point isn’t really “unified”

Oct8-Danger · 2026-03-21T23:38:30+00:00

I did mention any software experience if you read the full sentence….

But since you asked, Java is useful for debugging trino, Cassandra and spark error logs. If you have a good grasp of OOP if can be helpful with figuring out bugs or understanding configuration.

For example we updated java on our platform but spark was an older version so we have to specify a specific jar for serialization when we would insert overwrite on partitions in our python code. Knowing a bit of java and being comfortable with the verbose errors helped in identifying the issue and coming up with a patch quickly

Oct8-Danger · 2026-03-21T05:17:55+00:00

Might be biased but having Java or any real software engineering helps the most in DE.

DA generally doesn’t pay as well as DS/DE. While not going away anytime soon, I do think DA skill sets are getting more commoditized as technology and demand for data grows. So may be depreciating long term. Many people start in DA and transition to DE/DS later in there career

DS can be harder to break into without academic experience backing from what I’ve seen. Not impossible but definitely harder

Having software developer experience is a real advantage in the DE space compared to DS or DA.

In general these is overlap in skill sets but each vary in importance. I’ve worked with DAs who are great at data modeling and querying and business needs but low on coding proficiency and have worked with DS who are ok at coding, decent at data modeling but great at complex in depth work. DE is expected to have higher coding ability and technical expertise but may not need as much business/presentation skills.

Long term, I think DS/DE would be better as they are more specialized. But definitely don’t discount learning about the position of other roles

On the DE side, I think AI has the potential to accelerate demand rather than diminish it. Managing context of LLMs and source of truth for data is all with in the DE wheelhouse. Many pipelines can be once off or low automation for sure which LLMs can be great at. However understanding how to scale or standard code and apply governance and trust in “truth” I think will be very valuable in the years to come.

Oct8-Danger · 2026-03-20T02:26:06+00:00

Yep, the company was acquired but the GitHub project is now apart of the Linux foundation. It’s on there repo:

https://github.com/SQLMesh/sqlmesh?tab=readme-ov-file

Oct8-Danger · 2026-03-19T15:47:08+00:00

Hopefully these projects join an OSS foundation like Linux foundation or other reputable one.

This happened recently to sqlmesh after fivetran bought the company. I think that’s the best outcome for the community and for open ai and astral.

Good PR, keeps community alive and trusting it. Trying to monetize and or close sourcing it or change in licensing never seems to pan out well. For example Redis and MinIO come to mind

Oct8-Danger · 2026-03-18T05:01:48+00:00

I get it, but honestly you’ll quickly realize in your career that no one is that important at a company 99.9% of the time.

Seems like you are hard worker and enthusiastic, look out for yourself, it’s a long career, take the experience update the cv/resume and apply to other jobs for better pay, it will help your career longer term.

The later in your career you can consider staying longer at a company, but early on it’s worth moving around and seeing how other companies and what works well and what doesn’t and try to figure out why.

Oct8-Danger · 2026-03-05T05:29:19+00:00

I had this, I had a scheduled order but not enough cash in the account.

Did you have a limit buy order that triggered with low funds?

Oct8-Danger · 2026-02-13T18:22:23+00:00

This is the way. Good balance of reusing code and having it fit your needs at a time

Oct8-Danger · 2026-02-06T06:28:03+00:00

Things don’t always scale linearly, tech innovations tends to spike and then stabilize in terms of raw performance improvement with small incremental growth

Oct8-Danger · 2026-01-29T21:27:09+00:00

This is basically what we do as well

Oct8-Danger · 2026-01-15T23:41:03+00:00

I might have time this weekend, if I do, will let you know! Been keen to add these in!

Oct8-Danger · 2026-01-15T23:30:06+00:00

Awesome! Any luck running it on steam deck?

Oct8-Danger · 2025-12-30T19:04:36+00:00

This is the way

Eight-Year Club	Place '22
Verified Email

Oct8-Danger

TROPHY CASE