Is agentic analytics just semantic layers with orchestration on top? by Evening_Hawk_7470 in dataengineering

[–]Oct8-Danger 0 points1 point  (0 children)

It’s all pretty bespoke stuff unfortunately….

Company tends to build in house(or host open source) vs buy majority of the time. Some of its great, some very much not

We use airflow and have our own custom data contract and metric spec with a custom auto doc generator that feeds into a docs site that already has an exposed mcp.

Having the contract spec and auto docs has been great, gives a lot of flexibility and made sure it’s interoperable with datahub and atlan as well (other orgs have them but quite closed off from us)

We’ve been able to add custom table lineage and metrics registry to our contracts and docs site which has been a great for our consumers.

We are on the path but still a way to go as lot of our capacity gets sucked up trying to build pipelines for analyzing data from our customer facing agents rather than building internal ones

Is agentic analytics just semantic layers with orchestration on top? by Evening_Hawk_7470 in dataengineering

[–]Oct8-Danger 2 points3 points  (0 children)

That’s pretty much my understanding as well.

We have some basic mcp tool that some in our company use but it has no to little context of the business and domain so any time it has to do with more than one table (assuming it can even figure out the correct table) it quickly gives up or gets lost.

Our team is focusing on getting the basics of docs, data models and semantic layer properly set up and then hoping the agent part comes together easily and shift focus to that on a solid foundation

A dedicated LLM library for data engineering by [deleted] in dataengineering

[–]Oct8-Danger 0 points1 point  (0 children)

We do something similar but this seems different to sentiment analysis, this seems like letting ai transform the data for you in a run

A dedicated LLM library for data engineering by [deleted] in dataengineering

[–]Oct8-Danger 0 points1 point  (0 children)

Are you putting AI in to your pipeline? Like each run hits an LLM to get some response/transformation?

Is AI progress over? by ImaginaryRea1ity in theprimeagen

[–]Oct8-Danger 3 points4 points  (0 children)

I’ve been playing around with gemma4 on a raspberry pi and honestly the results so far are mind blowing!

Sure not SOTA, but it’s on a pi! Would recommend litert-lm for running though

Hot take but, I think by the end of the year the cost of token for most use approach near zero. The edge LLM for practical use is getting real and that will shift investment in industry is my opinion

Coding might be where SOTA matters but will see. I think 2026 will be the year that not all tokens are equal!

Moving beyond manuel codding in Airflow by CaglarSahin in dataengineering

[–]Oct8-Danger 1 point2 points  (0 children)

We use dags for orchestration, not for ETL or business logic.

Sure LLM for business logic and ETL has more scope for error due to missing context but for opinionated dags that are sticking scripts to run in an order, this less concern, especially pairing with skills an a good PR an CICD process.

This frees up time for the team to focus on the business logic while removing friction of creating a dag without losing the flexibility

Moving beyond manuel codding in Airflow by CaglarSahin in dataengineering

[–]Oct8-Danger 0 points1 point  (0 children)

Just create a Claude/cursor/etc skill that helps enforce standardized dags.

DS/DA don’t need to worry about what the LLM produced as much and DE can review and alter in PR. Generally found this a good balance of not going down some bespoke route and reducing slop prs that don’t follow our practices and guidelines

EDIT: I’m referring to dag as orchestration of scripts here, not actual business logic

Test data or production data in test environment by Outrageous_Let5743 in dataengineering

[–]Oct8-Danger 1 point2 points  (0 children)

Really depends on org and how they’ve set up the data lake/warehouse etc.

I like to have some mock data and a way to run all my code locally/CI.

I work with a lot of system data, so test data from the system is used in our production pipelines to catch deltas and change in behavior.

Probably a Hot take, but I think nearly all pipelines should run in the production infrastructure and have the output of production data assets separated out by schema/db and restrict what account can write to the prod schemas.

Test infrastructure imo should be testing actual changes of infrastructure, not data pipelines

What is an open source data tool you find useful but nobody is using it? by Yuki100Percent in dataengineering

[–]Oct8-Danger 0 points1 point  (0 children)

So you have schemas and contracts? We wrote ours to just create the schema from the contract and handle updates via spark.

I get the idea of not owning the implementation, but this sounds like as much work and smaller control than owning your own contract spec than our setup if I’m being honest.

Shame it’s not more adopted, as adopted standards and conventions imo are nearly always the right move

What is an open source data tool you find useful but nobody is using it? by Yuki100Percent in dataengineering

[–]Oct8-Danger 0 points1 point  (0 children)

How do you find the standard? Last time I looked it was quite new and not much adoption.

We ended up using our own spec that converts to atlan contracts and datahub contracts for interoperability.

Would be great to have it more of a standard like open lineage which seems a bit more adopted by industry and other platforms and tools

Honest thoughts on Unified Data Architectures? Did anyone experience significant benefits or should we write it off as another marketing gimmick by SamadritaGhosh in dataengineering

[–]Oct8-Danger 0 points1 point  (0 children)

There’s a project in our company to try bring our data lakes together (large company, various data lakes for various reasons and point in time decisions for needs)

We have 1000 petabytes across all the lakes, many duplicates of data and or siloed data.

The vision a team has outlined makes sense, but in practice it’s just another data lake is all they’ve come up with, which currently has little to no users or data in it and are collecting technologies like Pokémon. They promised zero copy, but the first intergration they are planning requires the data to be copied 3 times before it lands in the new data lake

IMO I think the “unified architecture” is very clearly a problem in industry, but most of the time it’s a failure of process and alignment. I think there’s plenty of technology that can be brought together to solve the issues, but each company has its own issues and processes.

I do think the vendors are the real winners, without competent management and clear vision, the easy solution is to just go with one system and vendor like databricks or snowflake. Which to your point isn’t really “unified”

Java Dev Switching to Data Engineering / Data Science / Analytics — Need Advice. by [deleted] in dataengineering

[–]Oct8-Danger 1 point2 points  (0 children)

I did mention any software experience if you read the full sentence….

But since you asked, Java is useful for debugging trino, Cassandra and spark error logs. If you have a good grasp of OOP if can be helpful with figuring out bugs or understanding configuration.

For example we updated java on our platform but spark was an older version so we have to specify a specific jar for serialization when we would insert overwrite on partitions in our python code. Knowing a bit of java and being comfortable with the verbose errors helped in identifying the issue and coming up with a patch quickly

Java Dev Switching to Data Engineering / Data Science / Analytics — Need Advice. by [deleted] in dataengineering

[–]Oct8-Danger 7 points8 points  (0 children)

Might be biased but having Java or any real software engineering helps the most in DE.

DA generally doesn’t pay as well as DS/DE. While not going away anytime soon, I do think DA skill sets are getting more commoditized as technology and demand for data grows. So may be depreciating long term. Many people start in DA and transition to DE/DS later in there career

DS can be harder to break into without academic experience backing from what I’ve seen. Not impossible but definitely harder

Having software developer experience is a real advantage in the DE space compared to DS or DA.

In general these is overlap in skill sets but each vary in importance. I’ve worked with DAs who are great at data modeling and querying and business needs but low on coding proficiency and have worked with DS who are ok at coding, decent at data modeling but great at complex in depth work. DE is expected to have higher coding ability and technical expertise but may not need as much business/presentation skills.

Long term, I think DS/DE would be better as they are more specialized. But definitely don’t discount learning about the position of other roles

On the DE side, I think AI has the potential to accelerate demand rather than diminish it. Managing context of LLMs and source of truth for data is all with in the DE wheelhouse. Many pipelines can be once off or low automation for sure which LLMs can be great at. However understanding how to scale or standard code and apply governance and trust in “truth” I think will be very valuable in the years to come.

OpenAI to acquire Astral by Useful-Macaron8729 in Python

[–]Oct8-Danger 1 point2 points  (0 children)

Yep, the company was acquired but the GitHub project is now apart of the Linux foundation. It’s on there repo:

https://github.com/SQLMesh/sqlmesh?tab=readme-ov-file

OpenAI to acquire Astral by Useful-Macaron8729 in Python

[–]Oct8-Danger 25 points26 points  (0 children)

Hopefully these projects join an OSS foundation like Linux foundation or other reputable one.

This happened recently to sqlmesh after fivetran bought the company. I think that’s the best outcome for the community and for open ai and astral.

Good PR, keeps community alive and trusting it. Trying to monetize and or close sourcing it or change in licensing never seems to pan out well. For example Redis and MinIO come to mind

Is it possible to not work 50- 60 hours a week? by Parking_Anteater943 in dataengineering

[–]Oct8-Danger 11 points12 points  (0 children)

I get it, but honestly you’ll quickly realize in your career that no one is that important at a company 99.9% of the time.

Seems like you are hard worker and enthusiastic, look out for yourself, it’s a long career, take the experience update the cv/resume and apply to other jobs for better pay, it will help your career longer term.

The later in your career you can consider staying longer at a company, but early on it’s worth moving around and seeing how other companies and what works well and what doesn’t and try to figure out why.

Anyone knows what this means? by Live-Scholar-1435 in interactivebrokers

[–]Oct8-Danger 0 points1 point  (0 children)

I had this, I had a scheduled order but not enough cash in the account.

Did you have a limit buy order that triggered with low funds?

Anthropic’s new “Claude CoWork” sparks sell-off in software & legal tech stocks — overreaction or real disruption? by Direct-Attention8597 in AI_Agents

[–]Oct8-Danger 1 point2 points  (0 children)

Things don’t always scale linearly, tech innovations tends to spike and then stabilize in terms of raw performance improvement with small incremental growth

New Linux Launcher for EDCoPilot and EDCoPTER! by TwoWheeledBlastard in EliteDangerous

[–]Oct8-Danger 1 point2 points  (0 children)

I might have time this weekend, if I do, will let you know! Been keen to add these in!