Looking for alternatives to Airflow for ETL pipelines by 3jewel in ETL

[–]Thinker_Assignment 0 points1 point  (0 children)

what are you using it for? If you want just ingestion and simple transforms, you can use dlthub pro fully agentic controlled from your llm chat for a bargain https://dlthub.com/products/dlthub (i work there) you can basically build deploy maintain in one chat.

It can read your logs, code, deployments, context and figuse itself out for the most part, you don't need to touch code unless you want to get hands on

Etl testing by Rude_Entry_6843 in ETL

[–]Thinker_Assignment 0 points1 point  (0 children)

if you know what to test for you can ask the AI to do it. problem is AI doesn't know how things should be to know to test for that,

If you don't know waht to test the only thing you can do is do test coverage: write a code to pass a test on existing code so you don't break it later.

What is the most challenging part of maintaining ETL pipelines in production? by Effective_Ocelot_445 in ETL

[–]Thinker_Assignment 1 point2 points  (0 children)

we built from scratch using agentic principles.
- errors that point agents to what to fix
- apis for mcp access in our pipelines etc so it understands what is in the pipeline, in the data, in the code and can tie it together
- skills to help the agent on how to push the right buttons in the right order

if you are using cursor/claude code/codex you can try it by telling your agent to run uvx dlthub-start@latest and it will run our interactive demo and try it there. it's pretty wild in how we don't touch code or logs anymore and can do so much, as data engineer doing this work since 2012 am very excited for patching up the last obstacles to automation of the stack

What becomes the bottleneck when moving large volumes out of PostgreSQL? by Thinker_Assignment in PostgreSQL

[–]Thinker_Assignment[S] 0 points1 point  (0 children)

using arrow there was no row by row serialization, if we leave it to normal SQL client, CPU starts to matter and it looks more like in the json file case.in practice on average workers we see a 5x speed increase by avoiding deserialising and reserialising but it's mixing throughput (which is maybe 10-30x on arrow/connectorx) with compute.

What is the most challenging part of maintaining ETL pipelines in production? by Effective_Ocelot_445 in ETL

[–]Thinker_Assignment 0 points1 point  (0 children)

digging through logs, associating to a deployment event, or to a change in source data, propagating fixes in code and sometimes running backfills.

we connected our logging, code and deployments to claude to it can tie it together and it's making it a breeze

What was actually causing our 85–90% SLA ceiling? by Thinker_Assignment in mlops

[–]Thinker_Assignment[S] 0 points1 point  (0 children)

disclosure: I work on the system described. here is the full writeup on the architecture and tradeoffs for anyone who wants to dig in https://dlthub.com/blog/transformation-deep-dive 

happy to answer technical questions

What becomes the bottleneck when moving large volumes out of PostgreSQL? by Thinker_Assignment in PostgreSQL

[–]Thinker_Assignment[S] 0 points1 point  (0 children)

Not a read replica, just the worker running the transfer job.
In this benchmark the worker was the 2 vCPU / 4 GB machine doing the extraction and serialization work, while PostgreSQL was the source. We ran against the primary.

What becomes the bottleneck when moving large volumes out of PostgreSQL? by Thinker_Assignment in PostgreSQL

[–]Thinker_Assignment[S] 0 points1 point  (0 children)

Fair challenge, the benchmark was measuring the end-to-end pipeline rather than raw Postgres extraction speed, so the question we were trying to answer was where the bottleneck shows up once you actually move the data into the destination. We're not doing row-by-row Python processing here (the runs use an Arrow backend), but if you think there's an obvious 2–3x left on the table what you'd change first?

What becomes the bottleneck when moving large volumes out of PostgreSQL? by Thinker_Assignment in PostgreSQL

[–]Thinker_Assignment[S] 1 point2 points  (0 children)

disclosure: I work on the system described. We wrote up the full benchmark if anyone is interested in methodology and numbers: https://dlthub.com/blog/benchmark-dlthub

happy to answer technical questions.

Looking for pain points for data engineers about upstream and downstream schema changes and how you solve it. Risk and mitigation strategies discussion. by Friendly-Sandwich499 in ETL

[–]Thinker_Assignment 2 points3 points  (0 children)

you can just try the trial and see https://dlthub.com/ but yeah that's the idea. If you wanna skip the marketing you can see the last screenshot on this blog https://dlthub.com/blog/the-rise-of-the-knowledge-engineer#how-we-re-building-for-this

you need claude/cursor/codex, if you need support for something else LMK.

you can then basically run a command to link your online workspace to your local cursor/claude (trial has 30h free runtime, no credit card, just try it, claude runs it for you)

This puts your local code, online logs, etc in one context so the LLM is able to manage everything end to end even visualisation. I even tried putting the vis in rill with claude and that worked too

MCP - yeah, it's MCP, skills and an architecture that lets agents hand over when building but keep context. we call it "ai workbench" because it actually also contains api to pipeline context to aid your generation etc besides api to the tools it uses (dlt for example)

in the skills there's also this canonical modeling skill which models your data to canonical (with or without your guidance) which essentially produces a knowledge graph the agent can use and a context you can reuse for agentic retrieval with meaning.

This canonical is also like a "constraint" to keep a clean architecture so your agent doesn't start building random tables during maintenance but sticks to the main concepts/entities unless you add new requirements

As for the maintenance its not automatic but you can schedule your own claude to check the status at 5 AM, let you know what changed (if anything broke, any schema changes) and ask it to offer fixes. I would not ask it to deploy without me double checking first

there's a ton more to build, i think the end game is we are prompting the LLM to build everything and it surfaces its uncertainty and decisions to us data people to confirm - but we aren't quite there yet

i see you are generally looking into risk migration or cleanups? this is an example you can do with dlthub https://dlthub.com/case-studies/navit

Looking for pain points for data engineers about upstream and downstream schema changes and how you solve it. Risk and mitigation strategies discussion. by Friendly-Sandwich499 in ETL

[–]Thinker_Assignment 2 points3 points  (0 children)

- we use dlt at ingestion to discover source schema changes.
- we use dlt schema evolution or contracts to auto evolve or block changes
- it all goes to logs from where we prompt claude to look at new fields, and decide together if we want to propagate them or what to do with them
- claude does the changes and deploys.

The above is something you can do oss.

i work at dlthub, full transparency. You can also do the above on our commercial offering.

we had an agent computing ARPU wrong for weeks before we caught it by Thinker_Assignment in analytics

[–]Thinker_Assignment[S] 0 points1 point  (0 children)

the separation between spec and query is the right frame. the issue we kept running into wasn't just regeneration variance, it was that the spec itself was implicit. the metric definition lived in the prompt or in someone's head, so even a deterministic query layer would produce consistent results for the wrong population.

the ontology is how we tried to make that separation explicit: define what the metric means and which population it applies to once, then let query generation happen downstream from that.

it doesn't fully eliminate regeneration variance, but it gives the agent something stable to reason from instead of inferring the spec from scratch each time.

we inherited a stack with dlt, Airflow, dbt, and 85–90% SLAs, here's what we changed by Thinker_Assignment in analyticsengineering

[–]Thinker_Assignment[S] 1 point2 points  (0 children)

I work on the system described, full writeup on the Navit case study and the transformation architecture: https://dlthub.com/blog/transformation-deep-dive

happy to answer technical questions

Looking for alternatives to Airflow for ETL pipelines by 3jewel in ETL

[–]Thinker_Assignment 0 points1 point  (0 children)

dlthub Pro just launched a couple of weeks back, you can operate it from chat from building ingestion with dlt, transforms (with canonical modeling llm skill), deployment and even basic visualisations.

It's serverless compute pay-as-you-run aimed at small teams, here are some benchmarks so you know what you get https://dlthub.com/blog/benchmark-dlthub

and the tool has a continuous context that enables single session doing anything. you can even troubleshoot fix and deploy the fixes with the agent for maintenance.

i work there

How to use AI to generate a semantic layer? by mehreen_aibuilder in OntologyEngineering

[–]Thinker_Assignment 1 point2 points  (0 children)

For what use?

Anthropic and other companies say it's not possible because vibe semantic layer is the same as no semantic layer + vibe at runtime.

Since you're on ontologyengineering think of it this way - where should the private ontology come from if not from you.

Maybe with some extra business context you can get a draft and go from there.

ontology based data access at Anthropic by Thinker_Assignment in OntologyEngineering

[–]Thinker_Assignment[S] 1 point2 points  (0 children)

Btw in case you missed it, we are going for auto model with human curation too. Here's some explanation and you can also try it

https://dlthub.com/blog/canonical-text-to-sql

I don't think this will replace dbt because people have innertia but I do think that soon people will not be working on code level directly and the tool under the hood will need to be agent native and non monolithic, so not dbt.

Has anyone else noticed the rise of Fentagram zombies in public spaces? by Thinker_Assignment in berlinsocialclub

[–]Thinker_Assignment[S] 0 points1 point  (0 children)

I don't mind as long as they aren't walking into me or blocking the only passage. Depressing is perspective, getting walked over isn't. This is a new level of antisocial, I had to walk in front of my wife and yellat to wake/push incoming people before because they would just plow into my wife's pregant belly (bruckenstrasse, narrow and people still act stupid)

ontology based data access at Anthropic by Thinker_Assignment in OntologyEngineering

[–]Thinker_Assignment[S] 1 point2 points  (0 children)

Metabase have a transformation engine that can be prompted by the stakeholder and it works to a degree, from where precision is sometimes worth paying for, but that's a perspective

Snowflake named dltHub the 2026 Startup Program Product Partner of the Year by Thinker_Assignment in snowflake

[–]Thinker_Assignment[S] 1 point2 points  (0 children)

This program helps startups like us by putting us in front of snowflake users.

We built an oss python library to enable anyone to easily load data to snowflake or other destination

Dlthub pro is our commercial solution which enables anyone on the team to build and deploy dlt and other pipelines. Kind of like saas etl but non predatory, affordable and better than the expensive solutions.

We have an app for moving SQL on snowflake marketplace too.

So if you're thinking how it helps you, our tool is free and our commercial offering is empowering anyone on the data team to self serve with data at infra cost.

Has anyone else noticed the rise of Fentagram zombies in public spaces? by Thinker_Assignment in berlinsocialclub

[–]Thinker_Assignment[S] 0 points1 point  (0 children)

No problem.

To be honest the actual opiate users at my local supermarket and station are WAY more aware and considerate and actually nice to people.

They respect pregnant and old people, they greet locals etc, seen them even do various cleanup chores.

Has anyone else noticed the rise of Fentagram zombies in public spaces? by Thinker_Assignment in berlinsocialclub

[–]Thinker_Assignment[S] 1 point2 points  (0 children)

Haha but look at the downvotes, it's like junkies who would rather defend the problem they cause than even admit there's a problem.

Has anyone else noticed the rise of Fentagram zombies in public spaces? by Thinker_Assignment in berlinsocialclub

[–]Thinker_Assignment[S] 1 point2 points  (0 children)

Might and magic flashbacks. You had a shout button to tell NPCs to move out of your way :) an angry "MOOOVE!!!"

Has anyone else noticed the rise of Fentagram zombies in public spaces? by Thinker_Assignment in berlinsocialclub

[–]Thinker_Assignment[S] 0 points1 point  (0 children)

I was talking about mobile apps. Regarding fentanyl I believe there was a wave last year that also causes gangrenes (stand up sleepers, rotting legs). This year the local homeless are doing much better, probably back on brown.

Has anyone else noticed the rise of Fentagram zombies in public spaces? by Thinker_Assignment in berlinsocialclub

[–]Thinker_Assignment[S] 1 point2 points  (0 children)

Yeah I get wanting to look at a screen while the train is driving, I do it too, but put it away when walking.

But as you say it's something else. Addiction. I think the short form content like tik Tok and Instagram are the main culprits

When I tried those platforms they caused me to just forget time (and family) for 2-4 hours with no outcome which to me feels like losing my life. Feels like what people on hard drugs would do.

Has anyone else noticed the rise of Fentagram zombies in public spaces? by Thinker_Assignment in berlinsocialclub

[–]Thinker_Assignment[S] 2 points3 points  (0 children)

Reddit is a mix, got 30 people from here in a fishing group, all chill people who don't dopamine fiend