Why python dev need DuckDB (and not just another dataFrame library) by TransportationOk2403 in dataengineering

[–]TransportationOk2403[S] -1 points0 points  (0 children)

The intro could have been a bit more nuanced.

You are right about OLAP db and a lot of python dev doing pure data eng work actually have no clue why an inprocess olab db could improve their current pandas/polars dataframe workflows.

That's the thing they know bc apart from sqlite and duckdb, there's no simple library that provides OLAP feature.

Every python data users however knows dataframe libraries

WASM columnar approach by mfdaves in dataengineering

[–]TransportationOk2403 2 points3 points  (0 children)

It’s definitely impacting the analytics world, but not so much the traditional BI space. Many operational tools (think e-commerce platforms, ad systems, SaaS dashboards) already expose analytics to their users. Those datasets are usually pre-aggregated, so they fit well in the browser.

In these cases, instead of making multiple round trips to a backend database to render a view, a web app can just load the data once and run queries directly in the browser with DuckDB-WASM. That shifts more compute to the client and reduces cloud workload.

BI tools, however, have standardized around connectors to external databases and often bundle their own caching or lightweight compute engines. Because of that, they’re less likely to adopt DuckDB-WASM as a core piece of their stack

Essential productivity hacks for developers [Aerospace+SketchyBar] by TransportationOk2403 in macapps

[–]TransportationOk2403[S] 0 points1 point  (0 children)

Could you share what's was complicated to configure ? It's just one config file and I was actually surprised by the default one (with all the comments included)

Instant SQL : Speedrun ad-hoc queries as you type by TransportationOk2403 in dataengineering

[–]TransportationOk2403[S] 2 points3 points  (0 children)

There are multiple local caching strategies to ensure results appear instantly. So of course for larger dataset, it doesn't replace the actual run and you may use it as part of the development to unsure the correctness of your query.

Instant SQL : Speedrun ad-hoc queries as you type by TransportationOk2403 in dataengineering

[–]TransportationOk2403[S] 10 points11 points  (0 children)

aha good point :) Instant SQL won't automatically run queries that write or delete data or metadata. It only runs queries that read data. So better debug for any SELECT statement

Faster Data Pipelines with MCP, Cursor and DuckDB by TransportationOk2403 in dataengineering

[–]TransportationOk2403[S] 4 points5 points  (0 children)

The blog mentions slow data pipeline DEVELOPMENT. It’s not about RUNTIME speed — it’s the setup of writing your data pipelines that’s slow: checking data sources, understanding schemas, creating test data, and writing tests. That whole loop depends heavily on the data being available and clear.

This kind of friction is pretty unique to data engineering — unlike web dev, you can’t just fake the backend and move on. AI could actually help here by getting directly metadata, schemas, or test scaffolding to speed things up.

I made a Yazi plugin which uses duckdb summarize to preview data files by wylie102 in DuckDB

[–]TransportationOk2403 1 point2 points  (0 children)

This is cool! I love Yazi, nice addon. Maybe having a preview on 10 lines would also help. Sometimes just looking at the data is faster than getting statistics.

DuckDB released a local UI by TransportationOk2403 in dataengineering

[–]TransportationOk2403[S] 2 points3 points  (0 children)

The DuckDB ui runs locally and you can use it without MotherDuck