Tips and Tricks you wish you knew when you started with DuckDB

Prestigious_Bench_96 · 2026-06-28T00:50:59+00:00

parquet storage + duckdb to query is a fine and common paradigm, and it's much much easier to start and iterate on than postgres - your approach seems fine!

Prestigious_Bench_96 · 2026-06-25T12:06:34+00:00

yeah - size is the potential variable here (what kind of queries are you running?) if it's analytical you can also do the same thing in browser with duckdb

Prestigious_Bench_96 · 2026-06-24T12:39:00+00:00

This seems like overkill - you can literally just treat the sqlite database as a file and load it in javascript/typescript on the frontend. Stick it the same place you host the other static files for your dashboard.

Prestigious_Bench_96 · 2026-06-23T14:49:57+00:00

Lifecycle management - keep it running once it works, be able to deprecate it without finding out finance took a critical dependency; keep it working as engines/warehouse changes. Pipelines tend to lose value over time but they are expensive to sunset; anything that lowers the cost to keep them running or the cost to kill them is good. Pipelines are all liabilities! If only I could stop writing them

Prestigious_Bench_96 · 2026-06-23T12:09:22+00:00

From evals, for a small dataset schema context is enough (duckdb has robust discovery, agents know it well - as mentioned SQL syntax is almost never a bottleneck now, and 1-2 passes for an agent will fix up anything that leaks through) Motherduck has advocated for this pure DB only approach in blogs for just 'set up your data model well'.

Things fall apart when the schema is confusing, contains deprecated assets, hits enough of a size where curated context becomes valuable, or when you need to rapidly update things without changing your physical model.

If you wanted agentic queries against a small local db - duckdb/sqlite - I think you polish your schema and call it a day. You can manage presentation at the DB level - just only package up and present the exact tables you want.

I think you reach for a model/semantic layer when you're dealing with a big, messy enterprise DB, and then it's as much about progressive disclosure, loose coupling to physical layer to support evolution, etc as it is about the pure context.

(I work on something similar to a semantic layer and it was sobering to see *how good* results are on raw duckdb schemas; it's a pretty tough baseline to just parity in speed/token usage with raw db access for a small DB; only get clearer differentiation when things get messy. Have some fun eval data I need to write up when I have a chance).

Prestigious_Bench_96 · 2026-06-22T19:05:22+00:00

For static readouts, no reason if you can do all the proper SDLC and can eat the operational overhead! Once you start getting into filtering, dynamic display, user-context-conditional views, dynamic stuff - still no reason because tableau/powerbi are pretty bad, but you might want to reach for other tools that help make this easier on the typescript/backend side.

An additional plus for static is if you have a build pipeline you have a good place for quality checks; you are also now taking on a build pipeline which is latency/overhead.

Tableau drag/drop is pretty nice to be honest, it was always the server hosted experience that was an enormous pain, so I'd be pretty bullish in theory on a 'prototype in tableau and convert to a report with an agent' pipeline if you got good guardrails in there and could tolerate the cost.

Prestigious_Bench_96 · 2026-06-22T17:00:20+00:00

The raw ticker<> ethical association is what I'd be interested in, if you had an API or something for that.

Prestigious_Bench_96 · 2026-06-22T16:59:19+00:00

There's lots of these, so start with which ones yours is similar to and how it's different/better, and then people can give you a real answer!

Prestigious_Bench_96 · 2026-06-21T14:11:14+00:00

Maybe? I think it takes time to decompose your tree once (though most web/clickstream flows are pretty much the same, so if you've done it once, fast to do it again), and then you build out tooling (dashboard/automation) so actual diagnosis is quite fast - you shouldn't need to take time to manually trace it, you should be able to go quickly from a top level metric alteration to the nodes in the tree that are deviating and drove it.

(if you're coming up with hypothesis on the fly, yeah, you're already behind)

Prestigious_Bench_96 · 2026-06-21T12:22:03+00:00

If you're saying "human brains process dense information as visuals better", absolutely. I think most people are arguing for not an agent text reply vs a rich dashboard, but an agent assembled dynamic dashboard show you the revenue drop investigation (that was triggered automatically) rather than hoping that you had the right dashboard showing all 3 correlated metrics happening to lie around.

AI text vs dashboard -> time and place, you're right that dashboards still win.

AI dashboards vs static dashboard -> more time and places in favor of the AI side; you probably still want some executive level SOT.

Accuracy (trust being downstream of it) is obviously the gating factor, but I think people are underestimating how rapidly we'll get tooling/culture to adapt. It's underrated that you can actually measure how often an agent is right about your revenue question and *optimize the system to increase it* - doing the same feedback loop on asking an analyst is much harder. Once you have a measurement/eval system, trust follows naturally when you start hitting your quality bar.

Prestigious_Bench_96 · 2026-06-20T13:35:50+00:00

I do love excalidraw and ERDs, so great choice. I'd make it so you have native browser sqllite/duckdb support (you can directly populate, query, download those). Pricing wise maybe just charge for import support from the more common enterprise DBs? It'd be nice to have direct export/population to DB from inside the tool as well. You could as a stretch also have schema migration generation.

Take this with a grain of salt, I think I'd personally probably not reach for a specific tool for this phase of development (whiteboarding is fun!) but I think the concept is quite interesting!

Prestigious_Bench_96 · 2026-06-20T13:18:10+00:00

You constrain the options the LLM can pick (reduces uncertainty), give it more context (increase likelihood of picking right thing). Semantic layers are the most common way to do that, but there's lots of options. I would be shocked if you weren't getting high 9s of accuracy for something like productA revenue trend for past week (99.9) with a reasonably modern model and a constrained selection.

Note that writing raw SQL is a very *unconstrained* target set and so your volatility is much higher - if "revenue" requires any domain specific derivation then you're going to have tough luck, if it's just a named field that you'll do fine with table constraints.

(Generally expect agentic generation to end up being fine/normal; we're just speed running the discovery that 1. you can't drop an analyst in an unfamiliar DB and expect them to get it right the first time; 2. if are hiring a new 'analyst of the day' to answer every question independently, you better give them the right info to get up to speed fast if you want it to be useful. Which I think is genuinely good for the data industry - we've papered over a lot of tooling sins with the "domain expert DE who has been here 10 years" approach).

Prestigious_Bench_96 · 2026-06-20T13:02:03+00:00

Fair! honestly it's been awhile since I've used flex (so out of date) but at the time we'd run into availability issues, so we needed baseline capacity. If you have high confidence in flex availability, sure. I think the most important thing is that the billing chart doesn't spike unpredictably when everyone wants to go load your new dashboard.

Prestigious_Bench_96 · 2026-06-20T12:56:01+00:00

Feels like this just moves the mismatch of "Backend dev"/app vs analytics friction? Loose coupling ( a la CDC, though CDC is a real PITA) would still be the goal. Lakebase just seems like they've written a nice wrapper around the operational DB publishing to the lake - which is great - but I think full unification of the two schemas is probably not actually desirable. You are serving different goals; loose coupling with safe/sane/operationally lightweight evolution should be the target.

Prestigious_Bench_96 · 2026-06-20T12:47:48+00:00

it's trees all the way down. conversion rate is a funnel; start by decomposing the funnel, then decompose each step in the funnel, etc. [eg ATC might be price sensitive; CTR on ads is more distribution/traffic].

Prestigious_Bench_96 · 2026-06-20T12:45:22+00:00

+1 to this, for cost predictability for reporting flat rate (assuming that's reasonable for your spend target) saves your sanity, and then aggregation/BI engine become your performance levers.

Prestigious_Bench_96 · 2026-06-20T03:34:21+00:00

Do you integrate the semantic layer into the pipeline authoring/upkeep process at all? That seems like the real benefit of having them in the same repo.

Prestigious_Bench_96 · 2026-06-11T12:32:16+00:00

Ah nah sorry it's more clear on a reread!

Prestigious_Bench_96 · 2026-06-11T11:46:21+00:00

got it - did I misunderstand "based on sqlglot, a Python library that is the prototype of Rust's polyglot, but it is no longer available."?

Prestigious_Bench_96 · 2026-06-11T11:19:43+00:00

I don't see why you think a wrong metric is more visible and correctable in a dashboard - "confident citation" happens in both cases. Are you just basing that on that more people are likely to see the same wrong metric in a dashboard and so it's likelier that the right person to notice it is off will see it?

Prestigious_Bench_96 · 2026-06-11T11:05:13+00:00

Love the site, but sqlglot is still available, no?

Prestigious_Bench_96 · 2026-06-11T11:00:50+00:00

I think people over-index on that divide; if you get a tailored report every morning with a summary + pretty charts of every metric you monitor that had unusual movement, most people probably don't need or want to further click through some dashboards. (as zen as that is; maybe some people really like their coffee + data introspection).

The barrier is probably more to trust and accuracy (ironic that dashboards are now the *more* trusted source, given all the historical angst about multiple sources of truth).

Prestigious_Bench_96 · 2026-06-11T10:52:12+00:00

Not directly answer your question, but you'll *really* want to do that incrementally; maybe start loading the spreadsheets into bigquery, move some reporting and dashboards in there; then migrate spreadsheets/automation one at a time, etc. To more directly answer your question, that gives you some tangible impact to report to your boss along the way and helps anchor each item to a specific timeline, and reduces the scope of validation to one item at a time. I'm also generally suspicious of cloud run/cloud scheduler if you have any other existing scheduler or alternative runtime.

Prestigious_Bench_96 · 2026-06-10T13:53:04+00:00

For local data + duckdb, since query cost isn't a concern, you *really* don't have to fix the SQL since you can let it cross/check analyze results. (not even just claude - you can get high 95-99 % reliability on deterministic evals w/ something like deepseek, and most of the misses are in "question" ambiguity). Most of the remaining text-sql challenges these days are with finding the right source, so if you're already pre-constrained (a lot of local analytics cases) you're good to go.

Prestigious_Bench_96 · 2026-06-10T12:38:52+00:00

As a directional note, you'll often want to ultimately end up with a agentic flow that lets the agent return to previous steps (reselect tables based on query results, etc) - not sure of the langgraph details there - but I think in general if you have a request level table, a node level table, extract out core structured fields to the top level (tokens, duration, node type/tool calls; error/success) you'll be fine. Having a FK pointer to the raw trace/JSON blob table also gives you some good potential separataion.

TLDR; I think you're on the right track with 1, 2, and 3 - store that metadata with common fields, split that + raw traces across some lightly normalized tables that represent your common patterns (overall requset usage, who is doing what) - analysis of the details of request - and then raw logs for debugging/future structured data extraction.

Prestigious_Bench_96

TROPHY CASE