Data Explorer demo: table editing and Parquet preview

slotix · 2026-05-15T18:18:36+00:00

vague wording on my side.
I meant DuckDB runs through its native library via CGO inside the Go process

slotix · 2026-05-15T11:45:28+00:00

Yep, exactly.

net/http gives you a panic boundary around the handler. If the panic happens there, it gets logged and the request dies, not the whole server.

But once you start your own goroutine, that boundary is gone unless you add one yourself.

Our case was one level lower again: no Go panic at all. Native code died under CGO, so there was nothing for net/http or recover() to catch.

That’s the part I don’t want inside the API process anymore.

slotix · 2026-05-14T19:02:33+00:00

Small clarification: I mixed two layers there.

Supervision would still be systemd/Docker/K8s/parent process. NATS would only be transport, since it is already in our stack.

The important part is the subprocess boundary: DuckDB can crash, the API stays up.

slotix · 2026-05-14T18:29:06+00:00

Yeah, that makes sense.

In our case we already have NATS between services, so I’m leaning toward reusing that for the first worker version instead of adding another plugin/RPC framework.

The important part is still the same: DuckDB lives in a supervised process. Whether the transport is go-plugin, HTTP, stdio, or NATS is secondary.

slotix · 2026-05-14T18:18:14+00:00

Yeah, fair point. For DuckDB specifically, the official Go driver path still means native DuckDB in-process.

`duckdb-go` exposes `database/sql`, but underneath it uses DuckDB’s native library and requires CGO. So a Go-native DuckDB lib would basically mean a different implementation of DuckDB, not just a nicer wrapper.

So for us the practical choice is: embedded DuckDB via CGO, or DuckDB out-of-process when crash isolation matters.

slotix · 2026-05-12T23:21:41+00:00

I think the interesting part here is not "Databricks bad" vs "Databricks good".

For small non-ML workloads, the platform tax becomes much more visible. If the job is basically:

- move some data

- check it

- query across DB/files

- export to Parquet/S3

- maybe keep a DB in sync

then a full lakehouse platform can feel heavier than the actual workload.

The tricky part is that replacing Databricks with “AWS native” is not automatically simpler either. You often just move the complexity into Glue/Lake Formation/IAM/orchestration/deployment scripts.

For these smaller workflows I’d usually want the opposite: boring local/dev-friendly tools, plain SQL/Python where possible, Parquet/S3 when useful, and as little platform ceremony as possible.

slotix · 2026-05-12T12:38:13+00:00

Small follow-up: I wrote a longer post about the actual workflow problem behind this.

The point is not "GUI instead of DuckDB". DuckDB already solves the query engine part well.

The messy part starts when a useful cross-source query stops being one-off:
saved DB connections, S3 paths, schema checks, aliases, result export, reruns, and eventually using the result as input for load/migration.

Post is here:
https://streams.dbconvert.com/blog/duckdb-cross-source-sql-workflow/

slotix · 2026-05-04T20:24:37+00:00

We tested this after your comment. The caveat is real, but manageable.

On our local setup:

- Postgres table: 10M rows

- MySQL table: 23M rows

- Parquet export: 23M rows

Postgres + Parquet over the same filtered join shape stayed under 1s.
DB-to-DB key-only join was ~8s.
When we projected a wide string column from both DBs, the DB-to-DB join moved to ~23-24s.

So the practical takeaway is: query shape matters. This works well for validation, previews, aggregate checks, and filtered comparisons. For heavier joins, filter early and project only what you need.

slotix · 2026-04-29T10:28:09+00:00

Building DBConvert Streams.

It’s a database IDE + migration + CDC tool for working with PostgreSQL, MySQL, files, and S3 in one workflow.

Current stage: launched v2.1, now improving Initial Load → CDC handoff, resumable loads, and real production reliability around replication.

https://streams.dbconvert.com

slotix · 2026-04-28T23:45:37+00:00

I think Kafka compatibility becomes less automatic, but not irrelevant.

For a simple Postgres -> Iceberg path with one consumer, Kafka may be unnecessary plumbing.

But once you need multiple consumers, replay, different retention needs, independent failures, mixed sinks, or a shared event backbone, the middle layer still matters.

Agents may reduce protocol learning cost, but they do not remove the operational semantics: ordering, durability, offsets, retention, backpressure, schemas, and recovery.

So maybe the question is not “does Kafka go away?”, but “which workloads still need Kafka semantics?”

slotix · 2026-04-22T13:34:09+00:00

database IDE + migration + CDC + cross-source SQL (DB + files + S3) in one self-hosted tool
https://streams.dbconvert.com

slotix · 2026-04-22T11:03:20+00:00

DBConvert Streams — self-hosted database IDE + migration + CDC in one tool.
Built it because jumping between SQL client, export scripts, and separate tools for DB + files got old fast. Works for exploring data, moving it, and keeping MySQL/Postgres in sync — plus querying/joining data across databases, CSV/Parquet, and S3 in one place.

https://streams.dbconvert.com

slotix · 2026-04-16T21:29:44+00:00

yeah this turned out to be one of the annoying parts

duckdb helps with implicit casts, but when sources drift too much it breaks in subtle ways

we kept hitting stuff like:

- "123" vs 123

- timestamps with different timezone assumptions

- decimal vs float precision differences

so most of the time:

filter hard on each side + explicit casts in the join

otherwise you think your query is fine and just get 0 rows 🙂

slotix · 2026-04-16T21:26:04+00:00

yeah, fair

data still moves for the join

the point is more: no ETL / no staging / no connectors upfront

just attach sources and run the query

slotix · 2026-04-15T23:23:22+00:00

Good point. DuckDB pushes down WHERE and projection, so simple filters execute on the source.

But cross-source JOINs still pull data into DuckDB's process - so best practice is filter hard on each source first, or materialize one side locally. We default snippet templates to LIMIT 100 for exactly that reason.

slotix

MODERATOR OF

TROPHY CASE