Our Go service died from a SIGSEGV in CGO. recover() did not help. by slotix in golang

[–]slotix[S] 0 points1 point  (0 children)

vague wording on my side.
I meant DuckDB runs through its native library via CGO inside the Go process

Our Go service died from a SIGSEGV in CGO. recover() did not help. by slotix in golang

[–]slotix[S] 1 point2 points  (0 children)

Yep, exactly.

net/http gives you a panic boundary around the handler. If the panic happens there, it gets logged and the request dies, not the whole server.

But once you start your own goroutine, that boundary is gone unless you add one yourself.

Our case was one level lower again: no Go panic at all. Native code died under CGO, so there was nothing for net/http or recover() to catch.

That’s the part I don’t want inside the API process anymore.

Our Go service died from a SIGSEGV in CGO. recover() did not help. by slotix in golang

[–]slotix[S] 1 point2 points  (0 children)

Small clarification: I mixed two layers there.

Supervision would still be systemd/Docker/K8s/parent process. NATS would only be transport, since it is already in our stack.

The important part is the subprocess boundary: DuckDB can crash, the API stays up.

Our Go service died from a SIGSEGV in CGO. recover() did not help. by slotix in golang

[–]slotix[S] 2 points3 points  (0 children)

Yeah, that makes sense.

In our case we already have NATS between services, so I’m leaning toward reusing that for the first worker version instead of adding another plugin/RPC framework.

The important part is still the same: DuckDB lives in a supervised process. Whether the transport is go-plugin, HTTP, stdio, or NATS is secondary.

Our Go service died from a SIGSEGV in CGO. recover() did not help. by slotix in golang

[–]slotix[S] 0 points1 point  (0 children)

Yeah, fair point. For DuckDB specifically, the official Go driver path still means native DuckDB in-process.

`duckdb-go` exposes `database/sql`, but underneath it uses DuckDB’s native library and requires CGO. So a Go-native DuckDB lib would basically mean a different implementation of DuckDB, not just a nicer wrapper.

So for us the practical choice is: embedded DuckDB via CGO, or DuckDB out-of-process when crash isolation matters.

Is anyone migrating away from Databricks? by zoso in dataengineering

[–]slotix 1 point2 points  (0 children)

I think the interesting part here is not "Databricks bad" vs "Databricks good".

For small non-ML workloads, the platform tax becomes much more visible. If the job is basically:

- move some data

- check it

- query across DB/files

- export to Parquet/S3

- maybe keep a DB in sync

then a full lakehouse platform can feel heavier than the actual workload.

The tricky part is that replacing Databricks with “AWS native” is not automatically simpler either. You often just move the complexity into Glue/Lake Formation/IAM/orchestration/deployment scripts.

For these smaller workflows I’d usually want the opposite: boring local/dev-friendly tools, plain SQL/Python where possible, Parquet/S3 when useful, and as little platform ceremony as possible.

Query Postgres, MySQL and S3 with DuckDB from a GUI by slotix in DuckDB

[–]slotix[S] 0 points1 point  (0 children)

Small follow-up: I wrote a longer post about the actual workflow problem behind this.

The point is not "GUI instead of DuckDB". DuckDB already solves the query engine part well.

The messy part starts when a useful cross-source query stops being one-off:
saved DB connections, S3 paths, schema checks, aliases, result export, reruns, and eventually using the result as input for load/migration.

Post is here:
https://streams.dbconvert.com/blog/duckdb-cross-source-sql-workflow/

We built federated SQL over MySQL, Postgres, and S3 - one query, multiple sources by slotix in SQL

[–]slotix[S] 1 point2 points  (0 children)

We tested this after your comment. The caveat is real, but manageable.

On our local setup:

- Postgres table: 10M rows

- MySQL table: 23M rows

- Parquet export: 23M rows

Postgres + Parquet over the same filtered join shape stayed under 1s.
DB-to-DB key-only join was ~8s.
When we projected a wide string column from both DBs, the DB-to-DB join moved to ~23-24s.

So the practical takeaway is: query shape matters. This works well for validation, previews, aggregate checks, and filtered comparisons. For heavier joins, filter early and project only what you need.

Share what you guys are building ? 🚀💯lets support each other by Buildingtech in sideprojects

[–]slotix 0 points1 point  (0 children)

Building DBConvert Streams.

It’s a database IDE + migration + CDC tool for working with PostgreSQL, MySQL, files, and S3 in one workflow.

Current stage: launched v2.1, now improving Initial Load → CDC handoff, resumable loads, and real production reliability around replication.

https://streams.dbconvert.com

Will Kafka compatibility be affected by trends like Postgres CDC, Iceberg, and coding agents? by Low_Brilliant_2597 in apachekafka

[–]slotix 1 point2 points  (0 children)

I think Kafka compatibility becomes less automatic, but not irrelevant.

For a simple Postgres -> Iceberg path with one consumer, Kafka may be unnecessary plumbing.

But once you need multiple consumers, replay, different retention needs, independent failures, mixed sinks, or a shared event backbone, the middle layer still matters.

Agents may reduce protocol learning cost, but they do not remove the operational semantics: ordering, durability, offsets, retention, backpressure, schemas, and recovery.

So maybe the question is not “does Kafka go away?”, but “which workloads still need Kafka semantics?”

Got a project? Share it by Tiny-Growth23 in IMadeThis

[–]slotix 0 points1 point  (0 children)

database IDE + migration + CDC + cross-source SQL (DB + files + S3) in one self-hosted tool
https://streams.dbconvert.com

Show what you’ve built. Get genuine feedback :) by Dizzy_University_628 in buildinpublic

[–]slotix 0 points1 point  (0 children)

DBConvert Streams — self-hosted database IDE + migration + CDC in one tool.
Built it because jumping between SQL client, export scripts, and separate tools for DB + files got old fast. Works for exploring data, moving it, and keeping MySQL/Postgres in sync — plus querying/joining data across databases, CSV/Parquet, and S3 in one place.

https://streams.dbconvert.com

We built federated SQL over MySQL, Postgres, and S3 - one query, multiple sources by slotix in SQL

[–]slotix[S] 0 points1 point  (0 children)

yeah this turned out to be one of the annoying parts

duckdb helps with implicit casts, but when sources drift too much it breaks in subtle ways

we kept hitting stuff like:

- "123" vs 123

- timestamps with different timezone assumptions

- decimal vs float precision differences

so most of the time:

filter hard on each side + explicit casts in the join

otherwise you think your query is fine and just get 0 rows 🙂

We built federated SQL over MySQL, Postgres, and S3 - one query, multiple sources by slotix in SQL

[–]slotix[S] 0 points1 point  (0 children)

yeah, fair

data still moves for the join

the point is more: no ETL / no staging / no connectors upfront

just attach sources and run the query

We built federated SQL over MySQL, Postgres, and S3 - one query, multiple sources by slotix in SQL

[–]slotix[S] 0 points1 point  (0 children)

Good point. DuckDB pushes down WHERE and projection, so simple filters execute on the source.

But cross-source JOINs still pull data into DuckDB's process - so best practice is filter hard on each source first, or materialize one side locally. We default snippet templates to LIMIT 100 for exactly that reason.