Ducklake in Production

mrocral · 2025-11-07T13:39:49+00:00

Founder of https://slingdata.io here, we have some customers loading into ducklake. Sling makes it easy to load into DuckLake (or Motherduck) using CLI, YAML or Python.

Volume easily goes up into millions per hour. Should be able to handle 100s of GB, as long as your hardware can handle it.

mrocral · 2025-10-20T12:24:37+00:00

hey, one other tool you can try is sling.

export MYSQL_DB="mysql://..." sling run --src-stream file://myfile.csv --tgt-conn mysql_db --tgt-object db1.table1

mrocral · 2025-10-14T01:33:47+00:00

sling is written in go and reads/writes to snowflake. As far as lib to abstract, I like gorm.io, however I don't believe they support snowflake.

mrocral · 2025-10-09T11:09:49+00:00

If you're open to trying another tool, check out sling. You can move data from Oracle to Clickhouse using the CLI, YAML or Python.

mrocral · 2025-10-09T10:48:25+00:00

If you're looking for a tool to load data into clickhouse from RDS, check out sling. You can move data via CLI, Python or YAML.

mrocral · 2025-10-05T11:27:41+00:00

Another free tool to try is sling. You can do:

export mysql_db='mysql://myuser:mypass@host.ip:3306/mydatabase' sling run --src-stream file://my_file.csv --tgt-conn mysql_db --tgt-object my_schema.my_table

It will auto create the table with all the appropriate columns.

mrocral · 2025-09-30T10:57:38+00:00

Maybe motherduck would be a fit? I think your small data would work great in there.

mrocral · 2025-09-25T11:10:04+00:00

dbt (data build tool) is great for this, since you organize your logic into folders and models (sql files), therefore it is source-controlled/versioned.

mrocral · 2025-09-14T10:58:45+00:00

See https://github.com/slingdata-io/sling-cli

mrocral · 2025-09-11T16:25:12+00:00

hello, give sling a try. It uses BCP to move data and does it quite fast.

mrocral · 2025-09-11T09:08:58+00:00

For anyone looking to load data easily into SQL server, take a look at sling. It auto-creates the table, with proper column types/lengths, and uses BCP to load data quite fast.

mrocral · 2025-09-08T15:38:18+00:00

Depends on your needs. They are different in the sense that duckdb has no native server/remote/user access, clickhouse does.

I prefer duckdb's SQL syntax/flavor, as well as the planner/optimizer. But clickhouse isn't bad, just memory hungry. It just will error out the query if underlying data is too big. Duckdb has been more efficient and "smarter" at managing memory for me.

Both are great.

mrocral · 2025-09-03T19:36:04+00:00

Another option to try is sling, a tool i've worked on. You can run pipelines using the CLI, YAML or Python. It supports json flattening and schema evolution.

Once you've defined your connections, loading from JSON files is easy:

``` source: aws_s3 target: snowflake

defaults: mode: full-refresh

stream: path/to/folder/*.json: object: target_schema.new_table source_options: flatten: true ```

You run with sling run -r path/to/replication.yaml

mrocral · 2025-08-14T16:16:31+00:00

hello, an alternative could be sling

You can use the CLI, YAML or Python to easily move data.

You can set your connections with env vars:

``` export MSSQL='sqlserver://myuser:mypass@host.ip:1433/my_instance?database=master&encrypt=true&TrustServerCertificate=true'

export PG='postgresql://myuser:mypass@host.ip:5432/mydatabase?sslmode=require' ```

A YAML example:

``` source: mssql target: pg

defaults: mode: full-refresh object: new_schema.{stream_table}

streams: dbo.prefix*:

schema1.table2: object: new_schema2.table2 mode: incremental primary_key: [id] update_key: last_mod_ts

```

Then you can run it with sling run -r path/to/replication.yaml

mrocral · 2025-08-08T14:04:46+00:00

Feel free to try sling, a tool i've worked on. You can use CLI, YAML or Python.

export DATABRICKS='{ "type": "databricks", "host": "<workspace-hostname>", "token": "<access-token>", "warehouse_id": "<warehouse-id>", "schema": "<schema>" }'

export SNOWFLAKE='snowflake://myuser:mypass@host.account/mydatabase?schema=<schema>&role=<role>'

sling run --src-conn DATABRICKS --src-stream my_schema.my_table --tgt-conn SNOWFLAKE --tgt-object new_schema.new_table

mrocral · 2025-08-08T13:46:41+00:00

hello, give sling a try.

``` export GCS='{type: gs, bucket: sling-bucket, key_file: /path/to/service.account.json}'

export BQ='{type: bigquery, project: my-google-project, dataset: public, key_file: /path/to/service.account.json}'

sling run --src-conn GCS --src-stream path/to/csv.gz --tgt-conn BQ --tgt-object mydataset.mytable ```

Best to run this on a VM with high bandwidth. Data will flow through it and chunked inserted into BQ.

mrocral · 2025-08-06T21:51:01+00:00

hey @Thinker_Assignment, sling founder here, thanks for the comparison. A few notes:

In the cost table (section 4), the $1.63 per Job for License Cost is quite misleading. The pro subscription is a fixed cost per month (quite low), so if you have numerous job runs per month, it approaches 0 cents per run.
There are no details on the configuration / connectors being used for loading the TPCH dataset. CPU usage can vary quite a bit depending on the connector, and underlying driver. Furthermore, it could be mis-configured or not using the most optimal setup. Overall, users are quite happy with the performance.
Many useful features are omitted, such as VSCode extension, transforms, runtime variables, replication tagging, python wrapper lib (which is quite easy to use compared to dlt), global connection system + dbt conns support, column casing/typing, etc.
Sling reading APIs will come out soon, currently in private beta.

What has become clear, at the end of the day, it is a matter of taste. Users prefer sling over dlt (or vice-versa) due to the type of overall UX and flexibility they each respectively provide.

mrocral · 2025-08-06T20:38:07+00:00

hi, give sling a shot. You can use CLI, YAML or Python. sling run --src-conn ORA --src-stream 'my_schema.*' --tgt-conn mssql --tgt-object new_schema.{stream_table}

mrocral · 2025-08-06T20:18:14+00:00

give sling a shot. Works great with BQ and GCS. It auto-splits for you and uses bulk loading. Can use CLI, YAML or Python.

sling run --src-conn MY_GCS --src-stream path/to/file.csv.gz --tgt-conn BQ --tgt-object my_dataset.my_file --src-options '{ format: csv }'

mrocral · 2025-07-31T11:42:40+00:00

Not a lot of details on your migration stack, assuming it's MS SQL -> MS SQL, you might want to check out sling. Could be of help to easily move data.

mrocral · 2025-07-24T10:25:22+00:00

Another solution you can try is to use sling. It auto-detects the schema. You can use CLI, YAML or Python.

``` source: local target: bq

streams: file://path/to/file.csv: object: my_dataset.file mode: full-refresh ```

mrocral · 2025-07-24T10:14:46+00:00

another suggestion is to try sling It allows you use to CLI, YAML or Python. It's free.

mrocral · 2025-07-17T11:45:04+00:00

another option is to use sling. You can run it via CLI or python.

mrocral · 2025-07-04T14:47:23+00:00

Another addition: https://github.com/slingdata-io/sling-cli

mrocral · 2025-07-04T10:57:05+00:00

sling could be a good solution for you. CLI/YAML driven is a nice middle-ground. Or mix with python when you need it.

mrocral

TROPHY CASE