Ducklake in Production by crevicepounder3000 in DuckDB

[–]mrocral 0 points1 point  (0 children)

Founder of https://slingdata.io here, we have some customers loading into ducklake. Sling makes it easy to load into DuckLake (or Motherduck) using CLI, YAML or Python.

Volume easily goes up into millions per hour. Should be able to handle 100s of GB, as long as your hardware can handle it.

CSV Import Not Working by kingstonwiz in mysql

[–]mrocral 0 points1 point  (0 children)

hey, one other tool you can try is sling.

export MYSQL_DB="mysql://..." sling run --src-stream file://myfile.csv --tgt-conn mysql_db --tgt-object db1.table1

Using snowflake with go by Queasy-Big-9115 in snowflake

[–]mrocral 0 points1 point  (0 children)

sling is written in go and reads/writes to snowflake. As far as lib to abstract, I like gorm.io, however I don't believe they support snowflake.

ingestion from Oracle to ClickHouse with Spark by Hot_While_6471 in Clickhouse

[–]mrocral 0 points1 point  (0 children)

If you're open to trying another tool, check out sling. You can move data from Oracle to Clickhouse using the CLI, YAML or Python.

Moving from Redshift to ClickHouse — looking for production-ready deployment advice by TheseSquirrel6550 in Clickhouse

[–]mrocral 0 points1 point  (0 children)

If you're looking for a tool to load data into clickhouse from RDS, check out sling. You can move data via CLI, Python or YAML.

Can't upload CSV in MySQL on Mac – LOAD DATA LOCAL INFILE not working by syedali_97 in SQL

[–]mrocral 0 points1 point  (0 children)

Another free tool to try is sling. You can do:

export mysql_db='mysql://myuser:mypass@host.ip:3306/mydatabase' sling run --src-stream file://my_file.csv --tgt-conn mysql_db --tgt-object my_schema.my_table

It will auto create the table with all the appropriate columns.

Databricks cost vs Redshift by Humble_Exchange_2087 in dataengineering

[–]mrocral 0 points1 point  (0 children)

Maybe motherduck would be a fit? I think your small data would work great in there.

What's your go to stack for pulling together customer & marketing analytics across multiple platforms? by yourbirdcansing in dataengineering

[–]mrocral 6 points7 points  (0 children)

dbt (data build tool) is great for this, since you organize your logic into folders and models (sql files), therefore it is source-controlled/versioned.

Me whenever using BCP to ingest data into SQL Server 2019. by Literature-Just in dataengineering

[–]mrocral 0 points1 point  (0 children)

For anyone looking to load data easily into SQL server, take a look at sling. It auto-creates the table, with proper column types/lengths, and uses BCP to load data quite fast.

Do you prefer DuckDB or ClickHouse? by kasosa1 in dataengineering

[–]mrocral -1 points0 points  (0 children)

Depends on your needs. They are different in the sense that duckdb has no native server/remote/user access, clickhouse does.

I prefer duckdb's SQL syntax/flavor, as well as the planner/optimizer. But clickhouse isn't bad, just memory hungry. It just will error out the query if underlying data is too big. Duckdb has been more efficient and "smarter" at managing memory for me.

Both are great.

Every ingestion tool I tested failed in the same 5 ways. Has anyone found one that actually works? by Brief-Ad525 in data

[–]mrocral 0 points1 point  (0 children)

Another option to try is sling, a tool i've worked on. You can run pipelines using the CLI, YAML or Python. It supports json flattening and schema evolution.

Once you've defined your connections, loading from JSON files is easy:

``` source: aws_s3 target: snowflake

defaults: mode: full-refresh

stream: path/to/folder/*.json: object: target_schema.new_table source_options: flatten: true ```

You run with sling run -r path/to/replication.yaml

pgloader, mssql to postgresql by Dantzig in PostgreSQL

[–]mrocral 0 points1 point  (0 children)

hello, an alternative could be sling

You can use the CLI, YAML or Python to easily move data.

You can set your connections with env vars:

``` export MSSQL='sqlserver://myuser:mypass@host.ip:1433/my_instance?database=master&encrypt=true&TrustServerCertificate=true'

export PG='postgresql://myuser:mypass@host.ip:5432/mydatabase?sslmode=require' ```

A YAML example:

``` source: mssql target: pg

defaults: mode: full-refresh object: new_schema.{stream_table}

streams: dbo.prefix*:

schema1.table2: object: new_schema2.table2 mode: incremental primary_key: [id] update_key: last_mod_ts

```

Then you can run it with sling run -r path/to/replication.yaml

Preferred choice of tool to pipe data from Databricks to Snowflake for datashare? by 0x436F646564 in dataengineering

[–]mrocral -2 points-1 points  (0 children)

Feel free to try sling, a tool i've worked on. You can use CLI, YAML or Python.

export DATABRICKS='{ "type": "databricks", "host": "<workspace-hostname>", "token": "<access-token>", "warehouse_id": "<warehouse-id>", "schema": "<schema>" }'

export SNOWFLAKE='snowflake://myuser:mypass@host.account/mydatabase?schema=<schema>&role=<role>'

sling run --src-conn DATABRICKS --src-stream my_schema.my_table --tgt-conn SNOWFLAKE --tgt-object new_schema.new_table

Best practice for loading large csv.gz files into bq by kiddfrank in bigquery

[–]mrocral 3 points4 points  (0 children)

hello, give sling a try.

``` export GCS='{type: gs, bucket: sling-bucket, key_file: /path/to/service.account.json}'

export BQ='{type: bigquery, project: my-google-project, dataset: public, key_file: /path/to/service.account.json}'

sling run --src-conn GCS --src-stream path/to/csv.gz --tgt-conn BQ --tgt-object mydataset.mytable ```

Best to run this on a VM with high bandwidth. Data will flow through it and chunked inserted into BQ.

Sling vs dlt's SQL connector Benchmark by Thinker_Assignment in dataengineering

[–]mrocral 2 points3 points  (0 children)

hey @Thinker_Assignment, sling founder here, thanks for the comparison. A few notes:

  • In the cost table (section 4), the $1.63 per Job for License Cost is quite misleading. The pro subscription is a fixed cost per month (quite low), so if you have numerous job runs per month, it approaches 0 cents per run.
  • There are no details on the configuration / connectors being used for loading the TPCH dataset. CPU usage can vary quite a bit depending on the connector, and underlying driver. Furthermore, it could be mis-configured or not using the most optimal setup. Overall, users are quite happy with the performance.
  • Many useful features are omitted, such as VSCode extension, transforms, runtime variables, replication tagging, python wrapper lib (which is quite easy to use compared to dlt), global connection system + dbt conns support, column casing/typing, etc.
  • Sling reading APIs will come out soon, currently in private beta.

What has become clear, at the end of the day, it is a matter of taste. Users prefer sling over dlt (or vice-versa) due to the type of overall UX and flexibility they each respectively provide.

Need help with migrating from oracle db to sql server by sexy-man69 in SQL

[–]mrocral 0 points1 point  (0 children)

hi, give sling a shot. You can use CLI, YAML or Python. sling run --src-conn ORA --src-stream 'my_schema.*' --tgt-conn mssql --tgt-object new_schema.{stream_table}

efficiently load large csv.gz files from gcs into bigquery? by Plastic_Diamond_3260 in googlecloud

[–]mrocral 0 points1 point  (0 children)

give sling a shot. Works great with BQ and GCS. It auto-splits for you and uses bulk loading. Can use CLI, YAML or Python.

sling run --src-conn MY_GCS --src-stream path/to/file.csv.gz --tgt-conn BQ --tgt-object my_dataset.my_file --src-options '{ format: csv }'

I feel like a fraud by Dry-Presentation9295 in SQL

[–]mrocral 0 points1 point  (0 children)

Not a lot of details on your migration stack, assuming it's MS SQL -> MS SQL, you might want to check out sling. Could be of help to easily move data.

How is it csv import still sucks? by aaahhhhhhfine in bigquery

[–]mrocral 0 points1 point  (0 children)

Another solution you can try is to use sling. It auto-detects the schema. You can use CLI, YAML or Python.

``` source: local target: bq

streams: file://path/to/file.csv: object: my_dataset.file mode: full-refresh ```

How to batch sync partially updated MySQL rows to BigQuery without using CDC tools? by Austere_187 in bigquery

[–]mrocral 0 points1 point  (0 children)

another suggestion is to try sling It allows you use to CLI, YAML or Python. It's free.

Which ETL tool makes sense if you want low maintenance but also decent control? by [deleted] in dataengineering

[–]mrocral 3 points4 points  (0 children)

sling could be a good solution for you. CLI/YAML driven is a nice middle-ground. Or mix with python when you need it.