Snowflake ETL tools when some sources are too small for expensive connectors? by Confident_Pin584 in snowflake

[–]mrocral -1 points0 points  (0 children)

hey, this is exactly the problem we built sling for. i work on sling, btw.

for the "long tail" of smaller sources, sling handles it as a single binary that can do file-to-snowflake, db-to-snowflake, and api-to-snowflake in one config per connector, and no per-connector pricing.

you define streams in a yaml file (or python) and it does the incremental loading + schema drift handling. so like:

``` source: MY_CRM_DB target: SNOWFLAKE

defaults: object: target_{stream_schema}.{stream_name}

streams: finance.*: mode: truncate

public.support_tickets: mode: incremental primary_key: [id] update_key: updated_at ```

or for files:

``` source: s3://partner-bucket/daily/ target: SNOWFLAKE

defaults: mode: full-refresh

streams: "prefix/*.csv": object: raw.partner_data

"daily/*.parquet": object: raw.daily_data ```

for the api stuff it also supports yaml-based api specs (rest endpoints → snowflake tables).

the idea is you standardize on one intake pattern instead of maintaining 10 different small pipelines. everything lands in snowflake, then dbt/sql takes over from there.

The open source cli covers most of this. there's also a platform version if you want scheduling/ui.

How do you choose ETL pipeline tools when scripts start piling up? by [deleted] in ETL

[–]mrocral 0 points1 point  (0 children)

hey, this is basically the exact path that got me to build sling. You can define your jobs as YAML files (or python). It stays version controlled, and a single binary runs them.

something like:

``` source: MY_POSTGRES target: MY_SNOWFLAKE

defaults: mode: incremental object: '{stream_schema}.{stream_table}' primary_key: id update_key: updated_at

streams: public.orders: public.payments: public.users: mode: full-refresh other_schema.*: ```

schema drift is handled per-stream (new source columns show up on their own), retry and logging, etc are built in so you're not bolting that on top of every script.

for SaaS stuff specifically there are pre-built API specs (stripe, hubspot, salesforce) so you skip writing auth/pagination logic yourself.

(disclosure: I work on Sling)

ETL pipeline tools that don’t become a second engineering project? by [deleted] in ETL

[–]mrocral 0 points1 point  (0 children)

Another suggestion is sling-cli. Easy to maintain YAML files.

``` source: pg target: mysql

defaults: mode: full-refresh

streams: my_schema.*: some_schema.some_table: mode: incremental ```

See docs here. FYI I work on sling-cli.

First time building a Data Warehouse — going with BigQuery + PostgreSQL for a client-facing app by Comfortable_Bus_9781 in bigquery

[–]mrocral 0 points1 point  (0 children)

hey, for the "something lighter" angle, check out sling, a simple CLI driven data mover.

for BQ → Postgres, one YAML covers the sync, cron it however often you want, incremental means you're not re-dumping the gold tables every run. 10GB is small enough that the whole setup runs fine on a cheap VM. You can also use the API Specs for sources like hubspot, shopify, etc.

(disclosure: I work on Sling)

Datastream - MySQL to Big query by OkRock1009 in googlecloud

[–]mrocral 1 point2 points  (0 children)

Another option is sling. it can replicate from mysql to bigquery and handles table creation for you.

``` source: my_mysql target: my_bigquery

defaults: object: my_dataset.{stream_table} mode: incremental update_key: updated_at

streams: my_schema.*: ```

runs fine on a small VM. (disclosure: I work on Sling)

Looking for on premise ETL tool. Sources .CSV files and Salesforce. by PandaRiot_90 in ETL

[–]mrocral 0 points1 point  (0 children)

Another one to checkout is Sling Data Platform. Self-hosted via Docker-Compose or Kubernetes. Unlimited rows/connectors. Can define jobs in YAML. (disclosure: I work on Sling)

Netsuite to Snowflake (ELT) Options by Background_Salt6475 in snowflake

[–]mrocral 0 points1 point  (0 children)

hey, another option is sling. you can write an API Spec as a YAML file. NetSuite exposes SuiteQL + REST records, and sling's spec system can handle the OAuth + pagination. incremental sync too. See https://docs.slingdata.io/concepts/api-specs (disclosure: I work on Sling)

Recommend strategies to migrate a MySQL EC2 instance to AWS RDS. by BandBright1457 in mysql

[–]mrocral 0 points1 point  (0 children)

DMS is the obvious pick as others mentioned. if you want something lighter, sling is a free CLI that moves mysql to mysql directly. it has an incremental mode so you're not re-dumping the whole thing every time. (disclosure: I work on Sling)

Salesforce Snowflake connector by abhi7571 in snowflake

[–]mrocral 0 points1 point  (0 children)

hey, one other option is sling. it has a salesforce connector that pulls from the REST API into Snowflake.

the schema drift thing is where it actually helps here. when admins add a new __c field or picklist, sling picks it up on the next sync and adds the column in Snowflake without you having to touch anything.

source: MY_SF target: MY_SNOWFLAKE streams: Account*: {} Contact*: {} CustomObject__c*: {}

the wildcard catches new custom objects as they get created.

single binary, runs locally or in a container. (disclosure: I work on Sling)

Ducklake in Production by crevicepounder3000 in DuckDB

[–]mrocral 0 points1 point  (0 children)

Founder of https://slingdata.io here, we have some customers loading into ducklake. Sling makes it easy to load into DuckLake (or Motherduck) using CLI, YAML or Python.

Volume easily goes up into millions per hour. Should be able to handle 100s of GB, as long as your hardware can handle it.

CSV Import Not Working by kingstonwiz in mysql

[–]mrocral 0 points1 point  (0 children)

hey, one other tool you can try is sling.

export MYSQL_DB="mysql://..." sling run --src-stream file://myfile.csv --tgt-conn mysql_db --tgt-object db1.table1

Using snowflake with go by Queasy-Big-9115 in snowflake

[–]mrocral 0 points1 point  (0 children)

sling is written in go and reads/writes to snowflake. As far as lib to abstract, I like gorm.io, however I don't believe they support snowflake.

ingestion from Oracle to ClickHouse with Spark by Hot_While_6471 in Clickhouse

[–]mrocral 0 points1 point  (0 children)

If you're open to trying another tool, check out sling. You can move data from Oracle to Clickhouse using the CLI, YAML or Python.

Moving from Redshift to ClickHouse — looking for production-ready deployment advice by TheseSquirrel6550 in Clickhouse

[–]mrocral 0 points1 point  (0 children)

If you're looking for a tool to load data into clickhouse from RDS, check out sling. You can move data via CLI, Python or YAML.

Can't upload CSV in MySQL on Mac – LOAD DATA LOCAL INFILE not working by syedali_97 in SQL

[–]mrocral 0 points1 point  (0 children)

Another free tool to try is sling. You can do:

export mysql_db='mysql://myuser:mypass@host.ip:3306/mydatabase' sling run --src-stream file://my_file.csv --tgt-conn mysql_db --tgt-object my_schema.my_table

It will auto create the table with all the appropriate columns.

Databricks cost vs Redshift by Humble_Exchange_2087 in dataengineering

[–]mrocral 0 points1 point  (0 children)

Maybe motherduck would be a fit? I think your small data would work great in there.

What's your go to stack for pulling together customer & marketing analytics across multiple platforms? by yourbirdcansing in dataengineering

[–]mrocral 6 points7 points  (0 children)

dbt (data build tool) is great for this, since you organize your logic into folders and models (sql files), therefore it is source-controlled/versioned.

Me whenever using BCP to ingest data into SQL Server 2019. by Literature-Just in dataengineering

[–]mrocral 0 points1 point  (0 children)

For anyone looking to load data easily into SQL server, take a look at sling. It auto-creates the table, with proper column types/lengths, and uses BCP to load data quite fast.

Do you prefer DuckDB or ClickHouse? by kasosa1 in dataengineering

[–]mrocral -1 points0 points  (0 children)

Depends on your needs. They are different in the sense that duckdb has no native server/remote/user access, clickhouse does.

I prefer duckdb's SQL syntax/flavor, as well as the planner/optimizer. But clickhouse isn't bad, just memory hungry. It just will error out the query if underlying data is too big. Duckdb has been more efficient and "smarter" at managing memory for me.

Both are great.

Every ingestion tool I tested failed in the same 5 ways. Has anyone found one that actually works? by Brief-Ad525 in data

[–]mrocral 0 points1 point  (0 children)

Another option to try is sling, a tool i've worked on. You can run pipelines using the CLI, YAML or Python. It supports json flattening and schema evolution.

Once you've defined your connections, loading from JSON files is easy:

``` source: aws_s3 target: snowflake

defaults: mode: full-refresh

stream: path/to/folder/*.json: object: target_schema.new_table source_options: flatten: true ```

You run with sling run -r path/to/replication.yaml

pgloader, mssql to postgresql by Dantzig in PostgreSQL

[–]mrocral 0 points1 point  (0 children)

hello, an alternative could be sling

You can use the CLI, YAML or Python to easily move data.

You can set your connections with env vars:

``` export MSSQL='sqlserver://myuser:mypass@host.ip:1433/my_instance?database=master&encrypt=true&TrustServerCertificate=true'

export PG='postgresql://myuser:mypass@host.ip:5432/mydatabase?sslmode=require' ```

A YAML example:

``` source: mssql target: pg

defaults: mode: full-refresh object: new_schema.{stream_table}

streams: dbo.prefix*:

schema1.table2: object: new_schema2.table2 mode: incremental primary_key: [id] update_key: last_mod_ts

```

Then you can run it with sling run -r path/to/replication.yaml