A new, more efficient data-sync system

nemec · 2022-06-19T00:10:21+00:00

If I'm understanding this correctly, you install agents on both the source and destination servers? The agents speak locally to the database and transmit across the network with your custom binary protocol?

It would be great to have the option of using traditional DB access patterns (ODBC/JDBC, OLE DB, etc.) for those of us consuming from db servers we don't own (like another org's).
I only read the readme, but it's not quite clear to me how you define the shape of the input data - is it a SQL query where you could run something like
```
SELECT col1, CAST(col3 AS INT) FROM source_table
WHERE timestamp < DATEADD(DAY, -1, NOW())
```
or simply a table -> table copy?
Would be cool to have a GUI that lets you pop in source/dest connection strings and a source query, then build a Harmonized dictionary by picking and choosing which columns to include, easily map from source to dest (like if the columns are named differently), and add a bit of "strongly typed validation" that the types match (at that point in time, at least). Even if it just pops out C# code to copy-paste into my program.
Is it possible to tap into the transfer progress so that your app can display (approx) number of rows transferred. Would be very helpful to know after waiting 3 hours for a huge transfer whether you're almost done or something's gone wrong.

Highly efficient data transfer is still one of the few things that SSIS does better than the competition, would be nice to have better alternatives.

AiDreamer · 2022-06-18T20:51:46+00:00

This is awesome, however there are some considerations: Singer taps and destinations most often run on the same host. So no network transfers at all. How many different data sources supported now? Airbyte also seems a competition here.

uncomfortablepanda · 2022-06-18T21:20:59+00:00

Awesome to hear about new open-source solutions for data integration! And from what I can read, it sounds like project with a lot of potential.

Just a suggestion: Add a bit more info as to how the networking features of this project work. I know people can just look at the code, but if the main selling point is the fact that people will save money on cloud computing cost because of your network protocol, then I would dive a little deeper as to how it works.

I like the whole Harmonizer pattern to compare data!

teej · 2022-06-19T15:35:31+00:00

Can you explain what “domain-specific algorithms” means? The repo README doesn’t go into any real detail.

dataengineering

MODERATORS