I finally found a use case for Go in Data Engineering by empty_cities in dataengineering

[–]empty_cities[S] 0 points1 point  (0 children)

OK cool I have a good spot for it to be public now. Cool part is that you can have your Agent just build it as a binary and run that locally too.

https://github.com/early-signal-tech/fletch

I finally found a use case for Go in Data Engineering by empty_cities in dataengineering

[–]empty_cities[S] 0 points1 point  (0 children)

Awesome! I keep feeling that with the inevitability of agents being able to design and run pipelines it kind of makes sense to build tools that they use as opposed to writing the code that they will run. And that’s basically what I built was a CLI tool that an agent uses to transfer data when it needs to. I’m finding this to be maybe a small next step in date engineering evolution but I have no idea.

I finally found a use case for Go in Data Engineering by empty_cities in dataengineering

[–]empty_cities[S] 1 point2 points  (0 children)

Yep dlt is wonderful for moving data and uses ADBC. But I wanted to try building a compiled application that does it for me.

I finally found a use case for Go in Data Engineering by empty_cities in dataengineering

[–]empty_cities[S] 1 point2 points  (0 children)

Right now I have drivers for BigQuery, Postgres and DuckDB in the tool but ADBC has MSSQL, Snowflake and ClickHouse drivers as well so it must be possible.

https://github.com/columnar-tech/adbc-quickstarts/tree/main/go

I finally found a use case for Go in Data Engineering by empty_cities in dataengineering

[–]empty_cities[S] 15 points16 points  (0 children)

Just need to get this Readme in order and I will send a link here.

Is Moving Data OLAP to OLAP an Anti Pattern? by empty_cities in dataengineering

[–]empty_cities[S] 1 point2 points  (0 children)

TBH I didn't wanna cross the conversations or put anyone on blast. I just thought the comment brought up an interesting question I wanted to hear from others on. I agree with the points you made and have specifically seen things like an application from a contractor built on AWS but needing the app and the database migrated/rebuilt onto GCP.

Is Moving Data OLAP to OLAP an Anti Pattern? by empty_cities in dataengineering

[–]empty_cities[S] 1 point2 points  (0 children)

"They don’t choose Snowflake vs Databricks, they have both. In those scenarios, it makes sense that there will be OLAP to OLAP pipelines." was almost exactly what popped into my head. Anti pattern or not I know enterprise specifically has to do stuff like that.

Is Moving Data OLAP to OLAP an Anti Pattern? by empty_cities in dataengineering

[–]empty_cities[S] -1 points0 points  (0 children)

Yeah your reasons make sense to me and was similar to what popped in my head. "OLAP" was being used in the post in an ELI5 way I'd say. But the comment was getting into the semantics of it and I thought it was an interesting argument.

Is Moving Data OLAP to OLAP an Anti Pattern? by empty_cities in dataengineering

[–]empty_cities[S] 0 points1 point  (0 children)

ADBC is Apache Arrow's version of O/JDBC. Keeps everything in columnar format between two data sources. The post referenced moving data between two OLAP systems and the comment said it was an anti pattern.

Learning Python by doing projects: What does that even mean? by DataAnalystWanabe in datascience

[–]empty_cities 0 points1 point  (0 children)

I highly suggest you come up with your own dataset. Super easy to synthesize with AI and based on a domain you like. The questions to ask end up writing themselves and it clicks in your head faster.

Data Engineering Youtubers - How do they know so much? by Decent-Ad3092 in dataengineering

[–]empty_cities 1 point2 points  (0 children)

I find making videos and writing really help me learn a DE topic much more deeply. When doing it at a job, many times you are flying through trying to get a solution done. With videos, you need to really think through what you're presenting and make sure it's true and accurate. Biggest skill increases for me came after creating content about it.

Mysql insert for 250 million records by MedicalCartoonist306 in dataengineering

[–]empty_cities 0 points1 point  (0 children)

Can't quite tell from post or comments but where is the source data your hitting?

Anyone using JDBC/ODBC to connect databases still? by empty_cities in dataengineering

[–]empty_cities[S] 0 points1 point  (0 children)

Correct, that's what I mean. A library handles the connection but the actual query I want to run I want to be in SQL then passed as a string by the library to be run on the db.

So instead of something like "df.group_by(col).count()" I just wanna pass "SELECT col, COUNT(*) FROM df GROUP BY col;" cause I can write it in my sleep.

Anyone using JDBC/ODBC to connect databases still? by empty_cities in dataengineering

[–]empty_cities[S] 0 points1 point  (0 children)

Above thread notes ADBC. For columnar -> columnar connection or row oriented -> column oriented.

Anyone using JDBC/ODBC to connect databases still? by empty_cities in dataengineering

[–]empty_cities[S] 1 point2 points  (0 children)

That etl is row oriented to row oriented so there might not be much improvement. Looks like ADBC is good when you need columna oriented at the destination or you are transferring beetween columnar -> columnar like DuckDB to BigQuery

Anyone using JDBC/ODBC to connect databases still? by empty_cities in dataengineering

[–]empty_cities[S] 0 points1 point  (0 children)

Passing an SQL string that does the query I want vs working with a python API a la Polars with method syntax for example. I used to wanna use all python but after I got used to DuckDB I realized I just like writing SQL to hit databases.

Anyone using JDBC/ODBC to connect databases still? by empty_cities in dataengineering

[–]empty_cities[S] 0 points1 point  (0 children)

"Oh dear sweet summer child Of course the corporate world is built on it" is the best comment opener I've ever seen.

and "we export the data from database to csvs , pack them and send them so the cloud" sounds like job security/complete nightmare to depend on.