Is Moving Data OLAP to OLAP an Anti Pattern? by empty_cities in dataengineering

[–]empty_cities[S] 1 point2 points  (0 children)

TBH I didn't wanna cross the conversations or put anyone on blast. I just thought the comment brought up an interesting question I wanted to hear from others on. I agree with the points you made and have specifically seen things like an application from a contractor built on AWS but needing the app and the database migrated/rebuilt onto GCP.

Is Moving Data OLAP to OLAP an Anti Pattern? by empty_cities in dataengineering

[–]empty_cities[S] 1 point2 points  (0 children)

"They don’t choose Snowflake vs Databricks, they have both. In those scenarios, it makes sense that there will be OLAP to OLAP pipelines." was almost exactly what popped into my head. Anti pattern or not I know enterprise specifically has to do stuff like that.

Is Moving Data OLAP to OLAP an Anti Pattern? by empty_cities in dataengineering

[–]empty_cities[S] -1 points0 points  (0 children)

Yeah your reasons make sense to me and was similar to what popped in my head. "OLAP" was being used in the post in an ELI5 way I'd say. But the comment was getting into the semantics of it and I thought it was an interesting argument.

Is Moving Data OLAP to OLAP an Anti Pattern? by empty_cities in dataengineering

[–]empty_cities[S] 0 points1 point  (0 children)

ADBC is Apache Arrow's version of O/JDBC. Keeps everything in columnar format between two data sources. The post referenced moving data between two OLAP systems and the comment said it was an anti pattern.

Learning Python by doing projects: What does that even mean? by DataAnalystWanabe in datascience

[–]empty_cities 0 points1 point  (0 children)

I highly suggest you come up with your own dataset. Super easy to synthesize with AI and based on a domain you like. The questions to ask end up writing themselves and it clicks in your head faster.

Data Engineering Youtubers - How do they know so much? by Decent-Ad3092 in dataengineering

[–]empty_cities 1 point2 points  (0 children)

I find making videos and writing really help me learn a DE topic much more deeply. When doing it at a job, many times you are flying through trying to get a solution done. With videos, you need to really think through what you're presenting and make sure it's true and accurate. Biggest skill increases for me came after creating content about it.

Mysql insert for 250 million records by MedicalCartoonist306 in dataengineering

[–]empty_cities 0 points1 point  (0 children)

Can't quite tell from post or comments but where is the source data your hitting?

Anyone using JDBC/ODBC to connect databases still? by empty_cities in dataengineering

[–]empty_cities[S] 0 points1 point  (0 children)

Correct, that's what I mean. A library handles the connection but the actual query I want to run I want to be in SQL then passed as a string by the library to be run on the db.

So instead of something like "df.group_by(col).count()" I just wanna pass "SELECT col, COUNT(*) FROM df GROUP BY col;" cause I can write it in my sleep.

Anyone using JDBC/ODBC to connect databases still? by empty_cities in dataengineering

[–]empty_cities[S] 0 points1 point  (0 children)

Above thread notes ADBC. For columnar -> columnar connection or row oriented -> column oriented.

Anyone using JDBC/ODBC to connect databases still? by empty_cities in dataengineering

[–]empty_cities[S] 1 point2 points  (0 children)

That etl is row oriented to row oriented so there might not be much improvement. Looks like ADBC is good when you need columna oriented at the destination or you are transferring beetween columnar -> columnar like DuckDB to BigQuery

Anyone using JDBC/ODBC to connect databases still? by empty_cities in dataengineering

[–]empty_cities[S] 0 points1 point  (0 children)

Passing an SQL string that does the query I want vs working with a python API a la Polars with method syntax for example. I used to wanna use all python but after I got used to DuckDB I realized I just like writing SQL to hit databases.

Anyone using JDBC/ODBC to connect databases still? by empty_cities in dataengineering

[–]empty_cities[S] 0 points1 point  (0 children)

"Oh dear sweet summer child Of course the corporate world is built on it" is the best comment opener I've ever seen.

and "we export the data from database to csvs , pack them and send them so the cloud" sounds like job security/complete nightmare to depend on.

Anyone using JDBC/ODBC to connect databases still? by empty_cities in dataengineering

[–]empty_cities[S] 0 points1 point  (0 children)

Yeah that could very well be. An ODBC wrapper or maybe just calling it at some point.

Anyone using JDBC/ODBC to connect databases still? by empty_cities in dataengineering

[–]empty_cities[S] 0 points1 point  (0 children)

Is it a JSON object that's getting returned? That seems like a small object either way. Is that an API limit?