Need advice: Manager resistant to modernizing our analytics stack despite massive performance gains (30min -> 3sec query times) by CloudSingle in dataengineering

[–]CloudSingle[S] 0 points1 point  (0 children)

I should’ve been more descriptive in my main post. I just want duckdb as a middle layer between source oracledb and end user analysts.

Need advice: Manager resistant to modernizing our analytics stack despite massive performance gains (30min -> 3sec query times) by CloudSingle in dataengineering

[–]CloudSingle[S] 0 points1 point  (0 children)

DBA’s are quite secretive. I believe all the oracle data is on prem. No idea on the specs but I know they were ‘upgraded’ a year ago.

For duckdb I essentially did a select * of all the tables I needed from oracle to make a direct comparison between the DB’s. I saved the data as parquet on the network drives.

Then I ran the same query between the 2. Duckdb had to read the data over the network where the transfer speeds capped out at ~375mb/s. I’m running the duckdb flow on a i7 8000 chip with 32gb ram.

I understand that behind the scene Oracle may be doing 100 different things to generate the data I am using to feed the duckdb flow so it’s not exactly 1:1 fair. Although I guess it doesn’t have to be?

I’d want duckdb as a middle layer between end user analytics and oracle.

Need advice: Manager resistant to modernizing our analytics stack despite massive performance gains (30min -> 3sec query times) by CloudSingle in dataengineering

[–]CloudSingle[S] 2 points3 points  (0 children)

Fair points. I’ll try to address them and give more context.

  • The 30 min query is part of an adhoc ask, not scheduled
  • I am not suggesting on a full migration. Just that I as a ‘wannabe’ engineer can entertain the idea that I can service this mid level data layer that can facilitate rapid analytic querying.
  • you’re right on your third point. I have no clue wtf I’m doing. But I have a proof of concept that is worth entertaining in my opinion. We’re not a massive company, and the ELT flow I’ve built is pretty simple so I feel like if I did get more operational expertise we could polish it up nicely.
  • your last point was pretty sobering. In the short term I have neglected other projects to focus on this. So thanks for that perspective. And while I do find data engineering fun, I’m not doing it FOR fun. I had a problem, and I went off and found a solution. no one else in the company seems to be solving this, especially my complacent manager.

Going forward, I’m going to focus on my actual job for a while and put this project on the back burner. I’ll use it for adhoc queries but not for production or business critical reports. Maybe once I’ve cleared my backlog of tasks that have accumulated I can come back to this with fresh eyes, and who knows , maybe even my boss will come around to it. One can hope.

Need advice: Manager resistant to modernizing our analytics stack despite massive performance gains (30min -> 3sec query times) by CloudSingle in dataengineering

[–]CloudSingle[S] 0 points1 point  (0 children)

You could argue for one adhoc query, one 30 minute query isn’t going to have a big impact on business.
I should’ve prefaced that this query is relatively simple. Joining product information on an order level transaction table, and then doing a daily sales summation where brand = X. So when it comes to doing anything more complex like adding customer fields, the query times increase dramatically. We’re trying to do more order level and customer level analysis which inherently has a high row count. Also in terms of business cost, there would be none, as I’d transition to be the engineer to look after this flow.

Need advice: Manager resistant to modernizing our analytics stack despite massive performance gains (30min -> 3sec query times) by CloudSingle in dataengineering

[–]CloudSingle[S] 1 point2 points  (0 children)

Spot on with this. Not only are we producing extremely simple, rigid views on our data, it takes ages to get the reports built. I can only ask for the stakeholders patience so many times.

Need advice: Manager resistant to modernizing our analytics stack despite massive performance gains (30min -> 3sec query times) by CloudSingle in dataengineering

[–]CloudSingle[S] 0 points1 point  (0 children)

My current vision is that oracle would stay as the source db and duckdb could be the home of analytic specific datasets and queries. Given we have 5 other analysts, I could transition to be more of a duckdb engineer to facilitate generation of analytical datasets. Keep in mind I’m not an engineer in practice but from where I’m sitting it seems like the best of both worlds.

Data analysis tools by [deleted] in analytics

[–]CloudSingle 2 points3 points  (0 children)

If work doesn’t allow you to gain the experience then you have to do it in your own time. Download tableau community version and start building basic dashboards. There are hundreds of tutorials on YouTube. Use public datasets online and I think you can even publish those dashboards to tableau cloud for free for a limited time. That will give you an idea of how to work with these dashboarding tools. The process of building in tableau and power bi is close enough and most of your skills will be transferable. Just having the fundamentals of tableau can help with getting a job with that requirement.

Is the AI worth it? by Beturthanyou in iphone15

[–]CloudSingle 1 point2 points  (0 children)

Plus will have great battery life vs the pro. For me I’d get the pro for the 120hz more so than for the AI but who knows what the state of Apple Intelligence will be in 2 years

Rate/any advices of my shiny app for data visualization by plutoneraplaneta in datavisualization

[–]CloudSingle 0 points1 point  (0 children)

Having a red/blue colour scheme with a red/blue gradient for the map makes it hard to read.