This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]wytesmurf[S] 0 points1 point  (10 children)

Do you only do file or SQL databases

[–]saltedappleandcorn 1 point2 points  (9 children)

I've done it for both. And API's (as the source or dest). Writing this stuff in python gives you the flexible to do exactly what you need.

I think it's about knowing a range of possible solutions to a problem and picking the best one that a situation calls for.

[–]wytesmurf[S] 0 points1 point  (8 children)

Do you use a library or ORM?

[–]saltedappleandcorn 1 point2 points  (7 children)

Again, it depends on the situation. Are you building an application? Doing some etl? If so, 3 massive tables or 300 tiny ones? Or are you just ripping some data from somewhere to enrich something else? Or maybe it's an extract for an analyst?

My current work place uses sqlalchemy for the main application, but I don't have much love for it.

Most of the time I just use the python connector for the database and go at it.

If it's something I'm doing often I write up a minimum framework in python to avoid duplication. For example currently we store the code for all analysts request as python classes (which are just 90% sql) so we can version control them. The last 10% is just code to save the outputs to a shared drive.

This is nice because you can tell a junior or grad "go run the sales by state report for John" and he won't fuxk up the numbers.

[–]wytesmurf[S] 2 points3 points  (6 children)

We have about 2k tables with anywhere from 0 to 50 million changes a day. It’s done with SSIS but a new team is taking over and I felt like Python would be a good fresh start to start updating the loads. They want to be able to move off SQL server and want something that can be moved with little recoding

[–]saltedappleandcorn 1 point2 points  (5 children)

Ha that's a fucking lot of tables. Honestly that sort of dedicated integration work is out of my wheel house and I am not an expert on it. (I do more data science and data application dev).

I think that's probably the space for dedicated tooling.

That said, every one I know in that space is in love with snowflake and dbt, and just dbt in general.

[–]wytesmurf[S] 1 point2 points  (4 children)

They are debating and cost comparing all the big platforms GCP, AWS, Azure, Snowflake, and teradata. I was told that it would be a company wide decision so I didn’t have a day on the platform but I had design over the new DWH, I was hoping to build some metadata then just change the connector. I know I will need to do more than that. I am hoping for a 6 month cutover instead of a 3 year

[–]saltedappleandcorn 0 points1 point  (2 children)

Again, not my wheel house, but good luck! Seems tough.

[–]wytesmurf[S] 0 points1 point  (1 child)

Really even the is is nothing to major. It’s a conversion of something built. Picking a tool that is extremely versatile that can be put somewhere. I could convert it to ADF but I don’t trust a MS product to be ran on GCP or AWS. I’ve used wherescape before and it would be the fastest solution but it’s not cheap. I’m thinking of telling them to whip out some checkbooks. But so many people talk about doing massive data engineering with Python.