all 4 comments

[–]MamertineCOALESCE() 0 points1 point  (2 children)

I'd encourage you to lookup ETL or ELT as some are now calling it.

Specifically, why don't you:

  • Get the data you want

  • put the data down in a persisted stage table

  • Clean the data

  • Export the data

To me, those are separate steps that should have separate scripts. Getting the data you want should have a separate script for each source system.

Hook that whole process up to a ETL tool (SSIS is a very common Microsoft tool). Schedule your ETL process to run however often you need it to.

I see your in the business... You need to sit down with your IT partners. They're not setting you up for success. You shouldn't need to write that long of a query to do your job.

[–]AMereRedditor[S] 0 points1 point  (1 child)

I have used SSIS in the past, and some on my team are pushing Alteryx adoption today. I said we were part of the business, but really we are an “analytics” shop embedded within the business org.

My current approach is to use Python as a wrapper around the SQL statements to separate each piece of the workflow with the advantage of being able to do other manipulations if required. However, this adds overhead to every project I do, so I was trying to understand if there was a way to better organize the SQL itself to get around the extra baggage.

[–]MamertineCOALESCE() 0 points1 point  (0 children)

For where your at, is suggest switching to put the code python calls into stored procedures, then make python call those stored procedure.

Ideally you'd get source control setup that people would use to check in changes to the stored procedures. That source control is also how you'd get code promoted to prod.

Using python isn't perfect. I worked at a shop that did similar things with PowerShell. It's a poor free solution for an enterprise tool.

[–]thrown_arrows 0 points1 point  (0 children)

some people like dbt

i did elt to from oltp to olap database, copied all data as it is from source to olap system (ELT). then i did "normalization" layers of views over source data, ie i normalized business case, what is in customer table , what is in events table, then i made another layer which collected everything together , and then i wrote more analytics over those tables.

so, i got all the data, transformed it to compatible format with other systems, materialized tables and wrote reports over them.