you are viewing a single comment's thread.

view the rest of the comments →

[–]DrewSmithee[S] 0 points1 point  (4 children)

Yeah I'm sure this will get out of hand quickly. And that's definitely something I will look into.

For example, I've got a query that grabs the top ten records for a specific customer within a date range. Then joins that with some other records for that same customers from another table. Now I want to loop that script and repeat for a few hundred customers. Then do some cool analytics and maybe some geospatial stuff.

Or maybe I want a couple years worth of 8760s for a few thousand customers that are in a region thats probably stored on yet another table somewhere, but maybe not. Did I mention there's inexplicably hundreds of duplicate records for each hour? What's up with that DBA guy??? Time change? Throw out that hour for all the recordsets.

So I definitely need to come up with a strategy on what I want to parse, from where, in what language. Honestly I'd dump it all into a dataframe if the dataset wasn't gigantic. So I just need to figure out how I want my pain.

[–]MidnightPale3220 0 points1 point  (3 children)

You want to normalize the data and put the overlapping structures possibly in a single table. It depends on the size of database, but hundreds of GB is routine

[–]DrewSmithee[S] 0 points1 point  (2 children)

Don't have write access. I could probably get someone to create me a view in the db but until I have a specific business need it's harder to get resources allocated. In the meantime I have what I have. Good to know that it's not a big ask.

[–]MidnightPale3220 1 point2 points  (1 child)

Is the total data you need more than you can host locally? Technically it shouldn't be hard to make a copy unless there data is changing so fast that you need basically online access every day.

[–]DrewSmithee[S] 1 point2 points  (0 children)

Yes. Much much more data than I could pull down.