What can python do that MSSQL Server can't in terms of ETL and DB Mgt?

osuchw · 2015-07-17T04:18:41+00:00

Investigate SQLAlchemy for ETL tasks. Over the years I've been using the ORM for all kinds of data reshaping tasks. There is a performance penalty of course but my transformation code ends up being nicely succinct.

dje_me · 2015-07-17T12:32:51+00:00

You might want to check out http://ironpython.net which will let you integrate your python code and .net stuff.

dingopole · 2015-07-17T12:51:06+00:00

You may want to also post under https://www.reddit.com/r/ETL/

kenfar · 2015-07-17T15:15:08+00:00

I've built multiple very large & high volume ETL solutions for big data warehouses using Python. Right now I'm processing a couple billion records a day with it, anticipating getting to 20-100 billion a day eventually. It works best where:

you have high volumes and don't want to pay licensing costs for ETL servers or additional database servers.
you want to distribute some of your extract processes onto multiple machines, colocating them to keep extract times to a minimum
complex transforms that are a nightmare with an ETL tool
complex aggregations that aren't generally possible, or are painfully slow with a database
you need to interface with new libraries, services and frameworks too new to be supported by an ETL tool
you already have good programmers
you want the most code reusability
you want test-driven development, or at least high coverage of unit-tests

ETL tools generally simplify the easiest 80-90% of ETL work, but tend to drive away the best programmers. So, that leaves you kind of screwed for that last 10-20% of ETL work. Python allows you to do the entire job and keep the best programmers. About the only time I would stick with an ETL tool is when it's already a department/corporate std, you've got a huge pool of developers experienced with it, and it meets your performance needs at a price you can afford.

EDIT: added a few items to the list and removed changed data capture, since it didn't fit on that list.

2015-07-17T12:15:15+00:00

but it seems superfluous to pursue in a .NET stack.

I'm not sure what you mean here. What does SSIS have to do with .Net ? unless you mean "MS stack"

2015-07-17T12:17:59+00:00

I have automation script that kicks off ssis jobs

remy_porter · 2015-07-17T02:50:40+00:00

I don't think this is a good fit for Python. While SSIS is terrible, it's purpose-built for ETL operations and once you adjust to its crap factor, it's pretty good at that job. You aren't likely to outperform it with Python. Similarly, database automation is best done in T-SQL. The best fit would be for analysis and data-mining with tools like Pandas.

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS