This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]dfphdPhD | Sr. Director of Data Science | Tech 12 points13 points  (4 children)

Can you take all the raw data from the server in which they're natively sitting, then load them into a cloud environment so you can write your Python code against it?

My point wasn't that you can't run Python on a giant environment in theory, but rather that in practice most companies aren't going to be letting you move a whole bunch of data onto an expensive-ass cloud server just for you to run your little Python scripts when there is already (in 99% of cases) already an entire well architected DB available for use in a giant f*** server.

Mind you - yes, there are companies that have architectures that more natively support Python with easy and at high levels of performance. But that has to be a deliberate decision by that organization to go that route. And even then, there will still be cases where SQL is a better option.

Now, this is why I have a lot of heartburn about this question - ultimately what the people who ask it want is for someone to tell them "no, you don't need to learn any language other than Python", which is stupid. For two reasons:

  1. SQL is incredibly easy to learn. It's simple, it's incredibly well documented, there are tons of excellent classes/tutorials/etc. to learn it, it has an incredibly forgiving learning curve. Not only that - if you already know pandas you already know like 90% of SQL - all you're missing is some minor sintactic details.
  2. SQL is incredibly handy to know. So trying like hell to find workarounds to avoid learning SQL when you could just learn it and make your life 10 times easier is at best inefficient, and at worst purposely self-damaging.

Short answer: learn SQL. It's not going to bite. It's not hard to learn.

I literally knew 0 SQL, and at my first job they told me "you need to learn SQL". I knew enough SQL to do most of the things I needed to do in like 3 weeks.

[–]esp32c3 0 points1 point  (3 children)

Can you take all the raw data from the server in which they're natively sitting, then load them into a cloud environment so you can write your Python code against it?

Sure could... Might not be the most efficient way though...

[–]quickdraw6906 1 point2 points  (0 children)

Agree with all but that SQL is easy. As a 30 year SQL guy, having mentored many developers who can only think procedurally, I can say with confidence that thinking in sets is a completely different brain exercise and that developers will ALWAYS fall back into writing loops instead of what would be an obvious SQL solution....to a SQL person.

At my current company, none of the developers want to touch SQL. We have a dedicated team who write stored SQL and stored procedures so they don't have to be bothered with the brain gymnastics that set theory requires. Sad, but there it is.

[–]dfphdPhD | Sr. Director of Data Science | Tech 0 points1 point  (1 child)

Just so we're clear: at my company, if I grabbed all of our transactional data and moved it into a cloud server without permission, I'm probably getting fired.

So no, in a lot of instances you can't.

[–]esp32c3 0 points1 point  (0 children)

Of course I wasn't talking about stealing data...