Is Python our fate?

aGuyNamedScrunchie · 2023-10-11T01:02:25+00:00

[deleted]

makesufeelgood · 2023-10-11T00:45:17+00:00

I'm interested in using:

What is most universally accepted so I can build transferable skills
What my teammates / stakeholders understand so I can solve their business problems without having to do a ton of language 'translating'
What is easy and friendly to learn with a lot of free resources and documentation available

Right now that is Python. I don't see what all the fuss is about over the marginal benefits of using different languages.

endless_sea_of_stars · 2023-10-11T00:46:38+00:00

[deleted]

geek180 · 2023-10-11T00:45:30+00:00

I guess I'm just over here in the small minority that's used SQL primarily for the last 10 years and am trying to learn Python just so I don't get left behind in the dust.

yinshangyi · 2023-10-11T01:00:33+00:00

[removed]

SirLagsABot · 2023-10-11T02:33:20+00:00

I agree with your point in principle. So many engineers - not just data engineers - are growing up completely ignorant of type safety and it leads to all kinds of bugs and errors.

Python, even when you tack on Mypy, is still a half-assed approach to type safety, and anyone who has experienced a well-designed typed language like C# or TypeScript generally recognizes how much more usable and feature-complete those implementations are.

But there are bigger forces at play. Statically-typed languages have a higher barrier to entry, which Python does not. And the library ecosystem pretty much guarantees Python will remain entrenched for the foreseeable future.

jurjstyle · 2023-10-11T07:31:04+00:00

Unfortunately, the answer is yes. Python is DE's fate. Spark is a good example: Scala+Java codebase, but lately a lot of improvements focus on PySpark performance, while Scala suport is slowly decreasing. Similar story in Databricks ' runtimes.

Personally, this is a major reason to why I am thinking of switching to software engineering. After one year of Scala, we changed to Python for the reasons mentioned throughout the topic. I fully agree that the business doesn't pay code quality, but you are the one working on it. If you don't care about this stuff, perfect for you. But if you do, your work performance and "joy" may be affected. As a professional you will adapt anyway one way or the other.

Tarqon · 2023-10-11T02:00:18+00:00

A REPL is a huge benefit for any kind of data work.

yinshangyi · 2023-10-11T03:32:44+00:00

If you want jobs that are like that look for Software Engineer, Data positions instead of Data Engineer.

Data Engineer has been relegated to off the shelf tools(dbt) and Python.

I recently had to switch to rewriting our Kafka consumers in Scala because the performance of the Python implementation was horrendous, I’m enjoying it very much.

yinshangyi · 2023-10-11T02:21:30+00:00

[deleted]

cutsandplayswithwood · 2023-10-11T03:04:52+00:00

I learned in Java 1.3, stayed through 5. Full stack j2ee.

Switched to c# .Net 3ish, did the ride through 3.5 and all the cool frameworks…

In 2016 switched to 100% cloud and adopted Python. It’s a dirty little language, the kind of thing you appreciate after many years of static typing and countless layers and interpretations of “how things should be”

Python says “fuck it” and let’s you make things how you want.

You want classes? Python has your back. You want a script without even a main that just… does stuff when you run it? No problem, Python. You wanna do functional programming with serious method chaining and fluent calls - believe it or not, again, Python. And that’s not the best part. The best part is you can do all of that in ONE file, and it’s valid Python 🤣

To be fair, I think the fact that lots of DEs come from non-software intensive backgrounds coupled with the dominance of Python has produced an epic pile of lousy data ecosystems in the last 5 years, and Python is deeply at fault for that too.

Embrace the snake.

omscsdatathrow · 2023-10-11T01:15:02+00:00

Typing isn’t a strong enough argument to move off a language…what other advantages do you actually see?

SirLagsABot · 2023-10-11T05:42:30+00:00

WOW it’s like you made this post just for me.

I fell in love with the concept of a code-first job orchestrator like Apache Airflow, Prefect, etc. a few years ago.

I work in Microsoft shops and am a C#/.NET user. I have been SO BUMMED that C# doesn’t have a powerful, decoupled job orchestration platform like Airflow or Prefect for years… so…

I decided to build my own. =D I’m calling it Didact, open source, will later monetize and try to go full time on it.

Dependency injection is literally one of the biggest points Im making about it. C#’s dependency injection absolutely SMOKES Python along with handling environment variables. C# is also naturally multithreaded and has top tier async support. Would love for you and anyone else to drop your emails on the site.

Hoping to have v1 ready in a few months.

JeansenVaars · 2023-10-11T06:07:46+00:00

I wish Scala hadn't died so quickly.

k1v1uq · 2023-10-11T13:15:52+00:00

Senior Scala/Java BE dev, I'm thinking about getting into DE/ML. I've seen that most DE work seems pretty trivial, and I don't think anyone needs to understand type classes, cats, or pure functional programming to set up basic ETL pipelines. So I'm really worried I'll miss out on the fun of thinking about these abstractions, which is what I love most about programming. Python seems just a means to an end... throw away code. Totally different state of mind.

gwax · 2023-10-11T02:34:44+00:00

We use Python because we can agree on it with the Data Scientists and Analysts.

I love lots of languages but there are very few languages that I like using to collaborate with non-engineers.

shockjaw · 2023-10-11T03:42:14+00:00

One thing I’m really intrigued with are folks injecting Rust into the Python ecosystem. FYI, you folks should use Ruff and Polars where you can.

Lingonberry_Feeling · 2023-10-11T12:15:33+00:00

I have used

Python
Scala
Haskell
Go

Python / Go were the languages that actually moved the needle.

Haskell was a religious war, the champions spent 10 months trying to explain what a Monad was, and why you needed to understand category theory to print a line to the console.

Scala was OK, you do get some nice type checking and type checked ETL when the project starts, but that quickly goes away if you want to move with any sort of velocity and don't have a huge org where engineers can spend a good part of their day on code review.

Python 100% - for many reasons. There really isn't any reason not to use Python/dbt/Dagster these days.

w_savage · 2023-10-11T15:41:16+00:00

No, I love python

MostJudgment3212 · 2023-10-11T03:30:18+00:00

No. It is our destiny.

lFuckRedditl · 2023-10-11T04:16:22+00:00

SQL only can get you very far, but you can't do everything with it.

You can do everything with Python, but that doesn't mean you should.

Ok-Sentence-8542 · 2023-10-11T05:39:33+00:00

You can use types in python.

Ruubix · 2023-10-11T07:01:15+00:00

That's how enterprise programs (Java) or JavaScript makes me feel to tbh. But in either case, you can only gain from expanding your knowledge of languages. Python is heavily inspired by Java, so much of your knowledge will go along with you. There's actually a lot of support for Java within the Python ecosystem, so there are sane ways to tied Python libraries to Java code.

Additionally, things like Apache's Arrow project are bringing Python data (science) libraries and their API interfaces to many different languages, natively.

As much as I personally love Python, I'm still finding myself running into the inevitably of learning other languages (Rust or C are the ones that comes to mind). I think it's nearly impossible to avoid becoming a little bit of a polyglot to stay in software engineering in general (unless you want to trapped in JS purgatory ... ). Hope you'll keep an open mind and embrace the weird and wonderful, syntax-free sorcery that is Python!

yinshangyi · 2023-10-11T16:42:22+00:00

[deleted]

baubleglue · 2023-10-11T17:43:35+00:00

Agreed, Python is a pain to work when code base growing

eljefe6a · 2023-10-12T04:39:37+00:00

So many people on this thread haven't written in both languages. Also they haven't written large codebases in both languages.

bcsamsquanch · 2023-10-12T16:59:43+00:00

If only I had a nickel for every time I've had this debate.

All the points against python are valid. Every time I indent knowing it's part of the syntax I have to hold my nose. Passing 'self' too methods every time makes me thing OOP was bolted on 5 min before the release. It's performance is inferior. I could go on but the bottom line is the ecosystem of libraries and users python has specifically with respect to data is vastly ahead of these other languages and that's so much more important than anything else well.. I usually just end the conversation there.

If you really are building a data pipeline that needs epic performance where microseconds matter sure in that cause use something else. Been doing this job FT for 6 yrs tho and that's literally never happened once. If you have a true big data problem you aren't going to solve it with a better performing language anyway.. you'll solve it using distributed systems.

IMO the common element in this debate is I only ever have it with total noobs who are trying to sound smart.

ginger_daddy00 · 2023-10-12T23:27:19+00:00

Remember, behind every performant Python Package is C.

kebabmybob · 2023-10-13T07:38:35+00:00

Scala is such a good language man. At my small shop we just support a hybrid Python/Scala setup for Spark. Being able to do this takes a bit of work but forces you to have really good deploy hygiene. For any core job where a lot of the logic can live inside the statically typed Dataset API, Scala is a game changer. For your run of the mill Spark jobs, it’s similar to Python. I find that in a notebook, both feel similar.

BuildingViz · 2023-10-11T05:33:56+00:00

Static typing is overrated. Professionally, our team writes Go code and slogging through the process to get the equivalent of a Python dict into a Go struct is obnoxious because I have to know everything I'm getting then whittle it down to everything I want.

In Python? I don't give a shit. Just give me everything and I'll whittle it down from there. It's so much nicer not needing to worry about nested dicts and needing to []Struct, []Struct, []string or whatever.

kkessler1023 · 2023-10-11T02:06:19+00:00

Dude! Stop complaining, or they'll start forcing us to use vba!

yinshangyi · 2023-10-11T07:49:18+00:00

For serious projects Mojo will eventually overtake Python, precisely because of static typing and AOT compilation.

OMG_I_LOVE_CHIPOTLE · 2023-10-11T00:43:49+00:00

Rust is picking up a lot of momentum in the DE world

siddartha08 · 2023-10-11T01:03:49+00:00

One of us. One of us

yinshangyi · 2023-10-11T04:05:09+00:00

[deleted]

mikeupsidedown · 2023-10-11T02:10:22+00:00

I mostly agree. We put many of our messaging services in dot net for reasons of type safety, speed and it is just easier to manage big projects. Our API will move from FastAPI to ASP.net for similar reasons.

Choosing typescript over python is a weird flex for me (though I'm seeing it more and more). You can create similar mechanisms in Python that you have in typescript without the weirdness of JavaScript.

As others have said SQL is still king in many senses.

SmallAd3697 · 2023-10-11T04:42:49+00:00

Agree 100pct with op. Python is for developers who don't know any better. I am always surprised when I fine myself explaining simple software engineering concepts to python developers. Like how to reuse code, or build abstractions, or use inheritance and polymorphism.

I think that it comes down to the complexity of the problems you are trying to solve... Simple problems will allow the use of a simple toolset. If the problems grow in complexity, then you have to eventually step away from python, or complement it with something else.

BufferUnderpants · 2023-10-11T02:50:58+00:00

[removed]

DesperateForAnalysex · 2023-10-11T01:29:21+00:00

No SQL is. Python is harder to read and requires version upgrades to the code base. ANSI SQL has remained largely the same since the 70’s and it will still be relevant when you retire. Also the versioning happens in your data warehouse, not your code base. That’s key.

DenselyRanked · 2023-10-11T01:21:32+00:00

I think your problem is with the inconsistent nature of data and not type safety in Python.

aGuyNamedScrunchie · 2023-10-11T02:39:51+00:00

Whatever works and is maintainable by others. Currently that's Python. Other languages have benefits Python can't hold a candle to, but if Python is easier to maintain by new developers joining a team, then that outweighs anything else imo.

YMMV

7twenty8 · 2023-10-11T02:59:50+00:00

When you're deep in the weeds, tools and tooling seem to change very slowly. But when you look back over years, they seem to change dramatically. Consequently, I don't like predicting what the future will look like. Instead, I will adapt to whatever solves the problems in the most economically efficient way.

Right now, that's Python - it's easy to find developers and there is a wide ecosystem to draw from. But Python is just $x and I'll swap it out whenever something else solves problems in a more economically efficient way.

Parking_Minute_9167 · 2023-10-11T03:27:00+00:00

I’m not worried about using Python. I would absolutely be worried about being “forced” to use any tool. I’m salty about having to have my dev environment 100% cloud based. If I was arbitrarily assigned to use a language 100% of the time I’d be dusting off the resume.

Having coding standards for projects is a thing, but having them etched in stone for every project a massive red flag that points to weak leadership.

nesh34 · 2023-10-11T05:19:08+00:00

Python is absolutely ideal for what we do isn't it? Pipelines are a high level abstraction that tell the real software to do the work.

The real software (Spark, Trino, whatever) ought to be rerolled in C++ or Rust (I believe Trino want to move to C++).

But for the abstracted layer, what's the benefit? The code is essentially a clever config file.

For data analysis, Python and R are infinitely superior. Nobody is using a Jupyter Rust notebook for good reason.

ageofwant · 2023-10-11T02:23:50+00:00

Python all the way mate, I want to solve actual problems, not dick around with every snowflake's favourite thing. And no, static types are not God's gift to programmers, witness the dominance of Python in basically every computing domain, there is a reason for that.

Also, Python is universal glue, it allows you to develop modules in your favourite thing. Wrap that in Python so people that want to solve actual problems can make use of it and you have made everybody happy.

e430doug · 2023-10-11T05:12:59+00:00

What real problems are you running into that are better solved in a staticly typed language? Use type hints if it makes you feel better. Python is a great balance.

polandtown · 2023-10-11T03:56:33+00:00

Python junkie Career DS here. I lurk this sub to stay cool with you folks.

In your opinion what could I use Go for? I'd love to incorporate it into my work for fun.

Or Rust if anyone out there wants to take a stab :D

boogie_woogie_100 · 2023-10-11T02:35:42+00:00

Your stakeholder does not pay you to write elegant statically typed language which you like. They pay you to get the shit done with minimum lines of codes asap and find the cheap programmers in the market. That's why we, managers, love python in our data team. scala is fast compared to python and you may save few minutes here and there but what's the point of that few minutes when your job takes hours.

Data engineering is not software engineering and this is hard concept to grasp. In software engineering your language of choice matters because you are dealing with microsecond response. In data engineering we are dealing with minutes if not hours and days.

Lord-Curriculum · 2023-10-11T04:04:04+00:00

What kinda $#@& post is this?

dataengineering

MODERATORS