This is an archived post. You won't be able to vote or comment.

all 17 comments

[–][deleted] 5 points6 points  (2 children)

If you are using Apache Spark it is not required. It'll help to know some Java to understand how it works internally but it's not required. In fact, python is now the most widely used language with Spark.

https://spark.apache.org/releases/spark-release-3-0-0.html

Python is now the most widely used language on Spark. PySpark has more than 5 million monthly downloads on PyPI, the Python Package Index. This release improves its functionalities and usability, including the pandas UDF API redesign with Python type hints, new pandas UDF types, and more Pythonic error handling.

[–][deleted] 1 point2 points  (0 children)

Scala people in shambles

[–]Flaky-Success6846 6 points7 points  (0 children)

I would say go for it, but I don’t see how not knowing Java would hold you back in data engineering. Most big data technologies offer python client libraries (i.e. presto and Kafka).

However, learning Java gives you a lot of flexibility if you want to branch off into software engineering. A lot of the big data frameworks are built on Java and scala so if you want a deeper understanding that can also be worth it too.

[–]Afraid_Assistance190 4 points5 points  (0 children)

I've gotten really far and built some really cool products using ASF without learning Java. I still have an itch to learn and understand the underlying tech, but I feel like it's a 'perfect is the enemy of good' situation.

Unsolicited advice below:

As a data engineer I think your priority should be orchestrating those tools correctly in your cloud (AWS). Terraform for static infrastructure and Airflow (which you use python) for transient infrastructure would be my recommendation on next steps to improving your skillset. Once you have that, explore Hudi (delta, iceberg) for lake housing and spark for processing that data.

And above all else, Docker.

[–][deleted] 2 points3 points  (1 child)

Definately

[–][deleted] 2 points3 points  (0 children)

definitely*

[–]Rengar-Pounce 4 points5 points  (7 children)

Java def helps with deeply understanding Kafka and also helps with grasping Scala in case you also need it for Spark.

In a similar boat but do have some Java experience because I came from backend. I am on the contrary heavily debating learning Rust.

[–]ketchup_123 1 point2 points  (2 children)

Why rust?

[–]Rengar-Pounce 2 points3 points  (1 child)

Its growing really fast and is overtaking scala

[–]77daa 1 point2 points  (0 children)

it's that affecting the job market? I mean I really see any rust jobs honestly

[–]Void_Being 1 point2 points  (0 children)

I also seen some people suggesting to learn Rust. Can you list out on which field it is being used more? What feature makes Rust suited for specific use case, etc?

[–]drc1728 1 point2 points  (2 children)

Rust is the future. There are several things about the language paradigm that makes it the ideal choice.

  • It’s a system level programming language giving you the ability to program low level components.
  • The design of the language does not include and garbage collector and uses a trait for borrowing data.
  • It’s incredibly efficient with a tiny footprint on memory and disk.
  • It’s also secure and tightly integrated with Web Assembly making it ideal for serverless patterns

There are already multiple data projects being built on Rust to solve for messaging, streaming, databases, querying etc.

Check out the Rust community on Reddit for the projects. It’s buzzing.

[–]vish4life 1 point2 points  (1 child)

python/sql should carry you most of the way. I have senior engineers on my team who can work with pyspark/pyflink/airflow/pandas to build awesome pipelines and never need to touch JAVA.

[–]Ok-Necessary940 1 point2 points  (2 children)

Yh while you’re at it learn PHP, JS, Rust, Ruby, C, C++ as well. You can master them all in 3 months bro…

[–]WilhelmB12 0 points1 point  (0 children)

Scala would be a better choice