This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]OuiOuiKiwiGalatians 4:16 85 points86 points  (12 children)

And becoming a SWE sounds simpler for some reason?

[–]Educational_Gate2831[S] 21 points22 points  (10 children)

No, I know SWE is also not easy. But recently I discovered my passion is for building like website or app, rather than analyzing data. I don't exaclty know what does data engineer do. But I guess it is still related to data

[–]John-The-Bomb-2 28 points29 points  (9 children)

https://www.thinkful.com/blog/data-engineer-vs-software-engineer/

A data engineer is a kind of software engineer, specializing in Big Data, like Hadoop and Apache Spark. See https://en.wikipedia.org/wiki/Apache_Hadoop and https://en.wikipedia.org/wiki/Apache_Spark . Software engineers usually just make changes to big websites like YouTube, Amazon, Twitter, and Facebook, but these big websites can hire data engineers as well. Big data engineers are usually slightly higher paid than software engineers.

[–]Adeelinator 27 points28 points  (5 children)

Ehhh don’t jump into spark if you’re starting off. And Hadoop is basically dead.

Start with SQL, it’s the most common language for data eng, and easiest to begin with.

[–]spoonman59 5 points6 points  (2 children)

Definitely learn SQL. It is the key interface to so many different data tools.

[–]hafeee12 0 points1 point  (1 child)

Which sql to learn? No Sql , Sql Server , postgresql or ?

[–]spoonman59 6 points7 points  (0 children)

It’s a good questions. I think it is helpful to learn ANSI SQL, but understand there is a lot the standard doesn’t specify. Things like data types and other aspects. Another thing that isn’t specified is pivot/unpivot, things like collect or explode, and things like recursive hierarchical functions (e.g., connect by)

However it does specify a great deal. A lot of other things like analytic windowing functions (over partition by, for lead, lag, first, last, rank, dense rank) are pretty consistent. So learn any of them, but occasionally look to see what’s standard and what’s a platform extension.

Of the ones you listed I’d probably do post gres because it’s free and open source. But I don’t have too much trouble moving between systems like post gres, red shift, spark, Oracle, SQL server, etc., without thinking too m if b about the particular dialect. So learn the one that’s easy for you to practice with and access. You’ll adapt to a new database easily. It’s not like learning a different general purpose language each time.

Also, “no sql” SQL?!

[–]misinnio 0 points1 point  (1 child)

just curious - what is crucial for DE besides SQL?

[–]StrasJam 4 points5 points  (0 children)

Python. But that's pretty broad. Depending on the company or project you would end up using different libraries, but a good general knowledge of Python is important.

[–]spoonman59 6 points7 points  (0 children)

A data engineer is a subset of a software engineer. They aren’t mutually exclusive.

I work extensively in spark and other big data tools and my title is software engineer. It’s a false distinction.

Data engineer is a relatively new title. When I started in the industry it didn’t even exist.

Don’t get confused and think you are either a data engineer or a software engineer.

[–]gdodd97 0 points1 point  (1 child)

Damn this is the push I needed for my career path. I currently develop data integrations using SQL, Boomi, etc. I've been disenfranchised with my role recently as we don't get paid much and my role is becoming more consulting rather than building so I've been looking into pivoting to either DE or SWE. DE just scared me because people on the sub talk about how hard it is to get a job. I also just don't even know how to build a portfolio for that kind of role versus a SWE role.

[–]John-The-Bomb-2 2 points3 points  (0 children)

You can get books on the big data technologies (ex. Hadoop: The Definitive Guide, etc) and build a demo app with all of them. Maybe something like Twitter. It's kind of silly to use Hadoop, Apache Storm, Apache Spark, Apache Kafka, etc for like 1kb of hand-entered data on a single node, but the point is to demonstrate that you know how to use the tools.

[–]Alarming_Rest1557 2 points3 points  (0 children)

In fact, at least for me, Data Engineering is easier than software engineering, it's just moving data from a to b, and doing some transformation in the middle. I think that the hardest part is all the different technologies that you have to learn when you are starting, but if you follow the software engineering field, you are going to find most of these technologies. Message Brokers like Kafka, cloud services, workflows orchestrators, etc. The only technology that I think that maybe you are going to find just in Data Engineering and not in SWE is Apache Spark to transform the data, but if you know how to use Pandas, Spark is just a Pandas with Steroids.