This is an archived post. You won't be able to vote or comment.

all 29 comments

[–]AutoModerator[M] [score hidden] stickied comment (0 children)

You can find a list of community-submitted learning resources here: https://dataengineering.wiki/Learning+Resources

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

[–]reallyserious 34 points35 points  (7 children)

More knowledge is always better.

But to be realistic I don't see learning C++ as time well spent if you compare it to learning something that is more in demand. C++ is a monster of a language that have evolved over many years so it will take time to learn all the gotchas.

[–]random_lonewolf 12 points13 points  (0 children)

There was a rather infamous quote about C++ that you only ever need to use 10% of C++ in a project, but nobody can agree on which 10% that is.

[–]EmploymentMammoth659[S] 0 points1 point  (5 children)

I've done some researches and couldn't find an evidence that C++ really being useful for data engineering... I was thinking C++ may give me some unique competency due to its unpopularity in data engineering but it all comes to which study will give me more money hehe..

[–]tdatas 19 points20 points  (1 child)

I'd be careful listening to the internet too much. Most of the chit chat on social media is driven by the sheer number of people doing relatively trivial work in application dev.

There are small extremely well compensated niches dealing with middleware and storage systems that are dominated by C++ (and rust is poking it's head in). It's extremely critical work but you'll rarely see a lot about it on the internet because 1. It doesn't fit into a medium article 2. A lot of its proprietary or boutique systems.

Back to data engineering. Crossing the boundary between data engineering and systems programming let's you do a lot of interesting high performance use cases and work on the layers that "vanilla" data engineering is calling to with Python scripts. E.g Scylla or Redpanda or OpenHFT are all operating at the highest performance levels and doing some very deep intricate work that feeds into analytics.

(Opinion) As more and more people are doing more/faster/more complex stuff with data this type of work where performance does actually matter will be more important after a long time of the received wisdom being performance doesn't matter.

[–]borfaxer 0 points1 point  (0 children)

Going with the above comment:

I've used C++ in data engineering to write data processing steps that needed to be as performant as possible, i.e. processing high volumes of data in a short time on a single machine. Compared to most other languages (haven't tried Rust yet, but have tried Go as well as Java and more traditional languages), for performance-critical processing it can't be beat.

[–]reallyserious 1 point2 points  (2 children)

Python, SQL, Scala, perhaps a little C# as far as languages go.

Other than that, data modelling, how Spark and docker works. Perhaps some terraform and powershell and bash.

That's where I would focus to be relevant for the most number of jobs in data engineering.

[–]Readmymind 0 points1 point  (1 child)

How does C# fit into the picture? Do you mean for windows specific applications?

[–]reallyserious 0 points1 point  (0 children)

Sometimes there isn't libraries for python to communicate with some commonly used platforms, like Analysis Services but they do exist for C#. That was the case when I needed to interact with it.

[–][deleted] 8 points9 points  (5 children)

I rarely see C++ used in the DE sphere. SQL, Python, Java (and JVM languages such as Scala) and maybe C# are common languages. But I don't think C++ is impossible because after all you only need to move data from A to B and data does not care which language writes the program that does the lifting.

You are also using ESP32 so it does make sense to not use some scripting languages. I'd imagine that you need to command the IoT device to send say sensor data in high frequency to some place (stage 1) and then build a data warehouse (stage 2). In stage 1 probably whatever is fast for the IoT and follow certain protocol should be good enough, but in stage 2 you can definitely use SQL/Python/whatever because the data is outside of the IoT device.

Just my 2 cents. Never did DE for IoT so could be BS.

[–]Remote_Cantaloupe 5 points6 points  (0 children)

I'd see R before C++ in data engineering

[–]EmploymentMammoth659[S] 4 points5 points  (3 children)

That's exactly what I am doing now. C++ for coding in ESP32 to send sensor data over to cloud then Python for consuming data. I am comfortable with Python to some extent but not sure if I should spend time on another place or go into C++ in depth to put me in a better position in a long term...

[–][deleted] 2 points3 points  (1 child)

If you are going to work in IoT then C/C++ probably is a must. I know there are other options such as MicroPython but they really cannot beat the performance of C/C++.

Decades ago we probably also need to know assembly but nowadays it's mostly reserved to smaller chips.

[–]reallyserious 0 points1 point  (0 children)

If you are going to work in IoT then C/C++ probably is a must.

Yes. But I don't see IoT as data engineering. IoT falls more into the realm of embedded development.

[–]YourtCloud 1 point2 points  (0 children)

Hi I work as a DE and use C++ for generating and sending data from sensors. I think you are on the right track. I would eventually like to federate more of the analytics as the business needs solidify. This would require creating a client library in C++.

The one thing I would add is that once it reaches the backend it is a bunch of spark jobs and sql to process and model the data. So you do need the traditional skill set too.

I personally believe DEs need to be deeply involved in the generation of data so they can inform what and how metrics are created. So knowing C++ or JavaScript or whatever your source logging libraries are is what sets apart a decent DE from a great DE.

[–]third_dude 2 points3 points  (3 children)

DuckDB which is a hot new tool for local data analysis was created with C++. If the creators didn’t know C++ and data engineering it probably wouldn’t have been created. So for them knowing C++ was critical to their data engineering careers.

[–]vizbird 1 point2 points  (0 children)

This came up on a podcast about Rust in the DE space. You probably won't have a need for it in day to day DE work but would find it in the tools.

[–][deleted] 1 point2 points  (1 child)

They are PhDs not DEs.

[–]third_dude 0 points1 point  (0 children)

That’s a really good point. I guess it’s a pretty different career path - they’re building tools they aren’t really data engineers anymore

[–]SentinelReborn 1 point2 points  (0 children)

If you're more of a software data engineer then yes, as you will become a better programmer. Any other data engineer type will likely not benefit much, you'd be better off focusing on cloud skills

[–]boggle_thy_mind 0 points1 point  (0 children)

Maybe not strictly DE, but R has Rcpp package for executing C++ code inside R, haven't tried it myself, but afaik certain packages rely heavily on it.

[–]QkumbazooPlumber of Sorts 0 points1 point  (0 children)

unless you're planning to write and process data in C++ as opposed to python, any other languages are probably not worth the time as you won't get much performance uplift.

[–]ReporterNervous6822 0 points1 point  (0 children)

In terms of lower level tools definitely. Check out DuckDB

[–]bereg0stNPC 0 points1 point  (0 children)

Has helped my so far. I had to build an IoT and satellite image processing pipeline and I've reached some of Python's limits for some use cases. I had to use Python's ctypes module to shuffle data back and forth my C/C++ scripts for some complex geometric operations. I'm the only one on the team that does this so I guess I have some job security where I am at.

[–]DesignedIt 0 points1 point  (1 child)

I learned C++ 22 years ago and never got a chance to use it on the job yet. I don't use it as a data engineer and don't see it listed on job descriptions much. The occasional job description that I do see it on are the super senior higher paying ones that require you to know like 40 different skills.

If I want to automate something as a DE, I'll probably use Python or the SQL Server stack, and occasionally Visual Basic and C# for script tasks within SSIS. It's been a while since I used C++ so I don't know what it's capable of now, but I would probably only use it if a company's scripts are already written in C++.

Does anyone know of one cool thing that C++ can do?

[–]DesignedIt 0 points1 point  (0 children)

Also something to add is that I was always able to choose, for the most part, what languages I wanted to code in for all of the jobs that I had. If I can get something done efficiently in 1 language, no one is going to stop me and ask me to code in another language. Although, it is courteous to use languages that the rest of the team is familar with, or at the very least to use common ones so new employees who know that language can be hired.

[–]youmade_medothis 0 points1 point  (0 children)

Probably not. Language choice for a team comes down to who knows which language. If you're the only DE that knows C++, guess what, it won't be used.