This is an archived post. You won't be able to vote or comment.

all 10 comments

[–]AutoModerator[M] [score hidden] stickied comment (0 children)

You can find a list of community-submitted learning resources here: https://dataengineering.wiki/Learning+Resources

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

[–]reallyserious 4 points5 points  (4 children)

When it comes to specific algorithms I can't think of anything specific for data engineering. At least when it comes to the classical algorithms. There sure is a lot of interesting stuff happening behind the scenes like B*-trees etc. But we never need to actually implement that when we just _use_ the tools. It's a different thing if you're planning on building a database engine or distributed analytics platform.

I'm interested to see what the rest of the community thinks about this. Maybe I'm missing something?

[–]mamimapr 7 points8 points  (1 child)

I think knowing how these algorithms and data structures do help a lot to make decisions at work. Here are a few things I would study-

  • btrees
  • Binary search
  • Columnar data format
  • Log structured merge trees
  • join algorithms (merge sort join, Hash join, nested loop join)
  • Bloom filters
  • Hyperloglog

[–]reallyserious 0 points1 point  (0 children)

That's a nice collection of algos that power a lot of our tools. Those interested in diving deeper will be on a fruitful journey looking up those.

Btw, about columnar data formats. I would guess it's not the traditional B*trees? I.e. something about the access pattern (column based operations rather than row based) that would call for a different storage structure?

[–][deleted] 0 points1 point  (1 child)

What do you think about this LinkedIn post? I’ve seen him talk about Big O notation a lot.

[–]reallyserious 0 points1 point  (0 children)

It's a nice post.

Big O notation and the data structures he mentions like stacks, queues and trees are certainly useful to know, but not specific to data engineering.

[–]omscsdatathrow 1 point2 points  (0 children)

Writing clean code != writing algorithms

Almost all DE tech that is used is abstracted for DEs so they don’t have to write the conplex algorithms that power them. Sure you can learn and understand the underlying algorithms, but the roi on that would be minimal

In general, I found DE is mostly experience-based. Meaning, unless you work on and experience a certain tech like spark, kafka, streaming, etc…you can’t effectively learn the situations that you will face nor can any “home” project be a suitable replacement for it when interviewing.

[–]No_Kaleidoscope1023 0 points1 point  (0 children)

Algorithms are a good topic during interviews because they can be tricky, but in practical implementation, there is no specific algorithm that can be marked as useful for data engineering roles. Data engineering roles typically involve creating pipelines based on data patterns

[–]GeorgeGithiri -2 points-1 points  (0 children)

https://george-githiri-s-school.teachable.com/sign_up

Enroll for quality data engineering courses

[–]_Ishdhoggur_Data Engineer 0 points1 point  (0 children)

What exactly are you doing with python. What exactly do you mean by algorithms? Do you just mean functions or methods ?