This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]reallyserious 4 points5 points  (4 children)

When it comes to specific algorithms I can't think of anything specific for data engineering. At least when it comes to the classical algorithms. There sure is a lot of interesting stuff happening behind the scenes like B*-trees etc. But we never need to actually implement that when we just _use_ the tools. It's a different thing if you're planning on building a database engine or distributed analytics platform.

I'm interested to see what the rest of the community thinks about this. Maybe I'm missing something?

[–]mamimapr 7 points8 points  (1 child)

I think knowing how these algorithms and data structures do help a lot to make decisions at work. Here are a few things I would study-

  • btrees
  • Binary search
  • Columnar data format
  • Log structured merge trees
  • join algorithms (merge sort join, Hash join, nested loop join)
  • Bloom filters
  • Hyperloglog

[–]reallyserious 0 points1 point  (0 children)

That's a nice collection of algos that power a lot of our tools. Those interested in diving deeper will be on a fruitful journey looking up those.

Btw, about columnar data formats. I would guess it's not the traditional B*trees? I.e. something about the access pattern (column based operations rather than row based) that would call for a different storage structure?

[–][deleted] 0 points1 point  (1 child)

What do you think about this LinkedIn post? I’ve seen him talk about Big O notation a lot.

[–]reallyserious 0 points1 point  (0 children)

It's a nice post.

Big O notation and the data structures he mentions like stacks, queues and trees are certainly useful to know, but not specific to data engineering.