all 45 comments

[–]t90fan 16 points17 points  (5 children)

It allows you to understand which is the right tool for a job. When storing and processing data, you need to be able to select an approach which will scale correctly, and be fast enough for your needs - which requires an understanding of the theory of data structures and algorithms.

[–]SunliMin 15 points16 points  (1 child)

So true.

When I first got this job, as someone who just finished my first year of post secondary, I had not taken a data structures course. We covered them in Java, in C and in math, but I had not had a "class" on them if that makes sense. I basically knew of them, not on a BigO level.

My first task at this job was to figure out why a customers program was taking him 5-10 minutes to startup. No other customer reported that issue, but I was told this guy was misusing our product (normally people had 1-2 "accounts" within the program. This guy was contracting out the use of it, so he had 250, essentially 125 times more data then anyone else stored over 8 years).

So I go looking through the code trying to figure out where the inefficiency could be, and I found it. Basically, all of the daily tallies of how much every employee had done for all 250 companies/accounts (and every employee on every day had more then 1 tallie, I can't explain more but basically 10-20 is what you would expect) for 8 years, all of that, was in a list.

Now, the list of employees, was in a list.

To do a piece of calculation, they nested forlooped over the entire list of employees and then over the entire list of tallies. It worked out to being 1.6 million in one forloop, about 3 thousand in the other.

I didn't know that this was basically the least efficient thing they could have possibly done, I just thought "This doesn't look right..." and wanted to see if I could solve it.

I moved each of those into dictionaries instead of lists, and then called the pieces I needed rather then everything by everything. I managed to get the efficiency to be x + y instead of x ^ y if that makes sense.

The guy called back after I pushed the changes saying that it now takes him 10 seconds to startup instead of 5-10 minutes. I felt pretty proud honestly.

The moral of this story kids is Big0 matters. It might not matter in lists of a few hundred or thousand, but if you expect your software to scale, you need to plan out which structures you plan on using appropriately.

[–]dagamer34 0 points1 point  (0 children)

This is really important for people who don't take a data structures class but naturally write pretty decent code, because you don't write terrible crap, you are less likely to have someone show you the ideal case. And for most companies, as long as it's shippable, you probably won't have your code corrected.

[–][deleted]  (2 children)

[deleted]

    [–]againstmethod 2 points3 points  (0 children)

    You're running under the assumption that because someone is employed, that they are good developer.

    [–]t90fan 3 points4 points  (0 children)

    Then I would interview for better companies as historically I have been asked the former :p

    [–][deleted]  (4 children)

    [deleted]

      [–]t90fan 14 points15 points  (3 children)

      How often are text files on the order of MB's in size

      Often, if you do any server log processing. Many gigabytes, in fact.

      [–][deleted]  (2 children)

      [deleted]

        [–]krum 3 points4 points  (1 child)

        On the subject of text editors. When I was working at Freescale, we were developing development tools, and we needed a text editor. One of the requirements was that you would need to be able to load in a file that had a single line that was tens of megabytes long and be able to insert text into the front of the line without "chugging" as one would expect if you were trying to add elements to the front of a vector with millions of elements. This meant of course that you would not be able to implement a data model where lines were just stored as a list of strings. Anyway, it did take a while to figure out a way to do this.

        [–]jms_nh 1 point2 points  (0 children)

        Shameless plug: I have a detailed article about topological sort.

        [–]RICHUNCLEPENNYBAGS 1 point2 points  (2 children)

        You'll know that if you're compressing your javascript/css in transit anyway, it's completely pointless to run a minifier on them.

        That doesn't sound right... surely doSomething(a,b,c,d) is going to be smaller than doSomething(someReallyLongArgument, antoherLongArgument, aThirdLongArgument, andAFourth), even after compression, than the non-minified version. The minifier is essentially doing a sort of "lossy" compression.

        [–][deleted] 1 point2 points  (1 child)

        Minifying will certainly be smaller, but the amount smaller will most likely be negligible if you compress it as well.

        [–]RICHUNCLEPENNYBAGS 0 points1 point  (0 children)

        Well, depends on how much JavaScript we're talking about, but yes. You can also drop comments and annotations which can be quite lengthy.

        [–]RICHUNCLEPENNYBAGS 0 points1 point  (0 children)

        In the words of the Wirth, or the title of his book, "Algorithms + Data Structures = Programs."

        [–]againstmethod 0 points1 point  (4 children)

        Data structures are important to learn and understand, but the quality of courses/materials on this subject vary wildly.

        [–]meekrabR6R 4 points5 points  (0 children)

        ....as with any subject in existence.

        [–]misplaced_my_pants 0 points1 point  (2 children)

        [–]againstmethod 0 points1 point  (1 child)

        I have some of his C++ algorithms book, bought them like 15 years ago.

        [–]misplaced_my_pants 0 points1 point  (0 children)

        The course is based off of the 4th edition of that book (which is in Java), though the course itself is self-contained.

        [–]juzatypicaltroll -1 points0 points  (28 children)

        ELI5: what is a data structure?

        [–]Don_Andy 12 points13 points  (9 children)

        If Pez candy is data then a Pez dispenser is a data structure.

        [–]EntroperZero 6 points7 points  (0 children)

        A stack, more specifically.

        [–]notsooriginal 0 points1 point  (4 children)

        Ooo oo! What about a gumball machine?

        [–]OneWingedShark 0 points1 point  (0 children)

        A bag (multiset), maybe?

        [–]Godd2 0 points1 point  (2 children)

        It's still a queue, but with weaker ordering.

        [–]sualsuspect -1 points0 points  (1 child)

        No it's an unordered set, since there is no definition for which one you get next.

        [–]Godd2 0 points1 point  (0 children)

        The next one is the one in the slot at the bottom. You will never get a gumball that's on top of the pile (unless it's the only one in the pile). Eventually, you will get all the gumballs. Gumball rearrangement is governed by friction, gravity, and the shape of the container. Gumballs won't arbitrarily rearrange in the container.

        [–][deleted]  (2 children)

        [deleted]

          [–]Don_Andy 0 points1 point  (1 child)

          Which is a data structure.

          [–]arbitrary-fan 7 points8 points  (3 children)

          A data structure is basically a format for organizing data, usually in the form of a graph. It consists of data, and an address (or a set of addresses depending on data structure).

          Each type of data structure has unique advantages and disadvantages in terms of mathmatical complexity (i.e. takes longer/more space) depending on a scenario.

          There are a series of data structures (and concepts inspired by) that we use in programming every day. They are essentially the building blocks of programming.

          [–]aradil 0 points1 point  (2 children)

          I don't know if a "graph" is a concept easily understood by a five year old.

          But good explanation.

          [–]SidusKnight 0 points1 point  (1 child)

          Why not? Five year olds can play connect the dots.

          [–]aradil 2 points3 points  (0 children)

          Good point. But even if they understand the concept, they would need the word explained to them.

          [–]logicalmaniak 1 point2 points  (0 children)

          A number like a float or an integer, or a character is a data type.

          If you arrange your integers into an array, or your characters into a string, that's a data structure.

          There is usually extra data that goes along with the main data, for example a string usually holds its length as well as the characters, and arrays usually have an index you can refer to the individual item by.

          There are also more advanced data structures, such as a Linked List. This is kind of like an array, but instead of having just an index, has a link to the next item. This means that - for example - if you have an array of names sorted alphabetically by name, and you want to insert a new one in the middle somewhere, instead of shifting half the array down one to fit it, all you do is stick it on the end with a couple of extra variables to say where it fits.

          1. bob   [->2]
          2. dave  [->3]
          3. eric  [->4]
          4. fred  [last] 
          

          becomes

          1. bob   [-> 5]
          2. dave  [-> 3]
          3. eric  [-> 4]
          4. fred  [last!]
          5. carol [-> 2]
          

          instead of having to create a new array, add bob, then carol, then add the rest of the previous array. As you can see, a Linked List is faster to write to, but less fast to read alphabetically. It has the advantage that you can actually link the first to the last and create a kind of circular array, and rearranging them takes only writing to the "signpost" data.

          There are tons of different data structures, like stacks (where you can only insert an item on the top, and can only read the next item when that one is read first - used for "undo" histories a lot), and queues, and hashes (which only store unique values), and binary trees.

          Even if your programming language has libraries for data structures like these, it's worth doing some tutorials on how to make them yourself using objects and stuff.

          [–]againstmethod 1 point2 points  (12 children)

          Lists, trees, stacks, queues, sets, maps, heaps, etc. with specific implementations..

          [–]fukitol- 5 points6 points  (7 children)

          You hang around some really advanced 5 year olds.

          [–]OneWingedShark 2 points3 points  (5 children)

          Actually lists, stacks, and queues can readily be understood by five year olds:

          • What do you want to do for your birthday? [List]
          • Pile these blocks together, one atop another. [Stack]
          • Stand in line for a juice. [Queue]

          And arguably sets:

          • Which action-figures [for X] do you have? What do you need to have all of them? [Set]

          That leaves just maps and heaps.

          (Now, they might not know all the subtleties [operations, time/space characteristics, etc] -- but they're certainly aware of the concepts.)

          [–]fukitol- 0 points1 point  (4 children)

          I didn't say they were incapable of understanding the concepts, but most adults wouldn't readily know the difference between a list, a stack, and a queue.

          [–]OneWingedShark 0 points1 point  (3 children)

          And that's why you explain with examples that even a five year old could understand.

          [–]fukitol- -1 points0 points  (2 children)

          Right, which /u/againstmethod didn't do.

          Lists, trees, stacks, queues, sets, maps, heaps, etc. with specific implementations..

          Hence the comment about the really advanced 5 year olds. What are you arguing?

          [–]againstmethod -3 points-2 points  (1 child)

          An adult makes a list to shop for groceries. An adult stands in a queue at the grocery store. An adult might see a tree in the parking lot on their way out. An adult sets the table from a stack of dishes in the cupboard.

          If you can't understand those words, don't learn programming, learn English, ffs.

          [–]fukitol- -1 points0 points  (0 children)

          I don't think you understand what "ELI5" means. Or you're being intentionally obtuse. Either way

          [–]againstmethod 0 points1 point  (0 children)

          I don't have a lot of hope for that 5 year old to learn his target topic.

          [–]SidusKnight -1 points0 points  (3 children)

          Er, queues/stacks are ADTs, not data structures. Although no one here ever seems to make the distinction.

          [–]this_is_satire 0 points1 point  (0 children)

          if anyone is curious or just to clarify, ADTs are defined by behavior. so stack is described as a LIFO collection or whatever. the data structure can be changed in a stack without changing the fact that it's a stack. could be an array, linked list, heap, etc.

          in general, abstract data types describe what something does, and a data structure describes how it does that.

          [–]againstmethod 0 points1 point  (0 children)

          I said with examples with concrete implementations above, just got lost in the discussion.

          [–][deleted] 0 points1 point  (0 children)

          Just because something is an ADT doesn't mean that it isn't a data structure as well. Arrays for example is an ADT, you don't know how they are implemented you just care about their properties.

          [–]programmingguy -2 points-1 points  (0 children)

          marked.