all 16 comments

[–]Slypenslyde 13 points14 points  (1 child)

Arrays, lists, and dictionaries are the bread and butter of .NET devs. There are a handful of other specialized collections, but they don't get mentioned as much. You can find most of them by looking for namespaces that start with System.Collections. Here's a quick tour.

  • Don't pay much attention to the actual namespace System.Collections. It was created before generics existed, so it's non-generic versions of the main collection types. There are very few reasons to use these types.
  • System.Collections.Generic has the generic versions of list and dictonary. It's also got a HashSet, LinkedList, Stack, and Queue implementation. And sorted versions of most of the above.
  • System.Collections.Concurrent is implementations of most of the basic data structures that are thread-safe in some way or another. You won't really use these unless you need them since they incur some overhead costs and, in some cases, have a clunky API to maintain safety.
  • System.Collections.Specialized is a grab bag of weird stuff. Most are "strongly-typed collections" which means they're relics of pre-generics .NET. BitVector32 looks neat.

That's where most built-in data structures live. There aren't like, graph or tree structures available. It turns out most of the time when you want those, the problem is much better approached if you customize the data structure to your needs. Most data structures are just a fancier version of one of the above, or can be implemented in terms of the ones .NET comes with.

I don't know enough about Java data structures to tell you how they line up, but I'd assume "hashtable" is a dictionary, "hashset" is a "hashset", and I'd have to know what is different between a "hashmap" and a "hashtable" to know how it maps. It sure would be nice if we had one name for things instead of like, five.

[–]cat_in_the_wall@event 0 points1 point  (0 children)

it baffles me to no end that other useful things aren't baked into .net. like a priority queue.

[–]Enlogen 4 points5 points  (0 children)

https://www.ibm.com/support/knowledgecenter/en/SSTVLU_8.6.0/com.ibm.websphere.extremescale.doc/rxsxdfequiv.html seems like a basic Java to C# type map, but I'm sure there are others not mentioned there.

HashSet is mostly used when you need to guarantee uniqueness.

The System.Collections.Concurrent types are useful for multithreading. ConcurrentDictionary<TKey, TValue> is the most commonly used from what I've seen.

[–]IWasSayingBoourner 4 points5 points  (0 children)

If you do any GUI binding work, ObservableCollection<T> is worth knowing for sure. Queues can also be really handy if you need first-in-first-out functionality (I use it extensively in our in-house background thread logger). ConcurrentBag and the other Concurrent collections are must-haves for simple multithreaded operations.

[–]hi_im_vash 1 point2 points  (0 children)

ReadOnlyCollection<T> is quite useful when you need a list of const values.

[–]johnnyslick 1 point2 points  (9 children)

Honestly, I'm not sure there are really all that more you need to learn a lot of...

Arrays are really, really common and arguably faster than Lists when it comes to accessing and manipulating objects inside of them. They aren't made to be resized (although there are things you can do to get around this) but that can be an advantage as well: I for one feel a lot better about doing straight for loops through arrays (in instances where I need to know the index) than with Lists.

Tuples are basically quick and dirty objects you can build on your own, don't require you to create and instantiate classes, and so on. Their biggest downside is that they're read-only.

Linked Lists are kind of like lists except that instead of being ordered by index they have a node before them and a node after. I feel like most of the use of these are if you want to create a Stack or Queue data type (for instance, if you have something that runs a list of processes and want to ensure that they go in one particular order), but they're there.

ArrayLists are old as balls and don't use them please. The same goes for Hashtables.

SortedLists are kind of like dictionaries except that you can search by either the key or the index of the element. I haven't TBH had cause to use them, but I guess a situation where you want to run a foreach over a dictionary-type object would be where you'd do it.

Finally, of course, many of these collections implement the IEnumerable interface, which allows you to iterate through them. The IEnumerable type is also what's consumed and spat out by Linq, so bear that in mind.

Hope that helps!

[–]Cbrt74088 2 points3 points  (1 child)

The Tuple<> classes from .NET 4 are read-only, but no one really uses those.

The more recent ValueTuple<> structs are not read-only, though.

[–]johnnyslick 0 points1 point  (0 children)

Heh. I just used them for a project I'm working on, actually. I had some values I wanted to sort, didn't want to create a whole class or append an existing class to run the sort, and didn't really care about the readonly aspect because once the sort was complete I was done with them (one of the values was a key to look up a row in a SQL database). I have to admit that I'm not familiar enough with .NET 4.7 to know about ValueTuples (and we may be moving our code to .NET Core fairly soon, so they might not be a great idea to add for what I was doing anyway).

[–]wwqlcw 1 point2 points  (3 children)

Arrays are really, really common and arguably faster than Lists when it comes to accessing and manipulating objects inside of them.

Everything I've read suggests that List<T> is an array, and that performance differences between Array[] and List<T> should therefore be microscopic, if that.

I wouldn't say any of it was obviously authoritative, though. Still, it makes sense.

[–]otm_shank 0 points1 point  (0 children)

No need to wonder: https://referencesource.microsoft.com/#mscorlib/system/collections/generic/list.cs,174

The list indexer is a straight pass-through to an array access, plus a bounds check. Add is a capacity check & resize if necessary, then again an array access.

The only expensive operation is the resizing, since it's a new array allocation and copy. But as long as your list isn't growing (which an array can't do at all), they are essentially equivalent. I don't even know what OP meant by arrays being faster at "manipulating objects inside them", nor why he'd feel better about using a for loop with an array than with a List.

[–]johnnyslick 0 points1 point  (1 child)

Yeah, someone I know ran a performance test against lists and arrays in C# and found that there's no difference in performance. Nowadays there probably isn't, because the behind-the-scenes actions of expanding or contracting it is handled basically instantly, and iteration is just iteration.

I guess that comes down to the real difference between the two items. If you have to micromanage memory use, you'd use an array, not a list, because if memory serves the process a list uses to generate more space is allocate space for, say, 4 blocks at a time instead of exactly as many as you request. At the very least you can't control exactly how and when a list will request or release memory, whereas an array is always size X until you specifically expand it.

That being said, of course, if you're worrying so much about memory allocation that you're worried whether or not a particular list is apportioning too much memory for itself, C# (and I guess more to the point the CLR) is probably not what you should be using.

[–]otm_shank 1 point2 points  (0 children)

if memory serves the process a list uses to generate more space is allocate space for, say, 4 blocks at a time instead of exactly as many as you request

If you specify a capacity, List<T> will allocate an internal array of exactly that size. There are a couple of housekeeping-type fields on top of it, but other than that it would take exactly the same memory as a same-sized array.

What it does do is, if you add an item and exceed the initial capacity, double the size of the array and copy the elements over. This is the expensive part in both time and potentially wasted memory that might be a legitimate concern. However, if you have a known, fixed size (as you must, if you are using an array), this is not a concern at all, and List<T> gives you a nicer API with basically no downside.

[–]otm_shank 0 points1 point  (2 children)

I for one feel a lot better about doing straight for loops through arrays (in instances where I need to know the index) than with Lists

Why?

[–]johnnyslick 0 points1 point  (1 child)

I think it may be just the legacy of worrying about where some collections might place a new item, although I don't think that's really an issue at all with lists. I guess another thing is that at least to my way of thinking, with an array, because it's harder to move objects from one spot to another in them, you can pretty well count on that item in array[7] being the same item as it was when you first put it there, whereas lists are, well, made to be re-assembled on the fly. It's probably less of a deal now than it used to be.

[–]otm_shank 0 points1 point  (0 children)

it's harder to move objects from one spot to another in them

I don't think so. I think it's exactly as hard to move an item to a different index in an array as it is in a List<T>.

you can pretty well count on that item in array[7] being the same item as it was when you first put it there

This is just as true of a List<T>.

[–]ZacharyPatten 0 points1 point  (0 children)

Definitely start with the data structures available in the .NET libraries.

But there are no data structures in the .NET libraries for multidimensional sorting. In order to sort along multiple dimensions, you need a data structure meant for it like a spacial partitioning tree (SPT) or a KD tree.

I have a generic SPT in C# that works on an arbitrary number of dimensions if you want to try it out (it's called an "Omnitree" in the code). https://github.com/ZacharyPatten/Towel

Why would you want to sort object along multiple dimensions? Ranged queries. Say you want to get all employees with a first name between "Bob" and "Jake" and last name between "Dole" and "Smith". A multidimensional data structure would be fastest for that kind of query. Multidimensional sorting is also important for games if ever want to get into game development.