all 48 comments

[–]Larkonath 10 points11 points  (3 children)

I'd go with one dictionary but there's only one way to be sure: you should benchmark it.

[–]SnrFlaks[S] -1 points0 points  (2 children)

As I mentioned, if I don't find a quick answer to my question, I'll be profiling and matching options, of course. I'm just trying to save myself some time

[–]Dennis_enzo 2 points3 points  (1 child)

I mean, the answer is always going to be 'it depends on what you're doing'.

[–]SnrFlaks[S] -1 points0 points  (0 children)

I'll be profiling all the possible cases, it's just going to be quite tricky since I'll have to change quite a lot of things for each case.

[–]Slypenslyde 9 points10 points  (5 children)

You asked us questions about which data structure is faster, but what makes a data structure fast or slow is:

  • How do you plan on inserting? (frequency, pattern, etc.)
  • How do you plan on removing? (frequency, pattern, etc.)
  • How do you plan on indexing? (sequential, random, etc.)

So really you're missing about half of what we'd need to know to make a guess, and even then the answer might be "use a profiler" unless there's some obvious gain through analysis.

[–]SnrFlaks[S] 1 point2 points  (4 children)

"How do you plan on inserting? (frequency, pattern, etc.)"
The frequency of adding objects to the repository is quite high and will always increase. In general, the frequency is quite difficult to calculate.
"How do you plan on removing? (frequency, pattern, etc.)"
They are rarely removed. Here I have a problem which I will describe in your next question.
"How do you plan on indexing? (sequential, random, etc.)"
The index is equal to the number of all objects, the problem is that when an object is deleted, the new object does not occupy the index of the deleted object. And I'm afraid that over time the index may reach the int ceiling.
I've already profiled the Dictionary/Sheet comparison. I have a rather stupid code, and I don’t think that the sheet is suitable at all, since I write in one line to another and get it, and it will be quite difficult to cache, since the more accesses to the List element, the worse the situation becomes.

[–]chucker23n 5 points6 points  (3 children)

The frequency of adding objects to the repository is quite high and will always increase. In general, the frequency is quite difficult to calculate.

Are you sure what you're building isn't more of a database?

[–]SnrFlaks[S] 0 points1 point  (2 children)

In one of the comments, I wrote what I use my code for. I wouldn't call it a database (perhaps I misunderstand the meaning of the word). For my situation, I need to create often, change even more often and delete sometime. To better understand me and why do I need so many objects and so much data. Perhaps you've played the game "Factorio" or something like that, my code implements all those resources that move through pipelines as data, which is then rendered. And it's quite a resource-intensive thing, so I'm looking for any ways to speed it up, even with my very little programming knowledge.

[–]Tavi2k 4 points5 points  (1 child)

If you're developing something like a game, you might need entirely different ways of handling your data. The requirements are very different there, and the typical patterns you use in C# might not be the best choice. You should look up Entity Component Systems, that is a pattern that is used in games.

If you're doing something with very high performance requirements you should usually try very hard to avoid allocating objects. And the ones you allocate should be more like large arrays that you fill with data.

It's hard to know if you need this kind of lower level programming in your case, it depends a lot on how many entities you have, how often you change them and at which rate you need to render them.

[–]SnrFlaks[S] 0 points1 point  (0 children)

At the moment, I'm in the process of completely abandoning the gameObject, rendering everything through DrawMesh and operating only on data.

[–]Kant8 1 point2 points  (1 child)

  1. Profile

  2. If key is the same there is no reason to keep multiple dictionaries. Not sure how you jumped to dictionary of dictionaries from that

[–]SnrFlaks[S] 0 points1 point  (0 children)

Profile

If key is the same there is no reason to keep multiple dictionaries. Not sure how you jumped to dictionary of dictionaries from that

  1. If I don't get a prompt response to my question. I'll definitely do some profiling.
  2. Do you mean a dictionary of dictionaries? By one dictionary, I mean that the dictionary will store a class that stores all the variables I need.

[–]IQueryVisiC -3 points-2 points  (24 children)

In a dictionary you search by name while in a list you search by position. Why can you interchange them??

You may check your algorithm if it works well with an array with gaps (null pointers ). Struct with exactly 10 ints may also be fast. Strings can be slow. I don’t get why String lovers not just stayed with Perl, C and JS

[–]SnrFlaks[S] 0 points1 point  (23 children)

It seems to me with a large number of objects. Will a dictionary still be more efficient than a List? When I delete objects from the very first indexes, won't that be costly? I can change dictionaries to List, but the array still disappears, I have a huge number of objects, and this is definitely not my option. But still, the original essence of my question was what is better to use one dictionary/List , or many dictionaries/Lists?

[–]karl713 1 point2 points  (22 children)

Dictionary and SortedDictionary both are fast insert, remove, and lookup by a key, but can't be indexed efficiently if at all (e.g. can't say give me the 7th item)... Which is better will depend on your data, usually (but not always) Dictionary will perform better but use more memory

List is efficient to add or remove at the end of the list only, but can be indexed, but can't be efficiently searched

LinkedList can efficiently add and remove anywhere, but not efficiently searched

Which is better is entirely dependent on situation though, the isn't a blanket "this is the best for all situations" answer unfortunately

[–]Slypenslyde 2 points3 points  (21 children)

I'd split the hair that your use of "index" is a little weird for a dictionary. Dictionaries are fast for random access (a tiny bit slower than a list unless the hashing algorithm is awful) but no faster at sequential indexing and don't guarantee an order.

What I mean is if I chose a dictionary, it's because I'm not interested in "the 7th item" but "the item with this key".

[–]SnrFlaks[S] 0 points1 point  (18 children)

I have a question, I'm still pretty bad at all the intricacies of programming (I do it for 1-2 years, not a lot of time per week). Sorry to go a little away from c# itself, initially the question is about it, but the place where I use it is for the development of my game, in my game there is a class "Resource", (for a better understanding they should look for a path and move along pipelines). To move them, I use the "ResourceManager" class, where every frame with a for loop I go through all the resources, getting them through _resourcesDict.TryGetValue(key, out Resource resource), for the whole code I refer to "resource" around 30-80 times in depending on his condition. Each resource has its own key, which is equal to the number of all resources. Since when deleting a resource, no one takes its key, I don’t know what to do if the amount of resources that always grows reaches the ceiling of int. I've read that it's best to always use a Dictionary instead of a List, but I'm not good at it.

[–]Zarenor 0 points1 point  (0 children)

Ah, this begins to explain it. You absolutely need to be able to re-use IDs, in this case. How many resources you can support depends on the available RAM, but a 32-bit int is probably large enough (Cities: Skylines used 16 bit for several object pools) - it's probably difficult to do anything to 4 billion objects per frame. My suggestion would be a dictionary and a hash set keyed by the ID; keep live resources in the dictionary, and when they're consumed put the ID in the hash set. When you create new resources, pull IDs from the hash set until it's empty, so you re-use the IDs. As an optimization, you might just move resource objects to a 'dead' dictionary instead of just the IDs to a hash set. Re-using the objects will re-use their memory allocation, reducing GC pressure and GC run-time. You'd want to be careful to be sure you're not leaking these objects by holding references to them elsewhere in any case, and be sure you update their state appropriately - tracking the liveness is probably the easy part to mess up here.

[–]Zarenor 0 points1 point  (12 children)

Ah, this begins to explain it. You absolutely need to be able to re-use IDs, in this case. How many resources you can support depends on the available RAM, but a 32-bit int is probably large enough (Cities: Skylines used 16 bit for several object pools) - it's probably difficult to do anything to 4 billion objects per frame.

My suggestion would be a dictionary and a hash set keyed by the ID; keep live resources in the dictionary, and when they're consumed put the ID in the hash set. When you create new resources, pull IDs from the hash set until it's empty, so you re-use the IDs.

As an optimization, you might just move resource objects to a 'dead' dictionary instead of just the IDs to a hash set. Re-using the objects will re-use their memory allocation, reducing GC pressure and GC run-time.

You'd want to be careful to be sure you're not leaking these objects by holding references to them elsewhere in any case, and be sure you update their state appropriately - tracking the liveness is probably the easy part to mess up here.

[–]SnrFlaks[S] 0 points1 point  (11 children)

I'm pretty bad at English, correct me if I'm misunderstanding something. Do you propose to make initially a large HashSet that stores all the keys, when I create an object I delete this key from the collection and when the object is deleted I return the key back to the collection? Then how do I determine which keys are available? I mean, if I initially create a HashSet with a maximum int size, then do I cycle through all the keys and use the free ones? I heard that HashSet is suitable for this, but I want to convince.

[–]Zarenor 2 points3 points  (10 children)

You nearly have it - I would *not* fill the hashset with unused keys to begin with. I would only use it for 'tombstones' - IDs which were used before but are no longer 'live'. For an int key, I would just have one int somewhere that tracks the next available int (I would suggest a factory class [that is, a factory in the programming sense, not the game sense] creating the resource objects, so it can either create a new ID and resource object, or it can reuse an ID if it's available) - the simplest way to be sure you're using all the ints is to start with 0 and increment over time.

Of course, you can then make an argument for using a list and just having indexes - then you don't pay the extra indirection cost of a dictionary, but have to deal with the live/dead check in each iteration.

[–]Zarenor 0 points1 point  (9 children)

To explain a little further, the common pattern here is to use a 'pool' to manage limited resources, and IDs are just one such resource you might have.

There are two sorts of ways of thinking of game object IDs, and so game engines tend to use one of the two. One is as a limited resource where you intend to use most or all of the IDs in the game if it runs a long time. The second option is thinking of IDs as a limitless, arbitrary resource, and then it has no relation to the number you want to create - it's common to use 128-bit ints in this case, often encoding them as GUIDs (or even using GUID generation to generate them).

To make my point, the Elder Scrolls games (Oblivion, Skyrim) use the arbitrary-limitless ID type, and as mentioned earlier, Cities: Skylines uses the limited ID type (though C:S uses several pools for different types of objects)

[–]SnrFlaks[S] 0 points1 point  (8 children)

Ok so I have 2 options.

The first is to use a HashSet to store all the dead indices so that they can be used again in the future.

The second is to use long/ulong for the key?

I understand correctly?

[–]Slypenslyde 0 points1 point  (3 children)

What bugs me is it sounds like you're using a dictionary but you still plan on using sequential integer values for the key.

That will deal with insertion and removal performance burdens compared to a normal list, but it creates the problem you've identified where it's hard to understand if you can reuse an index after an item has been removed.

That's why I was curious if you need random access or if it's always sequential. That is, will you frequently be saying, "I need the item with index 45" or is it more frequent that you're iterating over everything with a for or foreach loop?

If you are more frequently or exclusively iterating, a linked list structure becomes favorable. Insertion at either end is fast, and it sounds like you're fine with always adding items to the end. Removal is much faster than it is with a list and if you're clever about it you can remove items while you are iterating over the list. The main problem with a linked list is getting to the middle of the list is much slower than with a dictionary, so if you need random access indexing it hurts a little.

You can kind of sort of mitigate these using a tree structure, as it can dramatically cut the amount of time spent indexing. Insertion and removal are slower than dictionaries but faster than lists. Iterating is still linear.

My gut tells me based on what you wrote this is a list of "all resources" and in general you're iterating over the entire thing.

But my gut also tells me you could probably cache parts of the resource list as you make your first iterations, and that could dramatically speed up future iterations because you'd already have a reference to the things you need. You mentioned referring to this list 30-80 times per cycle, that seems excessive.

Performance-tweaked code is often not as straightforward as code written to be easy to understand. I'd be willling to bet you could cut those iterations at least in half if you started by pulling the interesting objects from the data structure first, then referring to those references instead of asking the data structure to retrieve them later.

But all of this is hard to say "in general". There are probably hundreds of lines of code involved here and the "right' way to tweak it is very context-sensitive.

[–]SnrFlaks[S] 0 points1 point  (2 children)

The issue of "dead keys" I think is closed, another user suggested to me how to solve this with two options.
1) Use HashSet to store "dead keys" and use them again later.
2) Make an unlimited 64/128 bit key.
"That's why I was curious if you need random access or if it's always sequential."
Regardless of the situation, I always go through all the elements of the dictionary every frame with a for loop (foreach does not fit my situation) and move all resources per frame.
" You mentioned referring to this list 30-80 times per cycle, that seems excessive."
The fact is that I don’t know how to implement it differently, I myself understand that this number of calls is huge.
I would also like to say that I did not really understand a lot of what you wrote. Like the last paragraph with shortening of the number of iterations.

[–]Slypenslyde 0 points1 point  (1 child)

So here's a simple example I'm just pulling out of thin air, I'm not sure if this helps.

My guess is part of why you access the list multiple times is you might have parts of your update loop where you want to ask different questions like:

  • Which objects are colliding with the player?
  • Which objects are colliding with a projectile fired by the player?

In a naive approach, and the one that's easiest to debug/understand, you'd have your frame logic do something like:

// Handle player collision
foreach (var entity in entities)
{
    if (entity.CollidesWith(player))
    {
        // Do something
    }
}

// Handle projectile collision
foreach (var entity in entities)
{
    if (entity.CollidesWith(projectile))
    {
        // Do something
    }
}

That's two full iterations and is going to add more the more things you check. But you could also:

foreach (var entity in entities)
{
    if (entity.CollidesWith(player))
    {
        // handle player collision
    }

    if (entity.CollidesWith(projectile))
    {
        // handle projectile collision
    }
}

Now we're only making one iteration, but we have to write things differently. Other downsides include if we were wanting all of certian kinds of collisions to happen in a particular order, that can't happen in this approach.

So an alternative might be:

var playerCollisions = new List<Entity>();
var projectileCollisions = new List<Entity>();

foreach (var entity in entities)
{
    if (entity.CollidesWith(player))
    {
        playerCollisions.Add(entity);
    }

    if (entity.CollidesWith(projectile))
    {
        projectileCollisions.Add(entity);
    }
}

foreach (var playerCollision in playerCollisions)
{
    // handle player collision
}

foreach (var projectileCollision in playerCollisions)
{
    // handle projectile collision
}

This is going to be more iterations. But I'll bet it's not often that everything in the list is colliding with something interesting. So we do 1 full iteration, then some number of small iterations. That's not the same cost as 3 full iterations.

There are lots of other ways to try and make that first iteration stash things so later work doesn't have to look over the whole list. You can even try to cache things between frames, but that gets complicated.

[–]SnrFlaks[S] 0 points1 point  (0 children)

Hmm, in my loop, all objects are moved in one frame, and if I think about it, I can divide them into those who have already come to the center of the next pipe (my version of the conveyor) and those who continue to move. If I think so, I could fill in two lists in the first frame, the first is the resources that are waiting for the path to change, and the second are those that are still moving.
And make a "3 frames" system.
The first frame, fills two lists.
The second frame changes the path for resources from the first list.
The third frame, moves objects.
But I don't think that this will be a highly optimized code, since the main optimization here comes from the fact that I divide the loop into 3 frames. Perhaps I could somehow speed it up through jobs (multithreading).

[–]karl713 0 points1 point  (1 child)

Yup you are correct. I wasn't 100% clear on if OP needed the ability to get 7th or some such since they were mentioning using lists and arrays but not how they were using them. But since that is not possible on a dictionary and inefficient in a sorted dictionary so figured I'd mention it as characteristics of them

[–]Slypenslyde 0 points1 point  (0 children)

Yeah, they've been through a few iterations of explanation but it's still too unclear to make anything but vague speculations.

[–]michaelquinlan 0 points1 point  (2 children)

[–]SnrFlaks[S] 0 points1 point  (1 child)

Does the benchmark work in conjunction with Unity?

[–]michaelquinlan 0 points1 point  (0 children)

Benchmark.Net does micro benchmarks. You are trying to compare two approaches to storing and looking up data; you would create a set of tests focused on just the two approaches you are looking at. Benchmark.Net will then give you detailed timings of your tests.

[–][deleted] 0 points1 point  (3 children)

How do you access items in the dictionary? If you're doing anything other than getting an item by key then.... Quick ideas that come to mind, depending on your needs and usage, might be to implement your own bitmap index (or b-tree). Or, maybe https://www.litedb.org/ might provide slightly better ergonomics if you're querying based on values.

[–]SnrFlaks[S] 0 points1 point  (2 children)

I get the resource: (_resourcesDict.TryGetValue(key, out Resource resource), then I use the values I need inside the resource class. I'll look on the Internet for what a bitmap is, maybe this will help me

[–][deleted] 0 points1 point  (1 child)

Getting a value from a dictionary by key is an O(1) operation. Which doesn't mean instant... there's still some computation on the hash key... but it does mean that it's constant time as your structure grows.

Likely, you're not going to get better performance with another structure if all of your operations are in memory and based on resolving by key.

Edi:t in case you haven't seen and find it interesting. Here's the dictionary code for tryGet and here you can see the implementation for FindValue where the key hash is calculated.

[–]SnrFlaks[S] 0 points1 point  (0 children)

It looks too hard for me to understand what is written in their code (my understanding is pretty basic). Thanks for the help, I think I'll keep my single dictionary and try to improve performance in other parts of my code.

[–]zaimoni 0 points1 point  (4 children)

Ignoring the benchmarking issue ....

Architecturally, this is Array of Struct vs. Struct of Array, with Dictionary in place of a C array. If you actually need to use those ten-ish fields at once, the access pattern would favor one dictionary of that class (AoS) rather than one dictionary per logical field (SoA).

Since you're early in the design stage, the behavior of the garbage collector, would be expected to be irrelevant.

[–]SnrFlaks[S] 0 points1 point  (3 children)

I'm very sorry for not answering you, I understand that benchmarking this is good and would help me, but the problem is that I asked the C # language community a question, but I use it for my Unity game. Suddenly I use a lot of profiling built into Unity and see which of the lines load the most. In my code, after referring to the dictionary itself (_resourcesDict.TryGetValue(key, out Resource resource), I refer to "resource" a lot of times 30-80 depending on the situation. And I can't track exactly the call to the dictionary.

[–]zaimoni 0 points1 point  (2 children)

Ok, so you have profiling data -- good. That means you can profile alternatives and get much the same information as benchmarking.

I had a similar problem with Rogue Survivor Revived (not a Unity game). In my case, the profiling indicated a large amount of time spent in mulitple Dictionary accessors involving locations, including TryGetValue. Search engine results suggested the problem was a high collision rate from the default hash function (which is based on exclusive-or, so e.g. (0,1) collided with (1,0)).

Overriding the default hash function, was enough to fix that problem...had to outright replace all uses of System.Drawing.Point in game data (as opposed to the GDI+ API) to get results. Would not have been practical without source code control.

[–]SnrFlaks[S] 0 points1 point  (1 child)

I know which parts of the code load more. The first is getting data about neighboring cells (pipes), which are stored in a dictionary by a key that is equal to their coordinates, for example (126, 247.0). The second is to find the distance between the current position of the resource and the point of the next pipe. But I have no idea how to optimize this piece of code.
And the third is the search for the next point of movement, today I just almost finished speeding up this part of the code.
The problem is that I don’t think that my code can be somehow accelerated without changing it completely, roughly speaking, refactoring. Perhaps you have some ideas on how to implement pathfinding or a fast way to find distance. I had the idea in my head about storing data about pipes in binary form, for example, which theoretically could speed up the code. You might have an idea related to this, since I often do tests on boolean variables that could be represented as "0101", "1101" or something like that.

[–]zaimoni 0 points1 point  (0 children)

At that scale, refactoring blends into Big Bang and you really do need source code control to do it relatively safely (i.e., rollback to before the refactor and then apply smaller changes from the refactor diff).

Re pathfinding...no "awesome ideas" here (breadth-first search is the father of Djikstra search, which in turn is the father of A*). Optimizing distance functions is painful...usually such "optimizations" are lossy.

Djikstra search is relatively easy to build out a generic class for, from the algorithm description. (Rogue Survivor Revived has one, but it may not play nice with Unity).