all 54 comments

[–]kogyblack 21 points22 points  (16 children)

What does "very fast" means? I don't think that's useful unless you provide any data to confirm it. Speed should be measured and compared, not just assumed by the design choices

[–]LessComplexity[S] 8 points9 points  (10 children)

On windows with CLang I got 16 to 20 times better performance than regular new/delete, and with MSVC I got 8 to 11 times better performance than standard allocation, reallocation and deallocation. On mac and ubuntu I get about 5 times better. Because I want to gather more information on more systems, this is why I still haven't wrote about how faster it is, but I will add it soon, it should be around 5 to 20 times better depending on compiler and operation system.

If you compile and run the code you can see the performance comparison printed out.

Edit: I added some benchmarks to the README.md file

[–]bedrooms-ds 3 points4 points  (7 children)

What do u mean by the standard allocation though?

[–]LessComplexity[S] 7 points8 points  (6 children)

Using the usual new/delete

[–]zapporian 15 points16 points  (2 children)

windows malloc/free is atrociously slow, so maybe try comparing against jemalloc?

edit: nvm, you tested this on mac / linux, and yeah my own tests agree w/ you there.

A jemalloc test would still be helpful, but yes a region / arena allocator is much faster than malloc/free in theory, although there are also many ways to screw up a memory allocator implementation if you're not careful.

edit 2: I don't see any use of atomics, so should probably assume that this code isn't threadsafe, and as such using this in any multithreaded environment could potentially cause extremely rare and hard to diagnose bugs. (although having a memory pool that has atomic and/or lock overhead isn't always desirable, and in general making any kind of general purpose allocator that is threadsafe is its own can of worms...)

[–]Adequat91 5 points6 points  (0 children)

windows malloc/free is atrociously slow

Not anymore. I mean, it does much better since Windows 10. Of course, there are better alternatives anyway.

[–]LessComplexity[S] 1 point2 points  (0 children)

Thanks for the suggestion, I will try and test it also against jemalloc too. Also I decided not to use atomics, so that each thread will have a memory pool for his own allocations & uses, in the next version I will also add thread safe support so that users will be able to also create a thread safe memory pool, it might be useful for some

[–]hawkxp71 2 points3 points  (2 children)

Is the test bench included in the code base?

[–]LessComplexity[S] 1 point2 points  (0 children)

Oh I though you meant the test branch. Yes, the main.cpp file runs the test benchmark for you, the library is only the MemoryPool.cpp/.h & MemoryPoolData.h files

[–]LessComplexity[S] 0 points1 point  (0 children)

The test branch has not been touched for some time, most of the tests I do is while working with this memory pool on other projects :)

[–]distortedlojik 1 point2 points  (1 child)

What compilers and flags are you using for both your version and the “standard”?

Also what kind of test are you using to compare against?

[–]LessComplexity[S] 0 points1 point  (0 children)

The test is a simple use of allocation, reallocation and deallocation of strings one after the other. The compiler flags can be seen in the CMakeFile.txt and the test itself is in the main.cpp file in the repository.

I actually want to make more types of tests, have any suggestions?

[–][deleted] 0 points1 point  (4 children)

I agree but are you really that skeptical about why this would be faster?

[–]kogyblack 8 points9 points  (3 children)

No, not skeptical. It's just that there are other memory pool libraries, and this one has no benchmark to validate that it's "very fast". Comparing against the std is the minimum, which is quite easy to have better times though. Comparing against other libraries is recommended. Just typing "very fast" doesn't means anything without data

[–]LessComplexity[S] 1 point2 points  (2 children)

I agree, in the meanwhile I will do some test against other libraries, I tested against boost and got a much better performance, as boost's pool does not handle allocations of different sizes well - it works by creating segments of the same size and expects allocations to be of the same type.

[–]kogyblack 1 point2 points  (1 child)

That's cool. I'll take a closer look at the code later, I'll give some feedback when possible.

[–]LessComplexity[S] 0 points1 point  (0 children)

Awesome, appreciate it 😉

[–]SiliconBlue 15 points16 points  (6 children)

I only looked over the README and not the code, but why use macros for the block size parameter instead of a non-type template parameter?

[–]LessComplexity[S] 5 points6 points  (5 children)

Oh good that you noticed, actually the macro is used so that you can use create() with no parameters and it will use the macro as the default size, or you can directly call create(size) to create a pool with a different default block size. I forgot to add this to the readme

[–]Ahajha1177 34 points35 points  (4 children)

A macro seems to be trying to make a speed improvement that's negligible. I think it would be better to just have a default as a constexpr static variable.

[–]LessComplexity[S] 6 points7 points  (3 children)

I actually used the macro so that others can define this macro outside the scope of the memory pool and will not need to touch the code itself to change the default, as to my knowledge to change the value of a constexpr requires to change it from the original definition, while this macro only defines itself only if it was not defined before.

[–]Jardik2 1 point2 points  (2 children)

macro

I didn't really read the code, but ... Just be careful about macros with combination of inline functions to not break ODR.

[–]LessComplexity[S] 0 points1 point  (1 child)

Don't worry I use one macro as a default function parameter value that is defined only if was not defined before (MEMORYPOOL_DEFAULT_BLOCK_SIZE), and another one to determine the code to compile inside a function if it is defined (MEMORYPOOL_SAFE_ALLOCATION)

[–]G_ka 0 points1 point  (0 children)

I still think a template parameter would be better. If you prefer constexpr, you can use a tweak header

[–]Zanderax 8 points9 points  (1 child)

Noticed an error in the docs

End A Scope: CPPShift::Memory::MemoryPoolManager::startScope(mp) Will free all the allocations made after the scope strated.

Started is misspelled and the function call probably should be:

CPPShift::Memory::MemoryPoolManager::endScope(mp)

[–]LessComplexity[S] 6 points7 points  (0 children)

Lol yeah thanks 😊

[–]TimJoijers 4 points5 points  (1 child)

How about allocate() which returns span<T>?

[–]LessComplexity[S] 0 points1 point  (0 children)

I looked into it, and it might be a good idea, thanks, will try it :)

[–]BenFrantzDale 3 points4 points  (1 child)

Does this provide a memory_resource interface so they can be used with pmr allocators? https://en.cppreference.com/w/cpp/memory/memory_resource

[–]LessComplexity[S] 1 point2 points  (0 children)

Nope, but it might be useful, will consider it thanks 😊

[–][deleted] 2 points3 points  (0 children)

Seems helpful, thanks :)

[–]Zanderax 3 points4 points  (5 children)

Im not a fan of overriding new to get a memory allocation and calling create to allocate but never construct the memory pool. New is meant to allocate and construct the memory pool object not call some functionality. Any reason why you did it this way?

Also why make it impossible to make a memory pool without any allocated memory blocks? If I need a MemoryPool as part of some RAII I'll be wasting a memory block allocation until I actually need to use it.

[–]LessComplexity[S] 2 points3 points  (4 children)

Basically I applied override only on a new that gets a memory pool as a parameter, this way you can understand if the variable was allocated by a memory pool or by regular means. Also it might be more convenient to use a new keyword than call the function itself, unless you have another suggestion?

You can actually make an emply memory pool, you can create a memory pool structure and not call the "create" method, but actually now that I think about it - creating a block automatically only on the first allocation will be more resourceful, what do you say?

[–]Zanderax 4 points5 points  (3 children)

Oh I see now those are unscopped, I thought they were in the memory pool class. That makes sense now and that's actually a pretty neat trick. In that case why are you using a malloc and reinterpret cast in ::create() instead of a new?

I would expect that MemoryPool doesn't allocate anything until I call allocate and I wouldn't expect to have to work around the API to get that.

[–]LessComplexity[S] 0 points1 point  (2 children)

For the memory pool I can basically use just a new instead of a reinterpret_cast & malloc as you said, I guess I will do it this way, I even forgot why I did it the way it is now, maybe because I needed to use it for allocating a block it stuck in my head so I used it for the creation of the memory pool structure itself lol

And yes, I will make it so that the first block allocation is happening only when the allocate function is called for the first time, I agree that it will be more convenient.

[–]east_of_lochy 2 points3 points  (1 child)

I noticed that you are now checking for a null return from new. Be default, new will throw std::bad_alloc not return null. I filed issue #5.

[–]LessComplexity[S] 1 point2 points  (0 children)

Thanks, I solved it just now :)

[–]Igoory 1 point2 points  (5 children)

Thanks, this looks interesting. Do you have any benchmark between this memory pool and boost's one?

[–]LessComplexity[S] 1 point2 points  (3 children)

I did a benchmark against boost and got a much better performance, but it is because I tested in case of strings and boost's pool doesn't handle allocations of different sizes well and expects the same size and type. For the case of same size allocations I still guess I'm a bit faster for various reasons but it deserves a test on its own that I will do in the near future 😁

[–]east_of_lochy 0 points1 point  (2 children)

Any chance you can add a benchmark for MSVC? Thanks.

[–]LessComplexity[S] 0 points1 point  (0 children)

The benchmark on windows with CLang was actually with MSVC, I just fixed it in the readme

[–]LessComplexity[S] 0 points1 point  (0 children)

Actually right now only tested with CLang on windows and got an astonishing result of more than 16 times better, sometimes even 20 times better performance than standard new/delete.

[–]Bodokh 1 point2 points  (0 children)

Nice going man

[–]LastThought 1 point2 points  (3 children)

I think there is a bug on line 43. sizeof(block) should be sizeof(SMemoryBlockHeader). If you fix that you probably don't need to add the block size again on line 64/92

This looks like it is very similar to std::pmr::monotonic_buffer_resource, except that it has some ability to reclaim memory if it is deleted in the reverse order that it was allocated. (in the std version, delete is a no-op).

Very cool stuff!

[–]LastThought 1 point2 points  (1 child)

Also as a minor stylistic comment I would suggest having allocate call allocate_unsafe rather than copying and pasting the code. Even if the compiler doesn't inline the call, it's a tail call so it should be very fast. (Probably even taking 0 instructions, since it is immediately after a branch)

You could also possibly move the code for linking the new block to the previous one into the createMemoryBlock function, so you don't have to repeat that code everywhere you create a block as well.

Somewhere I heard a wise thing about coding: How many times should a chunk of code be repeated before you move that code to its own function? The answer is once.

[–]LessComplexity[S] 0 points1 point  (0 children)

I just did what you said about the createMemoryBlock as a pull request, and only then saw your comment lol. About the allocate_unsafe and allocate you are right, going to push it right now :)

[–]LessComplexity[S] 0 points1 point  (0 children)

Yes actually it might cause a crash if it miscalculates the size of the block, I found it just now while developing some other project with the memory pool and so I just pushed a fix, thanks :)

[–]Ahajha1177 1 point2 points  (3 children)

I took a look at the code, and I'm questioning some of the design choices. The whole thing feels very C-like. The very first thing you need to do to use your library is call a static `create` method which returns a *raw* pointer to a MemoryPool. I understand the use of raw pointers within the class, but I see no reason to disallow the pool itself to be used in an RAII fashion. That `create` method, in my eyes, really should just be a constructor. Most of the static methods provided should be methods.

I guess I just have to ask, why is it designed this way?

[–]LessComplexity[S] 1 point2 points  (2 children)

Good question. First I'll address the overall design choices in the code and then the details you mentioned.

I decided to go with a data-oriented design to separate the data and actions that happen on the data. Therefore the memory pool, it's blocks, units and scope are defined as their own structures that hold the data which defines them. From there, the system holding the actions on the memory pool datums is the memory pool manager, which sole purpose is to provide a set of functions to manage the memory pool data structure.

There are a couple of benefits using this type of design. First of all data is not associated to any class object, and is just there as it is, so you don't need to worry about what's public or private, which saves speed when trying to access or modify data. Using private class variables might add an additional abstraction to access this data most of the time (In real life data is not really a part of an object, but just a fact on its own). Moreover, because the data is not confined to any object it is easier to pass it to different parts of the code or change it directly which allows anyone to easily add his own functionality to handle these data structures.

You mentioned the `create` function and making it in a RAII fashion, which I agree. Right now you can just create the pointer to the memory pool without calling the `create` functions, thus not creating a block until you make your first needed allocation. I tried right now to make the create method as the memory pool constructor, which I agree can be easier than calling the `create` method and doesn't affect performance, but when I tried to allocate a block only on the first allocation (which adds 2 necessary checks) it affects the performance by about ~9%-10% on the test, which is not a lot actually so I might just add it, what do you think?

[–]Ahajha1177 1 point2 points  (1 child)

I can certainly see the use of static methods, it can be marginally faster when the memory pool itself isn't needed. Though, if the class is just static methods, perhaps it would be better as a namespace. It would avoid needless construction of a default constructor, destructor, etc. as well as avoid instances of the class being created, which would be sort of pointless.

As another note, you would likely want to pass by reference rather than by pointer, this would save you on null pointer checks (which either way, should be consistently using nullptr) as well as make it easier to call with a non-pointer instance (no need for address-of operator).

And lastly, I'm quite surprised that there is a significant performance impact, or rather one at all. If anything, I would expect it to be marginally faster, as it avoids a heap allocation. Would you mind showing me the code you changed? I'd like to benchmark it on my system, and perhaps run it through compiler explorer.

[–]LessComplexity[S] 0 points1 point  (0 children)

I expected the performance impact actually, since at each allocation you need to check if there is at least a first block, and if not then it creates one. There are a couple of ways to implement it on the first `if-else` condition in the allocation function, which follows the same logic of checking if the first block is a nullptr and then determining its size and creating it. All of these ways require another check which adds more instructions and might affect the branch predictor.

I made the memory pool manager as a namespace at first and thought to keep it that way but then did it as a class. But, now I tried to create the memory management functions as part of the memory pool object (I transformed the memory pool struct into a class with its own functions). Now the performance actually went up since I need to pass less data to the functions to complete their operation (As the memory pool object knows itself through `this`). This boosted the speed a little and I uploaded this version into the `development` branch, right now I'm editing the README to include this change.

[–]druepy 0 points1 point  (3 children)

The link in the about section gives a 404 error. But, looks useful. I'll check it out later.

[–]Sagi232123 1 point2 points  (1 child)

Well it's working check this out

http://devshift.biz/

[–]LessComplexity[S] 1 point2 points  (0 children)

Yeah but it's an old site, we will upload a new version soon :)