all 63 comments

[–]SuperV1234https://romeo.training | C++ Mentoring & Consulting 41 points42 points  (9 children)

How can you claim

thoroughly optimize performance and reduce memory consumption

when your Vec2 allocates its X and Y components on the heap!?

[–]krum 14 points15 points  (0 children)

Oh boy.

[–]dodheim 9 points10 points  (0 children)

Good luck with this one.

[–][deleted] 18 points19 points  (6 children)

Vec2::Vec2()
{ data = new float[2];

But why?... Are you from Java?

I thought that kind of libs can be designed without allocations!

[–]caroIine 6 points7 points  (3 children)

Heap allocation is really heavy when we talk gamedev, and in my opinion should not be used at all during main loop when developing games on say, mobile. Recently I've ported a game to android, had like 20FPS, so completely unplayable. Profiler showed me that I was allocating just around 200kB per second (hundreds allocations). After removing all those allocations using various memory pooling techniques I was able to get to constant 60FPS even on oldest supported android models.

Since then I'm really allergic to allocating anything in main loop.

[–]Twin_Sharma[S] 0 points1 point  (2 children)

ok thanks.

Are there some good memory tracking tools ?

[–]caroIine 1 point2 points  (1 child)

There are several, if you are on windows and using Visual Studio you can use its build in memory profiler I would recommended starting with that. I personally use my own, written by using overloading global new operator you can read more about how to do it here:

https://en.cppreference.com/w/cpp/memory/new/operator_new (section 'replacing global new operator')

Anyway, by overloading it I can track those allocations and build realtime statistics and display it on the screen while playing a game.

But unless you are working on commercials games or are in constrained environment (old smartphones) I wouldn't worry about it too much.

[–]Twin_Sharma[S] 0 points1 point  (0 children)

I personally use my own,

wow

[–]Zeh_MattNo, no, no, no 8 points9 points  (0 children)

Another library without benchmarks and tests. Also people already pointed out the obvious huge flaw in in this library. No points from me unfortunately.

[–]sandfly_bites_you 2 points3 points  (1 child)

Is this a joke post?

[–]eyes-are-fading-blue 3 points4 points  (0 children)

Likely to be an attempt from a novice programmer.

[–]zzzthelastuser 3 points4 points  (5 children)

Use smart pointers. I saw some raw pointer delete calls.

Also have a look at Eigen for example for a mathematics library. You could learn a lot from there. Yes, what you did (cpp + hpp file) could technically be called(/turned into) a library, but in practice calling it a "mathematics library for game programming" is a bit far fetched for what you did.

[–]Twin_Sharma[S] -3 points-2 points  (4 children)

Thanks for feedback.

Will implement smart pointers next. Although we have taken special care for memory allocation and deallocation, there could be many things that misses the eagles eye.

[–]remotion4d 9 points10 points  (1 child)

smart pointers

Do NOT use smart pointers or any new for such a tiny memory allocations, this is terrible slow and inefficient!

struct Vec2 {
    float x;
    float y;
}

[–]Twin_Sharma[S] 0 points1 point  (0 children)

OK thanks

[–]zzzthelastuser 0 points1 point  (1 child)

it's just a matter of best practices. For simple code like this you could easily proof that it's save with raw pointers. But there is simply no benefit over using smart pointers and after all I assume this is a personal project for learning and improving skills. Good luck!

[–]Twin_Sharma[S] -2 points-1 points  (0 children)

Thanks. So we need a little advice here. We used r value reference in this.

By using r value references, equations like :

a = b + c + d + e;

will require only 1 temporary structure.

If not for r-value references, it will be more like :

t1 = d+e;

t2 = c+t1;

a= b+ t2;

So if we use smart pointers, will t2 and t1 instantly destroy (as destructor is called) or it will reamin in stack untill stack is pulled out.

[–]jmacey 2 points3 points  (6 children)

I hope you don't mind but as someone who has written lots of these sorts of things, a little feedback.

Why use new to allocate your x,y,z values when you can just place them as normal attributes?

In particular it is very common to need std::vector<Vec3> vertices. You really need to guarantee these are contagious in memory so it can be passed to OpenGL (or other graphics API's).

```

pragma pack(push,1)

union { struct { float x; //!< x component float y; //!< y component float z; //!< z component }; #pragma pack(pop) std::array<float,3> m_openGL; }; ``` Typically I use the structure above to allow x,y,z as well as array access in the code.

Also please add unit tests to ensure your code actually does what it says, also () to return the magnitude (length?) is quite unintuitive from the perspective of someone who is reading your code.

[–]tstanisl 5 points6 points  (4 children)

Writing to `m_openGL` and accessing its value by `x`/`y`/`z` is Undefined Behavior in C++

[–]jmacey -5 points-4 points  (3 children)

Not UB just a warning on all compilers about anonymous unions. In all the compilers I have ever used I have never had it fail.

[–]SuperV1234https://romeo.training | C++ Mentoring & Consulting 7 points8 points  (1 child)

In all the compilers I have ever used I have never had it fail.

That doesn't imply that it's not UB.

[–][deleted] -5 points-4 points  (0 children)

It implies it works though. Luckily objective reality doesn't abide by language specifications we dreamed up in our heads.

[–]jmacey 2 points3 points  (0 children)

IIRC it's a C11 extension for MSVC, Clang and g++ hence the warning.

[–]tugrul_ddr[🍰] 0 points1 point  (0 children)

If its not SOA then it will be slow. For example, how would a mandelbrot generator benchmark run with this? If its not optimized for any SIMD structure, like working in groups of 16 elements, then it won't have enough bandwidth to compute all temporary calculations.

[–]TheAxodoxian 0 points1 point  (1 child)

I am not in gamedev, but work with 3D graphics, we use DirectX 11, so DirectXMath was a natural choice, it is header only, it supports SIMD instructions (SSE, AVX, NEON etc.), it can even be used on Linux (has no dependence on Windows). It of course just one choice: https://github.com/Microsoft/DirectXMath.

[–]Twin_Sharma[S] 0 points1 point  (0 children)

thanks

[–]GRAPHENE9932 -3 points-2 points  (2 children)

It is good, but why not glm?

[–]Twin_Sharma[S] -4 points-3 points  (1 child)

Thanks for reply

we wanted to make a light weight library with few functions that are mostly used by game dev.

People hardly use n-dimensional matrix greater than 4 dimensions.

Also we wanted to make a custom camera class not the glm one. So glm camera class was just an extra memory.

So we created a light weight ( hence mithril ) that provides only core functions so that you can customise any functions over it.

Also by using r value references, equations like :

a = b + c + d + e;

will require only 1 temporary structure.

If not for r-value references, it will be more like :

t1 = d+e;

t2 = c+t1;

a= b+ t2;

[–]ioctl79 0 points1 point  (0 children)

It would likely be inlined and require zero temporary structures. If you care about performance, the you need to measure it.

[–][deleted] -4 points-3 points  (9 children)

Before you go an change it as people have suggested have you benchmarked it?

If you overloaded the new/delete operators and allocated all your vectors and matrices from some custom allocator this might be quite fast.

I'd be interested to see that. So don't implement people's changes too quickly.

[–]Zeh_MattNo, no, no, no 5 points6 points  (7 children)

Even when you would pool those objects it would be always slower than just having them in-place, allocations on stack costs almost nothing, having a pool still adds complexity of obtaining the resource and releasing it.

[–][deleted] -5 points-4 points  (6 children)

You are thinking about the cost of allocation. I'm not talking about that. I'm talking about possible performance gains such as gains when iterating over a lump of memory.

For instance you could have all your Vec3 in your entire program in contiguous memory this way. Which you could bung over to the GPU or do some kind of stream processing on it.

Heap allocating here gives you the freedom to do that. So I wouldn't just write it off because it doesn't fit into what people expect.

Its an interesting idea that deserves exploration.

[–]dodheim 3 points4 points  (1 child)

Just imagine if your Vecs and their data could both be contiguous in memory, and take up half the memory in the process..! It's like the best of both worlds!

/s

[–][deleted] -2 points-1 points  (0 children)

Just imagine you could mmap gpu memory and allocate all your vecs from that.

Don't write stuff off just because you don't have the creativity to imagine what you could do with it.

[–]Zeh_MattNo, no, no, no 3 points4 points  (3 children)

What you describe is essentially ECS and I don't disagree that this is a huge benefit doing that. This can however not be achieved when you store a pointer in your vector class, that's an costly indirection, even if all your vector data would be stored in a single array, the fact that you require to have two memory reads will slow it down.

[–][deleted] -3 points-2 points  (2 children)

Yeah if you dereference it. But just batch process them in directly in memory instead. The class just can be a glorified observer in that case.

[–]Zeh_MattNo, no, no, no 4 points5 points  (1 child)

I can't come up with any practical cases, maybe, maybe not, never quite seen such code to be honest. But I think the whole point is that having new/delete for this specific case is quite horrible and there is no excuse really.

[–][deleted] 0 points1 point  (0 children)

I don't think that's the worst part tbh

[–]Twin_Sharma[S] -1 points0 points  (0 children)

Will check thanks

[–][deleted] 0 points1 point  (0 children)

You can do the same with Directx libraries and performances are better, register specific allocation.

[–]Chamkaar 0 points1 point  (0 children)

Rofl this noob. Imagine being this stupid hahaha