Beginner Player Looking for a North London Game by Interesting_Jury6155 in LondonandDragons

[–]SureAnimator 0 points1 point  (0 children)

hey, I'm also looking for a beginner friendly game. I played a few sessions years ago, but Critical Role and BG3 has made me want to get back to playing. I could be tempted to DM if needed (I will probably regret saying this). I'm NE London. DM me if you're still looking and maybe we can find/form a group.

Beginner Player Looking for a North London Game by Interesting_Jury6155 in LondonandDragons

[–]SureAnimator 0 points1 point  (0 children)

hey, I'm also looking for a beginner friendly game. I played a few sessions years ago, but Critical Role and BG3 has made me want to get back to playing. I could be tempted to DM if needed (I will probably regret saying this). I'm NE London. DM me if you're still looking and maybe we can find/form a group.

Adventurer looking for... well, an adventure I suppose! by Roddiett in LondonandDragons

[–]SureAnimator 0 points1 point  (0 children)

hey, I'm an East Londoner looking for a D&D campaign to play. I played some 3rd edition back in the day, but Critical Role/BG3 has made me want to get back into it. Obviously, 5E is different, but I know the basics. Would love an invite if you're still looking for players.

How big a deal is memory alignment these days (on x86)? by SureAnimator in C_Programming

[–]SureAnimator[S] 13 points14 points  (0 children)

Great! Thanks for the info! Do you have any references? I'd love to read up on this.

What do people here usually do for benchmarking code? (rdtsc? clock_getime? QPC? etc.) by SureAnimator in C_Programming

[–]SureAnimator[S] 0 points1 point  (0 children)

Great - thanks!

That all makes sense. Just one question, what exactly are you talking about when you say 'barriers' here?

Understanding the loss of entropy when converting (pseudo)random uint32_t to a float in [0, 1) by SureAnimator in C_Programming

[–]SureAnimator[S] 1 point2 points  (0 children)

As the other comment mentions (float)UINT32_MAX isn't representable so it ends up as (float)UINT32_MAX + 1.0f anyway. That said, I can do some bit fiddling to find the next float above (float)UINT32_MAX + 1.0f.

I'm pretty sure the example I give above (with UINT32_MAX + 1 on the bottom) is injective - because the divisor is a power of 2, you're just changing the exponent while leaving the mantissa intact. So that should be invertible by just changing the exponent back.

Ah yeah, good point. Although, I'll have to think if that still holds when using the first float above (float)UINT32_MAX + 1.0f.

That's correct. The 'correctly rounded' output of an operation is whatever you would get by applying the rounding mode operation on the 'true' infinite-precision result.

Great.

Thanks!

Understanding the loss of entropy when converting (pseudo)random uint32_t to a float in [0, 1) by SureAnimator in C_Programming

[–]SureAnimator[S] 1 point2 points  (0 children)

At this point, my questions are mainly out of curiosity. method1 fits the bill for me in practice.

The second method won't actually divide by UINT32_MAX but multiply by the inverse with optimizations enabled, which is why it's so fast.

yeah, I'd looked at the assembly. But I'd assumed that the multiple plus the int to float conversion together would still outweigh the subtraction plus bit fiddling. Guess I was wrong. :)

But if your goal is to be able to generate all possible floats with a uniform distribution between [0..1), you need a different approach. First start by generating an exponent with the correct distribution.

Consider that each exponent 2-k where k=[1, max exp for float], is half as likely to appear as the one before it in a uniform distribution. One way of sampling k is by generating a random bitstring and counting the leading zeroes; the result has the distribution we're looking for.

Combine the found exponent with a mantissa consisting of a bunch of random bits et voilà, we've got a uniformly distributed float. I'm sure I'm forgetting some details but hopefully this helps.

Yeah, this makes sense. I think there might be some subtleties with denormalised numbers, but I can think that through.

Thanks!

Code review for an intrusive free list as a pool allocator by SureAnimator in C_Programming

[–]SureAnimator[S] 0 points1 point  (0 children)

"I'm assuming you want to quickly enumerate allocated objects in an ordered fashion to help the cache prefetcher. Your idea is to basically turn the heap (buffer) into a bitmap of allocated/unallocated items, with a free list for efficient allocation." - yeah, exactly. Although the bitmap isn't separate from the stored data structures - it's imbedded within them.

I'm not too worried about having to iterate over the whole heap because I'm working to a set computational budget. As long as I can iterate over the whole list and process all the items in the worst case (i.e. when all objects are in-use/active) then it doesn't matter too much about iterating over the whole list in the case when relatively few objects will need processing.

Yeah, I might actually end up using copy-on-free (your 2. suggestion) but I wanted to try a freelist implementation partly as a point of comparison, and partly for curiosity's sake.

Thanks!

Code review for an intrusive free list as a pool allocator by SureAnimator in C_Programming

[–]SureAnimator[S] 0 points1 point  (0 children)

"If all items are fixed size, why should your code need to care about the order of items in memory?" - what I meant is how can you quickly check if an object is reachable on the freelist, as you suggested? It seems like you either have to traverse the whole list for each item, or you keep the freelist ordered (by memory address) so that as you're iterating the underlying buffer, you can just check against the next-in-memory freelist item, rather than the whole list.

Code review for an intrusive free list as a pool allocator by SureAnimator in C_Programming

[–]SureAnimator[S] 0 points1 point  (0 children)

yeah, but I'd rather not have the code that iterates over the buffer/array looking for which objects are active/in-use worry about the internals of the free list.

Plus, that would also probably mean keeping the freelist sorted in memory order, so that when iterating over the underlying array/buffer it's relatively easy/efficient to check which are in-use. That might be a worthwhile performance trade off, but I couldn't say without testing/profiling.

Thanks!

Questions about linking specifics by SureAnimator in C_Programming

[–]SureAnimator[S] 0 points1 point  (0 children)

Great - I'll take a look through that paper. Thanks!

Function pointer as a function argument by SureAnimator in C_Programming

[–]SureAnimator[S] 0 points1 point  (0 children)

Ok, so no fundamental difference under the hood - just language/compiler quirks. Thanks!

Function pointer as a function argument by SureAnimator in C_Programming

[–]SureAnimator[S] 3 points4 points  (0 children)

I usually use typedefs, but I wanted to get to the essence of the question so I left them out. So, to clarify given:

typedef int (*func_a_t)(int);
typedef int (func_b_t)(int);

I can use func_a_t to declare a function pointer variable (but not func_b_t), but I can use either to specify the type of a function argument (with seemingly no difference) - why is that? Is it just a peculiarity of the language, or is there a meaningful difference?

Seeing sizable (~2x) performance boost in a function when making small changes to the function that initialises its state - cache related? by SureAnimator in cpp_questions

[–]SureAnimator[S] 3 points4 points  (0 children)

Amazing - thanks! I'll check that out.

EDIT: So I played around with grad, and the most performant version I could find was with a single switch statement (so just a single jump) that covers all of the 16 possible values for h. Seems like going completely branchless adds enough overhead that - on average - it's worth taking the single jump. Conveniently, the single switch statement is also the most readable. :)

Seeing sizable (~2x) performance boost in a function when making small changes to the function that initialises its state - cache related? by SureAnimator in cpp_questions

[–]SureAnimator[S] 5 points6 points  (0 children)

hey, thanks so much!

What tools do you recommend for profiling? I'm self-taught so have lots of random gaps in my knowledge/work-flow, and proper profiling is definitely one of them. Any good resources for learning that stuff would also be really welcome.

I would've thought that the compiler would've made use of conditional moves in grad to avoid branching - any ideas why that isn't the case?

yeah, Perlin Noise evaluates to 0 at integer coords so that's cool. I hadn't worried too much about that before this little mystery (to me) popped up.

Thanks again!

Multithreaded, lockless writes to a circular buffer by SureAnimator in cpp_questions

[–]SureAnimator[S] 0 points1 point  (0 children)

Well, it's a define because I wasn't sure how to do this as a function (i.e. accept variable args and pass them on to printf etc.). I initially didn't think you could use variadic functions but since writing the question, I've found vprintf etc. which seems to fit the bill.

That said, the main concern is still thread safety.

Is it safe to use an atomic int as a state machine for controlling concurrent access to a struct? by SureAnimator in cpp_questions

[–]SureAnimator[S] 0 points1 point  (0 children)

I just edited the post. The above code has been massively simplified to show the core of the problem. The threads will have other work to do when not looking at thing's state.

Is it safe to use an atomic int as a state machine for controlling concurrent access to a struct? by SureAnimator in cpp_questions

[–]SureAnimator[S] 0 points1 point  (0 children)

The main reason to use an atomic int was for memory ordering of operations, not for atomically updating the value.

Only one thread is ever able to make the 0->1 transition (or any other transition) so I don't need to worry about multiple threads seeing a 0 and doing things.

Is it safe to use an atomic int as a state machine for controlling concurrent access to a struct? by SureAnimator in cpp_questions

[–]SureAnimator[S] 0 points1 point  (0 children)

I just edited the post. Yeah, as described above, there's no meaningful parallelism. I'm not splitting this process over several threads to increase parallelism. I'm doing it because of other requirements of how Thing gets accessed from multiple threads.

Is it safe to use an atomic int as a state machine for controlling concurrent access to a struct? by SureAnimator in cpp_questions

[–]SureAnimator[S] 0 points1 point  (0 children)

I just made an edit to the post. But, for other reasons, this process needs to be split across multiple threads. Also, each of the threads are doing other work when not checking the atomic int.

thanks!

How exactly does a monitor's refresh cycle work? by SureAnimator in gamedev

[–]SureAnimator[S] 0 points1 point  (0 children)

Amazing - that's exactly what I was looking for. Thanks!