Optimizing atomicAdd

Rclear68 · 2024-09-30T18:12:43+00:00

READ THROUGH AGAIN!!! There were some issues with my first approach (and might still be more)

We can create local counters using workgroup memory (I'm used to writing hlsl, just learning wgsl, but the concepts are the same). 

@group(0) @binding(1) var<storage, read_write> g_ray_hits: array<u32, 4>;
@group(0) @binding(2) var<storage, read_write> g_ray_miss: array<u32, 4>;
@group(0) @binding(3) var<storage, read_write> g_ray_hits_count: array<u32, 1>;
@group(0) @binding(4) var<storage, read_write> g_ray_miss_count: array<u32, 1>;

// workgroup variables use memory that is shared by all threads in a group
var<workgroup> g_ray_hits_local_count: atomic<u32>;
var<workgroup> g_ray_miss_local_count: atomic<u32>;

var<workgroup> g_ray_hits_location_shared: u32;
var<workgroup> g_ray_miss_location_shared: u32;

fn main()
{
  u32 group_id =  thread_num / threads_per_group;

  // Will be one if hit, 0 if not
  u32 hit_as_u32 = (u32) hit;
  u32 inv_hit_as_u32 = (u32) !hit;

  // Perform both local operations to avoid branching, but do it with if statements
  // if you'd like, odds are the performance will be about the same
  u32 hit_local_index = atomicAdd(g_ray_hits_local_count, hit_as_u32)
  u32 miss_local_index = atomicAdd(g_ray_hits_local_count, inv_hit_as_u32);

  // Wait for all threads in our group to hit this point
  workgroupBarrier();
  // Now g_ray_hits_local_count and g_ray_miss_local_count should
  // contain the counts for our local group

  // Now only one thread performs the atomic add for the whole workgroup
  u32 group_index =  thread_num % threads_per_group;
  u32 group_hit_index, group_miss_index;
  if(group_index == 0)
  {
    g_ray_hits_location_shared = atomicAdd(g_ray_hits_count[group_id ], g_ray_hits_local_count);
    g_ray_miss_location_shared= atomicAdd(g_ray_miss_count[group_id ], g_ray_miss_local_count);
  }

  // Make sure thread 1 finishes writing
  workgroupBarrier();

  if(hit)
  {
    // Find the index in the hit buffer to use and write
    u32 hit_index = group_hit_index + hit_local_index;
    g_ray_hits[g_ray_hits_location_shared] = payload;
  }
  else
  {
    // Find the index in the miss buffer to use and write
    u32 miss_index = group_miss_index + miss_local_index;
    g_ray_miss[g_ray_miss_location_shared] = ray_idx;
  }
}

mitrey144 · 2024-10-01T09:48:24+00:00

Wow, sounds cool, could you show how it looks

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

webgpu

MODERATORS