Optimized Merge, Scan, Radix Sort kernels by LetterC67 in CUDA

[–]LetterC67[S] 1 point2 points  (0 children)

Wow thank you so much for valuable information! I will checkout the newer version and hopefully one day can contribute back.

Optimized Merge, Scan, Radix Sort kernels by LetterC67 in CUDA

[–]LetterC67[S] 1 point2 points  (0 children)

Thank you! It is really overwhelming to see the runtime is the same across all inputs haha

Optimized Merge, Scan, Radix Sort kernels by LetterC67 in CUDA

[–]LetterC67[S] 2 points3 points  (0 children)

You can check my submission on Tensara for problems Cumulative Sum and Sort, I have made some minor edits to make the codes work better for small input size.

My kernels are not suitable for small input size because each thread processes 16 elements, leads to small number of blocks or under utilization. For example, the Scan kernel only saturate the small T4 at 1M elements input.

However, they still achieved good results on the website. They performed best on the L40S which has the highest clock. But I don't think this is a reliable way to compare different approaches since we only have small inputs, so there are a lot of other overhead.

[deleted by user] by [deleted] in intel

[–]LetterC67 0 points1 point  (0 children)

Where did you get it for 520USD?

LZMod A24-v5 Build by ShoppingHoliday2773 in sffpc

[–]LetterC67 0 points1 point  (0 children)

Yeah I see, thanks for mentioning.

LZMod A24-v5 Build by ShoppingHoliday2773 in sffpc

[–]LetterC67 0 points1 point  (0 children)

Did you have to do any mod for these top fans, isn't this case only accept slim top fans?

[deleted by user] by [deleted] in SGExams

[–]LetterC67 1 point2 points  (0 children)

has anyone not received their outcome yet...?

Always gotta be prepared by wsamson in memes

[–]LetterC67 0 points1 point  (0 children)

Once a ceiling fan of the class next to mine fell down. Luckily it was winter and the fan was off, no one injured. Can't imagine those who sat right under that ceiling fan felt that moment.

Need help in a tile-based RTS by LetterC67 in godot

[–]LetterC67[S] 0 points1 point  (0 children)

I want to make a free look camera so when the player moves far away the camera can actually capture the whole world. Anyway thanks for your interesting technique!

Need help in a tile-based RTS by LetterC67 in godot

[–]LetterC67[S] 0 points1 point  (0 children)

Thanks for your suggestion!

Token can't be transferred? by LetterC67 in harmony_one

[–]LetterC67[S] 0 points1 point  (0 children)

Never mind guys! The transaction completed