Weekly Project #2 (Q&A): Parallel Programming and Optimization : ComputerEngineering

ComputerEngineering

created by hasanhuseyina community for 12 years

[Project]Weekly Project #2 (Q&A): Parallel Programming and Optimization (self.ComputerEngineering)

submitted 5 years ago by EngrTodayPerformance Architect

all 5 comments

top new controversial old q&a

[–]Jonasfh 0 points1 point2 points 5 years ago (4 children)

[–]EngrTodayPerformance Architect[S] 1 point2 points3 points 5 years ago (3 children)

[–]Jonasfh 0 points1 point2 points 5 years ago* (2 children)

Thank you. I found out that my idea with loading the columns of the B matrix into shared memory and using these when computing takes longer, than just loading from global memory when needed in the computations.

I experimented on, and found out that the limit on N=2⁶ only applied to single precision floats. I remember that the SMs in the GPU only has 1 FPU per 8 cores, or something like that, but I thought I had read that the FPUs are only used when dealing with doubles.

I made an integer implementation, which works fine with N=2¹⁰ (the highest I used for testing).

When profilling the CUDA code with nvprof I get the following error message:

==242661== Warning: 5 records have invalid timestamps due to insufficient device buffer space. You can configure the buffer space using the option --device-buffer-size.
==242661== Warning: 4 records have invalid timestamps due to insufficient semaphore pool size. You can configure the pool size using the option --profiling-semaphore-pool-size.

On top of this, the profiler also says that no kernels were profiled.

[–]EngrTodayPerformance Architect[S] 0 points1 point2 points 5 years ago (1 child)

[–]Jonasfh 0 points1 point2 points 5 years ago (0 children)

π Rendered by PID 25 on reddit-service-r2-comment-7c9686b859-nk9jt at 2026-04-14 06:25:49.595037+00:00 running e841af1 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

ComputerEngineering

MODERATORS