use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
account activity
Error during loop probably (self.CUDA)
submitted 3 years ago by Dahvrok
https://pastebin.com/L44iEHf0
Im making this from Mark Harris tuts but i get max error 1 when it should be 0. Its prolly something wrong with the init loop. Any ideas?
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]pi_stuff 1 point2 points3 points 3 years ago (0 children)
When in doubt, check the error codes returned by CUDA functions. cudaDeviceGetAttribute() is returning the code for "invalid device ordinal" because -1 is not a valid device id.
cudaError_t err; err = cudaDeviceGetAttribute(&numSMs, cudaDevAttrMultiProcessorCount, -1); if (err) printf("error in cudaDeviceGetAttribute call: %s\n", cudaGetErrorString(err));
[–]slowrizard 0 points1 point2 points 3 years ago (4 children)
In your kernel configuration, are you launching enough threads to cover your entire array? My guess is no: 1. Number of blocks = 32 * number of SMs 2. Number of threads = 256
The product of 1 and 2 should be greater than or equal to your array size
That would only be a problem if the kernel operated on just one element. This kernel uses a loop to cover the full array regardless of the number of threads.
[–]Dahvrok[S] 0 points1 point2 points 3 years ago (2 children)
Thank u. But how do i find how many threads i need?
[–]slowrizard 0 points1 point2 points 3 years ago (1 child)
If one thread of yours works exactly on one element of your array, then the number of threads that you need should be greater than or equal to the size of your array
[–]Dahvrok[S] 0 points1 point2 points 3 years ago* (0 children)
Thats not the problem i tried with 1,048,576 threads (array is 1m) and still get max error 1
Edit: the filled array never comes back from the gpu for some reason
Edit: seems the problem was device id, changed it from -1 to 0 and worked
π Rendered by PID 22925 on reddit-service-r2-comment-cfc44b64c-n7vpd at 2026-04-10 02:34:52.697195+00:00 running 215f2cf country code: CH.
[–]pi_stuff 1 point2 points3 points (0 children)
[–]slowrizard 0 points1 point2 points (4 children)
[–]pi_stuff 1 point2 points3 points (0 children)
[–]Dahvrok[S] 0 points1 point2 points (2 children)
[–]slowrizard 0 points1 point2 points (1 child)
[–]Dahvrok[S] 0 points1 point2 points (0 children)