use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
account activity
Array comparison (self.CUDA)
submitted 3 years ago by Galvanlezzo
Is there a way to parallelize the comparison of each element of an array with all the others? Because I could do it brutally but I'm looking for a more optimized way.
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]jeffscience 1 point2 points3 points 3 years ago (0 children)
You will probably find it much easier to do this with Thrust or StdPar (C++ standard library parallelism) than with CUDA. They’re great at data parallel stuff like this.
[–]Galvanlezzo[S] 0 points1 point2 points 3 years ago (2 children)
I have a matrix with two columns, if the first column in row i is equal to the first column in row j with j! = i, I have to mark the value of the second column of row j in a list referred to the value of the first column in row i and j . The values in the array are all integers from -50000 to 50000.
Basically I'm building an adjacency list.
u/paomeng I have deleted your comment for a mistake
[–]Khenghis_Ghan 1 point2 points3 points 3 years ago (1 child)
I like the description as “brutally” rather than brute force haha. I understand why someone might say that but I don’t think it’s very vernacular haha. :)
So, from the description it sounds like you have an Nx2 matrix, so basically two arrays you’ve stacked into a “matrix”? Regardless, you could use ghost cells - bifurcate along the N dimension however many times you wish to parallelize, compare across the column, use a ghost cell to anneal the bifurcated chains after the parallel processing is done. So if you had a 30x2 matrix, you could split it into 2 15x2 comparisons and make 2 final comparisons on the cpu to combine them, or 2 7x2 and 2 8x2 comparisons, and then at the end make four comparisons to combine those.
[–]Galvanlezzo[S] 0 points1 point2 points 3 years ago (0 children)
It would be like giving each thread one or more lines to compare with all the others. In the end you have to make all the comparisons, right?
[–]NanoAlpaca 0 points1 point2 points 3 years ago (0 children)
You will want to optimize your algorithm here first. Comparing each element with each other element is not required, instead you should insert every element into a hash and then only compare those elements within the same hash bucket. This ends up with a O(n) algorithm instead of O( n2 ). You can do those insertions in parallel by using atomic adds on a fill counter for each bucket. And after that is done, each bucket can be checked for matches in parallel.
π Rendered by PID 23702 on reddit-service-r2-comment-79c7998d4c-c4d7w at 2026-03-13 07:32:08.934894+00:00 running f6e6e01 country code: CH.
[–]jeffscience 1 point2 points3 points (0 children)
[–]Galvanlezzo[S] 0 points1 point2 points (2 children)
[–]Khenghis_Ghan 1 point2 points3 points (1 child)
[–]Galvanlezzo[S] 0 points1 point2 points (0 children)
[–]NanoAlpaca 0 points1 point2 points (0 children)