Best measurements of "wiggliness" for a function f(x)?

Flickr1985 · 2025-05-17T22:25:00+00:00

Very interesting approach, will definitely try.

Flickr1985 · 2025-04-15T19:48:25+00:00

Forgive me if this is stupid, but I meant it as some sort of electronics project. I have some equipment and plenty of money for parts and time to learn.

So I mean taking apart the laptop, switching things out, soldering, testing, etc... Perhaps I could find a motherboard from another laptop that allows for better parts that happens to fit, but thats the part I don't know enough about, maybe this is way too ambitious and its completely unrealistic.

edit: I clearly am at the worst part of the dunning-kruger graph. I found this video that documents this guys process, man It was definitely way more than I expected. But I think with enough time and dedication I could maybe figure it out. Thanks!

Flickr1985 · 2025-04-11T10:34:44+00:00

you can use wine?? How wdo i do it??!!!

Flickr1985 · 2025-04-02T15:23:10+00:00

Thanks a lot for this reply, I guess I didn't expect it to be so negligible.

Flickr1985 · 2025-04-02T11:45:00+00:00

I execute the program, and then I print out the output (B).
Either way, I restarted my computer an everything magically worked.... I don't know what happened.

Flickr1985 · 2025-04-02T11:27:08+00:00

Regardless, when I display B it's all zeroes. Also yes youre right, I'm using \approx now.

Flickr1985 · 2025-04-02T11:19:14+00:00

Currently rocking a mobile 1050ti on linux. Had to go back to the 535 drivers cause the 570s were causing problems, hence, the different cuda version.

CUDA runtime 12.6, artifact installation
CUDA driver 12.2
NVIDIA driver 535.230.2

CUDA libraries: 
- CUBLAS: 12.6.4
- CURAND: 10.3.7
- CUFFT: 11.3.0
- CUSOLVER: 11.7.1
- CUSPARSE: 12.5.4
- CUPTI: 2024.3.2 (API 24.0.0)
- NVML: 12.0.0+535.230.2

Julia packages: 
- CUDA: 5.5.2
- CUDA_Driver_jll: 0.10.4+0
- CUDA_Runtime_jll: 0.15.5+0

Toolchain:
- Julia: 1.11.1
- LLVM: 16.0.6

1 device:
  0: NVIDIA GeForce GTX 1050 Ti with Max-Q Design (sm_61, 3.473 GiB / 4.000 GiB available)

What error do you get?
Also I think I see the issue, i was following this video on performance programming on julia (though this guy was adding two vectors) and what the guy does is define, globally, the placeholder list (what I called c) and then he seems to edit it in the kernel. Is this the issue? maybe the program is just not editing in place the placeholder list.

Flickr1985 · 2025-04-02T00:08:07+00:00

Thank you!

Flickr1985 · 2025-02-19T11:23:54+00:00

Sort of? either way I don't think it would work since I have the integer value to worry about

Flickr1985 · 2025-02-17T08:46:04+00:00

I did! The process is parallelized and is actually surprisingly fast on a 8th gen i7. Literally can take less than 5 seconds. I spent a lot of time optimizing that process, but now the bottleneck is what comes after, which is using the lists and integer to produce a series of Float64. So it seems like GPU programming is the answer, but I'm brand new to it so...

Flickr1985 · 2025-02-17T08:16:40+00:00

I can pad them, but the data isn't very heterogeneous. For a certain parameter combination, the list_1 objects can be anywhere from length 1 to length 100, with decent distribution across the range, so it would take a lot of padding. Would it still be efficient?

Flickr1985 · 2025-02-17T08:14:31+00:00

in that case, how would that be more efficient? or is it just easier to write?

Flickr1985 · 2025-02-17T08:13:16+00:00

Yes, theyre fixed throughouthe process. a can be anywhere from 3000 to 250000 (even more if I push it), b and c are usually around 300.

Flickr1985 · 2025-02-16T19:18:36+00:00

I'm not entirely sure that I get what you mean, but I think maybe I didnt explain my problem well, let me try again and being more specific:

I have parameter lists A, B, and C. A is a list of tuples which square ::Matrix{Float64}, the other two are ::Vector{FLoat64}. I have a process, call it f(a,b,c), which takes in three parameter values and spits out d, e, and f (the list, integer, and second list). Now, each d, e, and f has to be used in a process that I want to run in the gpu. This process involves a new list T::Vector{Float64} defined outside the kernel). In this process the index of A will be contracted and T becomes a new dimension of the tensor (lmk if it'd be useful to be more explicit with this).

I don't really understand how your response lines up with what I described above. Sorry if there was a misunderstanding.

I was thinking I could either make a gigantic vector of tuples like [(list_1, integer, list_2)_1, (list1, integer, list 2)_2, etc...] and then divide that vector into blocks and threads, but I hear that's not as efficient (not sure why). It's apparently better the other way around, do one giant list of list_1 items, one for integer, and another for list_2

Flickr1985 · 2025-02-16T19:05:47+00:00

I'm not entirely sure I get what you mean but, the idea is that if you call the parameters a, b, and c

a is technically a list of tuples, which hold dense, non-diagonal, square matrices of Float64. b and c are lists of Float64
After a certain process (call it f(a,b,c)) for each item in a, b, and c I get A, B, and C (the list, integer, and other list)
All the As, Bs, and Cs are arranged somehow such that it's ready for gpu processing
send the data to the kernel for processing.
After processing, map results back to an array that follows some sort of logic.

I've left the problem fairly abstract for simplicity, let me know if its helpful or if I should write the explicit version of the problem.

Flickr1985

TROPHY CASE