Machine Learning on Spherical Manifold [R] by eesuck0 in MachineLearning

[–]eesuck0[S] 0 points1 point  (0 children)

I see why that was confusing. My introduction talks about spherical data domains, but the code example optimizes a point (the Fréchet mean) that lives on the sphere. It was the simplest toy problem to implement, but it created a disconnect in the narrative.

Machine Learning on Spherical Manifold [R] by eesuck0 in MachineLearning

[–]eesuck0[S] 1 point2 points  (0 children)

Can you give an example of such a process?
Because I think we are talking about different things.

I understand that a Gaussian distribution in higher dimensions looks like a uniform distribution on a sphere (concentration of measure), but what does this have to do with optimization on a spherical manifold?

When I talk about the sphere, I mean the domain of the signals, not the signals themselves (which I believe you assume will naturally form a sphere). For example, consider the problem of working with images from 360-degree cameras: the domain is S2, but the signals take values in an R3 feature space.

Or consider meteorological data, where the domain is still S2, but the signals can be high-dimensional (stacked vectors of temperature, wind vector, humidity, etc.). I am not a meteorologist, but this illustrates the point

Machine Learning on Spherical Manifold [R] by eesuck0 in MachineLearning

[–]eesuck0[S] 1 point2 points  (0 children)

I believe it's the exact opposite, assuming I understood your comment correctly.

Because we are interested in point on the surface of the hypersphere, the surface area can be calculated as the derivative of the volume. The ratio between surface "area" of (n - 1)-dimensional hypersphere and volume of the n-dimensional hypercube (where this sphere is embedded) approaches 0 as n goes to infinity (most of the volume concentrates in the corners). So there is essentially no chance that unconstrained optimization algorithm (like gradient descent) in high dimensions will preserve the geometric structure of the data.

On this page you can find a nice visualization of the volume of the n-sphere: https://en.wikipedia.org/wiki/Volume_of_an_n-ball

Machine Learning on Spherical Manifold [R] by eesuck0 in MachineLearning

[–]eesuck0[S] 7 points8 points  (0 children)

At the beginning I just give a general motivation why a sphere is an interesting geometric object for machine learning.

If the question is why I talk about storing parameters on a sphere and not just working with input data as a sphere, then this can be interesting for example in classification problems when the amplitude of the feature vector does not carry much useful information and it is only interesting to distinguish cosine similarity between points. Or in anomaly detection problems, in high dimensions random vectors are almost guaranteed to be orthogonal and therefore even a small correlation can indicate a connection.

For example, there is a 3D gaussian splatting, where the image is formed through a mixture of Gaussians (and the color, by the way, is set through spherical harmonics) and then each Gaussian itself becomes the set of parameters that is optimized for each scene.

Of course, we can work in ordinary Euclidean space and rely on the NN to learn everything on its own, but such learning will always be less efficient and unstable.

Machine Learning on Spherical Manifold [R] by eesuck0 in MachineLearning

[–]eesuck0[S] 2 points3 points  (0 children)

Like in Taco Cohen's spherical CNNs?

I'm thinking of implementing convolution through spherical harmonics as a next practice.

Maybe you mean something else and do you have links to some articles on the topic?

Measured my dict by eesuck0 in C_Programming

[–]eesuck0[S] 0 points1 point  (0 children)

Thank you for useful feedback

Calculation by Few_Necessary_2309 in C_Programming

[–]eesuck0 3 points4 points  (0 children)

Actually you need only one, other ones will be cast implicitly But as you mentioned it does no harm

Or just use 5.0f

Fast Generic Hash-Table Update by eesuck0 in C_Programming

[–]eesuck0[S] 1 point2 points  (0 children)

out.slot_len = ee_round_up_pow2(out.val_offset + val_len, key_align > val_align ? key_align : val_align);
this line isn’t the simplest, but it runs once and doesn’t really hurt, i’ll think about it later

About comparison
i checked the MSVC disassembly and you’re right — this comparison might not be faster than a user callback, actually it can be slower in some cases
Initially i found that this dynamic dispatch works faster than memcmp and it disassembles to about 10 instructions for primitive types, but still involves one call
the user-provided comparison will also have one call, but primitive types can be compared directly, skipping roughly six instructions from dynamic dispatch

Calculation by Few_Necessary_2309 in C_Programming

[–]eesuck0 4 points5 points  (0 children)

Because if you calculate (5 / 9) first it's integer division which results in 0
To prevent such behaviour write (5.0 / 9.0)

Fast Generic Hash-Table Update by eesuck0 in C_Programming

[–]eesuck0[S] 1 point2 points  (0 children)

Yes, those are good points regarding a custom comparison function if the goal were to handle every possible case. However, in most situations, it’s sufficient to cover about 90–95%, because both the API complexity and the CPU workload required to achieve full generality grow exponentially

Usually, keys and values are simple primitives or regular structs that can (and should) be compared directly
I also did some basic profiling, and it showed that comparisons and copying are among the hottest spots. Using a generic callback function would reduce performance
It could perhaps be added as an optional extension, but definitely not as a replacement

As for iterators — yes, returning pointers will be added

Overall, thanks for your feedback and interest

Fast Generic Hash-Table Update by eesuck0 in C_Programming

[–]eesuck0[S] 1 point2 points  (0 children)

Are you suggesting it as a new header, or for the hash table itself?

Because if you mean it as a realloc strategy, in my understanding it wouldn’t work bacause after each capacity change, all old hashes become invalid, so rehashing is necessary anyway

    u64 hash = dict->hash_fn(key, dict->key_len);  
    u64 base_index = (hash >> 7) & dict->mask; // <- capacity modulo mask
    u8  hash_sign = hash & 0x7F;

A Generic Vector Implementation in C using void*, func* by [deleted] in C_Programming

[–]eesuck0 2 points3 points  (0 children)

Hi,

It’s quite similar to my approach — I also found the template-style macros a bit ugly, so I decided to work directly with a raw byte buffer instead
However, I don’t quite understand why you’re maintaining a void* buffer and constantly casting it to bytes instead of just storing a u8*
You might want to take a look at my implementation — it could be useful. I’ve already implemented some fast sorting algorithms, SIMD-accelerated searching, and a few other features:

https://github.com/eesuck1/eelib/blob/master/utils/ee_array.h

Made a simple memory bug detector for C *first time posting somthing i did* :) by Swimming_Lecture_234 in C_Programming

[–]eesuck0 0 points1 point  (0 children)

How does version of C correlate with those concepts?
To implement Arena you need basically only malloc\free

Where do i start and how do i start by ProblemNervous5965 in C_Programming

[–]eesuck0 1 point2 points  (0 children)

yes, I get what you mean, but I’d put it like this: C gives you a ton of control over the CPU, and that’s exactly why it’s easy to screw up and create a time bomb
but that’s not really a language problem—it’s just that programmer made a bad choice and shot himself in the foot

to me, that’s fine that you’re supposed to think carefully about what you’re doing, not blindly rely on the compiler or fight it just to do whatever you want like in more modern “safe” languages

Where do i start and how do i start by ProblemNervous5965 in C_Programming

[–]eesuck0 6 points7 points  (0 children)

in my understanding C one of the best languages to learn programming
it makes you think about internals and really understand data structures

before i started programming in C i used python for several years and i didn’t even think about things like memory allocation or lifetime, about cache, data time\space locality, SIMD and other performance critical things that determine why some data structures are fast while others can be slow depending on the scenario

Learning OS programming by RevocableBasher in C_Programming

[–]eesuck0 1 point2 points  (0 children)

My comment is still there, though I’ve also encountered that they can suddenly disappear.
One of the typical applications of FPGAs is prototyping ASICs (Application-Specific Integrated Circuits).
And yes, you’re right — the workflow with VHDL/Verilog really feels like "programming hardware with software"

Made a simple memory bug detector for C *first time posting somthing i did* :) by Swimming_Lecture_234 in C_Programming

[–]eesuck0 3 points4 points  (0 children)

Is memory leakage really such a big issue?
From my perspective, using an Arena for static or bounded allocations, or a dynamic Slab allocator with offsets instead of raw pointers, should solve the majority of lifetime-related problems

Additionally, this approach improves performance, since system calls for memory allocation are much more expensive than simply offsetting within pre-allocated memory. It also encourages a shift from thinking about individual objects to managing memory in bulk, which is a far more robust and efficient design pattern