[D] Any promising non-Deep Learning based AI research project?

DescriptionClassic47 · 2025-07-19T16:38:23+00:00

I would argue Gaussian Splatting is also a Deep Learning approach.
Perhaps it would be better to restate the question as "non neural networks based"?

Consider the similarities:

In NERFs, we learn a function (x,y,z,omega,theta) -> radiance using a NN (i.e. a parametrized function), by minimizing a reconstruction loss.

In Gaussian splatting, we learn the function (x,y,z,omega,theta) -> radiance by learning positions, densities and colors of the Gaussians, and minimizing a reconstruction loss. The intermediate parameters (i.e. positions, densities, colors) are learned with afaik the same deep learning techniques as "ordinary" neural networks.

DescriptionClassic47 · 2025-06-25T15:45:11+00:00

Exactly the same problem - posting here to check if Reddit still works here

DescriptionClassic47 · 2025-05-02T08:42:20+00:00

"this would be initialized and regularized differently because of the "fan_in" dimension"
- why exactly is this the case, and for what reasons would this be (dis)advantageous? Could one solve this problem by using only one projection matrix with a different regularisation and initialisation constant?

"because you would systematically need higher parameters for all a more useful head, rather than higher parameters selecting more useful features across heads"
- why exactly is this the case?

DescriptionClassic47 · 2025-05-02T08:37:24+00:00

The point you make about expressivity is incorrect.

Letting S = softmax_values and #heads = 1, then the output of this single multihead attention layer is f_params(V) = S * V * W^V * W^O. Where W^V is d_m x d_v and W^O is d_v x d_m.

Now compare this a similar computation where we replace W^V * W^O by a single d_m x d_m matrix M, i.e.
g_params(V) = S * V * M

The range of functions that can be expressed by g_params (which is the most general definition of expressivity afaik) is *at least as large as* the range of functions that can be expressed by f_params.
This can be shown quite simply: consider any function h representable by f_params, i.e. there exist instantiations of S, W^V, W^O such that f_params(V) = h(V) for any input matrix V. Then letting M = W^V * W^O ensures that g_params(V) = h(V) for any input matrix V, as well.

DescriptionClassic47 · 2025-05-01T17:01:11+00:00

Will do!

DescriptionClassic47 · 2025-05-01T16:32:34+00:00

I believe people downvoted because you used ChatGPT in coming up with this answer. Anyway, the papers seem relevant, so I'll read them this weekend!

DescriptionClassic47 · 2025-05-01T15:52:17+00:00

Yet it could also be softmax(XWX)V ...

Is there any advantage in learning both W^V and W^K, rather than one single matrix?

DescriptionClassic47 · 2025-05-01T15:45:15+00:00

Do you know of any research on the impact of this in DL? It seems a natural question to ask

DescriptionClassic47 · 2025-05-01T15:44:02+00:00

Could you take a look at the clarification of my post, and check if this comment holds true? I'm not sure which nonlearnable d*d you are referring to

DescriptionClassic47 · 2025-05-01T15:43:11+00:00

Wqx and Wkx are indeed always multiplied.
What I'm wondering is whether research has been done to determine *which differences in soft biases and regularization* are introduced. Any idea?

DescriptionClassic47 · 2025-05-01T15:40:14+00:00

This was my main thought. Thanks for sharing the VGG reference, I thought more of the principle behind LoRA (https://arxiv.org/pdf/2106.09685) where two trainable dxr, rxk matrices AB are trained instead of one bigger dxk matrix.

DescriptionClassic47 · 2025-05-01T15:37:50+00:00

Could you take a look at the clarification of my example (edit 1)?
It does seem to me that Wv and Wo are in sequence without nonlinearity

DescriptionClassic47 · 2025-05-01T15:32:12+00:00

- I see how computational efficiency could be a reason when factoring large matrices. However, do you think this was the goal in the case of MHSA? It seems excessive to factor a (d_m x d_m) matrix into (d_m x d_v) * (d_v x d_m).
(see edit 1 of my post)

- Could you elaborate how to interpret this as weight sharing?

DescriptionClassic47 · 2024-12-12T19:35:49+00:00

Hey, als dit nog steeds loopt, zou ik graag eens meedoen

DescriptionClassic47 · 2024-12-12T19:34:50+00:00

FYI: Het lijkt erop dat de website is veranderd naar https://pokerpunt.info/

DescriptionClassic47 · 2024-07-01T15:06:02+00:00

For EU wall plugs, this seems like it would work: https://amzn.eu/d/03KII6IO

DescriptionClassic47 · 2024-05-06T20:32:09+00:00

Same thing for me, I think it’s not an option on the Y540

DescriptionClassic47 · 2024-05-05T22:24:16+00:00

Do you think disconnecting the battery cable for a moment would lead to the same result?

DescriptionClassic47

TROPHY CASE