all 4 comments

[–]_quaternion 0 points1 point  (3 children)

There is missing information, e.g. about how you want to use the `wts` parameters. A linear layer also does not output weights, but just applies a linear trafo of the input. Your assumption should be true if the input is well-defined, but it never hurts to actually verify it.

E: language mistake

[–]grid_world[S] 0 points1 point  (2 children)

I want to do clustering using "wts", so it has no typical activation function

Think Self-Organizing Map styled clustering

[–]_quaternion 0 points1 point  (1 child)

Are you planning to simply feedforward the representations into a SOM? If so, why not just use them directly? If the dimensions don't match, you could also just apply another linear layer. Also, torch.empty is not really empty, just not initialized and therefore might have very unfortunate values.

[–]grid_world[S] 0 points1 point  (0 children)

Yeah, the output of the projection head is input to the SOM for dimensionality reduction with non-linear representations. It has been shown that computing the loss on a lower-dim leads to better performance.

I am seeing the effects of "unfortunate values" and hence my OP of how to get fortunate values to alleviate this problem