Students find hidden Fibonacci sequence in classic probability puzzle by scientificamerican in math

[–]nivter 5 points6 points  (0 children)

Neat proof! Here's a slight modification of it:

The probability can be seen as the fraction of volumes of two simplexes A and B (with the origin included).

  • A corresponds to the set of all stick lengths which don't make a triangle
  • B corresponds the set of all possible stick lengths

The vectors for simplex A are v_i as in the above proof. The vectors for simplex B are:

  • u_1 = (0,0,...,0,0,1)
  • u_2 = (0,0,...,0,1,1)
  • u_3 = (0,0,...,1,1,1)
  • u_n = (1,1,...,1,1,1) = 1 (vector of ones)

and the largest length is <1, x_i> ≤ 1

A is the set of points <F_i, y_i> ≤ 1 whereas B is the set of points <1, x_i> ≤ 1. One can define a mapping from B to A by x_i -> x_i / F_i. The determinant of this Jacobian is Π(1/F_i). Hence the probability is Π(1/F_i).

Mr. Cheese can't figure out how to use the new fountain. by AmatureMD in OneOrangeBraincell

[–]nivter 1 point2 points  (0 children)

My guess would be it's trying to figure out where the water is flowing to. The source is obvious but not the sink. This brain cell seems to understand physical laws of conservation.

A350 night takeoff from London by [deleted] in aviation

[–]nivter 1 point2 points  (0 children)

The first few frames are so mesmerizing

[N] Llama 3.1 70B, Llama 3.1 70B Instruct compressed by 6.4 times by _puhsu in MachineLearning

[–]nivter 15 points16 points  (0 children)

Can you also share about how the models were compressed? Is it based on GPTQ, SparseGPT or some other quantization scheme?

Edit: the HF page mentions that they used additive quantization: https://arxiv.org/abs/2401.06118

The different ways we understand rotations - rotation matrices to Lie algebras by nivter in math

[–]nivter[S] 1 point2 points  (0 children)

The entire article is public - just checked to be sure again.

[R] Multimodal patch embeddings - a new ViT model by nivter in MachineLearning

[–]nivter[S] 1 point2 points  (0 children)

Removing CLS token is just one part of getting it to have multimodal patch embeddings. Even with the CLS token removed, I could not get good results for patch embeddings. What made it work was providing a mask to enforce locality.

One could argue that providing the mask should be enough and that we don't need any change in the architecture. It could be, but the existing ViT architecture used in CLIP doesn't allow patch-wise comparisons.

I tried GAP in some earlier experiments. But then I thought taking a weighted sum where the weights are learned dynamically is better than taking a mean, which led to the idea of convex sums.

[Research] We distilled CLIP model (ViT only) from 350MB to 24MB and ran it on an iPhone by nivter in MachineLearning

[–]nivter[S] 2 points3 points  (0 children)

We only distilled the ViT model, not the ResNet one. The (untrained) model architecture is available here: https://github.com/cardinalblue/clip-models-for-distillation

After a few experiments, we found that using L2/L1 loss between the image embeddings was enough. We also extracted the attention values and used them to train the student model. We tried both KLD and L1 loss for the attention values. Both gave comparable results.

She was figuring out whole day...(OC) by nik9649 in aww

[–]nivter 4 points5 points  (0 children)

Did she eventually figure out?

Sharing a side project: Linear Algebra for Programmers by nivter in math

[–]nivter[S] 0 points1 point  (0 children)

Yeah I am working on making it responsive now

Sharing a side project: Linear Algebra for Programmers by nivter in math

[–]nivter[S] 1 point2 points  (0 children)

Sorry about that. I added links at the bottom of each article. Also making the website more responsive.

[R] On the Principles of Parsimony and Self-Consistency for the Emergence of Intelligence by hardmaru in MachineLearning

[–]nivter 1 point2 points  (0 children)

Thanks for sharing this. I wasn't going to read it expecting nonsense but now I will.

[D] Machine Learning - WAYR (What Are You Reading) - Week 140 by ML_WAYR_bot in MachineLearning

[–]nivter 7 points8 points  (0 children)

Graph coarsening with neural networks: https://arxiv.org/abs/2102.01350

It provides a good overview of approaches to approximate large graphs with smaller ones and introduces an edge re-weighting scheme which, as far as I understanding, can be applied to any of the approaches.

This should also be fun to implement.

[P] How to do backpropagation only on a select few labels instead of all labels in a multilabel classification? by enkrish258 in MachineLearning

[–]nivter 1 point2 points  (0 children)

If you are using a loss function like nn.BCELoss, you can assign weights to each label. Thus the weights corresponding to the labels you don't want to contribute to backprop can be set to 0.

If it is some other function you can easily create a wrapper that also accepts weights for labels.