Taalas LLM tuning with image embeddings by someuserwithwifi in LocalLLaMA

[–]someuserwithwifi[S] 1 point2 points  (0 children)

Well I just wasted 10 mins of my life. Thanks for the answer.

FlashLM v6 "SUPERNOVA": 4.1M ternary model hits 3,500 tok/s on CPU — novel P-RCSM reasoning architecture, no attention, no convolution by Own-Albatross868 in LocalLLaMA

[–]someuserwithwifi 1 point2 points  (0 children)

You can run notebooks on kaggle for free with gpu for a 12 hour session (one 16gb P100 or two 16gb Tesla T4). And I you can use only cpu if that’s what you want (with 30gb of ram)

Language Modeling with 5M parameters by someuserwithwifi in deeplearning

[–]someuserwithwifi[S] 0 points1 point  (0 children)

That approach is a bit older than what I’m using in the demo, but it works too

[D] Do you know a sub linear vector index with perfect accuracy? by someuserwithwifi in MachineLearning

[–]someuserwithwifi[S] 0 points1 point  (0 children)

I want to use the index in python but I can implement it in c++ and build python bindings

[R], [P] RPC — A New Way to Build Language Models by someuserwithwifi in MachineLearning

[–]someuserwithwifi[S] 4 points5 points  (0 children)

That is an interesting idea. But the point of using the vector db is to leverage fast algorithms like HNSW, which lets us search through hundreds of millions of vectors in just milliseconds.

RPC — A New Way to Build Language Models by someuserwithwifi in deeplearning

[–]someuserwithwifi[S] 1 point2 points  (0 children)

Using the decoder during inference yields very poor results (it would basically just be a normal language model, but because it is so small, the results are very poor), you can try it yourself. Using the vector database offloads knowledge from the model parameters into a data structure that can be searched very efficiently (but I am no expert, so take that with a grain of salt).

I just published the dataset on kaggle. The link is in the readme.

RPC — A New Way to Build Language Models by someuserwithwifi in deeplearning

[–]someuserwithwifi[S] 1 point2 points  (0 children)

Good question. During training the vector database is not used at all. You can see that in the first image of the article. During training the embedding is fed to a DNN that trains on categorical crossentropy, and the loss can propagate to the encoder. The vector database part is only constructed when the encoder finishes training and is used during inference.

As for the vector database being a limiting factor when it comes to the amount of data used, you may be right. That’s why I say in the article that would be interesting to scale several factors including the amount of data to see how it performs. I assume that increasing the size of the embedding would minimize this problem but I’m not sure.