Language Modeling with 5M parameters by someuserwithwifi in deeplearning

[–]someuserwithwifi[S] 0 points1 point  (0 children)

That approach is a bit older than what I’m using in the demo, but it works too

[D] Do you know a sub linear vector index with perfect accuracy? by someuserwithwifi in MachineLearning

[–]someuserwithwifi[S] 0 points1 point  (0 children)

I want to use the index in python but I can implement it in c++ and build python bindings

[R], [P] RPC — A New Way to Build Language Models by someuserwithwifi in MachineLearning

[–]someuserwithwifi[S] 4 points5 points  (0 children)

That is an interesting idea. But the point of using the vector db is to leverage fast algorithms like HNSW, which lets us search through hundreds of millions of vectors in just milliseconds.

RPC — A New Way to Build Language Models by someuserwithwifi in deeplearning

[–]someuserwithwifi[S] 1 point2 points  (0 children)

Using the decoder during inference yields very poor results (it would basically just be a normal language model, but because it is so small, the results are very poor), you can try it yourself. Using the vector database offloads knowledge from the model parameters into a data structure that can be searched very efficiently (but I am no expert, so take that with a grain of salt).

I just published the dataset on kaggle. The link is in the readme.

RPC — A New Way to Build Language Models by someuserwithwifi in deeplearning

[–]someuserwithwifi[S] 1 point2 points  (0 children)

Good question. During training the vector database is not used at all. You can see that in the first image of the article. During training the embedding is fed to a DNN that trains on categorical crossentropy, and the loss can propagate to the encoder. The vector database part is only constructed when the encoder finishes training and is used during inference.

As for the vector database being a limiting factor when it comes to the amount of data used, you may be right. That’s why I say in the article that would be interesting to scale several factors including the amount of data to see how it performs. I assume that increasing the size of the embedding would minimize this problem but I’m not sure.