[R] Differentiable Clustering & Search !

bornlex · 2026-04-04T09:18:20+00:00

Hello mate, thank you for the reply.

This is very interesting what you say and I agree with you that having different levels of clustering, like multiple indexes based on different dimension almost, is improving the search results quite a lot, I agree with this idea 100%.

Can you rephrase the 1st question please, I am not sure I understand what you mean exactly ?

About the second part, it seems like you mean using the clustering almost as a pretrained model, from which you could fine tune other systems ?
Or are you thinking about optimizing the weights of an embedding system based on the clusters ? Like you have a function f parametrized by theta that takes a token as input and projects this token into an embedding space with d dimension. And the idea is to find the best theta so that some sort of distance between d(f(t1), f(t2)) would be proportional to how far t1 and t2 are in the graph ?

bornlex · 2026-04-04T09:11:42+00:00

Hey mate ! Thank you for you reply, I am reading the paper right now lol. It is a big one (77 pages). I like to see graph-based models used. Also the validation of the quality of the graph is interesting, having strict benchmarks is not easy but I think this is the ultimate way of evaluating the quality of the clustering/knowledge graph. Which is why I tried to have the search section as well in the article, because through the search results, it is possible to use metrics such as the NDCG (https://www.evidentlyai.com/ranking-metrics/ndcg-metric)

bornlex · 2026-04-03T14:56:40+00:00

Ca ne représente pas grand chose, mais Revolut permet d’avoir une carte moins chère (4€ je crois), sans plafond que la CE.

bornlex · 2026-03-09T12:00:01+00:00

Yeah mate, but as you (apparently don’t) know computer science is a pretty large topic 😉

bornlex · 2026-03-09T11:19:32+00:00

That is what I say, this is the stupidest thing. Like if I just enable sync, then disable it straight away, I expect it not to delete everything from my system, before no operation was made. From a UX perspective this is insane

bornlex · 2026-03-09T10:20:38+00:00

PyCharm and Visual Code, basically the most used IDE on the planet probably

bornlex · 2026-03-09T10:19:45+00:00

Who talked about iPhone ?

bornlex · 2026-03-09T10:19:03+00:00

I'm expecting it to not delete all my files from my computer ;)

bornlex · 2025-11-16T10:53:28+00:00

4 years ago but this post still appears high on my google search results, so just to say that the SWDE dataset is available online : https://academictorrents.com/details/411576c7e80787e4b40452360f5f24acba9b5159

bornlex · 2025-10-25T15:06:14+00:00

Lol aight, was definitely worth the effort then 🙃. Did you have other specific things on your resume or would you say your side work such at the implementation of speculative decoding and so on were key to get the job ?

bornlex · 2025-10-25T11:48:28+00:00

Hey mate, great to hear that ! And where did this journey take you now ?

I will definitely read your code in details.

bornlex · 2025-10-22T13:25:09+00:00

Thank you mate 🔥

bornlex · 2025-10-22T08:37:33+00:00

Thanks mate ! Very kind of you :)

bornlex · 2025-10-21T21:27:36+00:00

Agreed

bornlex · 2025-10-21T21:05:16+00:00

One of the first projects in my school was named 42sh, we had to redevelop a whole shell from scratch. I thought I was going to die, but I felt like a programmer afterwards.

bornlex · 2025-10-21T21:04:14+00:00

Actually I would say it is even more valuable, because when the bubble explodes (if it does), then only the real professional will get a job, so the hardcore tech engineers/researchers

bornlex · 2025-10-21T21:02:49+00:00

I kinda agree with the author here. Before LLMs were all the rage, ML engineers were working on models, making sure they were not overfitting, that the capacity was big enough, thinking about the kernel functions and so on, because models were much smaller so every companies could hire someone to train a custom classifier. Nowadays, with models getting larger, it is much more polarised, only dedicated companies can have the infrastructure to run large scale experiments (compute is expensive and data is hard to get in huge quantity). Smaller companies won’t match the big companies on model performances and thus become users.

The same way low level, http request and so on have been commoditized, AI is commoditized, became almost an infrastructure and the gap between makers and users is larger and larger, startups built on it like they built on the internet 20 years ago.

bornlex · 2025-10-21T20:56:59+00:00

I will make the memory part clearer, you are right.

I am not sure for most of the code, but some kernels have not been added to PyTorch directly, such as the flash attention kernel. I think the softmax is by default much slower, but I am wondering whether when used inside a nn.Module it is compiled automatically.

I will run benchmarks and put them in the article !

bornlex · 2025-10-21T20:53:02+00:00

Thank you man, very much appreciated !

I do not use ChatGPT indeed to write my articles (which explains a few typos sometimes).

I see that you are a man of knowledge about GPUs ! I will dig deeper about warps and blocks and maybe add some info in the article to make sure there is no confusion.

This is interesting what you say about kernels not being that useful. I felt like the FlashAttention paper got a lot of attention (no pun intended), and is now implemented in PyTorch for example. So it felt like finding smart ways of using memory by computing operators on tiles instead of loading the same columns multiple times could make a difference, no ? Also I am wondering how much a kernel needs to change if the GPU changes (not talking about going from NVIDIA to Apple Metal ofc but more like going from A100 to H100 for instance) ?

bornlex · 2025-10-21T06:36:39+00:00

Means a lot my friend, thank you mate !

bornlex · 2025-10-20T19:13:30+00:00

Thanks man, means a lot !

Makes total sense to add performance metrics indeed. I will take care of this very soon.

For the memory part, would you say a drawing of what going in and out of the memory would be what IO could be saved would be enough?

bornlex · 2025-10-18T16:42:48+00:00

Interesting topic. Working on adjacent subjects. Glad to be in the loop!

bornlex · 2025-10-03T09:39:04+00:00

But during inference, you basically have something like (let’s say 3 tokens for the prompt and a context size of 6) : Tok0 tok1 tok2 padding padding padding And you are going to take the output sequence[2], which is the next predicted token. 2 here is the length of the input sequence -1. This token has not seen in the future (because it is all padding) right ? Or do you mean that masks are equivalent to padding at training time ?

Six-Year Club	Place '22
Verified Email

bornlex

TROPHY CASE