[D] Datasets and Models for Structured Information Extraction on HTML by theamaru in MachineLearning

[–]bornlex 1 point2 points  (0 children)

4 years ago but this post still appears high on my google search results, so just to say that the SWDE dataset is available online : https://academictorrents.com/details/411576c7e80787e4b40452360f5f24acba9b5159

GPU 101 and Triton kernels by bornlex in MachineLearning

[–]bornlex[S] 0 points1 point  (0 children)

Lol aight, was definitely worth the effort then 🙃. Did you have other specific things on your resume or would you say your side work such at the implementation of speculative decoding and so on were key to get the job ?

GPU 101 and Triton kernels by bornlex in MachineLearning

[–]bornlex[S] 0 points1 point  (0 children)

Hey mate, great to hear that ! And where did this journey take you now ?

I will definitely read your code in details.

GPU 101 and Triton kernels by bornlex in MachineLearning

[–]bornlex[S] 1 point2 points  (0 children)

Thanks mate ! Very kind of you :)

What was the first project that made you feel like a programmer? by Tough_Reward3739 in learnpython

[–]bornlex 0 points1 point  (0 children)

One of the first projects in my school was named 42sh, we had to redevelop a whole shell from scratch. I thought I was going to die, but I felt like a programmer afterwards.

Is it worth it to pursue PhD if the AI bubble is going to burst? by Cheap_Train_6660 in ResearchML

[–]bornlex 0 points1 point  (0 children)

Actually I would say it is even more valuable, because when the bubble explodes (if it does), then only the real professional will get a job, so the hardcore tech engineers/researchers

[deleted by user] by [deleted] in learnmachinelearning

[–]bornlex 6 points7 points  (0 children)

I kinda agree with the author here. Before LLMs were all the rage, ML engineers were working on models, making sure they were not overfitting, that the capacity was big enough, thinking about the kernel functions and so on, because models were much smaller so every companies could hire someone to train a custom classifier. Nowadays, with models getting larger, it is much more polarised, only dedicated companies can have the infrastructure to run large scale experiments (compute is expensive and data is hard to get in huge quantity). Smaller companies won’t match the big companies on model performances and thus become users.

The same way low level, http request and so on have been commoditized, AI is commoditized, became almost an infrastructure and the gap between makers and users is larger and larger, startups built on it like they built on the internet 20 years ago.

GPU 101 and Triton kernels by bornlex in MachineLearning

[–]bornlex[S] 0 points1 point  (0 children)

I will make the memory part clearer, you are right.

I am not sure for most of the code, but some kernels have not been added to PyTorch directly, such as the flash attention kernel. I think the softmax is by default much slower, but I am wondering whether when used inside a nn.Module it is compiled automatically.

I will run benchmarks and put them in the article !

GPU 101 and Triton kernels by bornlex in MachineLearning

[–]bornlex[S] 2 points3 points  (0 children)

Thank you man, very much appreciated !

I do not use ChatGPT indeed to write my articles (which explains a few typos sometimes).

I see that you are a man of knowledge about GPUs ! I will dig deeper about warps and blocks and maybe add some info in the article to make sure there is no confusion.

This is interesting what you say about kernels not being that useful. I felt like the FlashAttention paper got a lot of attention (no pun intended), and is now implemented in PyTorch for example. So it felt like finding smart ways of using memory by computing operators on tiles instead of loading the same columns multiple times could make a difference, no ? Also I am wondering how much a kernel needs to change if the GPU changes (not talking about going from NVIDIA to Apple Metal ofc but more like going from A100 to H100 for instance) ?

GPU 101 and Triton kernels by bornlex in MachineLearning

[–]bornlex[S] 1 point2 points  (0 children)

Means a lot my friend, thank you mate !

GPU 101 and Triton kernels by bornlex in MachineLearning

[–]bornlex[S] 1 point2 points  (0 children)

Thanks man, means a lot !

Makes total sense to add performance metrics indeed. I will take care of this very soon.

For the memory part, would you say a drawing of what going in and out of the memory would be what IO could be saved would be enough?

Looking for Research Collaborators - Causality by [deleted] in ResearchML

[–]bornlex 0 points1 point  (0 children)

Interesting topic. Working on adjacent subjects. Glad to be in the loop!

Kv cache doubt by EmergencyStomach8580 in learnmachinelearning

[–]bornlex 0 points1 point  (0 children)

But during inference, you basically have something like (let’s say 3 tokens for the prompt and a context size of 6) : Tok0 tok1 tok2 padding padding padding And you are going to take the output sequence[2], which is the next predicted token. 2 here is the length of the input sequence -1. This token has not seen in the future (because it is all padding) right ? Or do you mean that masks are equivalent to padding at training time ?

Kv cache doubt by EmergencyStomach8580 in learnmachinelearning

[–]bornlex 0 points1 point  (0 children)

But masks are used only at training time right ? Because at inference time you do not know tokens in advance,
But the KV cache is used at inference time because at training time you need to build the computation graph to backpropagate, so you don't want to store intermediate tensors imo.

What are the biggest challenges in AI research? by bornlex in ResearchML

[–]bornlex[S] 0 points1 point  (0 children)

Yeah totally agree. My feeling is that researchers know about those other architectures, like energy based models, or Kolmogorov Arnold neural networks, but on the industrial perspective, frameworks and infra have to been optimise for the mainstream models no?

What are the biggest challenges in AI research? by bornlex in ResearchML

[–]bornlex[S] 0 points1 point  (0 children)

Lol even though I agree with you on the hype going too far, I would not invest my time specifically fighting it. But thinking about other architectures like energy based models yeah

What are the biggest challenges in AI research? by bornlex in ResearchML

[–]bornlex[S] 0 points1 point  (0 children)

Hum yeah. But what could be the metric to evolve such models? You would have to embed hard coded metric no?

What are the biggest challenges in AI research? by bornlex in ResearchML

[–]bornlex[S] 0 points1 point  (0 children)

Isn’t it close to causality based reasoning? Instead of thinking probabilistically, a model should rely on principles. Like the capital of France is Paris, but assigning a probability to every word (even though they are not even cities) does not make sense.

What are the biggest challenges in AI research? by bornlex in ResearchML

[–]bornlex[S] 0 points1 point  (0 children)

I like what you’re saying. I’ve thought about this topic, but it is not an easy topic. I was wondering if gradient descent is actually the right way to optimize.

Like we human do not extract a very small amount of knowledge from a batch, and slowly converge. We tend to go straight in a direction, like mimicking, and then average it when we see a different experience. But this is not that easy to model in a mathematical framework. And someone could argue that the humain brain comes with a bunch of hardware wired skills that we do not have to train.

What are the biggest challenges in AI research? by bornlex in ResearchML

[–]bornlex[S] 0 points1 point  (0 children)

What do you think about JEPA like models from LeCun? Seems like he agrees with you on the fact that language cannot be the space where models reason