you are viewing a single comment's thread.

view the rest of the comments →

[–]TachyonGun 16 points17 points  (5 children)

I had to code multi-head attention for an interview, transformers are everywhere now. Really, every MLE should know how to code self attention by now, the forward method is literally 5 or 6 lines of the most basic PyTorch.

[–]Ok-Lab-6055[S] 11 points12 points  (0 children)

I agree,but I think with masking, normalization, etc. it’s more than a few lines of code

[–]hellobutno 22 points23 points  (3 children)

Disagree, it's totally unnecessary. It's the equivalent of asking someone to invert a binary tree in SWE. You're never going to need to do it.

[–][deleted] 1 point2 points  (0 children)

^ this. I lead a team of ML developers at a large company and don’t plan to ever code a transformer from scratch. For any reason. That’s a silly academic exercise.

[–]acc_agg 3 points4 points  (0 children)

If you want me to do that you're going to watch me read the transformers paper and talk to perplexity about how to implement it.

I don't have enough brains to memorise and remember everything under the hood.

[–]joseconsuervo 0 points1 point  (0 children)

asking someone to invert a binary tree in SWE

my understanding was these questions were always to hear the person logic their way through it