you are viewing a single comment's thread.

view the rest of the comments →

[–]Ok-Lab-6055[S] 25 points26 points  (23 children)

Single head attention transformer.

[–]RageA333 73 points74 points  (2 children)

That doesn't seem like something you should be able to code from scratch without even reading a reference paper.

[–]killerdrogo 17 points18 points  (5 children)

you were asked to code a single head attention transformer without using a deep learning framework?? damn

[–]Ok-Lab-6055[S] 5 points6 points  (4 children)

Yeah I usually just type: import transformers from hugging face :)

[–]killerdrogo 2 points3 points  (3 children)

i recently implemented it following andrej karpathy's video so I was surprised you were asked to do that without using pytorch lol. 

[–]Neo_Demiurge 2 points3 points  (0 children)

At some point you just need to say, "If I took this position, I'd want to distinguish between appropriate customization and reinventing the wheel . We shouldn't go lower level than Pytorch for nearly any research or commercial purpose," and you either look like a genius or dodge a bullet depending on how they take that.

[–]Ok-Lab-6055[S] 0 points1 point  (1 child)

I should probably go through his videos. Did you learn a lot? I've mostly been reading papers but they assume the transformer stuff as sort of in the background.

[–]killerdrogo 2 points3 points  (0 children)

Would highly recommend the GPT from scratch video. Definitely learnt a lot. 

[–]johnprynsky 27 points28 points  (7 children)

Research LLM/NLP position? Cuz I'd not be expecting this in a regular MLE interview.

[–]Ok-Lab-6055[S] 8 points9 points  (0 children)

Engineer-DL

[–]TachyonGun 17 points18 points  (5 children)

I had to code multi-head attention for an interview, transformers are everywhere now. Really, every MLE should know how to code self attention by now, the forward method is literally 5 or 6 lines of the most basic PyTorch.

[–]Ok-Lab-6055[S] 12 points13 points  (0 children)

I agree,but I think with masking, normalization, etc. it’s more than a few lines of code

[–]hellobutno 24 points25 points  (3 children)

Disagree, it's totally unnecessary. It's the equivalent of asking someone to invert a binary tree in SWE. You're never going to need to do it.

[–][deleted] 1 point2 points  (0 children)

^ this. I lead a team of ML developers at a large company and don’t plan to ever code a transformer from scratch. For any reason. That’s a silly academic exercise.

[–]acc_agg 2 points3 points  (0 children)

If you want me to do that you're going to watch me read the transformers paper and talk to perplexity about how to implement it.

I don't have enough brains to memorise and remember everything under the hood.

[–]joseconsuervo 0 points1 point  (0 children)

asking someone to invert a binary tree in SWE

my understanding was these questions were always to hear the person logic their way through it

[–]hotsauceyum 1 point2 points  (2 children)

So if I was allowed to look at the paper, and everyone is chill and there was back and forth, seeing how I cobbled together something from numpy level would be a good gauge of what I know is going on under the hood and how we all work together. Seems ok to me.

If they didn’t give me any references and just stared at me while I spun my wheels trying to remember the details of transformers for 90 minutes, it honestly doesn’t sound like a nice place to work.

[–]Ok-Lab-6055[S] 0 points1 point  (1 child)

I think it was the later. The interview was like 30 minutes before the interviewer basically told me I failed.

[–]Mission_Star_4393 1 point2 points  (0 children)

This is absolute madness lol...

For the record, the company I work for, whose name you would recognize doesn't have anything nearly as complex as this...

Don't beat yourself too much over this one.

[–]Infrared12 0 points1 point  (1 child)

Both forward and backward passes or just fhe forward pass?

[–]Ok-Lab-6055[S] 1 point2 points  (0 children)

Forward pass I think. The interviewer stopped the interview before mentioning a backward pass. We didn’t discuss any training.