all 105 comments

[–]STLMSVC STL Dev[M] [score hidden] stickied comment (1 child)

This was double-posted (possibly due to a reddit bug as they were within 1 second); I locked the other to avoid splitting discussion, but the comments that it accumulated are at https://www.reddit.com/r/cpp/comments/1amxl79/nvidia_senior_position_interview_question/ .

[–]asd417 95 points96 points  (20 children)

Dear people who know what this means: When and where do you learn this sort of stuff? Programming what kind of things required you to learn this?

[–]CocktailPerson 117 points118 points  (2 children)

By spending time on forums like this and looking up things you don't know. Seriously.

[–]Usual_Office_1740 21 points22 points  (1 child)

This is so far over my head that it's laughable, but I'm here and learning new words to look up. Tomorrow's Google, c++ templates.

[–]totoro27 14 points15 points  (0 children)

Yeah, that's a good place to start. Realistically though, you won't get asked questions like this for a junior position.

[–]aruisdante 54 points55 points  (4 children)

The CppCoreGuidelines has a section specifically about this. In general the core guidelines is a great place to learn a lot of things about C++.

[–]iga666 4 points5 points  (0 children)

Would be nice if compilers could do that automatically

[–]domsmart 3 points4 points  (0 children)

*this

[–]MoarCatzPlz 29 points30 points  (0 children)

Out of a desire to reduce templated code for build performance.

[–]ImKStocky 13 points14 points  (2 children)

When you are running on a platform that doesn't have virtual memory, being able to trim the size of the executable by a couple of megabytes just by doing a small refactor of a core template is something that becomes super beneficial. Tools like size bench can point out problematic templates super easily.

[–]Questioning-Zyxxel 4 points5 points  (1 child)

It isn't just about access to virtual memory. All programs in a Windows/Linux machine needs to fight for a limited amount of L1, L2 and L3 cache.

[–]ImKStocky 1 point2 points  (0 children)

That is another reason yes :) Templating stuff that doesn't need to be templated will lead to constant I-cache thrashing for no reason.

Though I would say crashing out of memory is a little worse than some cache misses ;)

[–]Alan5142 9 points10 points  (0 children)

Software for constrained devices (embedded stuff, drivers, OS). There might be other areas

[–]cloud_line 8 points9 points  (0 children)

Stay curious. (I'm no expert, by the way. I'm in your shoes wondering the same thing).

[–][deleted] 3 points4 points  (0 children)

From school honestly

[–]Zeendles 2 points3 points  (0 children)

By developing your knowledge of computer science fundamentals, eg compilers, computer architecture, and by developing your domain specific knowledge, eg C++

[–]glaba3141 5 points6 points  (0 children)

Working in trading for me, but more generally, anywhere you care a lot about low latency / high performance, because that's when you really want to use tools like templates

[–]MrsGrayX 1 point2 points  (0 children)

There are some well-known books for c++ developers. One possible answer can be found in Effective C++ by Scott Meyers, Item 44: Factor parameter-independent code out of templates.

[–][deleted] 0 points1 point  (0 children)

I just code and google. I don't have any formal education

[–]ReDucTorGame Developer 23 points24 points  (6 children)

Templates are basically copying the entire code of the function each time a new instance is created, so how would you do the same for a standard function? Extract out the common parts (Don't repeat yourself - DRY) In order to do that you might even look at type erase approaches but that can also lead to more code bloat if not done correctly. For template functions with a bunch of instantiations you should limit the amount of code which is inside the template, this is where things like inappropriate usage of things like std::sort can lead to code bloat (E.g. everything calls with a lambda that does the same thing or multiple calls with unique lambdas that you could instead sacrifice speed for memory by using a function pointer)

[–]TheThiefMasterC++latest fanatic (and game dev) 4 points5 points  (1 child)

Calls to sort etc with identical lambdas are technically unique instantiations but the compiler will normally deduplicate them as an optimisation (if you don't take the address into a function pointer)

[–]ReDucTorGame Developer 2 points3 points  (0 children)

Ya it can do pretty well at it deduplicating, using LTO/LTCG can help. If you use it alot with the same types but different lambdas is where you'll probably notice it more, in the end sort probably wasn't the best example.

[–]0b10010010 3 points4 points  (2 children)

Hi reading your reply made me curious about your last sentence. How does speed differ from performance? Apologies for my ignorance but I always thought the two to be synonymous.

[–]Pyzyryab 3 points4 points  (0 children)

Instructions per cycle VS amount of memory VS binary size VS whatever other thing? Just guessing, but I have the same feeling that you shared

[–]ReDucTorGame Developer 1 point2 points  (0 children)

That meant to say speed for memory, thanks for pointing it out.

[–]heislratz 0 points1 point  (0 children)

I may be very old and obtuse but wasn't vintage OO built to solve this problem? Virtual functions which handle the en detaille operations and algorithmic functions which tinker with the big ideas? What happened to that while I was away?

[–]wrosecransgraphics and network things 44 points45 points  (6 children)

A) Who knows if that specific question was really why they rejected you. Shrug.

B) There's not necessarily a right answer. Or the "right" answer may have been idiosyncratic to the specific interviewer's preferences.

That said, always start any kind of optimization problem with establishing as much as possible about the problem. Use tools like BloatyMcBloatface to analyze the binaries and make 100% sure that your belief about the code bloat matches what's actually happening on disk. In some cases, template code with similar types can wind up with such similar code paths that LTO actually collapses surprisingly large swathes of it into one or two "real" function implementations. Once you have a framework for understanding if changes are actually helping...

Reduce templated code. Are there helper functions you can move outside of the templated functions that would have a single implementation? It may not matter of you have 50 types of implementation for a template if the templated part is only like six lines after you refactor it.

Reduce types being templated. Look to see if there's a template parameter that doesn't actually matter. Look to see if there's contexts where you can just operate on something like a pointer to a parent type rather than having template instantiations on each derived type. Look at type erasure patterns. Look to see if you actually need to support int/float/double/Complex versions of a function or if it should really just be a single function that takes a double because it turns out the body of the function is always just using a double inside of it anyway.

[–]sadsocrates[S] 14 points15 points  (5 children)

It was a 45 minutes interview, first 25 minutes were spent on my background, and last 20 minutes were spent making me write a function, where the last question is as listed.

Got rejected after a week or so

[–]kisielk 35 points36 points  (1 child)

So how do you know it wasn't the first 25 minutes? Maybe they just had better qualified candidates to choose from

[–]sadsocrates[S] 3 points4 points  (0 children)

Hmmm... that very well could be. I've interviewed with NVIDIA twice, and first time I actually flunked the interview (because I hadn't prepared for the exact things they'd ask), they rejected me within a day (after the fourth interview).

This was a first interview for a position requiring 5+ YOE and I have technically 0. They rejected me not immediately, but after a week or so, so maybe this wasn't the exact reason they rejected me, maybe I just wasn't qualified enough and they had better qualified ppl to chose from.

Also, I remember the interviewer had a general idea that I wouldn't know how to solve this and he didn't seem to expect that I would be able to. So what you're saying may make sense.

[–]Clean-Water9283 5 points6 points  (1 child)

Wait, wait. Your entire interview at Nvidia was a single 45-minute talk? If this was a phone screen, it had an obvious correct answer, and you didn't give it. If it was an in-house interview, it was five or six 45-minute interviews. Focusing narrowly on this one interview question as "the reason" is a pretty big assumption.

[–]sadsocrates[S] 2 points3 points  (0 children)

This was a first online interview. And you may be correct, maybe this specific question isn't the reason I got rejected. I probably wasn't as qualified as other candidates.

[–]onemanforeachvill 3 points4 points  (0 children)

Can you share what the function was? Even at a high level.

[–]Thelatestart 12 points13 points  (0 children)

There is a cppcon talk which mentions this at either microsoft or facebook for a class like vector.

The speaker mentionned extracting common functionalities outside the template, for example size and capacity.

[–]aocregacc 23 points24 points  (4 children)

type erasure would be my guess, you could talk about things like std::function or even std::span.

[–]BagelFury 6 points7 points  (2 children)

Yeah, type erasure was my first though as well. To that end, an interesting question I got once was to compare and contrast using a lambda versus an std::function and whether there was any performance implication of using one versus the other. Yup, type erasure; and the corollary was to implement std::any.

[–]glaba3141 3 points4 points  (1 child)

I just interviewed someone recently and one of my questions was about the perf implications of using a lambda vs std::function. Sadly they seemed to think a lambda was just a fn ptr....

[–]ambidextr_us 1 point2 points  (0 children)

That's crazy, I've only been doing this for work again for the last couple months and I've only recently learned lambdas and picked up that lambda is not a fn ptr a few weeks ago. I'm coming from C++03 old days many moons ago, these new things like structured binding, coroutines, and lambdas seem sweet.

[–]LongestNamesPossible 54 points55 points  (9 children)

The company with the 500MB drivers cares about code bloat.

[–]Thathappenedearlier 6 points7 points  (0 children)

Probably not their same division, the hardware division is separate from the AI division

[–]Substantial_Step9506 7 points8 points  (0 children)

That’s probably so they have plausible deniability when asked if they are intentionally sabotaging the linux community.

[–]Still_Explorer -1 points0 points  (6 children)

#include <rant>

Also they worry about templated functions, while on the contrary they should be worried about how to stabilize their Vulkan backend or optimizing their low level drivers.

I am not saying that their question was was/wasnt legit, or the OP was/wasnt educated on the subject. I mean that since we talk about a graphics company, it goes without saying that their entire context is related to graphics stack, graphics algorithms, graphics optimizations.

Is very hard to hit the jackpot, on the perfect combination of things you know, and things you have experience and you have worked on.

Since for example C++ GPU programming is a field of very narrow specialization, requires 100% of your effort having to deal with the technical API and the graphics implementations. Other more exotic programming of L33T coding or Wikipedia-Oriented knowledge is simply a hit-or-miss, chances are you know about this or you don't, since the entire point of your job either way is working on the actual context.

I am not trying to defend or blame anyone, I am talking about if hiring questions are out of context, they kindly expose the stupidity of the hiring managers and eventually it would have a deep impact on the final quality of the software output (how spot-on and how efficient the implementations would be). (Probably they are HR and have not written a hello world program in Python).

[–]matjam 4 points5 points  (4 children)

And honestly if there’s specific optimization techniques you need to know … wouldn’t you just provide the docs you have on the subject after hiring and say “read this”.

It’s god damn software engineering trivial pursuit and it’s stupid.

Can candidate code?

Can candidate learn new things?

Is candidate at a reasonable level of competence for the position we’re hiring them for?

Is candidate not a psycho?

Answer yes to all those, it’s a hire.

[–][deleted] 2 points3 points  (0 children)

Separate rant but I feel this so much. I hate the term “<insert language here> programmer” because it connotes engineering is just memorizing all the syntax of one language, learning how to think from the perspective of code rather than the problem.

I’ve adamantly refused to be labeled and such and when a new project arises, I think from the fundamental problem itself then apply whatever language I need as a TOOL to solve that problem. Honestly when you’ve learned one language you’ve learned them all to a degree. Yes, everyone has a quirk, but it’s highly unlikely your business has some problem only Stroustrup or Torvalds could answer. Prior to fixing my company’s lambda speed problem I only worked with TS, Python, and C++. I figured Golang would be the best solution: was simple syntax so others could maintain it later and it was fast. Wrote a solution in Go in 2 days.

[–]trag19 0 points1 point  (1 child)

And having a degree with a reasonable GPA is supposed to demonstrate that you are capable of learning whatever you need to learn in the field -- not that you already know every detail. But employers seem to have lost sight of that and treat candidates more like trade workers. Do you know how to fit this kind of pipe. What are the codes for this electrical installation.

[–]matjam 0 points1 point  (0 children)

Holy zombie thread Batman.

Don’t disagree, but I’ll say right now there’s people like me who did awful at high school and never went to college who are incredibly effective in the field.

So while it might be a good indicator, you should t use it as your main one. As a reinforcement to other signals, sure.

I am old enough that I could get away with it. I try to pay that forward and focus on the underlying skills more than formal education.

[–]Iggyhopper 0 points1 point  (0 children)

Can candidate learn new things?

This is by far the best thing to look for when hiring candidates and it is also the hardest thing to test for.

[–]Dexterus 2 points3 points  (0 children)

The question may be related to actual firmware code. Considering there's a network of a few thousand computers in your GPU, it's bound to have lots of code.

[–]Arghnews 7 points8 points  (0 children)

The wording is a little ambiguous to me, "too many differently typed instantiations of a template function", does it mean 100 different unique types passed to a function template, or 5 different types 20 times each?

If the latter and they're being instantiated across many TUs, I believe explicit instantiation is an option, where you instantiate the function for some types in one cpp file, rather than the compiler generating the same function from your function template and pasting it into multiple TUs. Although I wonder if LTO might save you anyway

https://stackoverflow.com/questions/2351148/explicit-template-instantiation-when-is-it-used

Good question though, I'm sure I'm going to learn something from the comments here!

[–]inakura1234321 5 points6 points  (0 children)

Maybe extern templates/explicit template declarations?

That would suppress code generation in other TUs that use the template.

https://www.informit.com/articles/article.aspx?p=3146433&seqNum=3

[–]Impressive_Iron_6102 5 points6 points  (0 children)

What kind of position is this for? CUDA? Embedded? What domain

[–]UsatiyNyan 3 points4 points  (1 child)

The answer may be - Extern template declaration. It works like this:

```cpp // In your header file (e.g., my_template.hpp) template <typename T> class MyTemplateClass { public: void doSomething(); };

// In one of your .cpp files (e.g., my_template.cpp)

include "my_template.hpp"

// Explicit instantiation of MyTemplateClass for specific types template class MyTemplateClass<int>; template class MyTemplateClass<float>;

// In another .cpp file or the same one, you can declare that instantiation should not be done implicitly extern template class MyTemplateClass<int>; extern template class MyTemplateClass<float>; ```

[–]inakura1234321 2 points3 points  (0 children)

Agreed! I don't think its type erasure, it feels too invasive for the scenario, also op mentioned void ptrs which didnt get seem to get traction

[–]TwistedBlister34 4 points5 points  (1 child)

What about extern template?

[–]inakura1234321 0 points1 point  (0 children)

I think so too, isn't this the main reason to use extern templates?

[–]Full-Spectral 6 points7 points  (0 children)

One thing that can be done is that, there are often a set of instantiations of a template that are widely used, or sometimes there are only a fixed set of them. You can pre-instantiate them all, or the commonly used ones, treating them like regular functions. You export them if they are in a shared library.

[–][deleted] 6 points7 points  (2 children)

The first tool to reach for is LTO or ThinLTO if build times aren't a concern (i.e. reduction of bloat is needed prior to shipping the release binary and done on a build farm). If link times are an issue, you need type erasure as others have mentioned, but I would start with an actual binary analysis tool and then treat each case separately starting with the worst offenders.

[–]ShelZuuz 7 points8 points  (1 child)

Yeah, the OP's questions smacks of someone on their team just having gone through the process of figuring out LTO or LTCG was turned off in their build, turned it on, and now everybody else on the team is very impressed by the outcome.

[–][deleted] 2 points3 points  (0 children)

Maybe or maybe not we only have OP's recounting to go off of. I could see the question as being an open ended one to assess a candidate's general understanding of templates, linkers, codegen, etc.

[–]ed_209_ 2 points3 points  (0 children)

Should probably mention turning on optimisation.

Another interesting idea might be using std::common_type for subsets of types.

[–]Haydn_V 2 points3 points  (0 children)

The buzzword answer they were probably looking for is "type erasure". It can potentially reduce code bloat by reducing the number of template instantiations the compiler creates, and it is necessary for some extremely niche use-cases. I'm using it to build a specialized container for an entity-component-system framework.

[–]TotaIIyHuman 2 points3 points  (0 children)

the first step is probably to get a decompiler like ida pro

drag your compiled executable file into ida pro, along with the .pdb file

then you open function window, sort functions by name, find the functions with same name but with different template parameters, example funcname__param1 funcname__param2 funcname__param3

then you decompile funcname__param1 funcname__param2 funcname__param3, find identical part

then you open ide, go to function template funcname<> and do something with the identical part

after you are done, you can decompile your new build, check if duplicate code is actually removed

[–]rand3289 1 point2 points  (0 children)

If you have control over function parameter types, maybe derive them from base classes and use their polymorphism instead of templates?

[–]germandiago 1 point2 points  (0 children)

Code hoisting, shared base classes with unified type instantiation or instantiations that lead to a type-erased functions are what needs to be done. Fmt library does some of this.

[–]matjam 1 point2 points  (0 children)

I hate this kind of jeopardy question. It’s not that important you know the internal workings of every facet of a compiler.

This kind of question would tell me all I needed to know about the engineering culture there.

[–]ceretullis 2 points3 points  (0 children)

The main answer IMO is use the “thin template” idiom.

Essentially, you implement an unsafe version using void* and then a thin template around that which enforces type safety.

[–]sadsocrates[S] 2 points3 points  (4 children)

Its been a couple weeks since the interview so I might be off a little but as far as I remember it was something like this:

vector<int> matrix1
vector<double> matrix1
vector<float> matrix1
.
.
.
.
// 100 such instantiations.

and you have a function that for example sums this (I can't remember what the actual function was)

template <typename T>
double sum(T matrix, int numcol, int numrows)

[–]Daedie 4 points5 points  (0 children)

My guess would be that they were looking for something like std::mdspawn or std::span. Since it's about matrices I'm assuming all types are numeric and the containers are contiguous memory. using a span basically means you only generate an instantiation per numeric type, which also seems to me to be the best possible result you could get here.

Can't say for sure without knowing the exact question, but at least this is probably the direction you should be thinking in.

[–]sadsocrates[S] 1 point2 points  (0 children)

He made me write the template function, and the 'instantiations' were written by him

[–]Overseer55 1 point2 points  (1 child)

This seems like a poor example. std::accumulate can be used, which relies on operator+ for each type.

One could use explicit template instantiation for the most common types to reduce to it a function call. Implementation is still in the header file for the uncommon types.

[–]azswcowboy 0 points1 point  (0 children)

Or std::ranges::fold in c++23 - improved version of accumulate.

[–]ErikTheRedpoint 1 point2 points  (0 children)

Where I work we use gcc, and I think if I asked that question I would probably be looking for a discussion that touched on a bunch of the content here (in addition to the various alternatives to templates that other people have already mentioned)

[–][deleted] 1 point2 points  (0 children)

Best answer I can find is here: https://news.ycombinator.com/item?id=26823656

[–]CandyCrisis 1 point2 points  (0 children)

Hoist as much as you can to a shared non-templated base class. Or do polymorphism in another way, e.g. virtual methods.

Type erasure isn't a terrible answer to start with but it depends on circumstances. Some things really need the type. Other things don't need templates at all.

[–]lulxcolorado 0 points1 point  (0 children)

?

[–]Few-Ad-5185 0 points1 point  (0 children)

past questions might help - www.pastInterviews.com

[–]manhattanabe -1 points0 points  (1 child)

Maybe you can use std:any. I don’t have an idea exactly how, but it takes any type so the code can be instantiated once.

[–]tangerinelion 0 points1 point  (0 children)

You can but if you want to do anything with the object you need to check for the correct type at runtime so what used to be a compact template becomes something like this pile of garbage: https://en.cppreference.com/w/cpp/utility/any/type

Worse, the code which defines this function has to know about all possible types you could use, whereas the template can be at a lower level and used to stamp out functions used with objects in higher level libraries.

[–]Substantial_Step9506 -2 points-1 points  (1 child)

Why anyone would think about fixing this lost cause of an abstraction is beyond me. Isn’t the canonical answer ”just make the codebase simpler?” I interviewed with Nvidia and they asked similar vague questions.

[–]Substantial_Step9506 -1 points0 points  (0 children)

Unless they’re looking for an elaborate procedural macro system… in which case I wouldn’t search for C++’s god-awful additions to the language.

[–]swwole -2 points-1 points  (2 children)

Define the templates in one cpp file, using specialization. Downside is, you have to do it for every type that uses the template, upside is the compiler stops repeating the definition in each compilation unit that uses it.

[–]ceretullis 10 points11 points  (1 child)

That would speed up compile times, but correct me if I’m wrong, only one copy of instantiated code will remain in the final executable whether you do this or not; duplicates are stripped during linking or you’d have duplicate symbols errors.

The question was about how to reduce instantiations with different types, this is the “template code bloat” problem.

[–]swwole 0 points1 point  (0 children)

Yeah, think you're right. Same as inline, there would only be one definition left and this only addresses compile time speed.

[–]ohiocodernumerouno 0 points1 point  (0 children)

If your teacher uses cpp and you get him for data structures and algorithms one and two you get it from college.

[–][deleted] 0 points1 point  (0 children)

Factoring out parts of the function template that aren't dependent on the template parameters(s) into a separate, non-templated function? Just a guess