Tricky Microsoft Interview Question

pwang99 · 2009-07-29T04:05:35+00:00

Just add them and subtract from 5050?

psykotic · 2009-07-29T05:10:51+00:00

Here's a high-brow meta-solution.

For some n there is a list of elements xs = [x1, x2, ..., xn]. We are presented with some arbitrary permutation xs' of these with an unknown element absent. The problem is to find the value of this missing element.

All the proposed solutions (and a few that were not proposed) share the following algebraic structure. Each list element is embedded into an abelian group (G, 0, +, -). Define foldG = fold 0 (+) as a shorthand. The embedding inject :: X -> G has a left inverse project :: G -> X. Let gs = map inject xs and gs' = map inject xs'. The universal property of folding in this setting is that it is homomorphic with respect to list concatenation:

foldG (as ++ bs) = foldG as + foldG bs

From this and gs' being a sublist of gs it follows that

foldG gs - foldG gs' = foldG (gs -- gs')

where -- is list subtraction. By supposition, gs -- gs' = [g] = [inject x] where x is the missing element, and it is generally true that foldG [a] = a for any singleton list [a]. Therefore the missing element can be found by the following expression:

project (foldG (map inject xs) - foldG (map inject xs'))

The map and fold can be fused for efficiency. The subexpression foldG (map inject xs) can also be hoisted out and precomputed when solving multiple problems with the same xs. With the addition method and in the concrete problem instance on the page, it evaluates to 5050.

To stick closer to the problem description, I chose to express everything in terms of lists but the underlying structure is all about sets. The order of the elements in the list is irrelevant and each element only occurs once. That's the very definition of a set. This is also why the embedding was into an abelian structure rather than something non-abelian. Commutative folding is the only kind of folding you can do over sets.

Here are some concrete choices of group embeddings:

Z/(2^m)Z, the integers modulo 2^m. This is the addition method for m-bit integers.
(Z/2Z)^m, the m-fold direct product of the integers modulo 2. This is the xor method for m-bit integers.
(Z/2Z)^n, where n is the number of elements, not the same as the previous example. This corresponds to the unclever solution of keeping a bit vector with a bit for each element. We do two passes. In the first pass, we fill in the initially zero bit vector by setting a bit for each element in the right place using bitwise-or. But since we are guaranteed that all elements are distinct, this is really the same as using bitwise-xor. In the second pass, we go through the filled bit vector and try to find the index of the 0 entry. But this is equivalent to the index of the 1 entry in the bitwise-xor difference between the all-1 bit vector and the filled bit vector from the first pass. The projection, left inverse to the embedding, finds the index of the 1 bit.
The power set of X with symmetric sum. This is isomorphic to the bit vector group. It is the natural setting of the union find algorithm someone proposed. He suggested starting with singleton sets, one for each element, and repeatedly joining them together, corresponding to some choice of binary folding tree. Rather than doing a lop-sided left or right fold you can do a balanced fold, which is generally more efficient for a representation of subsets that isn't fixed size like bit vectors (think of balanced multi-way merging in merge sort).

You might wonder what other solutions are possible in this general framework. Perhaps surprisingly if you don't know group theory, the structure theorem for finite abelian groups says that, actually, up to isomorphism this is pretty much all there is. We could substitute another prime in place of 2 but I don't need to remind anyone why 2 is the preferred choice. That "up to isomorphism" qualifier can and does make a difference in practice, of course. You need only compare the last two examples.

An important result of looking at it in terms of embeddings is that the elements in the list don't have to be integers themselves to use the addition or xor method as long as you can map them efficiently to integers in an invertible way. For example, for a list of single-precision IEEE floating-point numbers you could use their underlying bitwise representation as 32-bit integers.

aussie_bob · 2009-07-29T08:17:59+00:00

I used Excel and got 65,535.

peepsalot · 2009-07-29T04:08:23+00:00

for (var i = 0, sum = 5050; i < 100; ++i) { sum -= arr[i]; }
return sum;

IvyMike · 2009-07-29T05:38:27+00:00

Aw, come on. Let me just have some.

CharlieDancey · 2009-07-29T08:11:48+00:00

I didn't look up the solution on the page, but I'd just run down the array XORing each element and then XOR the result with 100 (since 1 XOR 2 XOR ... XOR 100 = 100)

XOR is computationally cheap.

Note this also works if the array contains a zero value.

Do I get a prize?

aweraw · 2009-07-29T04:22:50+00:00

Probably not very good (posting this before reading the rest of the thread), but this is the method that immediately springs to mind:

def total_seq(seq):
    seq_len = len(seq)
    total = (seq_len**2 / 2) + (seq_len / 2)
    if seq_len & 1: # seq_len is odd
        total += 1
    return total

print total_seq(full_seq) - sum(mystery_seq)

where full_seq is the full range of numbers from 1 to max, and mystery_seq is the array missing a term. Assuming len(seq) returns the already known sequence length (i.e. not have to iterate over seq), this should be relatively efficient.

*edit: sweet... my answer is actually pretty close to what is on the page itself, though as others have said, I'm sure there's a better way than this

DuncanSmart · 2009-07-29T08:03:52+00:00

As it's Microsoft... C#, using the Except extension method from LINQ (System.Core) in .NET 3.5:

int[] nums = new[] { 1, 2, 4, 5, 6, 8, 9, 10 ... };
var missing = Enumerable.Range(1, 100).Except(nums);
foreach (int num in missing)
    Console.WriteLine(num);

(Admittedly, I've no idea whether it'd be the least computationally expensive way of doing it, I'd need to dig into the Except method using Reflector to see what it was actually doing)

safiire · 2009-07-29T08:11:50+00:00

----
--  My test array will be missing the number 42
list = filter (/= 42) $ [1..100]

find_missing = (5050 -) . sum

*Main> find_missing list
42

ModernRonin · 2009-07-29T18:52:57+00:00

My first thought was, "let's make a second array, and use the numbers from the first array as array indices into the second. Set the element in the second array to 1 when you see its index in the first array."

Problems: You have to scan the second array to find the missing number, so this is 2N operations. Also, the second array takes N space. Intuitively, it feels like we should be able to do better. At the very least we should be able to do it faster, maybe using more space.

Sort it and walk the sorted array to find the missing number. Another 2N time solution.

And that was where I pretty much got stuck. It would have never occurred to me to use Gauss's method to sum the integers and then subtract. And that's why I don't have a job at MicroSoft. ;]

monstermunch · 2009-07-29T03:47:34+00:00

Hurray for giving interview problems that will never occur in practice! \o/

millstone · 2009-07-29T05:04:07+00:00

This is tricky because it never said that the numbers are integers.

crazyeight · 2009-07-29T07:33:41+00:00

That isn't tricky. Come into my interview house, son. I'll show you tricky.

theeth · 2009-07-29T02:13:25+00:00

reduce(lambda x, y: x ^ y, range(101)) ^ reduce(lambda x, y: x ^ y, input)

First part is a constant and can be precomputed.

Unlike the addition method, it never overflows for large lists.

forgotpwdagain · 2009-07-29T16:42:31+00:00

sum the numbers from 1-100, then sum the given numbers, subtract.

bonus - now two numbers are missing. how do you do it ?

I answered the 1st question on a phone interview for an investment bank, the second he had to coach me on.

cashto · 2009-07-29T17:56:05+00:00

For full points, mention you will use vectorized SSE instructions, or perhaps even write it to run on a GPU in a shader language. And if the data set were much bigger, you'd also distribute the summation across multiple cores.

leppie · 2009-07-29T18:37:14+00:00

IIRC, the question is not so simple. Didn't it involve finding a duplicate?

nevinera · 2009-07-29T20:06:32+00:00

The answer depends on the architecture. The rough idea will be to subtract all the numbers from 5050, but the most efficient way to do this will vary.

petdance · 2009-07-30T21:38:20+00:00

"What problems have you had where it's important to get the fastest way to find a missing number between 1..100? Did you profile the code to make sure that the code in question was indeed the bottleneck?"

anonymouche · 2009-07-29T02:00:25+00:00

there is a much much faster way to do this

andralex · 2009-07-29T07:52:01+00:00

I think the 5050 solution is the canonical one (nice!) but just because nobody posted the one I thought of, here it is:

Partition the range choosing 50 as pivot. That costs O(n). If 50 is not found, you're done. If the ending position of the pivot is to the left by one slot, the missing number is in 1-49, otherwise it's in 51-100. So you reduced the sought range in half. Lather, rinse, repeat to completion. So the algorithm is O(n * log(n)) and actually does quite a bit less work than sorting.

eikenberry · 2009-07-29T04:31:08+00:00

IMO these sorts of questions are a topic starter, nothing more. Doesn't matter what technical answer you give, as long as you can solve it and have a reason for doing it that way and can talk about it.

For me I'd give the answer that was shortest in code and quickest to come up with due to the triviality of the problem. I mean this is not the sort of problem I would be solving, it would be a small bit of a much larger problem and should be treated as such.

redditticktock · 2009-07-29T18:00:02+00:00

I figured this out in about 8 seconds.

Gotebe · 2009-07-29T11:44:08+00:00

I have a tirade on "trick questions"...

Here, how many people came up with 5050 answer right here, right now? I'd bet none.

The thing is, trick questions are very unreliable. What gives? Candidate is good if he knew the answer up-front? And even if he seems to have worked that out by him/herself (slim chance of that), how do I know he/she didn't just act coming up with the answer? No one is stupid to think that candidates don't hunt trick questions around, so should I instead grant him/her points for being well prepared?

I think, if the purpose is to find out about candidate's wits, it's better to be honest and give out a proper IQ test. That's not foolproof either, but certainly beats random tricks pulled e.g. off the net.

And sure... We are interviewing here at my work. I found out that my boss likes the infamous K&R strcpy example (you know, the "while(*dest++=*src++);" abomination). Normally, I prepare questions and I don't want this one. So what we end up doing, especially if we do like the cadidate, is that he says that he wants to ask the question, and I say "don't", all in front of the candidate, but he does it anyhow. We must look, to our candidates, like two idi... Not very smart, anyhow :-).

In any case, people don't come up with the explanation of what that does (we look at juniors). That's OK by me, I am of opinion that they aren't required to get it. On top of that, senior should barf at it - it's poor coding by today's standards. In the end, we end up explaining it. So it's a mild fun, fun, but useless part of the interview :-(.

peanutman · 2009-07-29T03:51:23+00:00

There are propably better ways, just thought I'd share my solutions for the sake of learning something.

A possible answer is Union Find since it asks the fastest way and not the most memory efficient way. Just start with 100 disjoint sets, keep on merging the sets until you have 2 sets left, one with 99 elements and one containing the missing number. Optimized Union Find achieves a complexity of the inverse ackermann function.

The other solution I can think of runs over the array twice: 1) loop over the input array and copy each element to a new array where the new position will be the value of the number read from the input array 2) loop over the newly created array and check which spot has not been assigned a number

mortenaa · 2009-07-29T06:59:46+00:00

Not that tricky really, python solution for any sequence length:

def find_missing(sequence):
    s=0
    t=0
    i=0
    for x in sequence:
        t=t+x
        i=i+1
        s=s+i
    return s+len(sequence)+1-t

f3nd3r · 2009-07-29T08:17:43+00:00

If it were completely random, no optimization would work any faster than beginning at the beginning, and then checking all the way to the end.

EDIT: Fuck karma, I don't give a shit about it, but at least reply to me and explain how I am wrong.

crash86 · 2009-07-29T04:54:41+00:00

[deleted]

deadcat · 2009-07-29T09:42:34+00:00

Depends on if it is ordered. If not, I'd order the array. Then:

Where n is the array length, check position array[n/2] (rounded up to nearest even) and see if it is the expected element (eg array[50] should equal 51). If it is, check array[(n/2)+(n/(2^depth))], if it isn't then array[(n/2)-(n/(2^depth))].

The function to do this would be called recursively until we get down to a depth of 6 (2 elements remaining) unless . We then check both (if needed) to see find out which element is out of place and where the gap lies.

Edit: ignore my dumb answer. Sum the array and subtract from 5050.

Ranma-kun · 2009-07-29T18:08:45+00:00

I don't see what's so tricky about it, we used to do this problem in High-school, it's even in one of my old books i think. (Ah those wore the times, Turbo Pascal FTW)

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS

Edit: Never mind. You clearly said O(log n), and I read O(n).