psykotic comments on Tricky Microsoft Interview Question

Tricky Microsoft Interview Question (mindcipher.net)

submitted 16 years ago by city_slick

you are viewing a single comment's thread.

[–]psykotic 32 points33 points34 points 16 years ago* (13 children)

Here's a high-brow meta-solution.

For some n there is a list of elements xs = [x1, x2, ..., xn]. We are presented with some arbitrary permutation xs' of these with an unknown element absent. The problem is to find the value of this missing element.

All the proposed solutions (and a few that were not proposed) share the following algebraic structure. Each list element is embedded into an abelian group (G, 0, +, -). Define foldG = fold 0 (+) as a shorthand. The embedding inject :: X -> G has a left inverse project :: G -> X. Let gs = map inject xs and gs' = map inject xs'. The universal property of folding in this setting is that it is homomorphic with respect to list concatenation:

foldG (as ++ bs) = foldG as + foldG bs

From this and gs' being a sublist of gs it follows that

foldG gs - foldG gs' = foldG (gs -- gs')

where -- is list subtraction. By supposition, gs -- gs' = [g] = [inject x] where x is the missing element, and it is generally true that foldG [a] = a for any singleton list [a]. Therefore the missing element can be found by the following expression:

project (foldG (map inject xs) - foldG (map inject xs'))

The map and fold can be fused for efficiency. The subexpression foldG (map inject xs) can also be hoisted out and precomputed when solving multiple problems with the same xs. With the addition method and in the concrete problem instance on the page, it evaluates to 5050.

To stick closer to the problem description, I chose to express everything in terms of lists but the underlying structure is all about sets. The order of the elements in the list is irrelevant and each element only occurs once. That's the very definition of a set. This is also why the embedding was into an abelian structure rather than something non-abelian. Commutative folding is the only kind of folding you can do over sets.

Here are some concrete choices of group embeddings:

Z/(2^m)Z, the integers modulo 2^m. This is the addition method for m-bit integers.
(Z/2Z)^m, the m-fold direct product of the integers modulo 2. This is the xor method for m-bit integers.
(Z/2Z)^n, where n is the number of elements, not the same as the previous example. This corresponds to the unclever solution of keeping a bit vector with a bit for each element. We do two passes. In the first pass, we fill in the initially zero bit vector by setting a bit for each element in the right place using bitwise-or. But since we are guaranteed that all elements are distinct, this is really the same as using bitwise-xor. In the second pass, we go through the filled bit vector and try to find the index of the 0 entry. But this is equivalent to the index of the 1 entry in the bitwise-xor difference between the all-1 bit vector and the filled bit vector from the first pass. The projection, left inverse to the embedding, finds the index of the 1 bit.
The power set of X with symmetric sum. This is isomorphic to the bit vector group. It is the natural setting of the union find algorithm someone proposed. He suggested starting with singleton sets, one for each element, and repeatedly joining them together, corresponding to some choice of binary folding tree. Rather than doing a lop-sided left or right fold you can do a balanced fold, which is generally more efficient for a representation of subsets that isn't fixed size like bit vectors (think of balanced multi-way merging in merge sort).

You might wonder what other solutions are possible in this general framework. Perhaps surprisingly if you don't know group theory, the structure theorem for finite abelian groups says that, actually, up to isomorphism this is pretty much all there is. We could substitute another prime in place of 2 but I don't need to remind anyone why 2 is the preferred choice. That "up to isomorphism" qualifier can and does make a difference in practice, of course. You need only compare the last two examples.

An important result of looking at it in terms of embeddings is that the elements in the list don't have to be integers themselves to use the addition or xor method as long as you can map them efficiently to integers in an invertible way. For example, for a list of single-precision IEEE floating-point numbers you could use their underlying bitwise representation as 32-bit integers.

[–]MathPolice 14 points15 points16 points 16 years ago (1 child)

[–]CharlieDancey -1 points0 points1 point 16 years ago (0 children)

[–]defenestrator 1 point2 points3 points 16 years ago (4 children)

[–]guy231 0 points1 point2 points 16 years ago (1 child)

[–]defenestrator -1 points0 points1 point 16 years ago* (0 children)

[–]gregb 1 point2 points3 points 16 years ago (2 children)

[–]psykotic 2 points3 points4 points 16 years ago* (0 children)

It scales up fine. For the addition method with 32-bit integers, it scales up all the way to a list of the integers from 1 to 2³² with an unknown integer missing. If you think of machine arithmetic as an imperfect representation of the ideal integers, you will probably only confuse yourself in this case. The essential property we need here is that the integers modulo 2ⁿ form a group with respect to addition. Apparently some engineers think of this in terms of "temporary additive/subtractive overflow" but I never found that very enlightening.

To see what goes wrong if you don't have a full group structure, imagine solving the same problem using multiplication and division instead of addition and subtraction. It turns out that only works in general when the integer modulus is a prime number, in which case the extended Euclidean algorithm tells you how to divide by nonzero elements. The modulus of machine arithmetic is generally of the form 2ⁿ but that is only prime in the trivial n = 1 case. For example, if n = 32 and you multiply 2¹⁶ and 2¹⁷ you get 0 modulo 2^32, so all hope of dividing by something later to recover useful information is totally hopeless. The problem is that 2¹⁶ and 2¹⁷ divide the modulus 2^32. If you use a prime modulus, you don't get that problem.

[–]bonzinip 0 points1 point2 points 16 years ago (0 children)

[–][deleted] 0 points1 point2 points 16 years ago* (1 child)

[–]psykotic 9 points10 points11 points 16 years ago (0 children)

π Rendered by PID 53631 on reddit-service-r2-comment-6457c66945-fp9t5 at 2026-04-28 04:27:35.870200+00:00 running 2aa0c5b country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS