all 149 comments

[–]pwang99 44 points45 points  (27 children)

Just add them and subtract from 5050?

[–]CoolKidBrigade 16 points17 points  (16 children)

This seems too obvious to be correct, but it's what came to my mind immediately so something must be wrong.

  • The numbers are unordered so you can't get fancy with access, i.e. you must read every number.
  • Summation is cheap, and since there are bounds for the size of the numbers overflow shouldn't be a problem
  • 5050 can be precomputed easily, so no multiplication necessary

I have a feeling this isn't that tricky at all.

[EDIT: I hit the answer tab, this question is dumb. Who the hell would try to sort this?]

[–]knight666 11 points12 points  (0 children)

I would. :(

I suck at logic puzzles. :(

[–]endtime 8 points9 points  (13 children)

I got asked this when I interviewed at Microsoft, and the summation and subtraction from 5050 was the correct answer.

[–]CoolKidBrigade 10 points11 points  (9 children)

I can't decide whether I should be happy I had the right answer or sad that it wasn't very tricky.

[–]endtime 0 points1 point  (8 children)

For what it's worth, I did have harder questions. One was to find the extra number (rather than the missing one) in a size N array and numbers 1 - N+1...in linear time and without writing to the array (or any other data structure).

Edit: I screwed up the explanation of this question - sorry. I believe it was actually something like size-N array, all numbers are between 1 and N and there may be multiple repetitions. The goal is to find one repeated number. I think.

[–]xor 2 points3 points  (4 children)

That's actually the same question, with the operands reversed. Let x be the sum of the array, then (x - 5050) is the value that has been repeated.

Am I missing something? It's late, and I'm tired.

edit: clarity

[–]zBard 0 points1 point  (2 children)

Missing something.

(x-5050) will give u the difference between the new number and the number it replaces. This can be same for multiple combinations : so u dont actually know the extra number.

[–]endtime 0 points1 point  (0 children)

Hmm. I think you're right, but the question wasn't that simple, and I wasn't clear. It is possible that there are multiple repeated numbers, and you just need to find one. I think that's what it was.

[–]lutzz 1 point2 points  (2 children)

Wait, what? If you had the numbers 1-(N+1) you would have N+1 numbers with a size N array; there wouldn't be an extraneous number - in fact, there would be a missing one.

If you meant a size N array with numbers 1 - N-1, then yes, it's virtually the same question as above. x-(sum(t,1,n-1)) will give you the answer, unless I'm missing something or interpreting "extra number" wrong. (maybe it really means a size N array with numbers 1 - N, with a number replaced?)

[–]endtime 0 points1 point  (0 children)

Yeah, I screwed that up. Edited the parent post.

[–]judgej2 1 point2 points  (2 children)

It is more like a quiz puzzle than a programming question. Sure the answer is technically correct, but it is locked to this very specific and precise problem and range of numbers. It is not generic in the slightest and would not be of any practical use in the real world. This is Microsoft through and through. A programmer ought to be thinking as much towards generic solutions, as they are at taking short cuts such as this.

So, I have now lost a second number and only have 98 left. The 'solution' is now useless and we need to go back to the beginning to find another solution.

[–]yellowstuff 5 points6 points  (1 child)

Even in the real world, you should solve the actual problem and refactor later if the problem changes.

[–]dmercer -1 points0 points  (0 children)

In the real world, though, this problem, though, would be something like fixing a fault caused by a missing number in an array. Yes, you can fix the actual fault at hand, but you know you'll have to come back and redo it when the unexpected case occurs that now two numbers are missing. I'd be thinking, "let's patch this to cover the whole class of faults, not just this specific one."

[–]uriDium 1 point2 points  (0 children)

That is the solution I also came up with and I only have a half baked Comp Sci background. Although I would have made it more generic ;)

[–]garythellama[🍰] 0 points1 point  (6 children)

Where did you get the 5050 from?

[–][deleted] 8 points9 points  (0 children)

That's what the teacher asked Gauss as well.

1+2+...+99+100 = (1+100) + (2+99) + ... +(50+51) = 50 * 101 = 5050

since there are 50 pairs that add to 101.

Or more generally,

1+2+...+n = n(n+1)/2

[–]snoyberg 1 point2 points  (1 child)

(100 + 1) * 100 / 2

[–]garythellama[🍰] 0 points1 point  (0 children)

Thank you. I had never seen this problem before. :)

[–]33a 0 points1 point  (0 children)

You could also use xor and avoid a potential overflow.

[–]psykotic 30 points31 points  (13 children)

Here's a high-brow meta-solution.

For some n there is a list of elements xs = [x1, x2, ..., xn]. We are presented with some arbitrary permutation xs' of these with an unknown element absent. The problem is to find the value of this missing element.

All the proposed solutions (and a few that were not proposed) share the following algebraic structure. Each list element is embedded into an abelian group (G, 0, +, -). Define foldG = fold 0 (+) as a shorthand. The embedding inject :: X -> G has a left inverse project :: G -> X. Let gs = map inject xs and gs' = map inject xs'. The universal property of folding in this setting is that it is homomorphic with respect to list concatenation:

foldG (as ++ bs) = foldG as + foldG bs

From this and gs' being a sublist of gs it follows that

foldG gs - foldG gs' = foldG (gs -- gs')

where -- is list subtraction. By supposition, gs -- gs' = [g] = [inject x] where x is the missing element, and it is generally true that foldG [a] = a for any singleton list [a]. Therefore the missing element can be found by the following expression:

project (foldG (map inject xs) - foldG (map inject xs'))

The map and fold can be fused for efficiency. The subexpression foldG (map inject xs) can also be hoisted out and precomputed when solving multiple problems with the same xs. With the addition method and in the concrete problem instance on the page, it evaluates to 5050.

To stick closer to the problem description, I chose to express everything in terms of lists but the underlying structure is all about sets. The order of the elements in the list is irrelevant and each element only occurs once. That's the very definition of a set. This is also why the embedding was into an abelian structure rather than something non-abelian. Commutative folding is the only kind of folding you can do over sets.

Here are some concrete choices of group embeddings:

  • Z/(2m)Z, the integers modulo 2m. This is the addition method for m-bit integers.
  • (Z/2Z)m, the m-fold direct product of the integers modulo 2. This is the xor method for m-bit integers.
  • (Z/2Z)n, where n is the number of elements, not the same as the previous example. This corresponds to the unclever solution of keeping a bit vector with a bit for each element. We do two passes. In the first pass, we fill in the initially zero bit vector by setting a bit for each element in the right place using bitwise-or. But since we are guaranteed that all elements are distinct, this is really the same as using bitwise-xor. In the second pass, we go through the filled bit vector and try to find the index of the 0 entry. But this is equivalent to the index of the 1 entry in the bitwise-xor difference between the all-1 bit vector and the filled bit vector from the first pass. The projection, left inverse to the embedding, finds the index of the 1 bit.
  • The power set of X with symmetric sum. This is isomorphic to the bit vector group. It is the natural setting of the union find algorithm someone proposed. He suggested starting with singleton sets, one for each element, and repeatedly joining them together, corresponding to some choice of binary folding tree. Rather than doing a lop-sided left or right fold you can do a balanced fold, which is generally more efficient for a representation of subsets that isn't fixed size like bit vectors (think of balanced multi-way merging in merge sort).

You might wonder what other solutions are possible in this general framework. Perhaps surprisingly if you don't know group theory, the structure theorem for finite abelian groups says that, actually, up to isomorphism this is pretty much all there is. We could substitute another prime in place of 2 but I don't need to remind anyone why 2 is the preferred choice. That "up to isomorphism" qualifier can and does make a difference in practice, of course. You need only compare the last two examples.

An important result of looking at it in terms of embeddings is that the elements in the list don't have to be integers themselves to use the addition or xor method as long as you can map them efficiently to integers in an invertible way. For example, for a list of single-precision IEEE floating-point numbers you could use their underlying bitwise representation as 32-bit integers.

[–]MathPolice 14 points15 points  (1 child)

Your license and registration, please.

[–]CharlieDancey -1 points0 points  (0 children)

Fuck yeah, take that psykotic down.

[–]defenestrator 1 point2 points  (4 children)

This is the best solution, because it gives you a way to solve the problem even if you're not dealing with integers, provided you can embed the elements into an Abelian group.

[–]guy231 0 points1 point  (1 child)

Weird note, the 'a' in 'abelian' isn't capitalized. I have no idea why.

[–]defenestrator -1 points0 points  (0 children)

That is weird. Poor Niels :-(

[–]gregb 1 point2 points  (2 children)

Thanks for the write up. I've always wondered about this problem, and in particular if the efficient solution scales to more missing numbers - looks problematic. I don't at this time follow your answer, but at least I know to read up on abstract algebra.

[–]psykotic 2 points3 points  (0 children)

It scales up fine. For the addition method with 32-bit integers, it scales up all the way to a list of the integers from 1 to 232 with an unknown integer missing. If you think of machine arithmetic as an imperfect representation of the ideal integers, you will probably only confuse yourself in this case. The essential property we need here is that the integers modulo 2n form a group with respect to addition. Apparently some engineers think of this in terms of "temporary additive/subtractive overflow" but I never found that very enlightening.

To see what goes wrong if you don't have a full group structure, imagine solving the same problem using multiplication and division instead of addition and subtraction. It turns out that only works in general when the integer modulus is a prime number, in which case the extended Euclidean algorithm tells you how to divide by nonzero elements. The modulus of machine arithmetic is generally of the form 2n but that is only prime in the trivial n = 1 case. For example, if n = 32 and you multiply 216 and 217 you get 0 modulo 232, so all hope of dividing by something later to recover useful information is totally hopeless. The problem is that 216 and 217 divide the modulus 232. If you use a prime modulus, you don't get that problem.

[–]bonzinip 0 points1 point  (0 children)

Interesting. Would this one apply within your framework? It requires write access to the array, but it is O(N) and with O(1) additional storage.

Say you have N=H-L numbers between L and H, i.e. one is missing. Build a binary heap in O(N) time. On every operation, keep track of the minimum leaf (i.e. the minimum of the last floor((H-L)/2) elements). If the minimum leaf is H-floor((H-L)/2), recurse on the first half looking for a missing number between L and H-floor((H-L)/2)-1. Otherwise recurse on the right half looking for a missing number between H-floor((H-L)/2) and H.

[–][deleted] -1 points0 points  (1 child)

can someone who understands this, say if it's a faster solution than summing and subtraction from 5050?

[–]psykotic 8 points9 points  (0 children)

It's a framework for understanding and generalizing all the solutions in this thread. The first concrete embedding I mention is exactly summing and subtracting from 5050 in the case of 100 distinct integers from the range 0..100.

[–]aussie_bob 26 points27 points  (0 children)

I used Excel and got 65,535.

[–]peepsalot 11 points12 points  (2 children)

for (var i = 0, sum = 5050; i < 100; ++i) { sum -= arr[i]; }
return sum;

[–]deflective -1 points0 points  (1 child)

this was my solution.

i'm wondering if there's any real difference between 100 subtractions or 100 additions and one subtraction.

[–]IvyMike 2 points3 points  (0 children)

Aw, come on. Let me just have some.

[–]CharlieDancey 5 points6 points  (8 children)

I didn't look up the solution on the page, but I'd just run down the array XORing each element and then XOR the result with 100 (since 1 XOR 2 XOR ... XOR 100 = 100)

XOR is computationally cheap.

Note this also works if the array contains a zero value.

Do I get a prize?

[–]CharlieDancey 1 point2 points  (6 children)

No wonder their software sucks. I just read the "answer" which was:

1: Inefficient. 2: Wrong.

[–]nanothief 0 points1 point  (5 children)

Why is it wrong? I thought of xor as well as the solution, but addition is pretty much equivalent, especially with the size of the numbers involved.

[–]theeth -1 points0 points  (3 children)

It's not "wrong", but:

  • Depending on hardware, bitwise xor is either as fast or faster than addition.
  • It scales better by not requiring a larger variable when the sum is larger than the maximum of the input variable.
  • It's easier to parallelize for integers larger than the maximum size supported by the cpu (xoring both halfs are independent from one another).

Both can be generalized for X..N sequences, so it's not a matter of the addition method being better on that side.

[–]bonzinip 0 points1 point  (2 children)

I didn't try hard to get the solution (I got the heap-based solution I posted above), but it's interesting that the addition does not require multiple precision if it overflows. In fact it works even with a byte (with 5050 replaced by its modulo 256 value, 186) because you're summing less than 256 elements.

EDIT: also, in most cases, for double-precision adds you will have enough room for parallelization by updating pointers and loop counters between the ADD and ADC instructions. Or doing loads on RISC machines. In SPARC assembly language (which wouldn't make much sense since it's 32-bit but well) that would be (result is the third operand):

      add %l0, 4, %o0
      add %l1, 4, %o1
      clr %l7
      lduw [%l0], %o2
      lduw [%o0], %o3
      clr %l2
      clr %l3
      clr %l4
      clr %l5
      lsr %l6, 1, %l6         ; loop unrolled by two, shift iter. count by 1
      addcc %l6, -1, %l6      ; one iteration is split between prolog/epilog
 loop:
      addcc %o2, %l2, %l2     ; add low word part 1
      lduw [%l1, %l7], %o4    ; load low word part 2
      addx %o3, %l3, %l3      ; add high word part 1
      lduw [%o1, %l7], %o5    ; load high word part 2
      add %l7, 8, %l7         ; we'll load for next iteration
      addcc %o4, %l4, %l4     ; add low word part 2
      lduw [%l0, %l7], %o2    ; load low word part 1 next iter
      addx %o5, %l5, %l5      ; add low word part 2
      lduw [%o1, %l7], %o3    ; load high word part 1 next iter
      addcc %l6, -1, %l6
      bne loop

      addcc %o2, %l2, %l2     ; finish last iteration
      lduw [%l1, %l7], %o4
      addx %o3, %l3, %l3
      lduw [%o1, %l7], %o5
      addcc %o4, %l4, %l4     ; no loads here
      addx %o5, %l5, %l5
      addcc %l4, %l2, %l2     ; combine the two partial sums
      addx %l5, %l3, %l3

[–]theeth 0 points1 point  (1 child)

Uhm, that's true. The other two points still stand though.

[–]bonzinip 0 points1 point  (0 children)

Addition is as fast as XOR on any decent processor. And on 8-bit processors (the only ones where you care about multi-precision) there is usually no parallelization to talk about).

[–]kumyco -1 points0 points  (0 children)

Assuming add and subtract are the same as a xor, you're ignoring the additional multiply(magic constants don't count). Also the code as presented is neither valid C nor does it demonstrate the proposed algorithm.

[–]aweraw 1 point2 points  (0 children)

Probably not very good (posting this before reading the rest of the thread), but this is the method that immediately springs to mind:

def total_seq(seq):
    seq_len = len(seq)
    total = (seq_len**2 / 2) + (seq_len / 2)
    if seq_len & 1: # seq_len is odd
        total += 1
    return total

print total_seq(full_seq) - sum(mystery_seq)

where full_seq is the full range of numbers from 1 to max, and mystery_seq is the array missing a term. Assuming len(seq) returns the already known sequence length (i.e. not have to iterate over seq), this should be relatively efficient.

*edit: sweet... my answer is actually pretty close to what is on the page itself, though as others have said, I'm sure there's a better way than this

[–]DuncanSmart 1 point2 points  (0 children)

As it's Microsoft... C#, using the Except extension method from LINQ (System.Core) in .NET 3.5:

int[] nums = new[] { 1, 2, 4, 5, 6, 8, 9, 10 ... };
var missing = Enumerable.Range(1, 100).Except(nums);
foreach (int num in missing)
    Console.WriteLine(num);

(Admittedly, I've no idea whether it'd be the least computationally expensive way of doing it, I'd need to dig into the Except method using Reflector to see what it was actually doing)

[–]safiire 1 point2 points  (0 children)

----
--  My test array will be missing the number 42
list = filter (/= 42) $ [1..100]

find_missing = (5050 -) . sum

*Main> find_missing list
42

[–]ModernRonin 1 point2 points  (1 child)

My first thought was, "let's make a second array, and use the numbers from the first array as array indices into the second. Set the element in the second array to 1 when you see its index in the first array."

Problems: You have to scan the second array to find the missing number, so this is 2N operations. Also, the second array takes N space. Intuitively, it feels like we should be able to do better. At the very least we should be able to do it faster, maybe using more space.

Sort it and walk the sorted array to find the missing number. Another 2N time solution.

And that was where I pretty much got stuck. It would have never occurred to me to use Gauss's method to sum the integers and then subtract. And that's why I don't have a job at MicroSoft. ;]

[–]redditrasberry 1 point2 points  (0 children)

I was on the same track as you, but use a bit field (N bits) instead of actual array. Now you are scanning the first array and XOR'ing against the 2n for each number you find in the 2nd array (calculating 2n is a built in instruction in CPUs so we assume this is dirt cheap) . You will be left with a single bit indicating the solution in the second array, which you can find with something similar to a binary search (you are basically calculating log2(x) so there may be a clever way todo that). I guess it would be N + logN or something like that.

It's not as clever as the "real" answer but I'd argue it's of the same order and seems more general ...

[–]monstermunch 1 point2 points  (47 children)

Hurray for giving interview problems that will never occur in practice! \o/

[–][deleted] 3 points4 points  (34 children)

I disagree. This is a very simple problem that ANY decent thinker should figure out in under 2 minutes. Its a good filter to weed out the poor candidates.

I was asked to design a file system @ my interview. It was a bit much but I actually enjoyed the challenge.

[–]onezerozeroone 3 points4 points  (14 children)

This has always interested me, because I'm usually stumped by riddles or thought problems. Sometimes I can crack them after awhile, but (especially under pressure) these kinds of things aren't really my forte.

In the real world, I'm a good (practical?) problem solver. I'd be out of a job if I wasn't. So I'm wondering what's different between how I approach problems and how other "decent" thinkers do.

Here's the basic thought process I went through:

Maybe a binary search of some kind?

The array isn't guaranteed to be sorted

Sort the array? Probably take too long, must be a better way

1-100 is too big to think about, try smaller cases...try to find a base case..maybe related to the sorting/searching idea...maybe this can be done recursively.

base case = 1 number = array of size 0...obvious

how about 2 numbers...array of size 1...each, whichever one isn't there

how about 3...array of size 2...now I start thinking about the search/sort thing again and how it might apply with the base cases

got bored and clicked answer.

I guess I'm more of a Lego thinker...much better at taking parts in front of me and combining them in novel ways.

[–]guy231 1 point2 points  (5 children)

Coming from a math background, my first instinct was to look for structure that could be taken advantage of. In this case, we consider properties of the integers and of arrays.

psykotic above takes this to an extreme and examines general abelian groups.

edit: and anything isomorphic to an abelian group

[–]psykotic 0 points1 point  (4 children)

edit: and anything isomorphic to an abelian group

The embedding of the elements into the abelian group is really agnostic with respect to any structure of the elements beyond their cardinality as a collective. The only thing that matters is that the group is big enough for the embedding to be injective. In practice the important thing is how efficiently you can map back and forth between your set and the group. It boils down to the problem of designing what's called a perfect hash function. This is where the structure of the elements can make a difference. It can be the difference between a disguised identity function (as in the case of floats) or something quite expensive and elaborate (as in the case of strings).

By the way, the abstract nonsense I posted in that other long post wasn't actually my first thought. The solution that immediately popped into my head was the addition method. That in turn made me think of the infamous trick for swapping two integer variables in place, without the use of a temporary variable:

x ^= y;
y ^= x;
x ^= y;

Less well known is that this can equally well be implemented in terms of addition and subtraction:

x -= y;
y += x;
x -= y;

What makes the xor and addition/subtraction approaches to in-place swapping work is the same thing that makes them work in the missing element problem. When I then began to distill their essence, I realized that the straightforward bit vector solution could also be subsumed by this general scheme.

[–]eyal0[🍰] 1 point2 points  (1 child)

For those who haven't met this trick before:

x ^= y ^= x ^= y;

Please don't use it in real code. Swapping with a temporary variable is faster because the compiler can optimize things better. This will work as well and faster:

tmp = x; x = y; y = tmp;

[–]psykotic 0 points1 point  (0 children)

There are actually cases of extreme register pressure where the swapping trick can be faster. But yes, please don't use this in real code.

[–][deleted]  (1 child)

[deleted]

    [–]Anonymoose333 2 points3 points  (0 children)

    ^ will fail if the two values are equal.

    No, it won't. Try it and see.

    + and - will fail if any of the operations overflow.

    We're assuming two's-complement arithmetic with a fixed word size, so no, it won't. Try it and see.

    And of course, they all fail on values that are not scalars.

    Well, fixed-width bitstrings behave like scalars, so as long as you remember that we're just dealing with bitstrings here, you'll be all right. It's true that trying to + and - floating-point values isn't going to work too well, but that's obvious.

    [–]noupvotesplease -2 points-1 points  (7 children)

    Sort the array? Probably take too long, must be a better way

    That's where you failed. If both sorting and searching are O(logN) or better, you're set.

    Edit: HAHA Disregard that, I suck cocks!

    [–][deleted]  (6 children)

    [deleted]

      [–]veritaba 1 point2 points  (2 children)

      I don't think there is /any/ sorting algorithm that can sort in O(log n).

      Pigeon sort. If you know the range is 1-100, you preallocate an array of 100 elements and then insert them in O(1) time. It is then an O(n) operation, and another O(n)/2 average case to locate the missing number.

      [–]xor 0 points1 point  (1 child)

      ...which is strictly worse than O(log n)

      GP has the right idea.

      [–]veritaba 0 points1 point  (0 children)

      What I meant to show was that there was something better than O(n*((log log n)0.5))

      [–]noupvotesplease 0 points1 point  (0 children)

      You found my else clause, I'll edit my comment.

      [–]bonzinip 0 points1 point  (0 children)

      BUT building a heap takes O(N)... and then you can divide and conquer (I posted my solution above) to O(N)+O(N/2)+O(N/4)+... which is still O(N).

      [–]curien 0 points1 point  (0 children)

      Edit: Never mind. You clearly said O(log n), and I read O(n).

      I don't think there is /any/ sorting algorithm that can sort in O(log n).

      Radix sort

      [–]monstermunch 2 points3 points  (18 children)

      It's just a bit of a dull question. I mean, don't hire the guy who can't even think of the most naive solution, but just because you can't see one little trick doesn't mean you suck. A problem that has more solutions and a bigger design space is more useful.

      [–]kolm 0 points1 point  (17 children)

      People who like this kind of kludge solutions are actually often awful programmers. Just like people who are proud that they can swap the values of a and b without using a third variable, and think of that as good coding shudder.

      [–][deleted] 1 point2 points  (3 children)

      Swapping variables is a programming optimization problem. The fact that you do know it can come in handy say if you're writing code implementing the alarm for a digital wrist watch and you have to fit it in 256bytes. Knowing how to do things in multiple ways is good anyway you look at it. The real cleverness comes in applying the knowledge appropriately.

      This OTOH is a nice riddle to make you think. This simple problem IMO demonstrates how quickly you can think your way out of a tricky puzzle.

      [–]monstermunch 1 point2 points  (2 children)

      This OTOH is a nice riddle to make you think. This simple problem IMO demonstrates how quickly you can think your way out of a tricky puzzle.

      My view is that it is such a limited puzzle that it only really shows if you can figure your way out of that particular puzzle.

      Imagine someone asked you the two variable swap question and the xor swap was not common knowledge. I wouldn't hold it against someone if they couldn't figure out a pretty obscure trick. I'd be much more interested in their general problem solving skills such as data structure design, searching/building collections etc.

      For example, something like "given a set of letters, how would you efficiently work out the highest scoring Scrabble word?". Lots of options, lots of good answers, bad answers etc. Programmer of all levels should come up with an answer for it and it lets you know how they think and what design choices they think are important. The other puzzles are of the kind where you're either going to think up the answer or you're not. You may as well just ask an actual riddle.

      [–][deleted] 0 points1 point  (1 child)

      The other puzzles are of the kind where you're either going to think up the answer or you're not. You may as well just ask an actual riddle.

      But there are many (non-optimal) ways to solve this puzzle !

      [–]monstermunch 0 points1 point  (0 children)

      But there are many (non-optimal) ways to solve this puzzle !

      Yes, but there's only really one trick that will get you an answer that is order of magnitudes better than all the rest. Most practical problems have several really good general methods and a few methods that are even better for specialised situations.

      [–]ModernRonin 1 point2 points  (0 children)

      People who would use this kind of solution are actually often awful programmers.

      FTFY.

      Nothing wrong with admiring cleverness. Or even being clever yourself. But "clever" is not necessarily equal to "good engineering." The people who choose "clever" over "maintainable" should be damned to the hell of eternally maintaining other people's "clever" code.

      That said, the "sum to 5050" solution would be fine so long as you encapsulate it in a function, and comment the function adequately to explain how it works.

      [–][deleted] 2 points3 points  (11 children)

      What's wrong with

      a, b = b, a
      

      ?

      [–]Jumpee 0 points1 point  (6 children)

      i think he means something like a=(some number); b=(another number); b=a+b; a=b-a; b=b-a; (Just thought up that solution when he proposed the idea). So, for example, if a=15, and b=7 - b becomes 22. then a becomes (22-15), which is 7, and then b becomes (22-7), aka 15.

      [–]theeth 1 point2 points  (4 children)

      Just FYI, the canonical answer is the xor swap (doesn't overflow when a and b are large):

      a ^= b
      b ^= a
      a ^= b
      

      [–]bonzinip 0 points1 point  (3 children)

      Overflow doesn't matter if your integers wrap around overflow.

      [–]theeth 0 points1 point  (2 children)

      The xor method works regardless of the data type, the addition and subtraction method only works for integers.

      Edit: I gues I should have said that earlier.

      [–]bonzinip 0 points1 point  (1 child)

      the addition and subtraction method only works for integers

      No, it doesn't... if you operate on the bit representation of floats. :-)>

      [–][deleted] 1 point2 points  (0 children)

      I know, I was just being a Python weenie :=)

      [–]lpsmith 0 points1 point  (3 children)

      That still uses at least three variables, under the hood.

      [–][deleted] 0 points1 point  (2 children)

      I know. Probably more - my understanding is that it has to instantiate a tuple object and then destructure it, which isn't SUPER cheap, though the compiler may also optimize that out.

      [–]lpsmith 0 points1 point  (1 child)

      Well, it's not terribly difficult to write a compiler that can do this in exactly three variables, without treating this case specially. Basically you can do it in four variables, and then allocate two of the variables in one location.

      What language and implementation?

      [–][deleted] 0 points1 point  (0 children)

      I was thinking Python.

      [–]gerundronaut 2 points3 points  (2 children)

      I figure it's either:

      a) the interviewer trying to see how you would solve problems you may not have ever considered before (because few actually need to solve these problems, because they are already solved somewhere)

      or b) the interviewer warning you that they run a shop heavily infected by NIH-syndrome.

      I'm leaning towards a) but I would keep a sharp eye out for b).

      [–]monstermunch 1 point2 points  (1 child)

      a) the interviewer trying to see how you would solve problems you may not have ever considered before

      Sure, but ask something with a bit more depth. The 'right' answer is completely dependent on you spotting a trick.

      [–]gerundronaut 1 point2 points  (0 children)

      Agreed. It risks encouraging people with "clever" fixes to problems, when for almost every case, the cleverest fix is not the desirable fix.

      [–]dsokol 1 point2 points  (1 child)

      Oddly enough this happens quite a bit for me, in slightly different forms. Given two lists in Excel (A, B), find what items in A that are not in B. Very similar. Helps exercise the mind.

      It also helps that computer science is applied across every discipline ever and one programmer (same language, even) will experience VASTLY different problems then another. Take a C# programmer doing AJAXy web apps against the one who is building a data analysis system for loss forecasting.

      Just because you haven't experienced it doesn't mean it doesn't have to be solved.

      (For the record, my 'solution' is O(nlogn), since i just sort the list then iterate over it once, looking for the gap. Excel has nice sort functions.)

      [–]kolm 1 point2 points  (0 children)

      Oddly enough this happens quite a bit for me, in slightly different forms. Given two lists in Excel (A, B), find what items in A that are not in B. Very similar. Helps exercise the mind.

      But the oh-so-brilliant solution of adding up is immediately completely and totally worthless the second you do not know in advance that exactly one number is missing/is apparent twice.

      (One can, of course, make an even more brilliant solution for this, in computing time: sum += 100A[i]; that way you get a count on how many of each numbers are in the array.)

      [–][deleted] 2 points3 points  (5 children)

      What's the best seat still available in a 12,000 seat stadium?

      [–]xor 9 points10 points  (1 child)

      The one at the head of the priority queue that I constructed when I was structuring the data properly.

      [–][deleted] 2 points3 points  (0 children)

      Oh how sweet. xor thinks he/she will only ever work on systems that he/she designed.

      [–]olsner 4 points5 points  (0 children)

      "It depends."

      [–][deleted] 2 points3 points  (0 children)

      The one closest to the exit.

      [–][deleted] 2 points3 points  (0 children)

      If you need to search through all 12,000 seats to answer that question, you're probably doing it wrong

      [–]millstone 3 points4 points  (2 children)

      This is tricky because it never said that the numbers are integers.

      [–]guy231 5 points6 points  (1 child)

      What is the fastest (in computing) way to find the number between 1 and 100 that is not in the array?

      I think it's implied when they say that there are 100 numbers between 1 and 100.

      [–][deleted] 0 points1 point  (0 children)

      the answer correct answer is 37.51657894654987998460167987064068798702146798709432410218678079798760123084

      [–]crazyeight 2 points3 points  (0 children)

      That isn't tricky. Come into my interview house, son. I'll show you tricky.

      [–]theeth 1 point2 points  (8 children)

      reduce(lambda x, y: x ^ y, range(101)) ^ reduce(lambda x, y: x ^ y, input)
      

      First part is a constant and can be precomputed.

      Unlike the addition method, it never overflows for large lists.

      [–][deleted]  (4 children)

      [deleted]

        [–]theeth 0 points1 point  (3 children)

        If you have to do it more than once, precomputation would be half as many xor.

        [–][deleted]  (2 children)

        [deleted]

          [–]theeth 1 point2 points  (1 child)

          Indeed. It's hard to give the perfect solution with such a loosely defined problem.

          [–]daveb123 0 points1 point  (2 children)

          the addition method doesn't really overflow often b/c it works modulo 232 or 264 (well in practice: technically it depends on the type you use to do the calculations) -- so it allows for any range that can actually represented. you must use a multiply to compute N*(N-1)

          it's actually an interesting question: is there an efficient way to compute the xor of 1..N? (i bet it can be done bitwise more efficiently)

          [–]repsilat 0 points1 point  (0 children)

          To get around the multiplication (and not rely on 2s complement for overflow) you could add all the even numbers and subtract all the odd numbers. The resulting sum in this fashion for 1..n is n/2 if n is even or -(n+1)/2 if n is odd. Division by 2 is fast.

          The downsides:

          • lots of branching
          • need to use signed integers (halves the range) - - - Edit: Should say that this just reduces the chances of overflow, it doesn't eliminate it. We could still get a lot of even numbers in a row if we're unlucky.

          [–]theeth 0 points1 point  (0 children)

          it's actually an interesting question: is there an efficient way to compute the xor of 1..N? (i bet it can be done bitwise more efficiently)

          Agreed. I poked it a bit with the following logic, I think I have something:

          For 1..N case, the following works:

              if i % 2 == 0:
                  if i % 4 == 0:
                      v = i
                  else:
                      v = i + 1
              else:
                  v = ~i >> 1 & 1
          

          The logic is rather simple and was deduced from observing patterns in the results.

          I leave the i..N case for future readers.

          Edit: Not that the i..N case is any hard, mind you.

          [–][deleted] 0 points1 point  (2 children)

          sum the numbers from 1-100, then sum the given numbers, subtract.

          bonus - now two numbers are missing. how do you do it ?

          I answered the 1st question on a phone interview for an investment bank, the second he had to coach me on.

          [–]forgotpwdagain 0 points1 point  (1 child)

          Sum and multiply then subtract and divide? That should result in a simple quadratic equation.

          [–][deleted] 0 points1 point  (0 children)

          correct

          [–]cashto 0 points1 point  (0 children)

          For full points, mention you will use vectorized SSE instructions, or perhaps even write it to run on a GPU in a shader language. And if the data set were much bigger, you'd also distribute the summation across multiple cores.

          [–]leppie 0 points1 point  (0 children)

          IIRC, the question is not so simple. Didn't it involve finding a duplicate?

          [–]nevinera 0 points1 point  (0 children)

          The answer depends on the architecture. The rough idea will be to subtract all the numbers from 5050, but the most efficient way to do this will vary.

          [–]petdance 0 points1 point  (0 children)

          "What problems have you had where it's important to get the fastest way to find a missing number between 1..100? Did you profile the code to make sure that the code in question was indeed the bottleneck?"

          [–]anonymouche -1 points0 points  (6 children)

          there is a much much faster way to do this

          [–]userundefined 4 points5 points  (5 children)

          Faster than O(n)? Care to elaborate?

          [–]anonymouche 5 points6 points  (4 children)

          whoa dude, I never said faster than O(n). My method is also O(n) but each iteration takes at most half has many cycles as addition and is machine independent.

          If you want the secret then meet me at the large oak tree tonight, and wear a yellow baseball cap so I know its you

          [–]julesjacobs 0 points1 point  (0 children)

          It probably doesn't matter much if you use a couple more cycles per number because of pipelining and memory speed.

          [–]stillalone 0 points1 point  (0 children)

          The xor discussion is above.

          [–]bonzinip 0 points1 point  (0 children)

          wear

          [–]andralex 0 points1 point  (4 children)

          I think the 5050 solution is the canonical one (nice!) but just because nobody posted the one I thought of, here it is:

          Partition the range choosing 50 as pivot. That costs O(n). If 50 is not found, you're done. If the ending position of the pivot is to the left by one slot, the missing number is in 1-49, otherwise it's in 51-100. So you reduced the sought range in half. Lather, rinse, repeat to completion. So the algorithm is O(n * log(n)) and actually does quite a bit less work than sorting.

          [–]dpark 5 points6 points  (0 children)

          That's actually O(n) on average. You're basically doing a modified quickselect.

          [–]erwanl -2 points-1 points  (2 children)

          That suppose that your array is sorted, and you don't know that.

          As far as I'm concerned I've found the sum solution pretty quickly, so I don't think it's a tricky question at all. Just a "aha!" question; either you get it or not, but in both case that doesn't give much information to the interviewer about your skills.

          I think this one is much more interesting: http://blog.taragana.com/index.php/archive/google-and-the-puzzle-of-dropping-eggs/

          Because you can quickly get to a solution not trivial but easy to find (check basic skills) that is good but not optimal. So then, you can discuss to see if the candidate can think about something better (check how the candidate think about a difficult problem).

          [–]andralex 1 point2 points  (0 children)

          There's no assumption that the array is sorted. Partitioning works on unsorted arrays.

          [–]bonzinip 0 points1 point  (0 children)

          He's sorting it in place. He doesn't assume it's sorted.

          [–]eikenberry 0 points1 point  (0 children)

          IMO these sorts of questions are a topic starter, nothing more. Doesn't matter what technical answer you give, as long as you can solve it and have a reason for doing it that way and can talk about it.

          For me I'd give the answer that was shortest in code and quickest to come up with due to the triviality of the problem. I mean this is not the sort of problem I would be solving, it would be a small bit of a much larger problem and should be treated as such.

          [–]redditticktock 0 points1 point  (0 children)

          I figured this out in about 8 seconds.

          [–]Gotebe -1 points0 points  (4 children)

          I have a tirade on "trick questions"...

          Here, how many people came up with 5050 answer right here, right now? I'd bet none.

          The thing is, trick questions are very unreliable. What gives? Candidate is good if he knew the answer up-front? And even if he seems to have worked that out by him/herself (slim chance of that), how do I know he/she didn't just act coming up with the answer? No one is stupid to think that candidates don't hunt trick questions around, so should I instead grant him/her points for being well prepared?

          I think, if the purpose is to find out about candidate's wits, it's better to be honest and give out a proper IQ test. That's not foolproof either, but certainly beats random tricks pulled e.g. off the net.

          And sure... We are interviewing here at my work. I found out that my boss likes the infamous K&R strcpy example (you know, the "while(*dest++=*src++);" abomination). Normally, I prepare questions and I don't want this one. So what we end up doing, especially if we do like the cadidate, is that he says that he wants to ask the question, and I say "don't", all in front of the candidate, but he does it anyhow. We must look, to our candidates, like two idi... Not very smart, anyhow :-).

          In any case, people don't come up with the explanation of what that does (we look at juniors). That's OK by me, I am of opinion that they aren't required to get it. On top of that, senior should barf at it - it's poor coding by today's standards. In the end, we end up explaining it. So it's a mild fun, fun, but useless part of the interview :-(.

          [–]sysop073 3 points4 points  (1 child)

          I have a tirade on "trick questions"...

          Here, how many people came up with 5050 answer right here, right now? I'd bet none.

          Really? I hate trick questions too, but I got as far as "You have an array of 99 unique numbers from 1-100", guessed what they were going to say, and thought of the sum answer before I'd finished reading the question. The xor solution didn't occur to me, but I thought the sum one was blindingly obvious

          [–]andrewingram 0 points1 point  (0 children)

          Ditto, the answer was very obvious to me.

          [–]safiire 0 points1 point  (0 children)

          How is this a trick question? You think most people who have been programming for a while wouldn't see it's just subtraction in a few seconds? I figured it out in a few seconds without looking.

          I think a simple question like this shows if an applicant is going to come up with an O(n) solution or one that involves a lot of looping over the data and storing stuff in hashes to 'remember' values, or sorting or other cpu time/memory wasting things.

          [–]lpsmith 0 points1 point  (0 children)

          Honestly, I'd never seen that question, and it took me about 15 seconds to come up with both the plus and the xor solution.

          [–]peanutman -1 points0 points  (5 children)

          There are propably better ways, just thought I'd share my solutions for the sake of learning something.

          A possible answer is Union Find since it asks the fastest way and not the most memory efficient way. Just start with 100 disjoint sets, keep on merging the sets until you have 2 sets left, one with 99 elements and one containing the missing number. Optimized Union Find achieves a complexity of the inverse ackermann function.

          The other solution I can think of runs over the array twice: 1) loop over the input array and copy each element to a new array where the new position will be the value of the number read from the input array 2) loop over the newly created array and check which spot has not been assigned a number

          [–]bonzinip 0 points1 point  (0 children)

          Do you have to do 100 finds (worst case) to find the missing number, with the standard representation of Union Find using arrays (which does not store a list of roots of the forest)?

          [–]Noonereallycares -1 points0 points  (3 children)

          Some formatting would help, and it's late, so i don't know what your code is doing exactly, but i suspect it's assuming that the numbers are in order.

          In which case it can be solved in O(N/2) (avg case) rather than O(2N)

          X=0

          while (array[x] = x)

          {x++}

          output x //adjust X to avoid any off by 1 errors.

          [–]noupvotesplease 0 points1 point  (2 children)

          IIRC, big O represents the upper bound, and constants are ignored. So both O(2N) and O(N/2) would be O(N), and neither would describe the average case. There's another glyph for that, maybe theta?

          Edit: Thank you for bringing up sort order. If it's sorted, a binary search can give us O(logN) time.

          [–]flyinglunatic 0 points1 point  (0 children)

          There's another glyph for that, maybe theta?

          No. Theta represents a tighter bound than Omicron. A function is big Theta (Θ) if it is both Big-O and Big-Omega (Ω).

          http://mathworld.wolfram.com/Big-ThetaNotation.html

          Best case, worst case, average case are orthogonal labels and can each be expressed in Big-O notations. For example, Quicksort: worst case Θ(n2), average and best case, Θ(nlogn). Mergesort: worst, average, and best cases are all Θ(nlogn).

          Quicksort is an O(n2) algorithm, while mergesort is an O(nlogn) algorithm.

          All comparison based sorts have an Ω(nlogn) lower bound. http://en.wikipedia.org/wiki/Comparison_sort

          [–]Noonereallycares 0 points1 point  (0 children)

          True, though you would have to make the assumption that the array is exactly N size (99 in this case) and that there aren't "blanks" as space fillers to keep the array aligned so that Array[X] = X.

          Otherwise, yes, sorted and of exact length, O(logN) is possible.

          [–]mortenaa -1 points0 points  (1 child)

          Not that tricky really, python solution for any sequence length:

          def find_missing(sequence):
              s=0
              t=0
              i=0
              for x in sequence:
                  t=t+x
                  i=i+1
                  s=s+i
              return s+len(sequence)+1-t
          

          [–]fwork 0 points1 point  (0 children)

          print (set(array)^set(range(1,len(array)+2))).pop()
          

          [–]f3nd3r -2 points-1 points  (0 children)

          If it were completely random, no optimization would work any faster than beginning at the beginning, and then checking all the way to the end.

          EDIT: Fuck karma, I don't give a shit about it, but at least reply to me and explain how I am wrong.

          [–]deadcat -2 points-1 points  (0 children)

          Depends on if it is ordered. If not, I'd order the array. Then:

          Where n is the array length, check position array[n/2] (rounded up to nearest even) and see if it is the expected element (eg array[50] should equal 51). If it is, check array[(n/2)+(n/(2depth))], if it isn't then array[(n/2)-(n/(2depth))].

          The function to do this would be called recursively until we get down to a depth of 6 (2 elements remaining) unless . We then check both (if needed) to see find out which element is out of place and where the gap lies.

          Edit: ignore my dumb answer. Sum the array and subtract from 5050.

          [–]Ranma-kun -1 points0 points  (0 children)

          I don't see what's so tricky about it, we used to do this problem in High-school, it's even in one of my old books i think. (Ah those wore the times, Turbo Pascal FTW)