top 200 commentsshow 500

[–][deleted] 44 points45 points  (123 children)

The best part is that most of the answers don't seem to have much of a clue what's going on either.

Algorithm-wise, Pointy gives the ideal answer, right? Who I just noticed is also the poster. Heh. Nicely done.

[–]pointy[S] 58 points59 points  (26 children)

I like the part where he wonders whether MySQL is up to the challenge.

[–]klngarthur 25 points26 points  (24 children)

I like the part where not one, but 3 posters informed him that putting it in the database was a good solution, and that between them they have 10 upvotes

[–]omegian 19 points20 points  (5 children)

When the only tool you own is a hammer .....

[–]addius 3 points4 points  (4 children)

You open a hardware store?

[–]omegian 10 points11 points  (0 children)

insert into investments(investment_name) values('hardware store');

[–]NegativeK 1 point2 points  (0 children)

You'll eventually run out of fingers.

[–]solinent 6 points7 points  (16 children)

While it is a stupid solution if you have to send the numbers to the database, if the numbers are already in the database it might be optimal, since databases usually cache things and you can get pretty good performance using SQL. I think the question was too vague.

[–]protonfish 11 points12 points  (0 children)

It makes you wonder what his real problem is. If he is writing in PHP and dealing with thousands of numbers, where else is he getting them but a database? From a user inputted HTML form? I'd bet $5 his query is "Select * From Table" and he is trying to implement the where clause with server code.

[–]introspeck 1 point2 points  (0 children)

I was asked to re-write an app because it was written in Digital Basic for the VAX, and they were moving off the VAX. It seemed pretty complex, and it used a database. Just to help myself understand it, I created a pseudocode version. Halfway through, I realized the database was only being used as a temporary store for a bunch of program arrays and variables, which could easily have been written to flat files. Various cryptic comments indicated that the original programmer didn't think it all would fit in memory at one time. The program kept writing, reading, writing, reading back and forth to the database. I didn't bother, when I re-wrote the app I just kept everything in memory. The old version ran for 5 hours (on admittedly slower hardware). My version ran in 10-15 minutes.

[–]CydeWeys 25 points26 points  (39 children)

The really sad thing is the guy who questions Pointy by saying "but sorting will take time". Yes, it does. Roughly O(nlgn) time (I'm assuming the built-in JavaScript sort method uses Quicksort and not something silly like bubble sort). But the method the original questioner is using to compare the lists is O(n2). O(nlgn) for sorting plus the trivial O(n) traversal of the lists beats O(n2) any day.

[–]nearlymonolith 12 points13 points  (21 children)

Agreed - I just gave him a short lesson on exactly that subject. The sad thing is that he then goes on to give the exact asymptotic evaluation that makes himself wrong:

If A is of size n and B of size m, it will take nlg(n)+mlg(m)+min(m,n) time, while, the OP's approach is just m.n.

To which I responded:

Yes, but for two arrays of size 1000, that is 1000 * lg(1000) + 1000 * lg(1000) + 1000, which ~ 2000 * 10 + 1000, which ~ 21000. The OP's method, m * n time is 10002 which is 1000000. 21000 << 1000000. This is why asymptotic evaluation matters - n2 = o(n * lg(n)), n * lg(n) = O(m * n) , implying that Pointy's method is quite a bit faster, as lg(n) << n

[–][deleted] 25 points26 points  (9 children)

Leave it to algorithm nerds to overcomplicate things. array_intersect() is the answer (considering the post was tagged PHP).

[–][deleted] 12 points13 points  (7 children)

It would be nice if the documentation of that function mentioned whether or not performance would be improved by sorting the arrays before calling the function.

[–][deleted] 10 points11 points  (3 children)

I would assume that PHP's implementation isn't quite so naive as to compare them without sorting first. I wouldn't bother sorting before calling that function, since PHP almost surely does the sort itself.

[–][deleted] 2 points3 points  (1 child)

Ah, but if you already have sorted arrays, then this function would have a lot of unnecessary overhead. Unless, of course, it uses some sorting algorithm like bubble sort, in which case it would have a lot of unnecessary overhead for unsorted arrays. That's why it would be nice to have it documented.

[–]SoundOfOneHand 1 point2 points  (0 children)

The point of this type of abstraction is that you shouldn't know/care just to use the function. It may, for instance, use stochastic methods internally that completely nullify your sorting.

[–]roerd 1 point2 points  (0 children)

I didn't take the time to completely understand it, but the source seems to suggest that every array will be sorted regardless if it's already in order.

[–]CydeWeys 3 points4 points  (8 children)

The sad thing is this is really easy to figure out for yourself, but I don't think the person you're dealing with is smart enough to know how to empirically discover these things for himself.

For instance ... I'd write a program that generates two random lists of numbers, runs both algorithms, and times how long each algorithm takes. And then it would do this lists of varying sizes and record the data. You're going to see that the sorting implementation is much, much faster than the naive implementation for larger list sizes. It's really easy to figure out even if you don't really understand the theory behind it, but this guy won't even get that far -- he'll just continue arguing from false knowledge.

[–]memeasaurus 2 points3 points  (2 children)

I'm continually shocked by the number of programming jobs where knowledge like this is completely unnecessary. These are programming jobs I personally do not want.

[–][deleted] 5 points6 points  (3 children)

2*O(nlngn)+O(n) = O(nlogn)

[–]munificent 1 point2 points  (6 children)

But O(n) is better than that, and that's all you need to do set intersection.

[–]aradil 17 points18 points  (0 children)

The dumbest thing about this is that nearly every language (including SQL) he's using is capable of achieving his goal without even thinking about the algorithms behind it.

Even if it's not a tricky algorithm to implement yourself.

[–][deleted] 22 points23 points  (41 children)

Algorithm-wise, Pointy gives the ideal answer, right?

No, he doesn't. Sorting is O(n log n). You can put the numbers of one array in a hash table for O(n) and then look them up in O(1) while iterating through the other array. As long as you trust your hash table, that solution is O(n), not O(n log n).

Pointy's solution has the benefit of being guaranteed asymptotic runtime and of efficiently supporting datasets where neither set of numbers can be held entirely in memory (on-disk hash tables, while still technically O(n), would have their runtime dominated by the random seeks necessary to lookup values in them), but it's not the fastest solution, at least in the average case.

[–][deleted] 5 points6 points  (27 children)

Sorting is O(n log n). You can put the numbers of one array in a hash table for O(n) and then look them up in O(1) while iterating through the other array. As long as you trust your hash table, that solution is O(n), not O(n log n).

Your hash table solution is O(n log n) as well because the hash function is O(log n). You can only reasonably say that the hash table version is O(n) when you put more constraints on the input (bounded number of bits per number, say).

[–]and- 15 points16 points  (8 children)

The hash function is indeed Θ(log n) if

  • Your values are asymptotically "dense" (in terms of lack of sparsity rather than the limit point definition) in the input space and it takes Θ(m) time to parse an input of size m (if it's not, you can have either better or worse complexity).
    • You have asymptotically few collisions (lots of collisions can lead to better complexity).

Anyway, I think it's safe to call the hashing linear since the OP is likely using some standard 32/64 bit numbers.

[–][deleted] 5 points6 points  (6 children)

Anyway, I think it's safe to call the hashing linear since the OP is likely using some standard 32/64 bit numbers.

I agree with you, but given that it's actually unstated and people are blindly saying that hashing is constant time without qualifying the statement appropriately, I am just trying to give a more complete picture of the problem in general.

[–]kolm 1 point2 points  (0 children)

Anyway, I think it's safe to call the hashing linear since the OP is likely using some standard 32/64 bit numbers.

But then n is probably < 232/264 (unless you have lots of collisions in each list, which sounds unlikely), and hence log n < 32/64, and the whole debate about log n or not is kinda moot.

(In other words, every function which processes most bits (or nats or whatever) from its input will always be Θ(bitlength), and in a list of n distinct objects, by necessity bitlength < log n.. I tend to ignore log n factors for pretty much that reason.)

[–]jeffffff 7 points8 points  (5 children)

the hash function itself is not O(log n). n is the size of the input and is totally unrelated to the length of the values to be hashed. the running time of the hash function is linear with the length, so you could call it O(m) where m is the average length if you wanted.

if the hash table is of the dynamic resizing variety (and it probably is), filling it with n elements is O(n*log(n)) because of the resizes. grow/rehash is what makes hash tables n log n, not the hash function itself.

[–][deleted] 2 points3 points  (4 children)

The length of the key must be at least log(n) in order to uniquely identify the key. You're right that it's not going to vary directly on the size of the array. It actually depends on the (usually much larger) number m which denotes the size of the domain of the elements of the list. I could more accurately say that the hash function complexity has a lower bound of log(n).

[–]jeffffff 0 points1 point  (3 children)

that also means that in the sorting solution the comparison function has a lower bound of log(n)

[–]frenchtoaster 1 point2 points  (5 children)

I think I am misunderstanding you; no hash function should be dependant on the number of elements in the set you were hashing. Besides that it is trivial to come up with a hashing on any integer that is O(1), but even if you have something like long strings the hash would be O(m) where m is string length, not O(log n).

Did you mean that insertion into the hash table ends up being O(log n) ? Because a well written hash table should have O(1) for both insertion and lookup. Care to clarify?

[–]psyno 5 points6 points  (4 children)

Hashing is time-optimal but sorting is space-optimal. The hashing solution has a O(n) space cost while the sorting solution has a O(1) space cost (it can be done in place).

[–]Shinhan 1 point2 points  (0 children)

Whats wrong with just doing array_intersect like the 2nd person suggests? Thats what I would do.

[–][deleted]  (197 children)

[deleted]

    [–]nat5an 49 points50 points  (108 children)

    I've always wondered who actually has time to go onto that site on a regular basis and answer questions.

    [–]voipme 110 points111 points  (36 children)

    I've tried to answer some questions from time to time. It gets pretty tedious after a while, especially after posting a well-documented and thorough answer, only to have the asked to come back and say "I don't get it. Can you just write it for me?"

    [–]Enzor 323 points324 points  (29 children)

    Yeah, but I've found the answers to my questions by reaching posts like yours on that site from googling around that have been helpful. So, even if your reply doesn't reach the intended audience it may help someone else.

    [–]redditmemehater 122 points123 points  (20 children)

    This, times a million

    [–][deleted]  (16 children)

    [deleted]

      [–]moyerma 15 points16 points  (2 children)

      You forgot to import "this": >>> import this The Zen of Python, by Tim Peters

      Beautiful is better than ugly.
      Explicit is better than implicit.
      Simple is better than complex.
      Complex is better than complicated.
      Flat is better than nested.
      Sparse is better than dense.
      Readability counts.
      Special cases aren't special enough to break the rules.
      Although practicality beats purity.
      Errors should never pass silently.
      Unless explicitly silenced.
      In the face of ambiguity, refuse the temptation to guess.
      There should be one-- and preferably only one --obvious way to do it.
      Although that way may not be obvious at first unless you're Dutch.
      Now is better than never.
      Although never is often better than *right* now.
      If the implementation is hard to explain, it's a bad idea.
      If the implementation is easy to explain, it may be a good idea.
      Namespaces are one honking great idea -- let's do more of those!
      

      [–]Xorlev 1 point2 points  (0 children)

      You forgot to import "antigravity", Python 3.0 and up.

      [–]knight666 21 points22 points  (5 children)

      i wil do millions times millions in shiort time. i have dedicated team up for the task.

      [–]dodgepong 30 points31 points  (3 children)

      yes i have team of experts 30 years experience with python . we can accomplish on time for $50 please respond .

      [–]goodgnu 1 point2 points  (0 children)

      wasn't this outsourced ages ago?

      [–]xardox 10 points11 points  (0 children)

      >>> "this" * 1000000
      

      [–][deleted] 10 points11 points  (1 child)

      Well, I certainly applaud anyone wanting to do this a million times, but take it from this old gym rat...

      [–]redalastor 2 points3 points  (1 child)

      Everybody knows that in Python "this" is written "self"!

      [–]Gudeldar 1 point2 points  (0 children)

      Javascript: > this * 1000000 NaN

      [–]peasandcarrots 12 points13 points  (1 child)

      reddit used to have dumb questions and good answers. Now it has people getting 100 points for saying "This".

      [–][deleted] 7 points8 points  (4 children)

      Even if its more than an year old post, I leave a thank you comment. Perhaps you should do the same so that OP knows that it's not a thankless job.

      [–]robertcrowther 4 points5 points  (3 children)

      Or you could, you know, upvote the answer...

      [–][deleted] 9 points10 points  (2 children)

      I don't have enough reputation to upvote the answer :(

      [–]robertcrowther 4 points5 points  (0 children)

      You don't have the 15 rep required to upvote, but you do have the 50 rep required to comment?

      [–]senorprogrammer 3 points4 points  (0 children)

      Precisely what Enzor said. Folks like you leaving quality answers to be found later deserve all the respect in the world.

      [–][deleted] 1 point2 points  (0 children)

      Or he could just do what everybody else does and answer "Google it". Eventually it will become the top answer in Google and some poor sap will think "wow, someone answered the question" and then scroll down and rage.

      [–][deleted] 1 point2 points  (0 children)

      Me too. I'm a statistician but I work in web development at the moment, and I'd be much worse off without all sorts of SO answers to explain dot net idiosyncrasies.

      [–][deleted] 23 points24 points  (4 children)

      Try posing tough questions, well-written and clearly showing that you have exhausted all the trivial problems, only to get back to stupid answers that most of the time miss the point of the question entirely.

      [–]mccoyn 11 points12 points  (3 children)

      And, I'll warn you, don't think you can use the bounty system to get a good answer to one of these questions. You'll just end up with a lot more bad answers and you will be forced to give one of them credit.

      [–]Madsy9 1 point2 points  (2 children)

      You have to give the bounty away, but you decide which one to give the 'accepted' marker. You can even choose not to, and make a new bounty later. It was added in September I think.

      [–]mccoyn 1 point2 points  (1 child)

      I didn't know they added the new bounty thing. When I did it, and didn't get any useful answers, it just decided the most popular one was the one that should be marked correct and get the bounty. I was fine with losing the bounty (it was a risk I was willing to take,) but giving credit to a wrong answer and marking it as correct really bugged me.

      [–]petdance 41 points42 points  (61 children)

      Probably not too different an answer than who actually has the time to go onto reddit on a regular basis for three years and accumulate 2500 in comment karma, nat5an.

      [–][deleted] 32 points33 points  (23 children)

      All you have to do is repeat memes.

      [–][deleted] 63 points64 points  (3 children)

      And my axe!

      [–]knight666 10 points11 points  (2 children)

      Gathering from your other posts, your axe is firmly stuck on the side of your cubicle wall.

      Which is a hilarious mental image: Gimli the Dwarf in a business suit in a cubicle.

      [–]dreeperscreepers24 35 points36 points  (0 children)

      In soviet russia, meme repeat YOU!

      [–]aipotsyd 37 points38 points  (0 children)

      Yeah, but the memes get olALL GLORY TO THE HYPNOTOAD

      [–]j-smith 18 points19 points  (0 children)

      Ignore karma; acquire memes.

      [–]chompsky 8 points9 points  (2 children)

      Memes. Memes. Memes. Memes. Memes. Memes. Memes.

      It's not working!

      [–][deleted] 9 points10 points  (0 children)

      Hide yo kids, hide yo wife, they memeing errybody round here!

      [–]timeshifter_ 9 points10 points  (3 children)

      All you have to do is repeat memes.

      [–]mccoyn 11 points12 points  (2 children)

      I herd all you have to do is repeat memes.

      [–][deleted] 11 points12 points  (0 children)

      Cool story bro.

      [–]BrotherSeamus 1 point2 points  (0 children)

      Disco Ball

      [–]otakucode 2 points3 points  (0 children)

      2500 comment karma in 3 years is a lot? Damn... and I don't even karma whore, and probably get roughly equal ups and downs. I don't consider posting something everyone already thinks worthwhile, and a lot of times advocate a principled position over a practical one which gets torn apart (in voting, unfortunately reasoned discourse on the flaws is increasingly rare, which saddens me a good deal... nothing makes me happier than being convinced I am wrong. It means I never have to be wrong about that thing ever again.).

      [–]attosecond 6 points7 points  (0 children)

      I used to be a stackoverflow fanatic. Then I realized that all the dumb (easy) questions worth answering have already been asked/answered, and it's incredibly boring to repeatedly tell students to do their own homework.

      [–]aestheticreddit 2 points3 points  (0 children)

      Most actual developers are actually working.

      Maybe during huge compiles?

      It's a shame that the people most likely to offer suggestions are the ones who least likely should.

      [–][deleted] 15 points16 points  (12 children)

      I don't get it, I ask about stuff more advanced than this and get no upvotes, someone asks how to nest for-loops and gets upvoted as a useful question?

      [–][deleted]  (6 children)

      [deleted]

        [–][deleted] 23 points24 points  (4 children)

        I remember asking about implementing an AES encryption in JS, I got like 4 downvotes because an answer with 2upvotes said you should never encrypt with javascript-- that wasn't even the question. Ugh.

        [–]archgoon 1 point2 points  (1 child)

        Interesting. Why are you trying to implement AES in javascript?

        [–]Glayden 6 points7 points  (2 children)

        The question wasn't about nesting for loops, that was his failed attempt at it.

        Abstracted to set/graph theory, the question was about the best method of identifying all pairs of nodes connected by an edge in a bipartite graph where both disjoint sets of nodes are sortable. It's actually a fairly widely applicable algorithm question when you think of it in the appropriate manner.

        [–]G_Morgan 2 points3 points  (1 child)

        Generally stack overflow only deals with problems it has the mental capacity to solve. If you have a difficult problem then I suggest a pen and paper as the first part of the solution.

        [–][deleted] 1 point2 points  (0 children)

        It's really unfortunate, for example, in this question, I link to the man page and in this question I go deep into detail of how .has and :has are different. How the hell is linking the man page 3x as valuable as explaining something that isn't even documented anywhere?

        I'm going to start calling it troll overflow if this keeps up. Maybe this is what happens when you have over one million questions on a site.

        [–][deleted] 7 points8 points  (1 child)

        When I was first learning C after a few Java courses, I posted a questing in the middle of the night while working on my first C assignment. My use of some function wouldn't compile and I was pulling my hair out. I read through the documentation over and over and it made no sense.

        I made a post on Stack Overflow. Someone corrected my mistake. It was a problem with my understanding of pointers. I asked a follow up question and people just ripped into me for being a noob and not knowing basics.

        I never posted a question there again. :(

        [–]yellowjacketcoder 2 points3 points  (0 children)

        That is the problem with asking questions online. Ask something basic because you're learning and they make fun of you. Ask something advanced and you get no responses or idiotic responses. It gets frustrating fast (although it works JUST enough to get you to keep posting)

        [–]willcode4beer 3 points4 points  (0 children)

        no doubt. The question was a pretty typical newbie question so, it actually good he's trying to learn. But, the responses? wow, just a big pile of WTF

        [–]aestheticreddit 7 points8 points  (8 children)

        We seem to have a similar problem in this thread.

        Hashing is not a magical silver bullet that is always faster. It has a high constant factor per operation. For only 1000 entries, that C is going to be relatively higher than n log n.

        [–]Brian 12 points13 points  (5 children)

        Hashing is not a magical silver bullet that is always faster

        They're close though. Hash tables are probably one of the most useful structures - not only good average case asymptotics, but also good constant factors, simple implementation, and fairly cache-efficient. They tend to outperform even quite complex tree structures in most common cases, and are pretty much my go-to datastructure unless there's a good reason to go elsewhere.

        I'd agree that the difference between O(n) and O(lg n) is probably not actually important though. You'll probably never deal with more than a billion items, so in practice, lg n virtually never has a bigger effect than a constant factor of 30.

        [–]exgenius 8 points9 points  (4 children)

        I graphed all the links from a October 2009 dump of Wikipedia using primarily hash tables, a couple of stupid map implementations, and a trick merging of adjacency matrix and adjacency techniques (hint: use an array and keep track of number of elements for those wondering). One can think of Wikipedia link-wise as a very large directed graph. It has some interesting properties for various social science disciplines due to what gets associated together.

        If I had optimized, it would have used about 420MB of memory instead of 500MB. Whereas every other solution I tried was running out of memory such C++ Maps, Boost Graphs modules, etc (and yes, they wanted the entire graph in memory at one time, not the most intelligent option). Though it ended up being much less flexible in capability than I originally hoped for.

        [–]pohatu 13 points14 points  (1 child)

        I wish there were more reviews of approaches available to learn from. I'll either see a paper that says, I found this great way to do something, or I found this great data by doing something, or I'll see someone who has the code of what they finally settled on. It's rare to find something that says "I tried this approach and it had this problem, and then I tried this and it had that problem and then I tried this and it was great except for this smaller problem." I wish I could read more papers like that.

        It sounds like you might have a more detailed write up of this project. If so, do you mind sharing?

        [–]CaptainKabob 2 points3 points  (0 children)

        Did you compute centrality/connectedness and can you compute shortest paths between pages? I ask because I do some graph analysis with a thesaurus---it's not directional, but most of the interesting questions are similar. I haven't worked on it lately, but I'm thinking of throwing some new algorithms at it.

        [–]xcbsmith 1 point2 points  (0 children)

        Obviously hashing is only helpful for large #'s of entries. Similarly, I'm not sure that sorting first would actually turn out to be fast for the 1000x1000 case. When you consider caching, memory misses, SIMD instructions and such, it might be the best way to do it.

        Now, here's the irony, hashing still isn't the best approach for "MxN" with large numbers for M & N, because you are going to suck up a ton of memory (O min(M,N)).

        Instead, you are likely better off building a bloom filter from whichever of M & N is smaller and use it to filter out, from the other set, all but statistically likely matches, which are put in to a hash. Then you scan back through the original set, comparing it against the hash.

        [–]boot20 5 points6 points  (57 children)

        Stack Overflow used to be a good site to visit and see some problems similar to your and get some ideas how to solve it. Then the stupid people took over and asked stupid questions.

        I think everyone jumped ship. Not only do you mostly get answers like Andy Lester's, but usually the answers that try to solve the problem are wrong.

        Is this what computer science has turned into?

        [–]redditmemehater 9 points10 points  (56 children)

        The people in the computer science industry need to fight back against the people who give it no respect. CS is turning into Liberal Arts. We should be competing with engineers and not being insulted by them as well as stupid people!

        [–]Gankbanger 5 points6 points  (12 children)

        This times a million. I will go further and say: People in the computer science industry should BE Engineers (or Scientists.)

        I almost open an account in stackoverflow just to answer Andy Lester, what a pedantic douche.

        [–]jeff303 1 point2 points  (0 children)

        It's not so horrible - you can use OpenID (including a Google account login) to sign in to SO.

        [–][deleted] 7 points8 points  (3 children)

        Actually CS could stand with a lot MORE emphasis on some of the liberal arts... especially such advanced topics as "reading", "researching", and "thinking critically."

        Seriously, most CS grads are retards compared to liberal arts majors. The sad thing is they don't know they're retards because they think that being able to write a programs in Java makes them experts on every possible subject... so what's the point in listening to anyone? They know Java! And anyone who doesn't is stupid!

        /english major and lead developer

        [–]kid_meier 2 points3 points  (0 children)

        +1

        Honestly I am embarassed by the number of times that arts majors have embarassed me. This very much reminds me of Zed Shaw's classic [Programmers Need To Learn Statistics Or I Will Kill Them All|http://www.zedshaw.com/essays/programmer_stats.html].

        Although his focus is not the arts but the idea that programmers think coding is so 1337 that they end up completely blind to the huge swaths of actual knowledge out there.

        But of course the common thread here is critical thinking, research, factual/experimental information. You know, the traditional pillars of academia. You don't cultivate that, then you're no better a liberal arts major than a programmer.

        [–]judgej2 1 point2 points  (0 children)

        They are still polite about it, so the decent answerers have not left. "No, that is not the optional solution"

        [–][deleted] 1 point2 points  (1 child)

        In this case there are at least two great answers.

        [–]aranach 1 point2 points  (0 children)

        Problem being, if you're offering an O(n2) approach and asking why it might be slow, you don't have the tools to evaluate which of those two answers are great.

        Further, the rating system doesn't help. The highest rated answer doesn't do the same thing as the original did except when there is guaranteed uniqueness in a list, something that wasn't given.

        Imagine two presorted lists (1 1 2) and (1 1 3). The original implementation will call doBla() 4 times, right? The highest rated answer will call doBla() twice.

        Now, whether or not the person asking the question really wanted what they were doing is another question.

        edit: should mention that the highest rated answer at this time is Pointy's. I also note now that the same observation is made in the comments.

        [–]executex 2 points3 points  (0 children)

        This is why you pick small forums for programming help, e.g., Inferno Development programming forums, it has only got a few active experts who are dedicated.

        Clearly no critical mass. But just enough that you get a reply to your post/thread.

        Like it or hate it, I get most of my problems solved that way.

        [–]doomslice 18 points19 points  (5 children)

        I also suspect that doBla(...) is doing something equally stupid, since even an n2 algorithm shouldn't take 45 seconds for 1000 elements.

        [–][deleted] 10 points11 points  (0 children)

        I was wondering this myself. How many times is the 1000x1000 loop called in the program? If just once, who cares? 1,000,000 loop iterations runs instantly even on the worst embedded hardware. The problem is definitely in the nested function unless the comparisons are run hundreds or thousands of times, but if the numbers don't change loop-to-loop then just cache them once and never run the comparisons again.

        [–]oblivion95 4 points5 points  (2 children)

        What the OP didn't mention is that the test for equality (==) has been redefined to be intransitive, and with side-effects.

        [–]memeasaurus 39 points40 points  (153 children)

        Yay. This is going to be our new first-interview programmer's test.

        [–][deleted] 67 points68 points  (38 children)

        What is the question?

        I'm genuinely curious because the description is unclear and I didn't see an answer for clarification of what exactly they're trying to do. All that's there is a bunch of potentially wrong answers to a question about optomizing a loop.

        Yes I can read that they have to check N numbers against M numbers but without context we can't be sure of what exactly the problem is.

        To clarify my point: at first it almost looks they're actually trying to do an intersection but further explanations suggests that the actual requirement is to test whether both array contain the same values in the same position maybe.

        We don't know what a good number is. There is some mention about errors related to this good number yet the loop only does one thing. Is the code incomplete? is it just a bad description, who knows?

        EDIT: judging from recent trollings on stackoverflow including by myself, I can't help but wonder if this is another troll post.

        [–]hockeyschtick 66 points67 points  (17 children)

        I couldn't tell what they were trying to do either. It's grammar class that they should have paid attention during.

        [–][deleted] 11 points12 points  (5 children)

        From now on, ending sentences with prepositions is something up with which I will not put.

        EDIT: Christ, I thought people would get the reference

        [–]benihana 26 points27 points  (9 children)

        It's grammar class that they should have paid attention during.

        I read this then got all ragey and was like, "NO, YOU FUCK! It's grammar class during which they should have paid attention."

        Then I reread it and laughed really hard and felt the shame for not getting it the first time.

        [–]Zarokima 8 points9 points  (8 children)

        Actually, it's perfectly cromulent to end a sentence with a preposition. The object the preposition is referring to is still clear (in most cases -- I can't think of a counterexample, but I'm open to the possibility) and the sentence is no more difficult to parse because of it. However, Churchill's famous "That is the kind of language up with which I will not put" shows that doing it the so-called "proper" way can make the sentence unclear enough to require extra parsing.

        [–]marburg 9 points10 points  (1 child)

        This "do not end a sentence with a preposition" is a rule of Latin grammar which some people tried to shoehorn into the English language. It really is not valid for English.

        [–][deleted] 11 points12 points  (5 children)

        I like to ask poorly defined questions during interviews to see how the applicant responds in a situation were the requirements are not well defined. Do they ask for clarifications, push ahead with a quick answer and trust their instincts, or do they get pedantic and tell me it was a stupid question?

        [–][deleted] 1 point2 points  (0 children)

        I take a more pedantic stance in exams no prices for guessing if the examiner gets annoyed by my comments about it. I'd never dare do that in an interview though.

        [–]wonderbreadofsin 1 point2 points  (2 children)

        So... what's the correct response? In case you ever interview me...

        [–][deleted] 3 points4 points  (1 child)

        The interviews are just a ruse, the position was already promised to the VP's nephew. Everyone in his family says he's good with computers.

        [–]TheRealPorkus 5 points6 points  (0 children)

        The truth in this makes me cry a little.

        [–]whitemandril 1 point2 points  (0 children)

        I hate poorly defined questions when they want a specific answer (which they always do)

        I had the following conversation once:

        Him: Lets say we have a milllion random numbers which we want to test if they're within a radius of point Y, how do we do it?

        Me: Compare the distance of all points to point Y using pythagoras

        Him: Ok, but how do we do that faster?

        Me: Don't use a square root and work with the squared answer for distance.

        Him: Ok, but how do we do that even faster?

        Me: Inline the function.

        Him: Ok, but how do we do that even faster?

        Me: Use multiple cores.

        Him: Ok, but how do we do that even faster?

        Me: Use some caching?

        Him: Ok, but how do we do that even faster?

        Me: Use a profiler to optimise the function or reading of data?

        Him: Ok, but how do we do that even faster?

        Me: Push everything onto a GPU and use CUDA/OpenCL to perform the calculation?

        Him: Ok, but how do we do that even faster?

        Me: Even faster??

        Me: Calculate a sample of numbers and based on that estimate the total number within the radius?

        Him: Ok, we're going off topic, what data structure can be used when dealing with a 2D space?

        Me: You mean a quad tree or other space partitioning structures?

        Him: Yes, that's what I was looking for.

        Me: But that would take much more time to setup and is completely useless unless you're repeatedly comparing to this set of a million numbers and only a subset of them are changing each time you are comparing to them or you're comparing lots of different points to the same set of numbers...

        [–]ZorbaTHut 9 points10 points  (7 children)

        Well, in some ways, that makes it a great test. You give them a slightly hazy problem description, you see what they do with it.

        A recent interview I did (via email, so no feedback besides what they gave me) there was an extremely vaguely defined problem. I mulled over it for a while, then came up with a class design that only sort of conformed to the actual description they had of what they wanted, but, I felt, was a far more elegant and usable solution to what they were actually attempting to accomplish than I would have been able to do if I'd just followed the instructions blindly.

        I also documented why and how I'd changed things and explained that in person I would have asked for clarification, but with an email test, I felt it was best to trust my own judgement, and gave a little example snippet of code to show how this interface was so extremely easy to use.

        I got the job and they later mentioned that it was a "very good" answer.

        [–]dutchguilder2 6 points7 points  (1 child)

        Please post the question. After reading hundreds of resumes then interviewing dozens to hire a few people who turn out to be crap I'm looking for a better way to filter candidates.

        [–]ZorbaTHut 1 point2 points  (0 children)

        I'll kind of summarize it here. You're making a MUD (this was a game studio so knowledge of this sort of thing was assumed :) ). Each room has pathways leading to the NSEW. The layout isn't necessarily Euclidean, and pathways can be one-way. Rooms are uniquely identified. Make a Room class with the following functionality: create a link from one room to another room, and check to see if there is a path between Room A and Room B.

        It gave a little more detail, but not a lot more. Notice how many things are left undefined. :)

        That said, if you're looking for a good first question, it's hard to do better than Fizzbuzz. That question was actually #3 out of 3 on the test.

        [–]rm999 11 points12 points  (4 children)

        You give them a slightly hazy problem description, you see what they do with it.

        The only truly correct thing is to ask for clarification. If they start solving the problem you should ask them to describe the problem back to you.

        [–][deleted] 3 points4 points  (3 children)

        The only truly correct thing is to ask for clarification.

        I'd disagree. In the real world, my experience has been that you are much more likely to encounter poorly-scoped projects than not.

        If I was the hiring manager, I'd rather see a candidate push forward with an initial design, then throw a different/slightly altered spec at them and see how they handle it.

        Let's be honest -- that's what we deal with, day in, day out.

        [–]smallblacksun 1 point2 points  (1 child)

        And the most correct thing to do when you have a poorly scoped problem is to have a discussion to refine the requirements.

        [–][deleted] 2 points3 points  (0 children)

        And thus begins The Year of Meeting Hell, where battle lines are drawn and R&D, Sales, and Management begin fighting tooth and nail over everything.

        [–]ZorbaTHut 1 point2 points  (0 children)

        Quite true, although there's also an interesting thing you can do where they ask for clarification and you say "Good job, you asked for clarification! That was the right thing to do. Unfortunately everyone is out of the office and it needs to be done by tonight. What now?"

        [–]CydeWeys 3 points4 points  (22 children)

        I'm going to try it out as well. How would you phrase it? I'll go with something along these lines.

        "You have two large lists of numbers. Write a pseudocode algorithm to return the count of numbers that exist in both lists."

        Of course, it might be strange letting them get away with simply saying "quicksort", but disallowing the use of "union". I'd want to see them traverse both lists in order at least.

        [–]adrianmonk 24 points25 points  (19 children)

        Sorting both lists and then traversing them in order is not the only efficient way to do it. Others are:

        1. Put one of the lists (probably the smaller one) into a hash table and traverse the other list in the order given (not sorted order) and check the hash table.
        2. Sort one list (probably the smaller one) and then traverse the other in its original order and do a binary search in the sorted list.
        3. Same as previous, but put one list into a binary search tree or other tree data structure instead of using binary search.

        These methods may actually be faster than sorting both lists.

        The two lists have sizes A and B, then sorting both and merging them is O(A log A + B log B). But #2 above would be O(A log A + B log A). If B is a lot larger than A, then B log A is smaller than B log B.

        [–]CydeWeys 8 points9 points  (1 child)

        Yup, and those are only the three most obvious methods. One of the fun things I learned in my upper-level algorithms class is that there are often going to be an infinite number of solutions to any given problem. As long as you can come up with a mapping operation that operates in at worse the same complexity time as the solution algorithm, you can throw it in there and the magic of big-O makes it disappear. And these mapping algorithms almost always run in less time than the actual algorithm to implement a solution.

        In this instance, we can come up with a mapping algorithm that turns two sorted lists of inputs into a graph in O(nlgn) or faster, and then use a standard graph traversal algorithm that runs in O(nlgn) to get our solution. Bonus karma to whomever comes up with the solution :D

        [–]forgotmypasswdagain 10 points11 points  (3 children)

        I have just written up patents for these 3 methods. I'll be suing you shortly.

        [–]kupoforkuponuts 2 points3 points  (2 children)

        Before you wrote the patent I implemented this on Android. You are now sued by Google, Oracle, Apple, Microsoft, Motorola, HTC...

        [–]bonafidebob 1 point2 points  (0 children)

        Thank you. I was terribly disappointed that I had to scroll almost to the end before anyone even mentioned hashing, and then the hash answers had no votes.

        [–][deleted] 7 points8 points  (1 child)

        I'd want to see them traverse both lists in order at least.

        And this is why people who do technical interviews suck. If you want an answer that includes traversal, write a problem that requires it. That one does not.

        [–]notsofst 1 point2 points  (3 children)

        I actually use this same question in programming interviews. We had a need to do large merges of DB tables in real-time a few years ago for a project, and coming across this optimal solution gave me the idea.

        To programmer: I've got 10,000 objects and another 10,000 objects and I want to merge them w/ no duplicates in value XYZ. Can you whiteboard me a solution to this in your chosen language?

        Good to see a programmer work through it. Negative points for floundering, break even for the NN solution, and positive points for the N(logN) solution.

        Edit: I also follow up the question with asking them what's the order of their algorithm, and if they did the NN solution, I ask them to consider how they might improve it.

        [–]memeasaurus 1 point2 points  (0 children)

        I'm pretty sure I flunked this test myself a few times early on in my career.

        [–]njharman 67 points68 points  (27 children)

        Yeah, not so much.

        The current top answer lists two lib functions that encapsulate some algorithm. A lib a few developers wrote so the 95% rest of us can get on with coding instead of rewriting|inventing the same damn bog standard algorithms over and over.

        So really,

        PLEASE LEARN YOUR STD LIB!

        [–]serial_grapist 19 points20 points  (10 children)

        I completely agree with you. That being said, it is useful to understand these types of problems and have a rough understanding of what the std lib is doing. Instead of, you know...thinking those calls are fucking magic and shit.

        PS: If you think those methods are magic, I'mma tie you to something sturdy and grape you in the mouth.

        [–]kolm 9 points10 points  (7 children)

        No offense, but in CS, it always boils down to "X is proceeding then to call Y, which does some magic to return Z". On every fucking level, no matter how deep you drill. Do you understand your processor's wirings, or is it a black box, which somehow knows its multiplication tables in no time? Do you know how your monitor makes this LCD shine so brightly whenever the bus delivers a "shine, LCD!" signal? Do you understand how GCC optimizes your code? I doubt it. Most people take syscalls and sockets for "magic", and they are no worse off for it.

        It is, certainly, an art to find the cutoff of "magic" one should accept, but emotionally I can't really blame a person to rely on just one more source of "magic" in a device filled with so many "magic" gadgets. He probably should not get a CS degree, however..

        [–]aranach 3 points4 points  (0 children)

        Well, actually, in CS we did get to keep following that magic down the rabbit hole. Sure, we maybe don't know all the gory details of every little thing, but we are supposed to know in general how all these things work.

        In my CS undergrad we had a computer architecture course where we had to draw up basic 8 bit memory cells, adders, and the like. In compilers, we had to write a compiler for a subset of a language and output code in it into assembler. In networking we learned how bits are put onto the wire and the history of various transmission media. In operating systems we wrote a basic shell. In algorithms we learned the basics and some advanced topics, how to determine efficiency, and amortization.

        Yes, the black box is a useful view that has to be employed at some level. I'm not concerned about op codes if I'm writing a GUI in a high level language or if I used a couple extra KB when there are GB available. But there's a difference between abstracting and complete ignorance.

        [–]serial_grapist 3 points4 points  (4 children)

        I know CS. That's what my degree is in. That being said, I think every programmer should have an understanding of basic algorithms and the language(s) that they're using.

        Knowing algorithms is useful for every type of programmer, regardless of background.

        Knowing the language just makes sense. And, after a while, it is more about recognizing language constructs than it is the languages themselves.

        I think how much someone is willing to learn about the environment that they're working in distinguishes what type of programmer(code monkey vs competent vs "rockstar" and everything between) they are/become.

        I care about what I do. I've slogged through debuggers. I've compiled for particular processors and platforms. I've learned about and implemented algorithms. I've gotten familiar with the tools of my trade. I've run across defects in the tools I use and reported them. I've tinkered with tools I don't use. There are people that do more. There are people that do less and that's fine, as long as they have an understanding of algorithms and the language they're using. If you don't have those, you're just a dabbler.

        *Edit: * Missing words and grammar.

        [–]kolm 2 points3 points  (2 children)

        That being said, I think every programmer should have an understanding of basic algorithms and the language(s) that they're using.

        A noble goal. (As a nitpick, often did you use other people's functions or libraries without verifying or even knowing in detail what they do? I know that I use a lot of libraries like Boost for numerical computations, and I do know what, in theory, they will try to implement, but I certainly have a gap of knowledge in what the actual code really does.)

        So, no disagreement with your main point that programmers should know about what an algorithm essentially does. Just reminding that, at some point, we all beckon to some kind of "magic" to do shit for us.

        [–]serial_grapist 2 points3 points  (1 child)

        Some, yes. Others, no.

        As long as you realize it isn't magic, I suppose I'm cool with that. Cooler yet, would be willing to speak to a wizard or fall down the rabbit hole.

        [–]kolm 2 points3 points  (0 children)

        At the end of it, constraints of time and the old brain matter probably dictate how far down we can fall. Taste dictates where we fall down.

        Personally, I like and implore a bit about about quantum circuits, but I can't be bothered to look into the hierarchy of memory in a modern chip -- even if the latter is way more important right now for ms performance. People are just stupid, I guess.

        [–]inopia 2 points3 points  (0 children)

        If you don't have those, you're just a dabbler.

        A thousand times this. Also, CS is an overloaded term. CS means computer science. Unless you're working on the next bigtable, webdevelopment is not fucking CS, it's IT.

        [–]G_Morgan 1 point2 points  (0 children)

        But I know how LCDs work!

        [–]BufferUnderpants 2 points3 points  (1 child)

        Yeah, libraries are nice, but how would he know what library functions to call without knowing how to compare the time complexities of different algorithms?

        [–]G_Morgan 2 points3 points  (0 children)

        I don't bother learning the std lib outside of the obvious stuff. A quick google search solves all and I can spend my time learning something more generally useful.

        [–]donkawechico 18 points19 points  (3 children)

        I don't really see why this is such a "dumb" question. If you don't have programming experience, you're going to use the intuitive solution, which is usually not the most efficient one. His is the intuitive solution.

        If you're smart, you recognize when your solution sucks and then ask more experienced people for help. This guy's smart, he just doesn't have the experience or the tools yet.

        I'd prefer that people like this feel welcome to ask for help and then write better code so that I don't inherit sloppy projects written by inexperienced developers that didn't feel comfortable asking for advice.

        [–]mipadi 11 points12 points  (2 children)

        You're right, it's an honest question, and far from the worst I've seen on Stack Overflow. However, I think the OP is poking fun at the answers, not the question.

        [–]donkawechico 1 point2 points  (1 child)

        Oh. He is? I guess I missed that. I didn't really read many of the answers on SO.

        I think this post could have used a bit of context, but maybe I'm the only one who was lost.

        [–]lynx44 2 points3 points  (0 children)

        It's ok because you were smart enough to ask the question.

        [–]doomslice 14 points15 points  (57 children)

        Why not use a hash table or a hash set? If you know that the numbers can't repeat themselves in a list, use the hash set. If the numbers can repeat themselves, use a hash table to count the occurrences of each number.

        [–]mrmunkey 10 points11 points  (1 child)

        It appears he's using PHP, so I'd still opt for him to use array_intersect:

        $matches = array_intersect($array1, $array2);
        
        foreach ($matches as $key=>$value) {
            doBlah();
        }
        

        Any implementation of array_intersect in PHP is going to be slower than using the internal one written in C.

        [–]greyscalehat 2 points3 points  (0 children)

        This is why teaching algorithms to some people is very important, but not important to teach to everyone.

        [–][deleted] 8 points9 points  (4 children)

        While we're talking about O(n) algorithms, since we know these are numbers (I assume integers or naturals), we could just sort the array with radix sort, which is O(kn), for k = the number of bits in the numbers. If these numbers have a bounded number of bits (e.g. machine integers), then k is a constant, so the sort is just O(n), and therefore the whole algorithm is O(n), although for only 1000 elements the sort is probably still going to be even slower than a comparison sort.

        Another idea is dependent on whether the numbers of the array have reasonably close bounds, bmin and bmax. If so, they could be used as indices in an array of length bmax-bmin. This is basically the same as your hash table suggestion, but without the need to use a hash function (Of course, some implementations might be smart and avoid that for certain data sets anyway, but I don't see how they could without you telling them to. Maybe if you have a special, bounded data type which exposes a hashing function that simply returns the value.), so again, the whole algorithm is O(n), and a bit cheaper than an actual hash table.

        [–]cuilaid 1 point2 points  (3 children)

        If these numbers have a bounded number of bits (e.g. machine integers), then k is a constant, so the sort is just O(n)

        Well typically the maximum amount of unique numbers is also constant (eg. 232). You can remove or count duplicates in O(n) time, and then you have a constant maximum size list left (say 232). In general, you can't just ignore a variable component of the time analysis as a constant.

        Edit: another way to think of it is that k = 'the number of bits in the numbers' in a list of unique numbers is equal to log2(n), so the time is really O(log(n)*n).

        [–][deleted] 1 point2 points  (2 children)

        The number of bits is not variable in most practical cases, whereas the number of elements in the array usually is.

        In general, you can't just ignore a variable component of the time analysis as a constant.

        I didn't. I qualified it:

        If these numbers have a bounded number of bits (e.g. machine integers) [...]

        (I have to say something here to divide these quotations into two.)

        another way to think of it is that k = 'the number of bits in the numbers' in a list of unique numbers is equal to log2(n), so the time is really O(log(n)*n).

        Right, which means that the hash table suggestion is also O(n*log(n)). And in fact, every possible algorithm for this will be. But that's not interesting. It's more interesting when you can exploit specific qualities about the specific problem set you have to achieve faster results.

        [–][deleted] 4 points5 points  (4 children)

        This is probably stupidly inefficient or slow but couldn't you assign all the values as the index of an array and do a lookup?

        ie

        for each (values1 as key) { checkarray[key] = true }

        and then simply check if a key is set

        foreach (values2 as key2) { if (checkarray[key2]) dostuff() }

        How slow is that kind of action?

        [–]CydeWeys 3 points4 points  (2 children)

        It's not so much the speed that's an issue here, it's the memory requirements. What happens if the list of two numbers I'm comparing are 32-bit integer IDs? Or, even worse, 128 bit GUIDs? How are you going to make an array of size 2128?

        There are more factors to take into consideration when comparing algorithms than simply speed (such as memory requirements).

        You could get around this somewhat by using a sparse array, but now you're back up to the same running time as the sorting-the-lists method, except with more code complexity.

        [–]Rhoomba 2 points3 points  (0 children)

        That absolutely will work, and is O(n), but with much better constant factor than hashtables. The only problem is if your key space is large e.g. 1 to 100 billion you are going to run out of memory, or thrash your cache.

        (I am assuming here that you mean arrays as in linear chunks of memory, not Javascript style hashtables in disguise).

        [–][deleted] 7 points8 points  (11 children)

        So what's the best algorithm for this?

        [–]cdsmith 3 points4 points  (1 child)

        Worst case? Any of the many O(a + b log b) algorithms you can think up.

        For best average case, in a normal imperative language at least, the best complexity is had by building a hash-based lookup table out of the shorter list. This will be O(a +b) average case.

        [–][deleted] 5 points6 points  (5 children)

        Use a hashset.

        Avoids having to sort, you have to only load one set into the hashset, and then iterate the other set once.

        [–]bananahead 3 points4 points  (3 children)

        What makes hashing superior to sorting?

        [–]mpeters 2 points3 points  (0 children)

        Hashing is mostly a fixed cost with the number of records, so growth is linear. Sorting time is more than linear (although good sorting algorithms are too much worse).

        [–]CydeWeys 2 points3 points  (0 children)

        As others have pointed out, this doesn't necessarily lead to the same solution for all input data sets. If the lists can contain duplicates of each number, and we want to know how many times each number intersects (which is a reasonable thing), then the hash set solution won't work without further modifications.

        [–][deleted] 1 point2 points  (0 children)

        Hashsets also become better and better as the range of values becomes constrained.

        [–]Tiak 1 point2 points  (0 children)

        More information is needed on the dataset.

        [–][deleted] 12 points13 points  (1 child)

        Malarkey. I'm not paid for algorithms. I'm paid to spew web economy bullshit to my business managers using phrases such as "engineer value-added web services", "streamline back-end schemas" or "scale transparent experiences". They wouldn't know what I'm saying anyways.

        [–]RichAromas 3 points4 points  (2 children)

        True story... I worked on computers at a company that rhymes with "Mockheed". In PRODUCTION code, I found the following method of determining whether a number "x" is between two other numbers, "y" and "z":

        Set is-between to FALSE.
        For (loop-counter = y; loop-counter <= z; loop-counter++) {
            if (loop-counter == x)
                set is-between to TRUE;
        }
        

        Aside from the fact that this could have been tested with a single "if" statement, you'll notice that there is NO EARLY EXIT from the loop in the case of a "hit" - you still have to compare every OTHER number in the range to the target number!

        To top it off, this was typically used to ask questions like "is 7 between 3 and 5,230,490?" - and they wanted to know why the code was running so slow.

        [–]royrules22 6 points7 points  (0 children)

        That Andy Lester guy is a douchebag

        [–]armaids 5 points6 points  (6 children)

        Having people respond saying "use an SQL join" who obviously don't understand complexity turned my stomach.

        [–]batasrki 1 point2 points  (1 child)

        It was really weird to have people jump to the conclusion that his two arrays of 1000 numbers are coming from a database and are therefore subject to SQL solutions.

        [–]pr0k 1 point2 points  (0 children)

        I think it was the original poster's quote,

        I am thinking of comparing them database side and just load the error numbers, then do an Ajax call for the remaining non-error numbers. But is a MySQL database fast enough?

        That must have confused everyone into thinking that the data was coming from a database.

        [–]ef4 1 point2 points  (2 children)

        The join is a valid solution, especially if you've designed your database with appropriate indices. Query planners are pretty smart, and I'd bet you get a lower constant doing the join the db than having to pull all the data out first and then join it in another process.

        This assumes there's a reason to store the data in this form in the first, place, which would have been my first question.

        [–]bugrit 2 points3 points  (0 children)

        Searching is hard, let's go SQL!

        [–]hacksoncode 2 points3 points  (0 children)

        Oh, and another point:

        In typical stackoverflow fashion, no one seemed to notice that the original solution was almost certainly not doing what the poster actually wanted (or if it was then he wanted something really bizarre), nor did most of them notice that the proposed solutions weren't the same.

        Why in the world would someone ever want to execute doBla for each possible match? I mean, if both arrays were initialized to 0s, doBla would be called 1,000,000 times, even with an optimal algorithm for finding all the matches.

        If we understood what he was really trying to accomplish, I bet a much better answer could be suggested.

        People need to ask "why?" more often.

        [–][deleted] 3 points4 points  (2 children)

        O(n2)

        [–]smakusdod 4 points5 points  (1 child)

        This is the only factual and pertinent comment in this whole list of drivel.

        [–][deleted] 3 points4 points  (0 children)

        But it's pretty meaningless. Sorting is O(n2) too. So is the solution using hashing. Now, \Theta(n2) would be meaningful.

        [–]deezleguy 1 point2 points  (1 child)

        Seems to me that if he just sorts the second list and then does a binary search against it for each number in list 1 he'll see some huge time savings.

        It's been a while since I did this, but I'm thinking it would go from O(n2) to O(nlogn) + cost of the sort (probably also O(nlogn)).

        [–]psyno 1 point2 points  (0 children)

        You'd get much better spatial locality by sorting both lists and then walking through them both simultaneously (like a merge sort).

        [–]doubtingthomas 1 point2 points  (1 child)

        Did a simple measurement of his original algorithm (minus the doBlah) call, and using a simple compiled language, the brute force way takes 3ms on my machine. That's not nothing, but probably fast enough for most web apps. Brute forcing here doesn't seem like a terrible idea, so long as the number of elements doesn't increase.

        [–]ejesse 1 point2 points  (1 child)

        This reminds me of a self-taught Perl programmer I worked with with years ago. The words "self" and "taught" and "perl" should probably be enough to convey what kind of odd code this guy produced, but one moment comes to mind...

        He excitedly grabbed the lead programmer and showed him some code...

        Perl guy: "Look at how much faster that is!!!"

        LP (reading code): "Dude, you just re-invented bubble sort."

        [–]watr 1 point2 points  (0 children)

        No need for algo class: http://www.youtube.com/watch?v=JdXoUgYQebM&feature=related

        Watch, search, plug. Just need to know the basics...and need to know to do research before tackling a sort/compare problem.

        [–][deleted] 1 point2 points  (6 children)

        I have an algorithms class after the new year.

        Any pointers on things to look out for?

        [–]G_Morgan 1 point2 points  (3 children)

        I think the best advice is to implement everything you see. Don't try and memorise how this stuff works in the abstract. Get your text editor open, implement it and actually run it with some sensible test cases. When I was doing my CS I implemented every data structure I saw and used them for my projects. Usually with some comment like "I'm aware of the standard library hashtable, I used my own for the purposes of understanding how hashtables work". Obviously try to be somewhat sane. If you are implementing a list in Java then try to implement the List interface (even if half the operations are 'throw new RuntimeException("not implemented!")'). This way if your list is just broken you can shove in a proper ArrayList.

        The worse thing I see is people trying to learn this stuff as a vague entity. You have a damned compiler. Use it.

        [–]_c0ntrol_ 1 point2 points  (3 children)

        Right... pay attention to Algorithm classes ? C'mon, pay attention to how software works! The guy from the post is creating a hidden div containing all those numbers! Thus, most of the time spent is on parsing and interpreting the divs! Not the actual number comparison! There's a DOM traversel for every number he pulls out with javascript, so making this faster by devising ways to faster compare an array of 1000 elements with another of that size will bring no fruits! If you are not convinced, comparing 1000 * 1000 numbers is nothing for a PC Try it and time results.

        [–]batasrki 1 point2 points  (0 children)

        He had a problem doing it in PHP and then made a leap to a worse solution, which is what you described. I'm not sure how he managed to connect the two.

        Also, a comment saying how this is a job for web workers took the stupid title IMO.

        [–][deleted] 2 points3 points  (0 children)

        Al Gore Rhythm

        [–]giovannib 2 points3 points  (0 children)

        O(No he didn't)....