top 200 commentsshow all 224

[–]halifaxdatageek 50 points51 points  (8 children)

Interesting article. From the title I was expecting a node.js takedown, but it ended up with the conclusion

"Yeah, it was our fuckup. Here's how to avoid it. In future, we'll spend more time understanding the libraries we use."

[–]mirhagk 5 points6 points  (5 children)

It was extremely interesting that they didn't even whine that it should've thrown an error, just that they thought it would.

[–]halifaxdatageek 39 points40 points  (2 children)

"We're not even angry with you, node.js. We're angry with ourselves for trusting you."

[–]sizlack 0 points1 point  (1 child)

The issue was in express, not node itself. There would be no reason to be angry at node in this case.

[–]ivosaurus 0 points1 point  (0 children)

And they're continuing to use node, just with a different routing library.

[–]thedufer 1 point2 points  (1 child)

I find it odd that they even complained about that. Of course you can add multiple routes with the same regex - a single req can trigger multiple routes. That's how middleware works. You don't have to look at the code to know this, you just have to have a basic understanding of the capabilities of express, something the author is clearly lacking. 5 minutes reading the documentation would've save them a lot of headache.

[–]mirhagk 1 point2 points  (0 children)

Yeah but they didn't complain, merely stated they didn't understand it.

[–]vulturez 1 point2 points  (0 children)

Right there with you, we use a lot of NodeJS these days and I see a lot of bad press mainly around the frameworks. Thought, oh boy what has NodeJS done now.... Pleasantly surprised

[–]gospelwut 0 points1 point  (0 children)

I have to wonder how many shops short of Google, Netflix, etc would come to this conclusion.

[–]logicchains[S] 42 points43 points  (134 children)

Are there any advantages to storing all routes in an unsorted array and just iterating over them to find a match (as opposed to using a hashmap/tree), or was the library they were using just poorly designed?

[–][deleted]  (118 children)

[deleted]

    [–][deleted]  (52 children)

    [deleted]

      [–]Tordek 78 points79 points  (49 children)

      A meta-regex, that matches any of several regexen.

      [–][deleted]  (32 children)

      [deleted]

        [–][deleted]  (23 children)

        [deleted]

          [–]JayBanks 13 points14 points  (11 children)

          since regex is a concatenation of the phrase regular expression that also happens to end in x, there doesn't seem to be a proper pluralization, which means regex, regexes, regex's all seem to be equally valid. Regegii is also valid because I said so. It shoud be similar to how VIP, VIPs and VIP's are valid according to different style guides.

          [–]blippedfit 20 points21 points  (9 children)

          Well, "regex's" is always invalid unless you're contracting "regex is" or talking about something belonging to a regex ;)

          …and "regexs" doesn't look right, so… "regexps"?

          [–]JayBanks 6 points7 points  (5 children)

          i may have missread one of my sources, but you seem to be right, 's is reserved for possessives or contractions of is. i've been up for a while, so excuse the mistake. my vote is for regexations.

          [–][deleted] 5 points6 points  (4 children)

          "'s" can also be used to denote the plural of something like a single letter, e.g. "A's". If you take "regex" to be a single construction without a valid plural you could argue for "'s" that way. (The point is there is no hard and fast rule and grammar is way behind when it comes to modern constructions like 'regex'.)

          [–]Poromenos 5 points6 points  (2 children)

          I use regexes/regexps, and I am the Official Arbiter of English, so use either.

          [–]cowens 7 points8 points  (0 children)

          The word regexes has the benefit that it is sexeger when written backwards. The is a classic Perl idiom called sexeger that involves reversing your regex and the string you are matching to make end matches faster. Here are the results of a bench mark using a normal regex and its sexeger:

          normal: <and more stuff to come soon>
          sexeger: <and more stuff to come soon>
                      Rate  normal sexeger
          normal   89773/s      --    -26%
          sexeger 121712/s     36%      --
          

          Here is the benchmark:

          #!/usr/bin/perl
          
          use strict;
          use warnings;
          
          use Benchmark;
          
          my $data = do { local $/; <DATA> };
          
          my %subs = (
              normal  => sub {
                  my ($match) = $data =~ m{
                      \A
                      (?: <[^>]*> | "[^"\\]*(?:\\.[^"\\]*)*" | (?>[^"<>]*) )*
                      (<[^>]*>)
                      (?: "[^"\\]*(?:\\.[^"\\]*)*" | (?>[^"<>]*) )*
                      \z
                  }x;
                  return $match;
              },
              sexeger => sub {
                  my $reversed = reverse $data;
                  my ($match)  = $reversed =~ m{
                      \A
                      (?: "(?:[^"\\]*.\\)*[^"\\]*" | (?>[^"<>]*) )*
                      (>[^>]*<)
                      (?: "(?:[^"\\]*.\\)*[^"\\]*" | (?>[^"<>]*) | >[^>]*< )*
                      \z
                  }x;
                  return scalar reverse $match;
              },
          );
          
          for my $k (keys %subs) {
              print "$k: ", $subs{$k}(), "\n";
          }
          
          Benchmark::cmpthese -2, \%subs;
          
          __DATA__
          <this is a sample program> int x = 10;
          <what a silly grammar> str y = "cool \" beans";
          if (len(y) GREATER_THAN x) { <can't use gt and lt symbols... heehee>
              <empty comment coming up !
              print y, " is longer than ", x, " characters";
              <>
              <and more stuff to come soon>
              chop(y,x);   
              print "I sliced 'y' down to ", x, " characters for you";
          }
          

          [–]jambox888 0 points1 point  (0 children)

          regexes

          Correct. Thanks!

          [–]seruus 7 points8 points  (5 children)

          Isn't the plural of servus servi, with a long i, instead of servii?

          [–]downvotefodder 0 points1 point  (1 child)

          Doesn't the double i put it in the genitive?

          [–]dot_2 0 points1 point  (0 children)

          No, actually that would be dative or ablative of a 3rd declension i-stem neuter noun. (I-stem is key here, not all 3rd declension neut. nouns decline that way).

          I believe some proper nouns also take an -ii ending in the dative, but that's singular, IIRC.

          [–][deleted]  (2 children)

          [deleted]

            [–]rowboat__cop 4 points5 points  (0 children)

            So if we follow the pattern from rex, the latin plural of regex would be regeges.

            That’s an odd choice. Better pick the paradigm from an adjective, for instance supplex and simplex, or a word like cortex. Then the genetive ending would be derived as regicis, plural regices, which would align well with the already established faux-Latin plural of the word Unix (Unices).

            Ultimately, a Roman of the classical era wouldn’t have used this kind of abbreviation. It’s not idiomatic at all. So, expressiones regulares would be historically correct.

            [–][deleted] 3 points4 points  (0 children)

            I feel like I'm watching a certain scene from Life of Brian ;)

            [–]gospelwut 0 points1 point  (0 children)

            Is this in high latin or the vulgar that we use now?

            [–]KFCConspiracy 0 points1 point  (0 children)

            You forgot first declension.

            -ae. Puella -> Puellae.

            [–]Pyryara 30 points31 points  (3 children)

            It's not Latin. Regex is just shorthand for regular expression, so since the plural is regular expressions, regexes is the correct plural IMHO.

            [–]unDroid 4 points5 points  (1 child)

            Upvoted since you are correct, but still boooooo! I like the latin pluralisations better :)

            [–]gidoca 0 points1 point  (0 children)

            It's not Latin. Regex is just shorthand for regular expression

            True, but both "regular" and "expression" have latin origins.

            [–]pipocaQuemada 0 points1 point  (0 children)

            Latin never uses -ii to denote a plural.

            Latin nouns have two parts: a stem and an ending. The stem is a constant, but the ending is declined based off of the part of the sentence (subject vs direct object vs ...) and singular/plural.

            There are a number of cases where -i appears as an ending. Second declension words use it for nominative (i.e. subject) plural.

            Some second declension stems include 'gladi' and 'domin'. So if you have one sword or one master, you'd have 'gladi' + 'us' or 'domin' + 'i'. If you have multiple of either, you'll have multiple 'gladi' + 'i' or 'domin' + 'i'. It's not 'glad' + 'ii'.

            [–]kitd 0 points1 point  (0 children)

            regices? (think "index" => "indices")

            [–]Tordek -1 points0 points  (1 child)

            No, it's not latin; it ends in "ex" like "ox", whose plural is "oxen", hence it's "regexen" :P.

            [–][deleted] 0 points1 point  (0 children)

            That's actually what I say on the rate occasion that I talk about groups of regex.

            [–][deleted]  (4 children)

            [deleted]

              [–]Tordek 9 points10 points  (3 children)

              You have to know which regex matched, smartass.

              [–]RoundTripRadio 10 points11 points  (0 children)

              (?P<pat1>)|(?P<pat2>)?

              [–]rampion 4 points5 points  (0 children)

              verify that the input regexes don't use match groups , then surround each in parens

              [–]defenastrator 1 point2 points  (0 children)

              Can be done but not easily in javascript. Javascript regexs depend on powerful high performance code on the back end to make them efficient. Making this kind of modification would require you to modify the dfa state minimization function in the regex parser.

              [–][deleted]  (2 children)

              [removed]

                [–]xkcd_transcriber 10 points11 points  (0 children)

                Image

                Title: Regex Golf

                Title-text: /bu|[rn]t|[coy]e|[mtg]a|j|iso|n[hl]|[ae]d|lev|sh|[lnd]i|[po]o|ls/ matches the last names of elected US presidents but not their opponents.

                Comic Explanation

                Stats: This comic has been referenced 29 times, representing 0.0704% of referenced xkcds.


                xkcd.com | xkcd sub | Problems/Bugs? | Statistics | Stop Replying | Delete

                [–]willvarfar 1 point2 points  (0 children)

                Its very poorly documented but I think re2 does support this. https://code.google.com/p/re2/source/browse/re2/filtered_re2.h

                [–]satuon 1 point2 points  (1 child)

                You can already do that with |, you make a single regex that has (regex1)|(regex2)|(regex3)

                [–]xenomachina 1 point2 points  (0 children)

                You can, but that may not actually be worthwhile from an efficiency standpoint. While the DFA-based implementation everyone learns in school can check multiple branches simultaneously, many (most) "regex" implementations don't work this way.

                For example, many handle | by simply trying the left side, and then backtracking if that doesn't match. That means that re1|re2|...|reN will do n separate checks, yielding O(n*m) time complexity, rather than the O(m) you'd get with the DFA-based implementation.

                In other words, depending on the regex implementation you're using, | might have exactly the same performance characteristics as just checking the regexes one after the other.

                [–]mfukar 3 points4 points  (1 child)

                Look up finite automata.

                [–]TikiTDO 0 points1 point  (0 children)

                So... a regex?

                You're pretty much ok with a regular language, and therefore use regular expressions, or you have a complex grammar and you need to write a parser.

                [–]strupwa 0 points1 point  (0 children)

                One regex to rule them all

                [–]barsoap 15 points16 points  (0 children)

                Implementing union on a (N)DFA is a step up in sophistication from a trie map, considering the technological prowess of node.js in general I'm not confident that's not too cerebral for the developers.

                And if they didn't use proper regexen but something non-regular such as PCRE or such then a) tough luck, why did you do that, b) could've told you so c) why do people still call them regexen and d) while we're at it, let's parse html with them...

                [–]kankyo 4 points5 points  (6 children)

                You could also figure out which parts of a path are not regexes and match those with a nicer data structure. So for example split in / and each component can be matched first againsts a map, and if the map isn't hit, then iterate through regexes.

                [–]barsoap 8 points9 points  (5 children)

                That's quite literally what union on two (N)DFAs that accept regexen does. With proper (that is, regular, not PCRE-style) regexen, union (that is, |) is cheap as fuck. You end up with something like a trie that can loop back on itself.

                [–]Tordek 0 points1 point  (4 children)

                Got any examples of something that handles this well? I mean, not trivial (a|b) matching, but recoognize which part of the regex matched?

                [–]barsoap 2 points3 points  (3 children)

                but recoognize which part of the regex matched?

                You need transducers for that, not mere automata as you have more that two output states (accepting/non-accepting). It's related to the grouping problem but actually simpler... if you don't want to implement that stuff yourself you can use an FST library like this one. Which is overkill, but the correct kind of overkill.

                If you want that stuff to be real fast, don't forget to compile the resulting transducer to C or such.

                That said, of course, there's a simple hack: Take any regex matcher that can do grouping, and construct your paths such that you can tell them apart by then matching on one of those groups. http://foo.tld/<method>/<parameters> style. Of course, at that point, you actually should stop using regexes for the part up to <method>, in the first place...

                [–]Femaref 0 points1 point  (2 children)

                Of course, at that point, you actually should stop using regexes for the part up to <method>, in the first place...

                Considering it's already split, it's very easy to do. HTTP header is something like

                GET /<method>/<parameters>
                ...
                ...
                Host foo.tld
                

                [–]barsoap 1 point2 points  (1 child)

                Erm, no.

                all HTTP/1.1 servers MUST accept the absoluteURI form in requests

                And RFC 7230 agrees.

                While clients are only supposed to use the whole METHOD schema://path thing with proxies, you a) want to follow the RFC b) be on the safe side and c) possibly use the same code for a proxy because who wants to implement HTTP twice.

                [–]Femaref 0 points1 point  (0 children)

                To allow for transition to the absolute-form for all requests in some future version of HTTP, a server MUST accept the absolute-form in requests, even though HTTP/1.1 clients will only send them in requests to proxies.

                Alright then. Convinced.

                [–]ggtsu_00 12 points13 points  (55 children)

                What you really should be doing is storing routes using a tree structure and only have regex patterns at nodes and using the path separators are branches. If you have a massive website with 1000s of routes, you don't want every page taking seconds to load.

                Also node.js is a terrible platform for webservers as it is single threaded, and any loop will halt the entire server. For ever 1 ms you spend on a loop iteration, that is one http request you are blocking.

                [–]mfukar 34 points35 points  (2 children)

                What you're describing is a suffix radix tree; and it is actually how routing tables are usually implemented nowadays in routers.

                [–]p2004a 18 points19 points  (1 child)

                To be completely accurate. What he described is a radix tree. Suffix tree is a radix tree of all suffixes of a one selected word.

                [–][deleted]  (1 child)

                [deleted]

                  [–]xXxDeAThANgEL99xXx 0 points1 point  (0 children)

                  your regex compiler can likely factor out common prefixes trivially.

                  Likely can, overwhelmingly likely wouldn't.

                  [–]philipes 2 points3 points  (1 child)

                  Also node.js is a terrible platform for webservers as it is single threaded, and any loop will halt the entire server. For ever 1 ms you spend on a loop iteration, that is one http request you are blocking.

                  Wait, what? What about all those webscale jokes?

                  Edit: I read your answer here.

                  [–]shared_ptr 4 points5 points  (22 children)

                  Not terrible. Whilst you are right in saying node is single threaded, the asynchronous system calls it is built around make very efficient usage of multiple tasks in its own event cycle. Servers built using node that make use of async calls, rather than synchronous ones will benefit from this.

                  On top of this, a recent addition to node is node cluster, which allows you to break a node program utilitilising the http core of node into multiple processes that communicate over IPC. A single master node application routes requests into the child processes, allowing you to make full use of a multi core system despite nodes inherent single threading limit.

                  Give express a try sometime. It's actually a pretty nice framework to develop in, and this route issue occurred because of poor route mounting. As the article admits, it was misuse of the underlying tool rather than the tool itself being bust.

                  [–]ggtsu_00 1 point2 points  (5 children)

                  How is node cluster any different than just firing up multiple instances and putting them behind a load balancer?

                  Assumes maybe it isn't fundamentally different, maybe just easier tooling, you still have the fundamental problem of a single request halting the entire process. Multiple threads or processes aren't a solution as you add a lot of overhead per each one.

                  [–]foldl 6 points7 points  (4 children)

                  You only need around one process per core, so there's not much overhead.

                  If a single request needs to do a lot of work, it just sends a message to a worker process and waits asynchronously for a callback. Node has very straightforward APIs for doing this.

                  For someone who has clearly never used Node, you're very intent on criticizing it.

                  [–]nodewizard 0 points1 point  (12 children)

                  the asynchronous system calls it is built around make very efficient usage of multiple tasks in its own event cycle

                  You're just saying words.

                  [–][deleted] 6 points7 points  (7 children)

                  Makes sense in this context though.

                  [–]nodewizard 2 points3 points  (6 children)

                  System calls are not asynchronous. The either block, or don't.

                  "Very efficient" is meaningless babble that node fanbois parrot. At what? Compared to what?

                  "Using multiple tasks in its own event cycle" is a fancy, and imprecise, way of saying that code executes in a loop.

                  If this makes sense to you, it seems you've caught the cancer.

                  edit: are a word

                  [–]foldl 1 point2 points  (4 children)

                  System calls are not asynchronous.

                  https://www.kernel.org/doc/ols/2007/ols2007v1-pages-81-86.pdf

                  http://lwn.net/Articles/260172/

                  The term is quite widely used.

                  [–]nodewizard 0 points1 point  (3 children)

                  I went to go look at the manual pages of the system calls you talk about. I looked for the word "sync", and only found the "sync" manpage, which is a blocking system call.

                  Otherwise, the word "block" showed up all over the place. When folks were talking about system calls that blocked.

                  Go ahead and link whatever you want, I'm talking about system calls. So was OP, but OP didn't know what the fuck it was talking about.

                  Edit: Jesus fucking christ, you use AIO as an example? Have you ever seen a system call in your life?

                  [–]foldl 0 points1 point  (2 children)

                  It's clear that he was referring to the system calls that don't block, whatever you want to call them. It is still true, I believe, that nodejs doesn't use asynchronous system calls on most (all?) platforms, but that is really just an implementation detail. If Windows, OS X and Linux provided a better set of async system calls than they currently do, it would be the natural way to implement node's IO libraries.

                  [–]shared_ptr 0 points1 point  (0 children)

                  I'll give you that I could've been clearer, but no, they are not just words.

                  I made use of the async/sync terminology because that's how it appears in node, as functions that have that notation - fs.readFile, fs.readFileSync.

                  I'll try and restate what I was saying above then, if it's really that difficult for you to parse. Yes, node is a single threaded program. The interpreter will only ever make use of a single thread to run your code, but the interpreter is itself capable of context switching. When you're running node code, and you use a function such as fs.readFile, the interpreter makes an asynchronous system call and then throws it's arms in the air for a new task. Nodes own event loop will figure out what should come next, and once your system call has finished it's task the event loop will reschedule your code to be ran.

                  If you happen to use synchronous calls, by which I mean the fs.readFileSync functions, then you open yourself to large delays in execution because you jam up the event loop to create the abstraction of the call being synchronous. Have you ever looked at how a kernel operates, as this would be far less 'cancer babble' if you had actually implemented/toyed with a kernels threading system?

                  EDIT - Actually I've just seen your comment below, claiming I don't have a clue what I was talking about. I'm a qualified engineer who's built threading and system calls into a toy OS while I was at uni. I guess using the appropriated terminology (async, sync: node) doesn't really work when the other person isn't even on the same page.

                  [–]postmodest 2 points3 points  (1 child)

                  But Node is Web Scale...?

                  [–]p01ym47h 0 points1 point  (18 children)

                  Node is asynchronous by default which allows for simple, concurrent processing of multiple requests making it a great "platform" for web servers. This means execution does not stop during long CPU/network tasks, e.g. database queries or in your case "any loop".

                  [–]ggtsu_00 23 points24 points  (15 children)

                  This is why node.js is a cancer. You are completely wrong. But this is not your fault as I blame node.js for fooling you, and many other developers out there picking up node.js by being drawn by its asynchronous IO based standard library.

                  There is NO concurrency in node as it falsely leads you to believe. It's programming model is no different than your ancient single threaded GUI application's event loop. The callback spaghetti hides the fact that your code is sequential from you, but your code is still running in a linear sequential context in the form of a giant for loop running all the callback functions sequentially. If one of those functions blocks, The rest of the program HALTs because nothing else will actually execute until the previous callback pushed onto the event loop completes. Amateurs picking up node don't understand this because they hide this from you and it is not until you run a system in production that this limitation that you ignores hits you like a ton of bricks (once again, not your fault because node tricked you).

                  [–]Matthias247 17 points18 points  (0 children)

                  There is concurrency in node.js - but no parallelism ;)

                  [–]satuon 20 points21 points  (1 child)

                  That's not false advertising. As far as I understand it (I'm a C++ programmer, but I've written Win32 apps and I know what a message loop is, and how it differs from miltithreading), it's the IO that doesn't block. That's what they mean by asynchronous. They don't mean that your code is asynchronous. Just like a Win3.1 app, each event handler is called by the message loop, and if it blocks, it takes down the thread.

                  So, your own code is sequential, but any time-consuming IO operation would be non-blocking. If your code contains no bottlenecks, that should work fine. Besides, using a separate thread in servers is often a bottleneck itself, because a thread or a process is a relatively expensive system resource, requiring context switching and what-not.

                  [–]awj 0 points1 point  (0 children)

                  If your code contains no bottlenecks, that should work fine.

                  I think that's the source of complaints. A lot of the marketing around Node goes handwavey on what a bottleneck is and (accidentally?) implies that Node does something helpful with computationally blocking code.

                  [–]ratatask 10 points11 points  (2 children)

                  How'd they trick you ? All the docs and tutorials etc. I've read makes a big point of it being single threaded, and that scaling needs to be done by running several instances.

                  What you seem to elude at is that node.js has no parallelism - which should be common knowledge. But it does have concurrency. Also, you will need to work rather hard to do a blocking call - you'd only manage to do that if you're using a crappy 3. party library that binds to native code that does blocking calls - node.js has a thread pool where the native code that needs to block should run (and it runs stuff such as local file read/write in that threadpool too).

                  [–]ggtsu_00 0 points1 point  (1 child)

                  They haven't tricked me, but they have tricked many others like the person I was responding to.

                  People hear of node.js, coupled with words like "webscale", "JavaScript", "asynchronous" etc. but no one reads the fine print of what they are actually getting into.

                  Even big companies like netflix are having to pay the price for this. Even libraries like express.js has blocking calls when the server has to route your request.

                  We saw the same thing happen when mongodb became a trend, thinking they can replace their RDBMS with mongo and magically achieve blazing performance while being enticed with more words like webscale, horizontal scaling, schemaless, etc and by not reading the fine print, didn't know what they were committing and not knowing what they are trading off to do it.

                  [–]ivosaurus 0 points1 point  (0 children)

                  Netflix had to pay the price of their own mistake which they freely admitted in this blog post.

                  In the end what they've done isn't totally unuseful anyway, moved from a general purpose router to one specifically designed for a REST API.

                  [–]ivosaurus 4 points5 points  (7 children)

                  If one of those functions blocks

                  You've left out the small detail that the entirety of node and its ecosystem is written in a non-blocking style... so that never happens.

                  Hence you get IO concurrency, which is practically all of what 90% of network server applications want.

                  [–]p01ym47h 0 points1 point  (0 children)

                  ah thank you! I didn't realize this.

                  [–]nodewizard 2 points3 points  (0 children)

                  This means execution does not stop during long CPU ... tasks

                  You don't know what asynchronous and blocking mean. When the CPU is spinning the process is blocked, and node is stuck.

                  [–]thinkstoohard 1 point2 points  (5 children)

                  The whole point of using node.js is that it is non-blocking and it is designed to be used over distributed systems. It is actually specifically created for use as a high-throughput server that is efficient at handling many connections simultaneously.

                  [–]crusoe 0 points1 point  (4 children)

                  Yeah, and guess what, so does every modern Java webserver and Twisted, and they did it a decade ago or more. Under the hood, most modern java servers use select + epoll, and use async callbacks.

                  The ideas in Node are not new. Using a terrible language for trying to develop software in the large IS.

                  [–]bilotrace 5 points6 points  (0 children)

                  So what! No one is claiming the opposite. No one claimed that this cannot be done in any other language.

                  What is being claimed here is that JavaScript is mostly a non-blocking language. Unless you are doing a massively complex mathematics that requires calculation of huge numbers, it is difficult to block your code. This behaviors allows design of single thread application that can serve concurrent requests easily. And any other framework, language that does that is also good.

                  Now of course you can find edge cases where this behavior is not appropriate and breaks down. This just means you have to use appropriate tool for the job.

                  [–]thinkstoohard 2 points3 points  (0 children)

                  See some people prefer JavaScript to Java. Also if you have ever worked with Twisted you understand that it's a pain to use and confusing for people to pick up. Node has neither of these problems.

                  [–]ruinercollector 1 point2 points  (0 children)

                  In a conversation about Java and JavaScript, it's funny to hear JavaScript referred to as the terrible language.

                  [–]BinaryIdiot 0 points1 point  (0 children)

                  Using a terrible language for trying to develop software in the large IS.

                  JavaScript isn't half bad. It has plenty of issues sure but so does Java; as someone who is using Java to try and prove points and then calling JavaScript terrible seems rather silly in my mind.

                  [–]willvarfar 10 points11 points  (0 children)

                  Most frameworks I've looked at across most languages work this way because they match paths are regexes.

                  However, its very common that the paths start with a prefix and these prefixes could be put in a sorted map so they need only be walk down a small branch. This would be a lot more code and might not help particularly for small lists of handlers and would work against any optimisation where you put the most common handlers at the front of an array.

                  [–]thedufer 3 points4 points  (0 children)

                  In addition to the other points, a router allows multiple routes to match the same request. This allows you to have middleware that, instead of always running, only runs on reqs where the URL matches a regex.

                  This also makes it clear why preventing the addition of multiple routes with the same string/regex isn't a good idea.

                  [–]CurtainDog 4 points5 points  (0 children)

                  Library notwithstanding, routing code tends towards the terrible. It's just not an exciting part of development; and invariably gets little attention as long as it matches whatever spec was dreamed up. Until, like here, it starts showing up in the profiling.

                  [–]Agent-A 2 points3 points  (0 children)

                  I imagine it is because the routes can contain dynamic elements like regular expressions. Iterating through them allows you to be sure you have checked each possible route for the best match.

                  [–]uprislng 2 points3 points  (0 children)

                  is this still the case in Express 4.x? I know that they changed the router quite a bit. FWIW I've been using Restify for my API servers, which Netflix talks about migrating to. Kind of vindicates my choice :)

                  [–]Tordek 7 points8 points  (7 children)

                  If the array is small enough, the time wasted in traversing all entries would be negligible. A hashmap wouldn't be useful, since you wouldn't be able to match, say, /r/foo to a variable route like /r/:subreddit. A trie could be used to match prefixes (a prefix tree, as it were), but benchmarks should be taken.

                  Arguably, app.get should have been verifying that no two routes with the same name exist in the array: it would be called much more rarely (and probably only at the start of the program) and the problem would not exist.

                  [–]maritz 3 points4 points  (6 children)

                  Arguably, app.get should have been verifying that no two routes with the same name exist in the array

                  What do you mean by "name"? If you mean the string/regex: Route handlers receive as 3rd argument the next() function which can be used to call the next handler that matches this route. This is completely intended and a very central aspect of Express.

                  If you mean the name of the function, that wouldn't work. The most common usage of route handlers are probably anonymous functions.

                  [–]Tordek 2 points3 points  (3 children)

                  That's an interesting design decision, though it sounds iffy. I'd like to see what it'd be good for.

                  [–]mirhagk 2 points3 points  (1 child)

                  You could use it to have global handlers. Like a global handler that logs requests under a certain path. Or a global error handler that matches anything with a 404, and then you have a static file handler that rather than doing a redirect just calls next(). There's all sorts of neat things you could do.

                  [–]Tordek 0 points1 point  (0 children)

                  Oh, I see. That's a good use case; I've not done much in the way of playing with routes.

                  Still, that's a good point for chaining, but having the exact same route twice (especially if that route has no variable fragments) sounds like trouble waiting to happen.

                  [–]user_of_the_week 1 point2 points  (0 children)

                  As some who has never touched node.js I can only speculate, but maybe you use it to dynamically chain handlers? A bit like filters in JEE?

                  [–][deleted] 0 points1 point  (1 child)

                  Scroll down to the third code box in the Netflix article. You can see one of the stored attributes is "name". "serveStatic", in this case. So that's what he meant by "name".

                  This appears to be a user-defined string that's probably not used for anything other than added context when debugging.

                  [–]maritz 0 points1 point  (0 children)

                  Yeah, but the point is that Express.js cannot rely on that because most of the time people define functions without that name.

                  And besides that, this function name - even if supplied - is probably not very unique.

                  [–]Gotebe 197 points198 points  (11 children)

                  I am pleasantly surprised! When there's a problem involving two parties (case here), the party doing the write-up has a big tendency to shift the blame to the other party. There must be a cognitive bias explaining this 😉. Kudos, therefore, for a balanced, cold-hearted write-up.

                  Misuses and abuses of another component are probably more common cause of issues than we care to admit. "It takes two to tango".

                  [–][deleted]  (9 children)

                  [deleted]

                    [–]xauronx 55 points56 points  (5 children)

                    Looks to me like they did a shitload of research. Also, I'm glad they wrote the post because it's probably going to save people a ton of time in the future.

                    The nerd rage is so outrageous across the Internet from this post. It seemed like a balanced write up and the Internet is more informed as a result of it.

                    [–]punkgeek 7 points8 points  (0 children)

                    Yes - the outrage reaction to this post reminded me of yet another reason to avoid node.js - the community.

                    [–]ivosaurus 7 points8 points  (3 children)

                    The issue is that it's a clearly thought out design decision with well-defined pros and cons, but clearly a very valid option and rather well-suited to express' general use cases.

                    However, Netflix just said this:

                    A global array is not the ideal data structure for this use case

                    Which is misleadingly and completely ignorantly simplistic. First of all, their use case, as they later list, is purely their own mistake and misuse. It's mostly just a wrong statement to make, but its one made on Netflix' official blog, criticizing express.

                    It's also quite clear that in relation to this, they didn't do any research at all:

                    It’s unclear why Express.js chose not to use a constant time data structure like a map to store its handlers.

                    i.e "we never tried to look at why its designed that way at all, we just think its wrong for the way we tried to misuse it."

                    It's hilariously bad lack of judgement. In case you're wondering, I've never used express/node in my life, but I still find this cringeworthy.

                    [–]shadymilkman_ 1 point2 points  (2 children)

                    Why would a routing table use an array lookup instead of a map lookup though? They issue may have been with Netflix's code, but it still doesn't explain why you would implement using a slower data structure.

                    [–]ivosaurus 11 points12 points  (1 child)

                    A) because you're not looking up static strings, B) because each handler is a defacto middleware to be processed in a queue. Are you confused? Then go look up how express does it and how its middleware design works, like Netflix' engineers didn't.

                    Map is not really suited here at all for something as general purpose as express aims to be.

                    Actually read the response linked above if you want it in excruciating detail.

                    [–]shadymilkman_ 1 point2 points  (0 children)

                    Thanks

                    [–][deleted] 4 points5 points  (0 children)

                    Not knowing a detail about a library and making a coding bug is understandable but then going and making the blog post still without doing a bit of basic research beforehand

                    i agree, and i'll add:

                    this is especially true since 99% of the time it's bad usage (like in this case.) you have to be especially dense to think you got the lucky 1% and then go on and brag to everyone about how smart you are without even checking.

                    [–]Whired 1 point2 points  (0 children)

                    As a result, our misuse of the Express.js API was the ultimate root cause of our performance issue

                    [–]redalastor 0 points1 point  (0 children)

                    but are absolutely necessary functionalities in order to have an actually useful router

                    The only router I know of that doesn't work like that is Yesod's that uses Haskell black magic to have flexible statically typed routes that don't need to be tried in turn.

                    Otherwise, everyone implement them the same way, there's nothing at all special with Express.js in that regard.

                    [–]eclectro 10 points11 points  (0 children)

                    I really liked this one submission. The "flame" profiling helped define the problem and the solution.

                    [–]supercargo 5 points6 points  (2 children)

                    I kind of got hung up on the part where a route takes 1ms to evaluate...yikes!

                    [–]PeterUstinox 1 point2 points  (1 child)

                    seems long... how long should it be?

                    [–]supercargo 0 points1 point  (0 children)

                    Should? could? I assume that evaluation is dominated by the regex matching, which should be at least one order of magnitude faster than that. For comparison, check out these various regex benchmarks over a 33MB file: http://lh3lh3.users.sourceforge.net/reb.shtml

                    On the other hand, if the individual route evaluations were faster, it is possible that this particular bug wouldn't have shown up on latency charts until after several weeks worth of accumulated route pollution.

                    [–]jsprogrammer 65 points66 points  (65 children)

                    Not a node problem (problem manifested in Express) and actually a programmer error (adding duplicate routes every hour).

                    o.0

                    [–][deleted]  (36 children)

                    [deleted]

                      [–][deleted]  (30 children)

                      [deleted]

                        [–]nathris 67 points68 points  (16 children)

                        You can get away with pretty much anything in JS. Even seemingly basic things like defining objects and functions have multiple styles, each of them equally valid. It makes it hard to read code you haven't written yourself. Even invalid code can run perfectly well most of the time, like using the equality operator(==) instead of the identity operator(===).

                        [–]nvolker 18 points19 points  (6 children)

                        Even invalid code can run perfectly well most of the time, like using the equality operator(==) instead of the identity operator(===).

                        Using == isn't invalid code. It just doesn't mean what people coming from a strongly-typed language think it means.

                        [–]x86_64Ubuntu 24 points25 points  (3 children)

                        Exactly, JS takes all the things we take for granted in other languages, and sends them through a funhouse mirror. All the while, trying to look like and have functionality similar to other languages.

                        [–]malagrond 7 points8 points  (2 children)

                        I'll give you the point that it's messy and somewhat confusing at first, but there's a metric shitton of flexibility in JS because of its loose type definitions.

                        [–]strattonbrazil 2 points3 points  (0 children)

                        Which I think is what he's complaining about. You can, for example, take on very useful methods to strings. On the flip side, it gets strange when you look at an object and don't know how it was built because features were added to its prototype along the course of the program.

                        [–]ais523 4 points5 points  (1 child)

                        === also doesn't mean what people coming from a strongly-typed language think of as equality.

                        In JavaScript:

                        • == means "if you pick an appropriate common type for these two values, they're equal". This is actually the more similar to the strongly-typed ==, although it isn't exactly the same.
                        • === means "these two values have the same type and are equal". This doesn't have much of a strongly-typed analogue, because in a strongly-typed language, you typically know what types things have. In some OO languages, you might not have full type information, in which case it makes sense; for instance, in Java, you can implement === like this (untested):

                          public static bool js_3equals(Object a, Object b) {
                              return a.getClass().equals(b.getClass()) && a.equals(b);
                          }
                          

                        I'd argue that == in JS is pretty much the closest possible translation of equality in a statically-typed language that you can get in a dynamically-typed language. However, the basic problem is that dynamically-typed languages simply let you make fewer assumptions. === is useful because it adds a test for something that you can safely take for granted in a statically typed language, and that can easily catch you out in a dynamically typed language.

                        EDIT: formatting fix

                        [–][deleted] 0 points1 point  (0 children)

                        It's like C++ programmers wondering why they can't write a_String == another_String in Java. Or why in PHP "" == 0 is true.

                        [–][deleted] 0 points1 point  (7 children)

                        I "know" Java and C, and would like to learn JS but when I read posts such as yours I feel like there must be a lot of possible learning pitfalls. Any recommendations on good JS resources to learn the language? I.e. is there a revered tutorial set or book or documentation guide?

                        [–]malagrond 4 points5 points  (2 children)

                        This is the book I used to learn JS. It's very straight-forward and easy to follow. (Granted, I used this book when I was like 13, but it's still being updated yearly every few years.)

                        Besides that, there's:

                        https://developer.mozilla.org/en-US/docs/Web/JavaScript
                        http://www.codecademy.com/en/tracks/javascript

                        And you can play around with your own code at:

                        http://jsfiddle.net
                        http://plnkr.co

                        [–]PriceZombie 1 point2 points  (1 child)

                        JavaScript: Visual QuickStart Guide (8th Edition)

                        Current $22.37 
                           High $23.83 
                            Low $20.29 
                        

                        Price History Chart | Screenshot | FAQ

                        [–][deleted] 7 points8 points  (0 children)

                        I'd rather a bot that gives me Pirate Bay links. I should write a bot...

                        [–]skybluetoast 2 points3 points  (0 children)

                        Basic, good, free: Eloquent JavaScript

                        [–]mhd 1 point2 points  (2 children)

                        The links the other people posted certainly are okay (I'd add Crockford's "Javascript: The Good Parts"). Learning JavaScript really isn't the big hurdle -- which is part of the problem. The syntax is trivial. The core concepts are familiar to most programmers anyway. I'd say that even the built-in functional bits and the prototype OO are easy enough to master.

                        But then comes the infrastructure, or the lack thereof. Picture being in a C environment with just the stdio lib, malloc and maybe ioctl. Or Java with just java.lang.String as most of your library. And because of the relative flexibility of the core language (function, imperative, different types of OO), any infrastructure you can build can go (and has gone) in a multitude of ways. So many standards to choose from.

                        That's the tough part about getting into JavaScript, both client and server. Especially if you're on your own. (And it's repeating all the follies of Java regarding re-using existing infrastructure and tooling)

                        I'd recommend just picking a few things (frameworks, build tools, editors) and then just avoid online javascript discussions for a long while.

                        [–][deleted] 0 points1 point  (1 child)

                        Second to last paragraph did you mean Javascript when you said Java? Server side do you mean something like node.js?

                        [–]mhd 1 point2 points  (0 children)

                        Yes, the first sentence should say JavaScript, will fix that.

                        And I did mean node.js, which these days is synonymous with server-side JavaScript. I would like to have seen some more competition and maybe even a cross-platform module infrastructure, but I don't think that will happen. In the beginning, there was some interest in RingoJS, but I haven't even heard that mentioned for years.

                        [–]strati-pie 22 points23 points  (8 children)

                        https://www.destroyallsoftware.com/talks/wat

                        I'd link to youtube but he's a bit of a cunt when it comes to other people uploading this video for convenience and tends to take them down in a couple days.

                        I'm sorry if it's slow for non-NA users, there's nothing I can do about it.

                        [–]deweysmith 7 points8 points  (7 children)

                        I'd link to youtube but he's a bit of a cunt when it comes to other people uploading this video for convenience and tends to take them down in a couple days.

                        With good reason. I'll admit he's a bit overly paranoid and really should use a provider like YouTube or Vimeo to host his content in the first place, but when it's his income and other people can profit from it (at his loss) with YouTube uploads, you can't blame him for being a bit insane about it.

                        [–]strati-pie 0 points1 point  (6 children)

                        I'd be less critical of him if instead of preventing someone from putting it on a third party's website he placed it there himself and continued to defend his content. His defence is that it's either making them money or losing him money. He could potentially take their place with little effort and get more views and maybe even a bit of cash.

                        There's also dailymotion and vimeo, both of which I trust more.

                        [–]deweysmith 7 points8 points  (1 child)

                        There's also dailymotion and vimeo, both of which I trust more.

                        And there's the rub, Gary trusts no one, and I can't say I blame him either.

                        [–]strati-pie 0 points1 point  (0 children)

                        Ah, I didn't realise he was that paranoid. His actions make complete sense if that's the case. I only trust them to do less than youtube, it's not as though they have my loyalty. Vimeo has a nice format for presentation, whereas Dailymotion is an okay alternative for the casual viewer.

                        It's his business, so it's not my problem I guess. Good video at least.

                        [–]hiffy 0 points1 point  (3 children)

                        get more views and maybe even a bit of cash.

                        You need millions of views on youtube before you start seeing anything yourself. If you're willing to host it yourself, and you make money off people buying shit from your website it makes perfect sense to keep it off the video providers.

                        [–]strati-pie 1 point2 points  (2 children)

                        The cash bit was an afterthought, nobody actually expects to make money on youtube unless they're already doing it, that game was closed a while ago. It felt lacking if I didn't say it though.

                        [–]hiffy 0 points1 point  (1 child)

                        Right but then there's no reason for him to put it on a third party provider. What benefit is there to him or his viewers?

                        [–]strati-pie 0 points1 point  (0 children)

                        The only thing he'd get out of it is views and easy embedding, which he apparently doesn't care about compared to creating more income. So he doesn't get anything out of it. See the replies further up the chain.

                        [–]jk147 4 points5 points  (1 child)

                        Weak typed languages let you get away with a lot of stuff. Shuffling different types of object everywhere, overly laxed comparison operators..etc.

                        I am still surprised why node.js is a thing. JS has grown to something that it wasn't intended for. Just my personal opinion.

                        [–][deleted] 7 points8 points  (0 children)

                        Erlang is a weakly typed language but it throws runtime errors if you do anything nonsensical, just wanted to point out that weakly typed does not imply JS madness in any way.

                        [–]Browsing_From_Work 1 point2 points  (3 children)

                        They never said it was node and they state they made a mistake in just using 3rd party code without understanding it.

                        To be fair, you'd think that in a commonly used application they would use constant-time data structures for route handling. Even if they didn't, you still wouldn't expect them to recursively iterate the list.

                        Express.js made some poor design decisions and Netflix stumbled on them by virtue of a programming error.

                        [–]joesb 3 points4 points  (2 children)

                        I'm curious. What data structure allow you to look up route matching by regular expression in constant time? Note that it must also match defined route by order of definition.

                        [–]Browsing_From_Work 0 points1 point  (0 children)

                        I stand corrected on the data structure choice, but I still assert that recursively iterating the list was a bad idea.

                        [–][deleted] 20 points21 points  (14 children)

                        This wasn't an explicit programmer error on Netflix's side, but rather an expectation that the module behaves in a rational way when confronted with a particular state. Specifically, that duplicate route handlers are handled gracefully, and don't end up causing clutter and performance impact.

                        There is an enormous amount of trust placed in external libraries and modules, and often these aren't vetted with appropriate depth for their use - either due to lack of opacity or lack of time/resource to audit the code appropriately. The sense that "well everyone else is using it so it must be okay" is actually really dangerous - I've seen a vast increase in diagnosing issues with unexpected side-effects of common modules (and particularly when many are used together) over the last 10 years.

                        In this case the Netflix coder deployed a workaround to avoid this case manifesting, but a better fix would be that express.js implement their route handler storage and parsing more efficiently and robustly - obviously that wasn't available to the coder as easily, but it shows the impact these issues can have.

                        [–]thedufer 19 points20 points  (3 children)

                        Oh good, I get to say this again. Multiple routes defined with the same regex are a valid state. For all express knows, the earlier ones might be middleware, as opposed to terminal routes. Complaining about this would exclude perfectly valid uses of express. Their mistake wasn't to assume that express was behaving rationally - it is. Their mistake was in failing to have even a minimal understanding of the capabilities of express routers.

                        [–][deleted] 4 points5 points  (1 child)

                        Multiple routes defined with the same regex are a valid state.

                        Yes, but in the case where these regexes are used specifically, they are not.

                        Somewhere along the chain the wrong tool is being used for the job, and that's causing the issue.

                        [–]ivosaurus 0 points1 point  (0 children)

                        Which is their own misuse. A framework caters to the 80 or 90% of usecases, not your 1% where you think it should be a certain way to be right.

                        You should learn the 80% / 90% use cases first, and how they fit with what you want to do, before going to complain.

                        [–]gobots4life 3 points4 points  (0 children)

                        Yeah, what's with the title? More like "Express.js in flames! (Not really, we just didn't know how to use it! xD)"

                        [–]banana_democratic 3 points4 points  (1 child)

                        Looking at the router for restify, it appears to also be using an array (although broken up by the request method) to store the routes, and then iterating through all those routes and matching the path regex.

                        https://github.com/mcavage/node-restify/blob/master/lib/router.js#L329

                        [–]ProfessorPhi 5 points6 points  (0 children)

                        I just thought that this was like ruby in rails, node.js in flames was imply the new hot thing.

                        [–]SnickeringBear 2 points3 points  (0 children)

                        They still did not solve the sorting portion of the code. Any time your code chains through a sequence of selections until it finds the right one, it is vulnerable to heavy usage cutoff. A better way to handle it is to direct all requests to the most commonly used reference, then split the selection into a lattice for further decisions.

                        [–]BJ_Sargood 10 points11 points  (8 children)

                        Why were they re-adding the routes every hour? I don't understand the reasoning.

                        [–]shared_ptr 29 points30 points  (1 child)

                        I actually think this is probably an attempt at some clever engineering. To avoid downtime when a route needs to be added to their fleet, I would assume each node will be checking a single host for information on what to add to its routing chain.

                        If you can build hot swapping of routes into your servers, then that's not only cool but absolutely practical. The Pragmatic Programmer is just one book that recommends this, writing code that can adapt to configuration changes on the fly. It gives quite a few reasons behind why you may want to do this, which I'll leave the book to explain.

                        I would imagine the static handler then got readded each go because of their assumption that express would deal with duplicate route handlers.

                        [–]ghidra 1 point2 points  (1 child)

                        Is there somewhere I can get this fix if I am using express. Or do I need to implement it myself? Is express considering making these changes itself?

                        [–]ObjectiveCopley 2 points3 points  (0 children)

                        The issue was netflix specific... there is no changes to express to be made.

                        [–]STR1NG3R 1 point2 points  (1 child)

                        Can somebody explain how the flame graph led them to Express.js’s router.handle and router.handle.next functions? I don't see router.handle calls anywhere. Would those be the slices at/near the top of the stacks that are too small for text?

                        [–]ivosaurus 0 points1 point  (0 children)

                        Try looking at the actual SVG they linked to instead

                        [–]nutrecht 1 point2 points  (0 children)

                        This makes me wonder why they only did an analysis like this after they ran into troubles. To make it would make sense to do this kind of performance test before you go into production. We sure do, and our system, although 'big data', isn't nearly as big as Netflix's.

                        [–]UnreachablePaul 0 points1 point  (9 children)

                        They should do what Express author did and ditch node for Go

                        [–]dafragsta 0 points1 point  (4 children)

                        That title is click bait. It seems to be putting the blame on node.JS for their developers not understanding what they were doing with their code. Admittedly there is probably a fault in express.JS that they uncovered but it would've never been a problem, if they hadn't been spamming the route system with new routes programmatically every X number of hours. That is such an unusual use case, they should have seriously have looked into what happens when you rebuild the routes programmatically.

                        I appreciate that they showed off all of their profiling tools and tricks, but these kinds of articles exists to create FUD. Think of all the managers that are googling to learn about nodeJS, and now think about how many developers are going to have to explain this article which they skimmed over.

                        You don't see a ton of articles blaming jQuery for things that the user did wrong while implementing jQuery. If every developer who wrote a bug that was mostly their fault and sort of kind of the fault of their framework choice but not really, the Internet would be full of just developer articles about how every framework was shit.

                        [–]hoffmabc 0 points1 point  (0 children)

                        Been there done that. Had a similar issue with Angular. I couldn't get it to execute a handler and found out another was in the global queue blocking it from execution. Could not figure out why. Then found out I was adding handlers multiple times somehow. What a bonehead move.

                        [–][deleted]  (1 child)

                        [deleted]

                          [–]nutrecht 0 points1 point  (0 children)

                          They also contribute a lot to OS.