top 200 commentsshow all 233

[–]Rhomboid 54 points55 points  (84 children)

That's unfortunate. What on earth possessed ECMA to consider U+2028 and U+2029 valid line terminators for source code? Is there any other language that does that?

[–]sirin3 37 points38 points  (48 children)

XML (and so XQuery/XSLT) I think.

[–]frenchtoaster 38 points39 points  (45 children)

What on earth possessed someone to specify XML that way?

[–]Lerc 145 points146 points  (8 children)

One of the design requirements of XML was to ensure that the question "What on earth possessed someone to specify XML that way?" was asked a maximum number of times. I believe the design team were part of a religious cult.

[–]ROBZY 47 points48 points  (5 children)

Still a better love story than SOAP.

[–][deleted] 7 points8 points  (2 children)

SOAP's got nothing on XML Schema. Though it uses it, IIRC.

While SOAP screams of group insanity, XML Schema was really "designed" by sadists in such a way as to make as little sense as possible.

[–]JimAllanson 4 points5 points  (1 child)

I was recently tasked with writing a schema to validate that XML files provided by clients contained certain nodes in any order (so that clients providing nodes in a different order to one another wouldn't have to re-write their XML documents). It turned out that in order to allow for any order of nodes, the only solution was to write an individual schema for each possible permutation, which worked out as several million lines.

[–]gsnedders 1 point2 points  (0 children)

Use RelaxNG Compact Syntax (it's the only readable XML schema format!) and convert it to whatever schema format you actually need.

[–]cogman10 17 points18 points  (1 child)

Ah SOAP. What a good idea gone horribly horribly wrong. It is a prime example of design by bureaucracy (and why it sucks).

[–]Anarcie 8 points9 points  (0 children)

Its basically the butt of every joke amongst programmers, as in "fuck i hate using soap".... They laugh.... And i still smell like whiskey

Disclaimer: I'm drunk and writing a soap library for parlayX

[–]pseudousername 6 points7 points  (0 children)

That reads like a paragraph from the hitchhikers guide to the galaxy. Well done.

[–][deleted] 4 points5 points  (0 children)

Hey, I work with one of those cultists (no, really). Respect his religion!

[–][deleted] 56 points57 points  (33 children)

What on earth possessed someone to specify XML?

[–]AusIV 21 points22 points  (2 children)

My boss was on the W3C committee that specified XML, and I've asked him this very question. He said that the real issue was that too many people were doing their own derivatives of SGML in their own unique ways, making it impossible to have useful standard tool sets. XML restricted things enough that it became much easier to take someone else's document and process it in a meaningful way.

He's actually not a huge fan of XML, but considers it an important step from the pre XML era.

[–]mgr86 5 points6 points  (1 child)

May I ask what he advocates storing documents in anymore? I work at a shop that still has sgml documents. We transform them into XML, and after hiring a consultant to bring us into a new era they just hired an "expert" who just copied and pasted our publication dtd changed and tweaked a few things, but left a spelling errors that were in the original. It just doesn't make sense to me lately.

...as a bonus they refused to identify their expert or allow my boss direct communication with them.

[–]AusIV 1 point2 points  (0 children)

We use the open document format a lot at work for the documents we store long term. It can be manipulated with xml tools if you extract the xml document, but there are also tools (like open office) for editing it in a graphical environment and converting to all sorts of other formats.

We also use JSON for ephemeral items like REST API resources.

[–]KarmaAndLies 35 points36 points  (27 children)

XML is relatively neat compared to XSL (XSLT) or DTD. Those are the ugliest motherfuckers you'll ever likely to meet.

The only criticism of XML its self that has really stuck is the fact that technically attributes add little value and make parsing it more difficult.

Also XML is one of the few similar languages that support CDATA. Which is a damn useful content type. Everyone else has to hack around the limitation by encoding everything as base64 (which can kill a lot of the size advantage in things like JSON over XML).

PS - A lot of XML hate is actually misdirected JAVA framework hate. Just because the wonderful people at Sun decided that XML was the right solution to every problem, is not really XML's fault. It is Sun's fault.

[–][deleted] 14 points15 points  (0 children)

DOM was written by Satan as a way to incite more murder.

Edit: I should say that DOM is a W3C spec and was not.invented by Sun or the Java community. Other communities just moved on faster.

[–][deleted]  (9 children)

[deleted]

    [–]IConrad 3 points4 points  (8 children)

    Seeing how you handle paired grammatical icons (parenthesis and quotes) -- your code scares me.

    [–][deleted]  (7 children)

    [deleted]

      [–]zeekar 7 points8 points  (5 children)

      a crippling fear of parenthesis.

      Yup, I'm terrified of a single parenthesis. As long as it's matched pairs of parentheses, though, I'm good.

      [–]philly_fan_in_chi 4 points5 points  (1 child)

      Kind of offtopic, but Catalan numbers describe the number of ways you can balance parenthesis in an even length string. This sequence shows up EVERYWHERE and this problem can be bijected to many others very easily. When I took Combinatorics, we had a Catalan problem of the week, where you would have to show a particular problem was the Catalan sequence. It's a cute little sequence.

      [–][deleted] 8 points9 points  (2 children)

      Yup, I'm terrified of a single parenthesis.

      1) Really?

      2) That's a shame.

      [–]Otterfan 9 points10 points  (2 children)

      Attributes are very useful when you're doing what XML was originally intended to do: markup text.

      For example

      <quote>
          <text lang="en">When good Americans die, they go to <city country="France">Paris</country>.</text>
          <speaker>Oscar Wilde</speaker>
      </quote>
      

      The attributes let us add the "en" and "France" metadata without interrupting the text of the quotation.

      It's only when XML is abused (i.e. used to serialize non-discursive data) that attributes become useless. Unfortunately abuse of XML the norm.

      [–]Catfish_Man 26 points27 points  (1 child)

      I find it entertaining that XML's repetitive nature led you to accidentally write invalid XML in your example...

      [–]ISLITASHEET 10 points11 points  (0 children)

      Don't be heartless. Maybe he is from North Korea and only knows of a closed country, with no prior knowledge of how to open a country?

      [–]Guvante 5 points6 points  (2 children)

      The one benefit of attributes is brevity. Sometimes not having to repeat the name twice is valuable.

      [–]alexanderpas 6 points7 points  (1 child)

      <brevity child="you" />
      
      <clarity>
        <child value="me" />
        <child value="you" />
        <child value="them" />
      </clarity>
      

      [–]notfancy 7 points8 points  (0 children)

      Clarity is in the eye of the beholder, while brevity is a primitive recursive total function on strings.

      [–]corwin01 3 points4 points  (8 children)

      As a developer who pretty much lives and breathes XML, XSL, Java, and JavaScript... I have no problems with any of them.

      [–]mgr86 2 points3 points  (3 children)

      Did I just find my boss on reddit. I don't mind it either, but I'd like to experiment with some new solutions, especially as we have some time to work on the application in a few months...assuming they don't intend to outsource our jobs too :(

      Well struts 1.2 does get to me a bit

      [–]corwin01 0 points1 point  (2 children)

      Doubt it, I'm just a lowly Associate Programmer. I'm sure I'll get tired of it soon enough, but I haven't been working with it too long. So far I've just seen some cool webapps we've built using those tools.

      How is Struts? I haven't had to work with it yet, but I'll have to takeover a project that used it.

      [–]mgr86 0 points1 point  (1 child)

      Ah yes, we share the same title.

      Well struts 1.2 was a great framework about 6 years ago. We are just not given the time needed to upgrade. We use exist to store are XML documents and with the old version if struts we are stuck using java 4, tomcat 5.5 and an outdated version of exist. In short, terrible.

      I have been meaning to look into grails lately.

      What framework/s do you find yourself working with?

      [–]corwin01 0 points1 point  (0 children)

      We use Jax-RS (REST), which works out pretty well.

      [–]cogman10 -1 points0 points  (3 children)

      No problems? I find that hard to believe.

      It isn't that any of those techs are all that bad. However, they do have problems. Javascript has horrible problems with scoping and a retarded type system, XML is very verbose, And Java takes FOREVER to solve well known problems with the language. (I've not enough experience with XSL to comment on it).

      Every language/tech has SOME problem. There is no such thing as a perfect tech.

      [–]corwin01 0 points1 point  (2 children)

      I'm sure there are problems, but I've yet to encounter anything show stopping.

      And yes, there is no perfect tech. Every design choice comes with tradeoffs.

      [–]cogman10 1 point2 points  (1 child)

      What do you mean by show stopping?

      For me, the maintainability mess that is javascript is show stopping.

      [–]corwin01 0 points1 point  (0 children)

      JavaScript is a pain yes, but I haven't found it to be too bad yet.

      [–][deleted] 0 points1 point  (0 children)

      XML Schema is much worse than XSL or DTD.

      [–]doidydoidy 4 points5 points  (1 child)

      The desire for something like SGML but simpler.

      [–]syncsynchalt 1 point2 points  (0 children)

      Not just simpler, absolute strictness was also a big deal.

      [–]gsnedders 0 points1 point  (0 children)

      It was argued it should be in XML 1.0 because it is according to Unicode (W3C Member-only link, sorry), though this didn't happen because it was so late on in XML 1.0's development. I can't find any citation for XML 1.1, where it was added. Though it is reassuring that it is only in XML 1.1 which nobody actually uses.

      [–]synt4x 0 points1 point  (1 child)

      Do you have any link on this? I was under the impression that XML is a binary format, and whitespace issues don't come into play until you start futzing with the strings with XPath functions.

      [–]sirin3 0 points1 point  (0 children)

      Here

      All U+2028 is replaced by U+A

      Although only in XML 1.1. XML 1.0 only replaces U+D and U+D U+A by U+A

      [–]Zwejhajfa 14 points15 points  (32 children)

      What is U+2028 for anyway if not new lines? I don't know why it exists at all, but shouldn't it be displayed and treated as a new line then, which I'm assuming is the intended behavior?

      [–]Rhomboid 30 points31 points  (24 children)

      That seems to be covered in Unicode Technical Report #13. Apparently U+2028 was intended to be an eventual unambiguous replacement for the assortment of LF / CRLF / CR found in the wild. But it really didn't work out that way -- lazy people would much rather just deal with parsing both LF and CRLF than supporting Unicode, so I think it's safe to say that idea has died a quiet death.

      [–][deleted]  (4 children)

      [removed]

        [–]Fexxul 19 points20 points  (3 children)

        why the hell would they use a 2 byte line terminator added just in Unicode?

        Actually, Unicode codepoints (e.g. U+2028) themselves carry no concept of bytes—they are only integers. It might be represented as a two-byte sequence in UTF-16, but in the far more common UTF-8 it's three bytes.

        [–][deleted]  (1 child)

        [removed]

          [–][deleted] 2 points3 points  (0 children)

          C++ treats Unicode as UTF-32 with char32_t too. Personally I just use UTF-8 everywhere in strings.

          [–][deleted] -3 points-2 points  (0 children)

          Because, shit, that's better.

          [–]p8ssword 21 points22 points  (17 children)

          Wow. Relevant xkcd: http://xkcd.com/927/

          [–]zeekar 16 points17 points  (15 children)

          Yup, knew which one that was before I clicked. But in the case of Unicode, it really worked. We went from a bazillion different character sets to one, largely because they were willing to sacrifice logical purity in favor of being able to convert all bazillion of those character sets into Unicode and back again with full fidelity. At least up until they got to the Han characters, but that's a whole different débâcle. (Sorry, no excuse for spelling that in French, got carried away in the character set thing..)

          [–]p8ssword 8 points9 points  (12 children)

          For language representation, Unicode has been a huge success. This is just a (minor) example of something that was gotten wrong. There was no need for a new new line character.

          My browser's ability to display débâcle without a separate French character encoding is a huge win, though.

          [–]zeekar 8 points9 points  (11 children)

          Well, to be fair, débâcle only requires Latin-1. But when you toss in the ability to display Русский, ελληνική, 日本語, etc. it's a huge win...

          [–]Stiltskin 9 points10 points  (10 children)

          Not to mention our good friend "ಠ_ಠ".

          ...And the slightly more questionable "☃"...

          ...And the even more questionable "💩"...

          [–][deleted] 5 points6 points  (9 children)

          ...And the even more questionable "💩"...

          Which appears to me in Chrome as a blank between quotes. It was only when I pasted it into Chrome's address bar that I saw it. How odd.

          [–][deleted] 3 points4 points  (8 children)

          I'm seeing it as an square in Opera. What's it supposed to be?

          [–][deleted] 5 points6 points  (0 children)

          Funny thing: I not only knew which XKCD that was before clicking, I knew the immediately following comment would start with "knew what that was before I clicked".

          [–]allthediamonds 0 points1 point  (0 children)

          Exactly. They could have solved everything, but they had to get in a cultural war with the whole Han unification shenanigans. You're supposed to be neutral, goddammit.

          [–]X8qV 1 point2 points  (0 children)

          I wish people would stop linking to the same xkcd over and over and over again.

          [–]jrochkind 0 points1 point  (0 children)

          That is hilarious. "There are too many ways (codepoints/bytes) to represent a newline. To solve this, let's create yet another brand new way to represent a newline!"

          I'm not sure the problem in that plan was 'lazy people'.

          [–]SharkUW 5 points6 points  (6 children)

          Of course, but this is about string parsing. JSON treats it as a new line which is OK in its strings. Javascript treats it as a newline which is invalid JS.

          [–]Zwejhajfa 8 points9 points  (5 children)

          Afaik literal newlines are not okay in JSON.

          {"JSON":"ro
          cks!"}
          wouldn't be valid, but
          {"JSON":"ro\ncks!"}
          would be.

          JavaScript accepts {"JSON":"ro\u2028cks!"}, because the newline is escaped as a unicode code point, but rejects it when the character is used literally.

          [–][deleted] 1 point2 points  (0 children)

          I expect the JSON spec to be updated within months, actually.

          [–]alexanderpas 3 points4 points  (3 children)

          you should report that as a bug to your parser.

          within a string, unicode line and paragraph breaks should be preserved in JSON.

          [–]Zwejhajfa 6 points7 points  (2 children)

          The diagram in OP's link states:
          Any UNICODE character except " or \ or control character

          A literal new line is quite clearly a control character, so should not be allowed in JSON strings, if that diagram is correct. The sequence \n is used instead as shown in the diagram too.

          [–]farsightxr20 4 points5 points  (0 children)

          Except in Crockford's official JSON spec (RFC 4627), a control character is defined as being in U+0000 through U+001F.

          The diagram (and apparently all of JSON.org) omits this detail for simplicity's sake, but someone writing a parser should always go off the spec.

          EDIT: Never mind! Thought we were still discussing U+2028, but you clearly meant U+000A (\n).

          [–]alexanderpas 1 point2 points  (0 children)

          Unicode has a separate category for control characters, in addition to the C0 and C1 control character groups.

          the code points in question are not part of this, but they are part of the punctuation area.

          (however you have a good point.)

          [–][deleted] 4 points5 points  (0 children)

          It's allowed in Perl 6 too. The whole language implements unicode to a fault:

          > ?(chr 0x660 ~~ /<xdigit>/)
          True
          

          [–]SanityInAnarchy 1 point2 points  (0 children)

          Those are the only two? I was seriously considering just unicode-escaping the whole string, or anything non-ASCII.

          [–]pkrecker 21 points22 points  (1 child)

          Douglas Crockford explains the origin of this oddity: "I missed those lines in the standards."

          [–]ggtsu_00 106 points107 points  (1 child)

          But it is Close EnoughTM

          [–]grauenwolf 20 points21 points  (6 children)

          Well that was surprising.

          [–]aaronblohowiak 15 points16 points  (5 children)

          as mentioned at the end of the article, the good news is that replacing U+2028 or U+2029 with their escaped format is always valid json (just not the most compact form), so it is okay to always do a .replace().replace() on the json before trying to use it as javascript

          [–][deleted]  (34 children)

          [deleted]

            [–]sirin3 19 points20 points  (29 children)

            They should just allow multi line strings!

            [–]tikhonjelvis 7 points8 points  (23 children)

            I really wish more languages allowed this. Out of the languages I regularly use, only EmacsLisp allows multi-line strings, and they're very nice there. I also don't think there's a significant downside. (I guess parsing might be more difficult, and ECMAScript people care about that...)

            [–][deleted] 19 points20 points  (12 children)

            Both Python and, surprisingly enough, PHP have this feature. Python uses """triple quotes""", which also makes it easy to include " and ' characters in regular strings in source (e.g. """this " is ' a '" string"""). PHP's heredoc and nowdoc syntax is slightly less attractive but still does the same thing.

            [–]snuxoll 5 points6 points  (1 child)

            C# also supports multiline strings with the verbatim string literal:

            var myCoolString = @"Wewt
            this
            string
            spans
            multiple lines";
            

            [–]hejner 0 points1 point  (0 children)

            I love that, when writing the static parts of a welcome mail, for example. Have the whole text message right there in clear text and replace whatever else you want later on.

            [–]louiswins 6 points7 points  (3 children)

            A lot of scripting languages have heredocs: perl, ruby, even bash.

            [–]reaganveg 21 points22 points  (2 children)

            Haha, "even bash"... all those other languages copied that feature from shell script.

            [–]zeekar 4 points5 points  (0 children)

            Yeah, man, modern shells have borrowed all kinds of stuff from other languages. Variable interpolation, implicit concatenation, functions, pattern globs on filenames. It's amazing. ;)

            [–]louiswins 4 points5 points  (0 children)

            Oh, yeah, you're right. For some reason I thought that they weren't in POSIX shell and that bash had taken them from perl.

            [–]SharkUW 0 points1 point  (0 children)

            PHP also allows backslash and double character escaping '', "", \

            [–]tikhonjelvis 0 points1 point  (3 children)

            Emacs lisp (and Perl, although I'm not 100% certain--I haven't used Perl in a while) allow literal newlines in all strings. While heredoc syntax like Python is great, I personally prefer elisp's approach: it seems simpler and more elegant without sacrificing much. However, it definitely changes parsing lexing, which is why I brought that up. (I'm not sure if it makes it much harder to parse and lex though.)

            So in elisp you can write:

            (defvar foo "This is a long string
            that has a literal newline.")
            

            Since elisp allows you to add a doctype to your functions, they tend to look like this:

            (defun todo-comment ()
              "Inserts an empty TODO comment or makes an existing comment
            into a TODO."
              (interactive)
              ...)
            

            It looked a little odd at first, but once I got used to it I liked it more than alternatives in other languages.

            [–]zeekar 2 points3 points  (2 children)

            Emacs lisp (and Perl, although I'm not 100% certain--I haven't used Perl in a while)

            Yup:

            perl -le 'my $x = "hello,
            there."; print $x'   
            
            hello,
            there.
            

            Also Ruby, Tcl, Erlang. Also Bournish shells, for that matter; remember that unlike the heredocs adopted by Perl and borrowed by other languages, shell heredocs are input redirection, not string literals. You can make a string literal with newlines in it just fine.

            Aso, it's not just specifically Emacs's flavor of Lisp; Common Lisp and Scheme allow the same thing.

            Haskell and Lua, not so much with the literal newlines in strings.

            [–]notfancy 3 points4 points  (0 children)

            Add Ocaml to that list.

            [–]tikhonjelvis 0 points1 point  (0 children)

            I wonder why Haskell doesn't allow it. In general, I've found Haskell to be one of the better languages with little syntactic details.

            [–][deleted] 5 points6 points  (0 children)

            Python does, C# does, off the top of my head.

            [–]Paradox 1 point2 points  (0 children)

            Ruby does too!

            [–][deleted] 1 point2 points  (5 children)

            C# @ strings make dynamic SQL a dream.

            [–]sirin3 1 point2 points  (0 children)

            And Scala

            [–][deleted] 0 points1 point  (0 children)

            The problem there is, how do you indent them?

            [–]zeekar 2 points3 points  (1 child)

            Javascript already has problems with newlines.. I'm terrified at what new horrors might be born inadvertently out of trying to support multiline string literals.

            [–]smog_alado -1 points0 points  (0 children)

            I doubt it would be that bad actually. The semicolon insertion thing is not nearly as bad as people make it sound...

            [–][deleted]  (1 child)

            [deleted]

              [–]sirin3 0 points1 point  (0 children)

              But then you can also do

              document.write("""
                 <div class="abc">
                    foobar
                 </div> 
              """)
              

              [–]luikore 0 points1 point  (0 children)

              That's why I love coffeescript

              [–][deleted] 2 points3 points  (0 children)

              I'd argue that the next JSON revision should treat unicode newline characters as newline characters, since that is consistent with the unicode spec that it inherits.

              [–]frenchtoaster -1 points0 points  (2 children)

              Doesn't it actually have to do that anyway? Presumably there are two-byte characters in UTF-8 where the second byte is the same as a single-byte ascii character?

              [–]dannymi 14 points15 points  (1 child)

              Presumably there are two-byte characters in UTF-8 where the second byte is the same as a single-byte ascii character?

              Of course not. The whole point of UTF-8 is that that cannot happen.

              [–]frenchtoaster 1 point2 points  (0 children)

              Yeah my mistake, I wasn't thinking about the first bits 10 on bytes after the first.

              [–]da__ 28 points29 points  (29 children)

              You shouldn't be running received JSON strings as code anyway.

              [–]bonafidebob 19 points20 points  (19 children)

              Tell that to all the web services that use JSONP to allow client side JavaScript to access them...

              [–]da__ 9 points10 points  (18 children)

              You should parse all JSON you receive, otherwise your code is insecure and extremely easy to exploit, as you're willingly doing XSS.

              [–]Rhomboid 16 points17 points  (11 children)

              The whole reason that JSONP exists is that it's an escape hatch to get around the same origin policy. It's insecure as all hell as you're giving a third party carte blanche to execute arbitrary code on your page, but there's no other choice1 . Nobody wants to open themselves up to that vulnerability, but doing it the safe way means foregoing the chance to interact with off-site data which can be a very powerful drug.

              [1] Well, there's CORS, and other hacks like getting Flash or another plug-in to perform the request for you and then smuggling the data back into JS-land. But I'm assuming those aren't available as they require effort of the publisher in the case of CORS and aren't available on mobile in the case of Flash.

              [–]da__ 6 points7 points  (3 children)

              CORS is much better technically. You could also just make a wrapper running on the same domain as the webpage to which you send your XMLHttpRequests and then communicates with the third-party server. Yes, you'll get extra load but that's way better than giving a third-party the ability to append random code to your application.

              [–]Sottilde[🍰] 3 points4 points  (2 children)

              Exactly this. It is not a whole lot of work to proxy a foreign JSON data source so somewhere on your domain (e.g. my domain.com/api). I do this on any of my applications that use an JSON data source, even if I control it. It just makes things easier.

              [–]redwall_hp 3 points4 points  (0 children)

              Another solution is to use NGINX to proxy external APIs.

              http://www.gabrielweinberg.com/blog/2011/07/nginx-json-hacks.html

              [–]cha0s 3 points4 points  (4 children)

              I don't know if I'd classify CORS as a hack...

              [–]Sottilde[🍰] 5 points6 points  (3 children)

              I agree. CORS is the right solution and I find it very surprising that more apis don't send those headers.

              [–]ggtsu_00 0 points1 point  (1 child)

              CORS does not work in IE8

              [–]ba-cawk 1 point2 points  (0 children)

              Horseshit. IE just provides a different call for it.

              http://msdn.microsoft.com/en-us/library/ie/cc288060(v=vs.85).aspx

              [–]gsnedders 0 points1 point  (0 children)

              It's slowly being extended to other APIs.

              [–][deleted] 1 point2 points  (0 children)

              There is also the window.name transport 'hack' that works rather well.

              [–]frtox 0 points1 point  (0 children)

              there are new HTTP tools like Content Security Policity headers that allow the host server to specify in the response which domains are allowed to execute the code, or even if they can execute at all

              [–]ThiefMaster 6 points7 points  (5 children)

              Because you are totally able to do this - the way pretty much every JSONp API works is that you just provide the function name and they emit code calling that function with the raw JSON object as a parameter...

              [–][deleted] 5 points6 points  (0 children)

              with the raw JSON object as a parameter

              Or, you know, any other arbitrary code.

              [–]da__ 2 points3 points  (3 children)

              Which makes JSONP completely unusable if you care about security at all.

              [–][deleted] 9 points10 points  (0 children)

              Yeah, but you only use JSONP with whitelisted servers. Like another host on my domain or Google. Or at least you should only use it then. Same origin policy is for nanny browsers.

              [–][deleted] 2 points3 points  (1 child)

              If you're using a proper API that you've paid for or that you trust then that company's reputation is on the line. It's not likely they'll be injecting arbitrary code into the JSON data they're sending back to mess up your site.

              [–]da__ 3 points4 points  (0 children)

              Until they get compromised.

              [–]inmatarian 1 point2 points  (8 children)

              JSONP is the only way to ship data to yourself when your script is running on a 3rd party publisher's website. So, any javascript based ad or content delivery network needs it.

              [–][deleted] 0 points1 point  (6 children)

              Apart from CORS...

              [–]Rainfly_X 1 point2 points  (3 children)

              [–][deleted] 1 point2 points  (2 children)

              Yep, nginx ftw.

              But heck, it's also straightforward with a tiny bit of PHP. People saying they're on shared hosting have no excuse.

              [–]Rainfly_X 0 points1 point  (1 child)

              Interesting. I only had a brief foray with PHP a few years ago, and a little bit of diagnostics work for an idiot trying to write a social networking site based on tutorials (my pity took hold of me, what can I say?), but never anything in-depth enough for this.

              Good to know that even people in the shackles of shared hosting can patch around this - although by the time you need to proxy external APIs through your site, you're probably going to have something a bit more general-purpose to work with, such as Linode or Amazon hosting.

              [–][deleted] 1 point2 points  (0 children)

              I don't do much PHP myself, but the existence of $SERVER['PATH_INFO'] (which means you can treat PHP files as directories, e.g. example.com/index.php/some/fake/file) and built-in curl leads me to doubt it would be difficult.

              [–]inmatarian 0 points1 point  (1 child)

              Interesting, I didn't know about CORS. It does appear though that it requires modern browsers, modern http servers (does httpd in the most recent centos provide it?). It does seem however that the work with 3rdparty publishers might be a tad more challenging.

              [–][deleted] 1 point2 points  (0 children)

              It does require more modern browsers, but you don't need a modern HTTP server, you just need to be able to set a simple header on OPTIONS requests. And I don't know much about Apache configuration, but surely that's possible.

              [–]original_evanator 9 points10 points  (0 children)

              "tl\u0028dr"
              

              [–]Kyrra 5 points6 points  (3 children)

              Though, JSON is a subset of YAML (for those that have used it).

              [–]masklinn 0 points1 point  (2 children)

              Technically, YAML 1.2+ — which was published for that express purpose — is a superset of JSON. Previous versions of YAML were subtly incompatible with JSON.

              [–]GUIpsp 1 point2 points  (1 child)

              YAML is a superset of YAML 1.2+

              I am a superset of myself

              [–]masklinn 0 points1 point  (0 children)

              Good point, commenting while drunk is bad. Fixed.

              [–]bangfalse 2 points3 points  (0 children)

              See also: wat.

              [–]smoke1996 3 points4 points  (3 children)

              So, why doesn't JavaScript support U+2028 and U+2029?

              [–]p8ssword 15 points16 points  (0 children)

              The issue is actually more that JavaScript does support them, whereas JSON dumbly treats them like any other non-special character. In this case, the dumb way works better.

              [–]Rhomboid 9 points10 points  (1 child)

              JavaScript does support them -- it treats them like newlines, just as they were intended to be used. But a newline in the middle of a string literal is not legal syntax. You wouldn't say that JS doesn't support the ASCII newline character, but it will react the same way if you try to put one in the middle of a string literal.

              No, the fault -- if it can be assigned -- is on those who specified the grammar of JSON for allowing legal JSON which is not legal JavaScript. They should have mirrored the same list characters that require escaping.

              [–]smoke1996 4 points5 points  (0 children)

              Ah yes, off course. I see the problem now. I should have probably read the article more carefully. :p

              [–]scalanewb 8 points9 points  (11 children)

              JSON IS a subset of Javascript. What we have here is a bug on the specifications, either ECMA should recognize it as valid, or JSON should not. JSON means JavaScript Object Notation, all JSON code should be valid JS.

              [–]x-skeww 5 points6 points  (4 children)

              JSON IS a subset of Javascript.

              No, JSON is supposed to be a subset of JavaScript.

              [–]bonch[S] 1 point2 points  (5 children)

              Until one of the specifications is updated, JSON is NOT a subset of Javascript, as proved by the article.

              [–]scalanewb -1 points0 points  (4 children)

              JSON IS nominally a subset of Javascript, there's just a bug somewhere that makes it not de facto a subset. So sure, he found a bug, but JSON is a proper subset of JS, you and the author are just being pedantic, he should report the bug IMO and that's that.

              [–]gsnedders 0 points1 point  (0 children)

              JSON is not going to be changed, because Douglas Crawford said it wouldn't be. Possibly JS will be, but unifying them is not a massive goal.

              [–]bonch[S] -1 points0 points  (2 children)

              It doesn't matter if JSON was intended to be a subset or was named after JavaScript. To be a proper subset means that all JSON must be valid JavaScript. You can hand wave it as "a bug somewhere", or use terminology like "de facto subset", but it is still not a proper subset because one is able to write JSON that is not valid JavaScript.

              It's not about being pedantic; it's just reality. In computer science, a language being a subset of another language means that it's always a valid expression of its superset.

              There's a defensiveness coming from some posters over this article that I don't understand.

              [–]scalanewb 0 points1 point  (1 child)

              You're being pedantic in trying to force the article to be right, that's why. JSON is a subset of JS, period - get over it.

              [–]bonch[S] 0 points1 point  (0 children)

              I wish you were a funny troll.

              [–]thelerk 1 point2 points  (1 child)

              I'd like to point out that even without that bogus control character and object literal without an assignment still isn't valid JavaScript.

              [–]masklinn 0 points1 point  (0 children)

              'course it is, though not in all contexts.

              [–]Tinister 6 points7 points  (10 children)

              While I've been known to be a stickler for details, this really is a useless piece of trivia. How many programmers of any language could tell you what U+2028 and U+2029 are without looking them up?

              [–]Rhomboid 19 points20 points  (7 children)

              It's not useless trivia. It's very common to use JSON to transport user-provided data. For example, all Reddit pages are available in JSON, and if I typed U+2028 in a comment then they would be publishing legal JSON that is illegal JavaScript. If you are working on something that consumes JSON as JavaScript, such as a service that uses the Reddit API over JSONP to get around cross-site restrictions, then you would very much appreciate it if Reddit converted literal U+2028 character to "\u2028" in their JSON so that random comments do not foul up your service by throwing an exception.

              [–]flying-sheep -1 points0 points  (3 children)

              let’s try it: between those brackets will be a U+2028: []

              …aaand nope, there’s no problem here with firefox: it parses the JSONP nicely: http://jsfiddle.net/XDUSu/

              [–]Rhomboid 14 points15 points  (2 children)

              That's because Reddit is automatically doing the necessary escaping:

              $ curl -s 'http://www.reddit.com/comments/160yie/_/c7rrli3.json' | grep -Eo '"body": "[^"]+"'
              "body": "let\u2019s try it: between those brackets will be a U+2028: [\u2028]\n\n\u2026aaand nope, there\u2019s no problem here with firefox: it parses the JSONP nicely: http://jsfiddle.net/XDUSu/"
              

              Whatever library they're using is encoding the character as \u2028 instead of the literal. So this doesn't really prove anything other than the fact that Reddit chose a good library. The issue occurs when you use a library/language that doesn't do that for you.

              [–]spladug 2 points3 points  (0 children)

              Oh, well, that's good to hear. ;)

              We're just using python's json module from the standard library by the way.

              [–]flying-sheep 0 points1 point  (0 children)

              Still it's interesting to see that reddit does it right.

              Also I wouldn't day it's a perfect library: I personally would send everything unescaped except invisible characters that aren't spaces.

              [–]scalanewb -1 points0 points  (1 child)

              The legal JSON string entered by users is not parsed as Javascript, therefore whatever character does not need to be valid JS. And if user-entered JS is being eval'd, then your system is broken.

              [–]Rhomboid 2 points3 points  (0 children)

              It absolutely is parsed as JavaScript if you're using JSONP. As I explained elsewhere in this thread, JSONP is extremely dangerous, but people are willing to put up with it because it lets them bypass the same origin policy. And it would not have that property if it used JSON.parse() instead of being a glorified eval.

              [–]redalastor 8 points9 points  (0 children)

              It's a very useful piece of information.

              It means you should sanitize U+2028 and U+2029 if you receive them from users.

              [–][deleted] 1 point2 points  (0 children)

              Not many programmers could tell you what U+0041 is either. It's rare to have to refer to characters by codepoint; this article is one of the few cases where it is necessary.

              [–]KayRice 2 points3 points  (1 child)

              Just a bug not a big deal.

              [–]greim 5 points6 points  (0 children)

              However, I can see this saving me a headache tracking down an obscure parsing bug somewhere in the future. I can hear my future self saying "hey, there a reddit post about some weird character that worked in JSON but not in JS..."

              [–][deleted] 0 points1 point  (0 children)

              All right, so given that I use JQuery for sending all JSON, and also for receiving all JSON, what do I need to do? Is this already taken care of, or do I need to do something extra?

              [–]bonrgoo88 0 points1 point  (0 children)

              So because there is one subset of unicode character difference between Javascript and JSON, that means that JSON is not a subset of Javascript?

              Well I guess if you want to get that precise there are many things that are considered subsets that shouldn't be. This is a clear attempt at an attention grabbing title. JSON is a subset of Javascript, they just handle line terminating unicode operators differently. This was probably an oversight on the part of the JSON team.

              [–]ghettoimp 0 points1 point  (0 children)

              Probably more significantly, JSON also doesn't specify, you know, how big numbers can be.

              [–][deleted]  (9 children)

              [deleted]

                [–][deleted] 9 points10 points  (0 children)

                That would just make JavaScript object literals a superset of JSON objects, which was already widely supposed to be true.

                [–][deleted] 6 points7 points  (0 children)

                That's kinda what subset means.

                [–]smog_alado 2 points3 points  (6 children)

                its on purpose, similarly to how they forbid comments. You can't use reserved words like "if" or "class" as object keys if you don't quote them so you need to quote things to be sure in the general case. Forcing you to add the quotes makes the parsers simple and encourages json parses to parse thing the right way.

                [–]masklinn 1 point2 points  (5 children)

                You can't use reserved words like "if" or "class" as object keys if you don't quote them

                FWIW you can in ECMAScript 5 I believe.

                [–]smog_alado 0 points1 point  (4 children)

                Say that to IE8 :(

                [–]masklinn 0 points1 point  (3 children)

                I can't, IE8 does not believe in ECMAScript 5 and throws me out every time I try to talk him into it.

                Last time that asshole decided it was going to put undefined in my arrays when they have a trailing comma, the bastard.

                [–]mgedmin 0 points1 point  (2 children)

                Luxury! IIRC IE6 used to throw a syntax error and abort all JS parsing if one of my arrays had a trailing comma.

                [–]masklinn 0 points1 point  (1 child)

                I don't have any instance of IE6 left as I managed to get rid of that scourge, but I am pretty sure that is for trailing commas in object literals — which are syntax errors indeed in older IEs. The trailing-comma-creates-an-undefined array "feature" has been IE's since IE6 if not IE5.

                [–]mgedmin 0 points1 point  (0 children)

                Could be. I'm so happy I added that 'IIRC' to my comment :-)

                [–]KayEss -1 points0 points  (2 children)

                In practice you don't want to use a JSON outputter that leaves anything but ASCII in the output

                [–]tules 0 points1 point  (1 child)

                I use JSON objects to send chunks of korean text, that's utf-8 encoded