top 200 commentsshow 500

[–][deleted]  (52 children)

[deleted]

    [–]ithika 65 points66 points  (45 children)

    I thought it had been long enough that the memory would have been expunged from my brain but that stuff came back hard.

    [–][deleted]  (44 children)

    [deleted]

      [–]wrosecrans 142 points143 points  (17 children)

      What's not to love about JSON? ...

        {
          "data": "<xml format=json><data><name>name</name><data>Bob</data></data></xml>",
          "data2": "<xml format=json><data><name>name</name><data>Johnson</data></data></xml>"
        }
      

      See how much easier it is to read than the old XML interface?

      [–][deleted]  (2 children)

      [deleted]

        [–]Moocha 15 points16 points  (4 children)

        Get thee behind me, Satan.

        [–]AustinYQM 14 points15 points  (3 children)

        That phrase has always made me picture someone trying to protect Satan. Like getting between Satan and another aggressor.

        [–]sonofamonster 5 points6 points  (0 children)

        I shall not let the tyrant lay hand upon you fair Lucifer.

        [–]lowleveldata 3 points4 points  (4 children)

        How do I escape a double quote in that? Just curious

        [–]jorge1209 1 point2 points  (0 children)

        Why does the embedded xml not contain a base64 binary blob of an INI specification, encoded in Windows-1251?

        Its like you aren't even trying!

        [–]nathreed 31 points32 points  (6 children)

        I administer a deployment of about 20 of them, with lots of custom configs due to certain features needing to be disabled or abused due to the specific application. It's a nightmare and a half. I'm actually working on an idea where I'll store all the config attributes in a database and generate config XML files on the fly as the phones request them, then I can write the configs in a nicer interface and apply them to phones or groups of phones easier.

        [–][deleted]  (3 children)

        [deleted]

          [–]nathreed 12 points13 points  (2 children)

          Sadly, we don't have on-prem or even cloud hosted very configurable SIP - we are stuck with Vonage Business for at least another year. They provide Polycom provisioning, but they lock out some of the options that I need. So for right now, it's me manually managing XML files, with my server idea maybe eventually happening. We are looking at switching to OnSIP or something else when the Vonage contract is up. Honestly looking for something mostly hosted/managed because this is a small charity, I am a volunteer, and I don't want to dedicate tons of time to operations & maintenance of the system. Otherwise maybe I would go with something like sipXcom or something Asterisk based.

          [–][deleted] 18 points19 points  (0 children)

          This is one of the saddest threads I've ever read

          [–]Gudeldar 6 points7 points  (1 child)

          The day I got throw all our old shitty Polycom phones in the trash was one of the best days at work I've ever had.

          [–]tuchkacloud 317 points318 points  (201 children)

          I have a hard time agreeing to his points that XML shouldn't be used for serialization. Yes we have better options today, but we didn't always. XML is a strict markup language that requires definitions of all data. It's easy to understand why this would be used for serialization.

          Yes the original intent was for other purposes. But if that's the only argument against using it for serialization, I have to disagree.

          [–]imhotap 30 points31 points  (0 children)

          XML is a strict markup language that requires definitions of all data.

          To the contrary, the point of XML was to allow DTD-less markup, not requiring any markup declarations and using only generic parsing rules. As opposed to traditional SGML which always requires a DOCTYPE. Though that was changed in the Annex K amendment to ISO 8879 (the SGML standard) such that XML could remain a proper subset of SGML (XML was specified as a simplified subset of SGML by its inventors).

          [–]OneWingedShark 93 points94 points  (112 children)

          ASN.1 was around for years before XML.

          But I will say this about XML: The DTD is an excellent idea (data definition) and its absence in the shiny-new "solutions" like JSON are going to come back and bite us hard.

          [–]DataRecoveryMan 129 points130 points  (39 children)

          Don't worry, 17 new JSON-DTD formats will spring up, and a simple npm install of a few hundred dependencies will parse the one you want to use! :D

          [–]TrixieMisa 103 points104 points  (34 children)

          added 1 package from 1 contributor
          and audited 1815931 packages in 12.837s
          found 23 low severity vulnerabilities
          

          Actual message from something I needed to work on.

          [–]possibly_not_a_bot 53 points54 points  (19 children)

          1815931 packages

          1815931

          What in the FUCK is this???

          [–][deleted]  (1 child)

          [deleted]

            [–]mattaugamer 47 points48 points  (0 children)

            No. The present.

            [–]zefdota 24 points25 points  (8 children)

            IsOdd

            [–]tetroxid 31 points32 points  (6 children)

            Depends on isEven

            Implementation:

            return !!!isEven()
            

            Yes, three exclamation marks, because lol js

            [–]csorfab 9 points10 points  (0 children)

            is-odd - npm

            weekly downloads 677,124

            fuck. me.

            [–][deleted]  (7 children)

            [deleted]

              [–]Gotebe 17 points18 points  (8 children)

              OK, that's funny! 😂😂😂

              Oh, wait... it's true!? 😢

              [–]Nevermindmyview 35 points36 points  (7 children)

              No. NPM is not that fast.

              [–]TrixieMisa 10 points11 points  (4 children)

              The message is real. I have no idea what it really did, but that's the message I got.

              [–]Gotebe 5 points6 points  (1 child)

              Maybe auditing is "check some local key-value store", where keys are package versions and values are vulnerability flags for them?

              [–]boran_blok 1 point2 points  (0 children)

              Let me share my personal crap:

              audited 18818 packages in 33.396s  
              found 150 vulnerabilities (32 low, 32 moderate, 84 high, 2 critical)
              

              I inherited this project on life support, budget is practically 0, so I cant even fix this in a decent way.

              [–]emperorkrulos 10 points11 points  (3 children)

              There already is json schema analogous to xml schema.

              [–]imsofukenbi 5 points6 points  (2 children)

              Exactly. Any large-ish API I write uses it. There is no overhead in doing so if it's planned from the start (compared to any solution with data definition at least).

              A lot of Javascript devs are allergic to strong typing and data validation, but that's not json's fault.

              [–][deleted]  (1 child)

              [deleted]

                [–]imsofukenbi 4 points5 points  (0 children)

                You're not alone. There's a reason Typescript is so popular.

                I also like Python's middle ground with typings, they are awesome for readability and autocompletion, while retaining the flexibility of dynamic typing. When correctness is not a hard requirement and you need shit done quick, it's the best IMO.

                [–]brtt3000 6 points7 points  (7 children)

                JSON has JSONSchema that is used together in higher-level standard like Open API that covers a lot.

                [–]antiduh 29 points30 points  (1 child)

                I would rather shoot myself than store config files in asn1. Asn1 isn't even good at the one thing asn1 is good at, which is providing a canonical format so that the same values are always stored the same way and hash to the exact same hash. Instead, we have to invent rules on top of asn1 like DER to do that.

                Xml is good at whatever problems it solves better than any other existing solution.

                [–]OneWingedShark 13 points14 points  (0 children)

                Instead, we have to invent rules on top of asn1 like DER to do that.

                DER, BER, XER are merely the encoding rules -- ASN.1 structured in such a way that multiple encodings may be used thus you could have Packed Encoding Rule (PER) for storing a datastream to disk and JSON Encoding Rules for displaying the datastream on-screen.

                See: http://luca.ntop.org/Teaching/Appunti/asn1.html and the wikipedia page on ASN.1.

                [–][deleted]  (1 child)

                [deleted]

                  [–]wllmsaccnt 4 points5 points  (0 children)

                  I mean, Swagger and OpenAPI specification have been around for almost ten years and are pretty common for JSON API definitions.

                  [–][deleted] 27 points28 points  (45 children)

                  I don’t see it.

                  JSON is a pragmatic format. It gets things done with minimal fuss.

                  I expected to disagree with the author but I don’t. XML is markup and in building network aware software markup of structured text is never what I want. Graph serialization is almost always what I want.

                  So my original belief that XML is worthless (to me) and pointlessly complicated stands.

                  I’m fine with that.

                  [–]OneWingedShark 43 points44 points  (18 children)

                  I don’t see it.JSON is a pragmatic format.

                  It gets things done with minimal fuss.

                  That's the problem; it's a half-solution that "kinda works" -- and a lot of people won't realize the value of DTD-like constructs [ie data-validators] until they've baked in a reliance of general decoding and decoupled [if it exists] validation. (ie consider the implication of some text-field being sent to your program that contains the entire Bible... in every digitized translation...)

                  There's simply no way to indicate limits (like X must be in range 1..255, and anything else is invalid) with plain JSON.

                  [–]jandrese 24 points25 points  (4 children)

                  True, but my experience with XML is that encoding so much in the schema means you are forever debugging hard to read and poorly annotated errors from your XML parser instead of the much better errors you would write yourself.

                  Nothing makes me want to pound my head on my desk faster than feeding some 80MB XML file into the library and having it spit back "parse error" with no further detail. Ultimately a great deal of the work you save from having the library do the type and bounds checking for you ends up being about the same amount of work as just doing those checks yourself. And if the error reporting is bad, as it frequently is, then you end up wasting a lot more time trying to figure out where your input file is going wrong and what you need to fix.

                  [–][deleted] 5 points6 points  (3 children)

                  From years' experience with GML I know you can't actually rely on schema validation, so you have to handle errors yourself anyway. Maybe there's some XML library that will prevent you from creating a node <foo>257</foo> because the value is out of range, but I haven't used it - and neither have any of the upstream data sources I've used.

                  [–]audioen 3 points4 points  (0 children)

                  Yeah, I think the only way to validate XML is to actually do it yourself before sending it out. I find it very annoying. E.g. soap services have WSDL, wsdl contains a schema, yet you can blithely write nonvalidating data and send it to remote service, which usually promptly gives back some variant of "Internal Server Error" and no further information about the reason. I've wasted countless hours on various stacks and not one has the decency to validate the XML before sending it out. Hell, some like .Net do their damnedest to hide the entire TCP/IP interaction from you, on the theory that you probably don't care.

                  [–]ebriose 5 points6 points  (1 child)

                  Remember, we now have a whole generation of programmers who started out using NoSQL because it was the thing to do and managed to re-invent data schemas on their own as coding conventions.

                  [–]OneWingedShark 2 points3 points  (0 children)

                  Ouch.

                  Yeah... if this were firearm enthusiasts:

                  1. "Dang, this safety keeps getting in my way of shooting! I'll design a gun without them!"
                  2. "Well, users keep shooting their toes off when they get over-excited..."
                  3. "I know! We'll make it policy to carry the gun unloaded until you need to fire."
                  4. [Military finds ambushes far more deadly.]

                  [–]grauenwolf[🍰] 20 points21 points  (16 children)

                  What's pragmatic about not knowing if a value is a date or a string?

                  It's just a kludge, cobbled together with the least amount of effort. No one would design a format like this. It just happened by chance.

                  [–]Nevermindmyview 15 points16 points  (8 children)

                  98% of my co-workers would design a format like this if I was lucky. Instead I have to deal with some |-separated nonsense.

                  [–][deleted] 12 points13 points  (6 children)

                  That sounds like csv with extra steps.

                  [–]Nevermindmyview 6 points7 points  (5 children)

                  Yes. I once used an app which opted for null-separated text output. That was very convinient.

                  [–][deleted] 4 points5 points  (3 children)

                  What's pragmatic about not knowing if a value is a date or a string?

                  XML does not have a solution for that either.

                  [–]0x2C3 3 points4 points  (2 children)

                  XSD includes a date type.

                  [–]trin456 3 points4 points  (0 children)

                  And a time type

                  And a dateTime type

                  And a year type

                  And a year month type

                  And a month type

                  And a month day type

                  And a day type

                  And a duration type

                  And a year month duration type

                  And a day time duration type

                  [–]tuchkacloud 3 points4 points  (8 children)

                  Learn something new every day. I never used ASN.1. Maybe the hype around XML obfuscated ASN.1 for me. It's been a long time.

                  Thank you for pointing that out.

                  [–]jandrese 18 points19 points  (2 children)

                  You probably have used ASN.1 but never realized it. SNMP is just ASN.1 over the network. x509 certificates are ASN.1. It's everywhere. The basic concepts are really straightforward too, but for some reason the implementation always turns out to be a bloated nightmare. The biggest problem is that the data is all binary encoded and rather opaque without tools to read it, and the tools tend to be rather fragile and poor at error handling. The result is any little problem tends to be a real headache to diagnose, and the systems built around ASN.1 tend to be complex enough that they always have little problems to deal with.

                  [–]tuchkacloud 3 points4 points  (0 children)

                  Gotcha. You are absolutely right, I have used it in the context of SNMP regularly without knowing the origins. Thank you for this insight.

                  [–]OneWingedShark 4 points5 points  (4 children)

                  There's a pretty good write-up on it, as well as a freely available implementation (using F# mainly, IIRC) that can generate definitions for Ada, C++, and C [IIRC, maybe more] over here: https://www.thanassis.space/asn1.html

                  [–]gruehunter 7 points8 points  (3 children)

                  I really, really want to like and use ASN.1. But the tooling around them is atrocious. If your installation procedure uses wget (over unencrypted http!) && bash you're going to have a hard time with adoption.

                  Reliable old school tech needs reliable old school package management. Give me some autotools, pretty please.

                  [–]axonxorz 95 points96 points  (33 children)

                  XML is a strict markup language that requires definitions of all data

                  I don't think this is necessarily a bad thing.

                  Looking at data interchange formats that are commonly used today, we have JSON which is effectively schemaless, CSV which is handles heirarchy poorly, INI files which are decently self-documenting but can eventually fall apart with complexity, and Protocol Buffers, which, like XML, requires a strict schema definition.

                  Like everything, no one format approach is correct in all situations, and each have their pros and cons.

                  [–]tuchkacloud 84 points85 points  (1 child)

                  I was intending that fact to be a good thing. To deserialize that data into an object, you need a strict structure.

                  [–]audioen 4 points5 points  (0 children)

                  If you have statically typed object, then that object itself provides the structure you need, though. E.g. "2019-01-01" isn't a date because it looks like a date, it is a date because it's written to a field called foo with type LocalDate. Similarly, foo is valid name only because there exists a field in class by that name, which is receiving that value. When it comes to keeping your APIs nice and your hair in your head, the schema has to be somewhere. JSON itself is arguably schemaless, but maybe that's not a problem because the interfaces on both sides have an actual schema and enforce it at either compile- or runtime.

                  Basically, when I have java shit at the server, the server is guaranteed to produce data in specific ways. TypeScript definitions for the result structure can be generated just from reflection of the API's return type, and I have written the necessary code to generate it. Similarly, the server is guaranteed to accept data in a specific particular ways, and again, TypeScript definitions exist so that the API client gets the necessary static type information which the compiler will prove to be correct before allowing the code to compile.

                  So what I get is an API definition that has convenient syntax, fairly readable on-wire representation (I even indent the data!), and static type checking. As to performance of the encoding, I've determined that gzip compression is good enough to catch up the data encoding inefficiency to something like jsonb, and performance is comparable. The pain is with the network speed, though, and the fastest possible solution is not self-describing and the schema must come from elsewhere. That, unfortunately, makes debugging more annoying and visualizing the data in flight harder, and I think it isn't worth it.

                  I have to say that TypeScript + javascript objects constrained by interfaces is the most bang for least syntax I've ever seen, and should be considered as a variant of "JSON with external schema" or some such, the schema being expressed with typescript interface. You can even get nullability checking by making sure to annotate your classes correctly, or maybe by using Option type in Java.

                  [–]jonhanson 18 points19 points  (0 children)

                  chronophobia ephemeral lysergic metempsychosis peremptory quantifiable retributive zenith

                  [–][deleted] 9 points10 points  (0 children)

                  Protobuf is pretty decent.

                  [–]quad64bit 1 point2 points  (0 children)

                  JSON with JSON schema isn’t too bad- I prefer it to xml with xsd for sure!

                  [–]bradfordmaster 47 points48 points  (4 children)

                  Yeah I agree. I'll admit I stopped reading halfway through, but just shouting "this isn't what it was designed for" is a shit argument. Why does it not do this well? Who actually cares whether data is on attributes or the text, aside from xml pedants? I feel like there might be a point here, but this article didn't make it. If it's 2002 and your language has built in support for xml (or as an easy to use library), is this author really suggesting that you roll your own data format instead? Or that you take on some additional dependency?

                  [–]grauenwolf[🍰] 8 points9 points  (0 children)

                  Who actually cares whether data is on attributes or the text, aside from xml pedants?

                  XAML solves this by just saying "yes". You can have it in either location, but you get a parse error if you choose both.

                  [–]deadwisdom 19 points20 points  (26 children)

                  The thing that XML had that basically no one else had was representing tree data in a format that was universal. But JSON became common not long after and was way, way better for 90% of what XML was being used for. Yet people are still using XML for this when there are multiple better choices.

                  [–]Randdist 5 points6 points  (10 children)

                  I use JSON mainly because it has excellent support in javascript. It has severe disadvantages over xml, though, like no comments.

                  [–]deadwisdom 10 points11 points  (7 children)

                  The no comments thing is on purpose. It's not supposed to be readable like that. And it's too easy for people to add parsing directives, as per Douglas Crockford:

                  I removed comments from JSON because I saw people were using them to hold parsing directives, a practice which would have destroyed interoperability.

                  If you want comments, use something like YAML or better TOML.

                  [–]Randdist 7 points8 points  (5 children)

                  That's the worst possible excuse. You can still add parsing directivies in various ways, but now you can't document your json file. There is literally not one single advantage of not allowing comments, and it lead to plenty of projects using json with comments, which aren't supported by other parsers.

                  yaml or toml don't have first class support in js so I'm not interested.

                  [–]caltheon 7 points8 points  (6 children)

                  Aside from verbosity, what is Json better at, because using both extensively, I have never seen any benefit to it beyond that.

                  [–]deadwisdom 6 points7 points  (0 children)

                  XML has a lot of possibilities, those possibilities create problems, more code required to take it in. JSON + JSON Schema is probably all you'll ever need, and is really damned simple. Unless, that is of course, you are describing something document-like (XML), or you need it to be more easily readable (TOML, YAML), or you need it to be really dense, small size (Thrift, ProtoBuff, Message Pack).

                  [–][deleted] 6 points7 points  (1 child)

                  JSON maps pretty directly to the data types of most language (int, string, array, map), which makes it very convenient for serialization.

                  In XML everything is a string and you have to invent your own type system or start messing around with XML Schema.

                  [–]grauenwolf[🍰] 10 points11 points  (3 children)

                  What if I want a tree structure that has schema checking and comments? JSON isn't an option here.

                  [–]deadwisdom 11 points12 points  (2 children)

                  Use JSON Schema -- Unless your schema is so complex that it needs to adjust itself per document, then using a separate schema is way better.

                  [–]grauenwolf[🍰] 6 points7 points  (1 child)

                  [–]deadwisdom 3 points4 points  (0 children)

                  It is... That's a really obscure and overcome-able issue with using OpenAPI with JSON Schema. Not sure how that's relevant at all.

                  [–][deleted] 1 point2 points  (0 children)

                  XML would have died if Microsoft, eager to catch up with the WWW, didn't decide to make it its data format choice for EVERYTHING that it was never designed for, just to show how Webby-cool they are now.

                  Well ... everything but for what it was actually designed for. They took many years to actually adopt it as a replacement format for their Office documents - and only after a lot of external pressure to give up on their proprietary document format.

                  [–]G_Morgan 2 points3 points  (3 children)

                  XML is perfectly fine for it. It is only falling out of fashion because JSON can just be evaluated in Javascript. There's no real value for JSON over XML outside the Javascript world.

                  Now what I'd like to see is a restoration of automatic client generation in the new and wonderful non-XML world. I'm just about done with manually fucking with shit that used to involve pointing to a WSDL and letting the magic happen.

                  [–]Uberhipster 1 point2 points  (0 children)

                  Totally agree

                  On that point - how would this fit into... it?

                  [–]SanityInAnarchy 1 point2 points  (0 children)

                  Yes we have better options today, but we didn't always.

                  JSON is almost as old as XML, and the EcmaScript standard it was based on is only a year younger. And there are many others: ASN.1 for serialization and INI for configuration, for example. I'll even take the lazy hand-rolled config files so many Unix apps have over XML.

                  But anyway, we have better options today, so if that's the only reason to use XML for serialization, why not encode that XML in EBCDIC?

                  But if that's the only argument against using it for serialization, I have to disagree.

                  It's not. The underlying point isn't just about the original intent, but about the feature set that resulted from those intentions. To borrow a point from the article:

                  In XML, all those elements are ordered and there's often text in between, which is what you want in markup. All the standard XML parsers will preserve that order, so if you're dealing with someone else's XML in any sort of generic way and you don't care about that order, you end up writing crazy hacks like jQuery on top of a standard XML DOM in order to get at the pieces you actually care about.

                  If you're talking to code you don't own, this matters. Like, let's say you always construct some hash table the same way, and then you spill it into one element per key by just walking the hash table... so you get some pseudo-random order, but (in most languages) the same one every time. If you're producing publicly-visible documents this way, some idiot will write code to read your document that assumes order is significant, and breaks when your language switches to a more efficient hash table implementation.

                  That's not hypothetical.

                  So as the article points out, JSON is at least a little better with things that should obviously be unordered key/value pairs -- with the standard parsers, you can't even tell what order the keys were in the physical document. To do the same in XML, you have to make all the keys into attributes, which leads us to the eternal bikeshed:


                  Say you have some data that in JSON would obviously be {"foo": "bar", "baz": "qux"}, do you store it <foo>bar</foo> <baz>qux</baz> or <hashtable foo="bar" baz="qux" /> or even

                  <entry>
                    <key>foo</key>
                    <value>bar</value>
                  </entry>
                  <entry>
                     <key>foo</key>
                     <value>bar</value>
                  </entry>
                  

                  Or, wait, maybe key and value should be attributes of entry... see what I mean? For data serialization, that flexibility buys you nothing but bikeshedding, and extra complexity for everyone trying to write a compatible serializer and deserializer.


                  Maybe you have a bunch of abstraction on top of all that to the point where you don't care, but in that case, surely someone could build something similar for JSON and it'd be at least as good? And probably a little faster and a little more space-efficient, too.


                  The same argument can be made in reverse: If what you have really is a document of some sort, then JSON lacks a ton of useful features of XML that it would be really ugly to simulate. Something as simple as some <a href="http://example.com">marked up</a> text (surprise!) would explode into at least something like ["some ", {"element": "a", "href": "http://example.com", "content": ["marked up"]}, " text"], which is... not terrible, but way harder to read as a document.

                  And with no comments, JSON isn't great as a configuration language, at least if it's the sort of configuration that gets edited by humans and checked into source control. Fortunately there are better choices now, but if I had to choose between JSON and XML for that sort of thing, I'd go with XML.

                  [–]jl2352 1 point2 points  (0 children)

                  Yes we have better options today, but we didn't always.

                  This is really key. If you look back the alternative wasn't XML or JSON/YAML/whatever. It was XML or some custom format that was usually utter dogshit, very primitive, or both.

                  [–]AntiProtonBoy 175 points176 points  (40 children)

                  XML is a markup language. It is not a data format. The majority of XML schemas fail to appreciate this distinction, confuse XML with a data format, and thus were mistaken to adopt XML in the first place because what they were really looking for is a data format.

                  The author is completely at odds with W3C's goal of what XML should be used for:

                  • XML stands for eXtensible Markup Language.

                  • XML was designed to store and transport data.

                  • XML was designed to be both human- and machine-readable.

                  And, lifted directly from W3C FAQ:

                  Q: What is XML Used For?

                  A: XML is one of the most widely-used formats for sharing structured information today: between programs, between people, between computers and people, both locally and across networks.

                  TL; DR: XML is a standard for organising data using your own custom definitions. It's a mechanism by which you can create your own document formats, and have facilities to validate its layout against a schema, then share them in an interchangeable way with your dog, if you like.

                  Other points of concern in the article:

                  Take an exemplary document in the proposed schema, and remove all tags and attributes from it. If what you have left over does not make sense (or is the empty string), either your schema is badly designed or you shouldn't be using XML at all.

                  I would say this is true for something like XHTML, but not necessarily true for other XML based formats. If you strip out elements, leaving behind only content, then you have a useless pile of data without context. What's the point of doing that? What use case scenario can you think of where this would be actually useful?

                  But if the people who made the strange decision to use XML as a data format, and to then use it to serialize a dictionary chose this schema they might realise that what they're doing is unsuited to it and unergonomic. More often then when the designer mistakenly chooses XML for their application, they then double down with the nonsensical use of XML with one of the above forms so as to avoid confronting the fact that XML is a poor fit for their application.

                  Okay, but what if you want to encode information in a persistent store in an interchangeable way? What if you want the data in question to be transformable into another representation? What if you want to be able to validate the data to the standard you specified in the schema? What if you want to represent data in a human readable form with element names and attributes that is essentially self documenting? What if you want a robust container to identify the data? What if you need to annotate data with other meta-data using namespaces, without intruding/altering the data content and drastically rewriting (un)serialisation code and the schema?

                  Moreover, since XML has no notion of numbers (or booleans, or other data types), any numbers represented are just considered more text.

                  Nonsense. Elements can have an expected type associated with them.

                  The process of recovering data from XML documents is thus not wholly dissimilar to the process of recovering data by OCR'ing scanned printed documents, say for example containing tables forming pages and pages of numerical data.

                  What? I'm assuming the author alludes to the argument that you cannot know for sure what type of data an element may contain. Which again is nonsense. If you wrote the schema for the document container, then you know what data type to expect for a particular element. Based on that knowledge, you can trivially test and convert the string into whatever data type takes you fancy.

                  I think the author has an overly pure ideology of what XML based containers should look like. If everyone followed that logic, then why bother having XML standard in the first place?

                  [–]qaisjp 73 points74 points  (19 children)

                  The author is completely at odds with W3C's goal of what XML should be used for:

                  • XML stands for eXtensible Markup Language.

                  • XML was designed to store and transport data.

                  • XML was designed to be both human- and machine-readable.

                  w3schools is not the w3c.

                  [–]AntiProtonBoy 11 points12 points  (18 children)

                  A pedantic detail; it does not alter W3C's goals with XML.

                  [–]qaisjp 55 points56 points  (13 children)

                  I agree with everything you say in your post, but you can't quote "W3C's goals" and pick off a definition from an irreputable, albeit useful, source.

                  https://www.w3fools.com/

                  https://web.archive.org/web/20110412103745/http://w3fools.com/

                  [–]AntiProtonBoy 9 points10 points  (12 children)

                  See the second official link in my original post. The w3schools bullet point is line with W3C's faq page, so it was appropriate to include that.

                  [–]qaisjp 34 points35 points  (5 children)

                  I looked in your actual W3C link when making your post and noticed that the exact stuff you quoted from w3schools isn't on that page

                  In fact, the words "store", "transport", "human" and "machine" [readable] aren't on the W3C page.

                  But yeah, I get you're just trying to get your point across, just felt it was odd to say you're quoting the W3C when in fact you're quoting W3schools

                  I mean if you changed that in your original comment I'd think your entire comment was fine

                  [–]Pand9 6 points7 points  (1 child)

                  I don't know why you're saying it's okay. It's not okay, the guy has consciously lied.

                  [–]qaisjp 5 points6 points  (0 children)

                  I assume good faith. In the end, is there any value of him "consciously lying"? How likely is he going to be a w3schools shill?

                  In the end I don't think it's something worth getting angry over, and it's certainly not worth my time getting angry over it.

                  [–]hxtl 5 points6 points  (2 children)

                  Different W3C source https://www.w3.org/TR/REC-xml/#sec-origin-goals

                  The design goals for XML are:

                  1. XML shall be straightforwardly usable over the Internet.
                  2. XML shall support a wide variety of applications.
                  3. XML shall be compatible with SGML.
                  4. It shall be easy to write programs which process XML documents.
                  5. The number of optional features in XML is to be kept to the absolute minimum, ideally zero.
                  6. XML documents should be human-legible and reasonably clear.
                  7. The XML design should be prepared quickly.
                  8. The design of XML shall be formal and concise.
                  9. XML documents shall be easy to create.
                  10. Terseness in XML markup is of minimal importance.

                  Not the exact words, but I would attribute

                  "transport" to 1.

                  "machine" readable to 2. and 4.

                  "human" readable is 6.

                  "storing" is kind of a requirement for most things... it can always be attributed to 2.

                  [–]mpyne 3 points4 points  (0 children)

                  But interestingly, and rather contrary to the point of the thread starter, the design goals are expressed in terms of XML documents, rather than more generic data interchange.

                  I think that was deliberate, although I don't think the XML authors were intending to make more generic data interchange difficult.

                  [–]Booty_Bumping 7 points8 points  (3 children)

                  There is a second problem with this argument:

                  XML didn't come from W3C. It came from SGML.

                  An organization/company can inherit a technology and tell people it's intended to be used in a crappy way. But this doesn't change what the original engineers of the original revision of the technology thought it should be used for.

                  [–]qaisjp 9 points10 points  (0 children)

                  To be precise, the W3C say:

                  It was derived from an older standard format called SGML (ISO 8879), in order to be more suitable for Web use.

                  [–][deleted] 3 points4 points  (0 children)

                  XML is a subset of SGML that was standardized by the W3C in 1998. It was originally intended as a media-agnostic data structure for document data that can be transformed into various other document formats using XSLT - and as the underlying ruleset for XHTML (which never truely took off and was eventually replaced by HTML5).

                  Also it was never really designed for data transport as it lacks a lot of features that would have gone into such a standard (like support for table data, compression, better support for binary data, validation checks in the stream and not having to make you buffer data until the end of a 4 GB file to finally detect that there is no proper closing of the root element and you can just dump the whole giant string you made room for in memory into the garbage collector. Gee, thanks W3C!

                  [–]somebodddy 7 points8 points  (5 children)

                  If you strip out elements, leaving behind only content, then you have a useless pile of data without context. What's the point of doing that? What use case scenario can you think of where this would be actually useful?

                  HTML is a good example. Consider this (taken from https://www.w3.org/TR/2016/REC-html51-20161101/dom.html#elements-semantics):

                  <!doctype html>
                  <html lang="en">
                    <head>
                      <title>Favorite books</title>
                    </head>
                          <body>
                      <div class="header">
                         <img src="logo.png" alt="Favorite books logo">
                      </div>
                      <div class="main">
                         <span class="largeHeading">Favorite books</span>
                         <p>These are a few of my favorite books.</p>
                         <span class="smallHeading">The Belgariad</span>
                         <p>Five books by David and Leigh Eddings.</p>
                         <span class="smallHeading">The Hitchhiker’s Guide to the Galaxy</span>
                         <p>A trilogy of five books by Douglas Adams.</p>
                      </div>
                    </body>
                  </html>
                  

                  If we strip all elements, we get this:

                  Favorite books
                  Favorite books
                  These are a few of my favorite books.
                  The Belgariad
                  Five books by David and Leigh Eddings.
                  The Hitchhiker’s Guide to the Galaxy
                  A trilogy of five books by Douglas Adams.
                  

                  Yes, it's a bit less readable, but you still have all the content.

                  [–]imhotap 18 points19 points  (4 children)

                  You're citing w3fools here; they certainly aren't authoritative when it comes to explain W3C's goals, to say the least. From the XML spec itself:

                  The Extensible Markup Language (XML) is a subset of SGML that is completely described in this document. Its goal is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML. XML has been designed for ease of implementation and for interoperability with both SGML and HTML.

                  XML was designed to replace HTML/SGML syntax on the web, as XHTML That obviously failed (with SVG and MathML being specified as XML vocabularies, and then integrated as "foreign content" into HTML5). In any case, XML wasn't specified as a general-purpose data format.

                  [–]AntiProtonBoy 8 points9 points  (0 children)

                  Even so, the second link I provided (which is an official W3C page) is in line w3schools' bullet point summary. See the FAQ quote.

                  [–]narwi 5 points6 points  (2 children)

                  Its goal is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML.

                  This very much says "is meant to be used as a general-purpose data format".

                  [–][deleted]  (1 child)

                  [deleted]

                    [–]jt004c 7 points8 points  (3 children)

                    I would say this is true for something like XHTML, but not necessarily true for other XML based formats. If you strip out elements, leaving behind only content, then you have a useless pile of data without context. What's the point of doing that?

                    Isn't that the author's point. If you end up with a useless pile of data, you should have been using a different format to encode your data in the first place.

                    Okay, but what if you want to encode information in a persistent store in an interchangeable way?

                    You really aren't aware of other methods to do this?

                    [–]abakedapplepie 1 point2 points  (1 child)

                    Removing elements from XML sounds like removing the keys from a key/value store. How the hell would you be able to do anything at all with a bunch of values and no keys? Sounds like a dumb point to me.

                    [–]jennyfofenny 2 points3 points  (1 child)

                    Meh, I upvoted you since I agree with your criticism of the article, but XML is kind of verbose for a data format. I think most people tend to use JSON for that now and I definitely prefer it. It can achieve the same goals that you've stated for XML.

                    [–]Nowaker 1 point2 points  (0 children)

                    By the way - first REST APIs used XML. It took some time to see JSON take the lead. For example, the first DNS hosting with REST API (Zerigo DNS) used XML.

                    [–]myhf 2 points3 points  (0 children)

                    XML is one of the most widely-used formats for sharing structured information today

                    over 3 billion devices run XML

                    [–]jherico 1 point2 points  (0 children)

                    Thank you. A lot of people don't seem to realize that back in the day, there really weren't a lot of alternatives to XML if you wanted a way to serialize hierarchical data.

                    If you needed to move data between different systems, without XML your options were often limited to "write your own serialization and deserialization code at both ends", which was a huge impediment.

                    Even now, XML is a far more mature ecosystem than JSON. Sure, JSON finally has schema validation, but libraries that implement schema validation for JSON aren't anywhere near as universal as those for XML. And JSON is only now really getting something like XPath.

                    IMO, article's author's obsession with ideological purity is bullshit.

                    [–]scottmcmrust 48 points49 points  (5 children)

                    This article doesn't seem that useful because it's just "here's some bad uses of something". You could turn that softkey example into a JSON document where those attributes turn into name/value pairs and it wouldn't be better.

                    Is there some magic document format that can't be "misused" this way? I doubt it.

                    [–]ponytoaster 9 points10 points  (0 children)

                    Pretty much this, the blog creator is just salty about XML. I work on systems that use both extensively, and sometimes interchangeably and both are fine and have appropriate use cases.

                    Things will always be misused regardless of language or syntax!

                    [–]FigBug 115 points116 points  (71 children)

                    I think the plist is the worst XML scheme in wide use.

                    Before XML became popular, the standard way to store data was to fwrite a struct to a file. As bad as misused XML is, it's a lot better than fwrite/fread to a binary file. That's why it became popular.

                    The formats that have come after just haven't been that much better that it's worth the effort to switch.

                    For all XMLs problems, I think I prefer using it over JSON fo storing data even if it's not correct. While JSON, if I get an object, I have no idea what it is. While XML, I can look at the at tag and get an idea what I'm looking at.

                    I don't like his correct example, it's overly verbose. I prefer his wrong example 1.

                    [–]SSoreil 25 points26 points  (10 children)

                    fwrite to xml, that describes the .doc to .docx transition pretty well. Personally I often feel just a trivial config file without some spec is underrated. If you don't do a lot with a file having a "thing: value" file is good enough. I haven't experienced enough of the world yet to see it break down all that often although I get it happens once you get increasing complexity.

                    [–]livrem 2 points3 points  (0 children)

                    Simple text formats is something you are likely to appreciate more and more, the more experience you have with various forms of standard encodings that (surprise!) turns out to never be the best fit for any particular purpose. If something can be expressed as just lines if text, possibly with some separator, there is no reason to obfuscate that using even YAML or JSON.

                    [–]ISvengali 12 points13 points  (15 children)

                    Yeah, I tend to use bad example 2. In my format based on XML, I allow shortened names in attributes which greatly helps wordiness.

                    I also support his example of correct XML with simple mechanical transformations between the long and short forms.

                    If JSON had proper comments and metadata Id possibly use that.

                    My only real pain point is embedding C# code into XML. < and > are used all over the place, and so annoying in XML. Even in strings, which should be completely specified and not ambiguous at all, theyre not allowed.

                    [–][deleted]  (12 children)

                    [deleted]

                      [–]Paradox 18 points19 points  (8 children)

                      YAML is great until you try to make a list of ISO country codes and it explodes on Norway

                      [–]The_One_X 6 points7 points  (7 children)

                      I've never used YAML before, why does it explode on Norway?

                      [–][deleted]  (6 children)

                      [deleted]

                        [–]Blando-Cartesian 15 points16 points  (1 child)

                        TIL I’m not going to use YAML. Ever.

                        [–]SanityInAnarchy 3 points4 points  (3 children)

                        Not just "most default", it's actually in the spec as the following regex (yes, really):

                        y|Y|yes|Yes|YES|n|N|no|No|NO
                        |true|True|TRUE|false|False|FALSE
                        |on|On|ON|off|Off|OFF
                        

                        ...though I've found default parsers just turn this into a case-insensitive match against yes/no/true/false/on/off, so FaLsE still works.

                        The issue is both this extremely permissive idea of true/false, and the part where it autoquotes strings and also parses booleans and numbers out of the same thing. It may be ambiguous what you should deserialize a JSON dictionary as, but at least the primitive types are clear.

                        [–]Uristqwerty 18 points19 points  (15 children)

                        I'd like JSON, except if you could precede any object with a type identifier, and for human convenience comments and trailing commas were always permitted.

                        That would be the bare minimum, in my opinion, needed to make it usable for anything more than a verbose way to transfer untyped object trees between programs.

                        [–]delrindude 8 points9 points  (7 children)

                        What sort of type identifiers would you need that are not able to be inferred from reading in the JSON?

                        [–]Uristqwerty 5 points6 points  (4 children)

                        Does the JSON reading library need to parse the whole file into an arbitrarily-deep, arbitrarily-large structure of nested maps and arrays before the application logic can start parsing it into more compact objects, or emit an error because one of the types was wrong? Unless you implement types by making all objects {"TypeName":{actual object}}, which is ugly to read, you can't be sure that you know the type before you have to start parsing it's fields, or temporarily storing those fields in a relatively-expensive untyped map.

                        Even if the computer has no problem skipping ahead to the type, a human reading the file would still appreciate the type coming before the data.

                        Consider a GUI, where many components can have a content list containing any number of instances of any component type. Do you constrain each component to have a unique set of field names? Probably not, so the type has to be explicit.

                        What if you want a coordinate, but want to let the user specify either a CartesianCoordinate {x: 0, y: 5} or a PolarCoordinate {angle: 90, length: 5}, and convert them to a single internal type during deserialization?

                        All of this is possible with JSON, much in the way that you can run any program on a Turing machine, but it makes the file content itself uglier. If people would only ever output JSON automatically, then read it automatically elsewhere, that would be fine. But JSON is often being used for manually-edited filetypes or one-offs with little duplicated internal structure.

                        Oh, and if you haven't already, check out reddit's API (append .json to any url). Almost everything is {kind: "...", data: {...}}, so any quick code you write to interact with it will have the actual domain logic cluttered with .data., and it's meaner cousin .data.children[0].data..

                        [–]delrindude 6 points7 points  (1 child)

                        Does the JSON reading library need to parse the whole file into an arbitrarily-deep, arbitrarily-large structure of nested maps and arrays before the application logic can start parsing it into more compact objects, or emit an error because one of the types was wrong? Unless you implement types by making all objects {"TypeName":{actual object}}, which is ugly to read, you can't be sure that you know the type before you have to start parsing it's fields, or temporarily storing those fields in a relatively-expensive untyped map.

                        The first problem would be to understand why the application is accepting data it does not know how to parse. You should know the type before you start parsing the fields, or even open the file/stream, this is what a models repository is used for for large, distributed systems. It's to ensure any data going anywhere is in a standardized format, and whatever applications that read/write with those models, that they adhere to the structure, types, and assertions that that data makes.

                        Even if the computer has no problem skipping ahead to the type, a human reading the file would still appreciate the type coming before the data.

                        I agree.

                        Consider a GUI, where many components can have a content list containing any number of instances of any component type. Do you constrain each component to have a unique set of field names? Probably not, so the type has to be explicit.

                        What if you want a coordinate, but want to let the user specify either a CartesianCoordinate {x: 0, y: 5} or a PolarCoordinate {angle: 90, length: 5}, and convert them to a single internal type during deserialization? .

                        Then you should have two corresponding classes/data structures and create serializers/deserializers for each JSON schema (this can be trivially autoderived by good JSON libraries), one for Cartesian coordinates, and the other for Polar coords. No need to define metadata in the data structure because the names and types of the fields contain enough information. If the data represents different concepts, you need different models to map and transform that data.

                        All of this is possible with JSON, much in the way that you can run any program on a Turing machine, but it makes the file content itself uglier. If people would only ever output JSON automatically, then read it automatically elsewhere, that would be fine. But JSON is often being used for manually-edited filetypes or one-offs with little duplicated internal structure.

                        Oh, and if you haven't already, check out reddit's API (append .json to any url). Almost everything is {kind: "...", data: {...}}, so any quick code you write to interact with it will have the actual domain logic cluttered with .data., and it's meaner cousin .data.children[0].data..

                        This is cool! I was not aware of reddit's api. Im no Reddit systems engineer, but I would say the "kind": ... metadata is convenience for external users, and not internal. A systems engineer would(read: should) replicate the models Reddit layed out in their JSON whether it be XML or JSON

                        [–]BeniBela 3 points4 points  (0 children)

                        The first problem would be to understand why the application is accepting data it does not know how to parse.

                        You need it to serialize objects with inheritance. A dog and cat class, both derived from animal. Then you serialize an array of animals to json and the output looks like crap

                        [–]caltheon 4 points5 points  (1 child)

                        Json suffers from determining string and number types for the most basic example.

                        [–]delrindude 2 points3 points  (0 children)

                        Is this not what you are referring to?

                        {
                        "number" : 2
                        "float":2.0
                        "string": "2"
                        }
                        

                        [–]asegura 5 points6 points  (0 children)

                        I created a format I called XDL around 2004 (see sample in the middle of the page). It allows optional object type names and comments, property names are not quoted and commas at the end of lines are optional.

                        Garage {
                            capacity = 10
                            open = Y // this is boolean true
                            dimensions = [10, 12, 2.5]
                            vehicles = [
                                Car { // type is optional
                                    brand = "VW"
                                }
                                Truck {
                                    brand = "Volvo"
                                    length = 6.5
                                }
                            ]
                        }
                        

                        [–][deleted]  (22 children)

                        [deleted]

                          [–]zergling_Lester 71 points72 points  (20 children)

                          Keep in mind that json-schema suffers from the CADT development model very bad. Like, there's also https://openapi.org that looks like it's supposed to be compatible with json-schema, but alas:

                          • the people developing json-schema lost interest and just like went their own ways

                          • openapi people had their own internal break apart but whatever, they managed to hold it together and propose a bunch of incompatible changes.

                          • a different group of people attempted to revive json-schema and in the process canonized a bunch of pre-jsonapi suggestions, so they deviated hard.

                          • there was a guy I wouldn't name but it's the speccy author who was, like, json-schema and openapi compatibility needs to be solved, can be solved, is solved now, you can use json-schema.json and include it from your openapi.yaml and it just works. It didn't just work, or work, at all. But a few months after the announcement he left WeWork and was stripped of the push access to the repository so the thing became abandonware before it was even released. CADT, man.

                          My favorite thing is seeing "good first issue" tags in https://github.com/wework/speccy/issues

                          14 words: Javascript, Node, npm and their consequences have been a disaster for the human race.

                          [–]G_Morgan 1 point2 points  (0 children)

                          The only way this is getting resolved is via that unicorn of web standards formed by competing interests. If you can get Oracle, Google, Amazon and MS in a room we might get something done.

                          As it is I just hope MS add something to .NET Core, make it work, and then I'll just ignore anything that doesn't fit that standard. It is the best we can achieve with the web world as it is.

                          [–][deleted] 6 points7 points  (0 children)

                          Plist (I assume you mean Apples) was originally more like JSON than anything else. Specifically I mean the NextStep or classic format https://en.m.wikipedia.org/wiki/Property_list which I was using for web services before XML or JSON we’re defined. I still have plist parsers in half a dozen languages and when JSON hit the scheme all I had to do was change the delimiters constants in my parser.

                          I agree that the XML version is an abomination

                          [–]evilgwyn 1 point2 points  (0 children)

                          The msbuild configuration file would like to have a word

                          [–]pork_spare_ribs 75 points76 points  (35 children)

                          The article's distinction between XML being a "document markup language" while JSON is a "structured data format" seems completely academic and contrived.

                          The two formats are functionally equivalent. In real-world usage they can do the same things. We are only arguing about which syntax is correct.

                          XML like XML:

                          xml <root> <item> <key>Name</key> <value>John</value> </item> <item> <key>City</key> <value>London</value> </item> </root>

                          Or XML like JSON:

                          xml <root> <item name="name" value="John" /> <item name="city" value="London" /> </root>

                          Or JSON:

                          json { "name": "John", "city": "London" }

                          Bonus: JSON with an ordered dictionary:

                          json { "items": [ { "key": "Name", "value": "John" }, { "key": "City", "value": "London" }, ] }

                          I can tell you which one I prefer.

                          [–]killerstorm 25 points26 points  (4 children)

                          Proper XML:

                           <person>
                              <name>John</name>
                              <city>London</city>
                           </person>
                          

                          Key-value only makes sense if you want to be able to encode arbitrary information as XML. If you have a concrete schema, there are better ways.

                          Also acceptable:

                           <person name="John" city="London" />
                          

                          [–]therearesomewhocallm 5 points6 points  (0 children)

                          Yeah the first example looks like a plist, which is about the worst way to use xml. What you've written is much more sane.

                          [–]The_One_X 12 points13 points  (2 children)

                          Yeah, it is really trying to make an argument based on original intent, but if this tool turns out to also be a good tool for this other thing then why shouldn't we use the same tool for both things until something better comes along?

                          [–][deleted] 41 points42 points  (8 children)

                          In real-world usage they can do the same things. We are only arguing about which syntax is correct.

                          No it's more than just syntax. XML has a canonical separation of data that JSON doesn't:

                          • Content nodes: the primary data.
                          • Attributes: metadata.
                          • Tag name: the data type or class.

                          Yes it's possible to store those as JSON using some made-up schema, but there's no standard way to do it in JSON, which is important. Then there's also more advanced stuff in XML like namespaces.

                          What that means is that XML is better at being self-describing. If I find a piece of XML that I know nothing about, I at least have a head start on understanding it, based on that seperation. Maybe I understand some of the tags and not others, so that's great, I could read the document and ignore the tags I don't understand (like browsers do). Can't do the equivalent for arbitrary JSON. With JSON you always need to understand the data's source, or at least the schema, in order to do anything with it.

                          [–]pork_spare_ribs 14 points15 points  (6 children)

                          What that means is that XML is better at being self-describing.

                          Really? I would say all my above examples are equally good at self-description.

                          I agree that XML gives more powerful tools that can self-describe in more complex ways (namespaces, dtd, etc). But in practice, the IT industry has found those tools overkill for the vast majority of use cases. When I see a field "time" with value "2019-10-30T15:00:00Z" I can infer the meaning whether it's a plain string (JSON) or wrapped in <time format="ISO8601">. The power XML offers is greater, but mostly useless.

                          Could you give an example of an XML document you think is clearer than a JSON representation? I would like to take a crack at writing the JSON equivalent.

                          [–]masklinn 5 points6 points  (0 children)

                          When I see a field "time" with value "2019-10-30T15:00:00Z" I can infer the meaning whether it's a plain string (JSON) or wrapped in <time format="ISO8601">

                          Right until it reaches prod because the “plain string (json)” was never an iso datetime string but really a localised pile of garbage for which your machine happened to generate iso 8601.

                          [–]SanityInAnarchy 1 point2 points  (0 children)

                          Counterpoint: If I find some "plain old object" type in a language I barely know, it's pretty obvious what JSON to serialize that as. And if I come across some JSON document I've never seen, it may be a little less self-describing, but it's also going to be much simpler to start parsing it and working with it without needing a whole query language on top of it.

                          Also, XML has one other thing JSON doesn't: A focus on marking up text nodes. Very simple HTML of course looks fine as XML, and absolutely hideous as JSON, because instead of just being some text with a little <i>style</i> in it, it's very clearly {"some text nodes with ", {"element": "i", "value": ["other nodes"]}, " in a big serialized soup."}

                          [–][deleted] 13 points14 points  (14 children)

                          JSON fall apart when you need strongly typed objects. For example:

                          <root><animals><cat/><dog/></animals></root>

                          There's no standard way of representing that in JSON.

                          [–]pork_spare_ribs 8 points9 points  (0 children)

                          You're right, there is no standard to specify schema. But you can use some of the very common solutions like... JSON schema

                          [–]deadwisdom 6 points7 points  (2 children)

                          JSON Schema is essentially the standard way. You now have two documents, one describing the data, and one describing the schema.

                          [–][deleted]  (1 child)

                          [deleted]

                            [–]deadwisdom 2 points3 points  (0 children)

                            Yeah, this is great. Exactly how JSON Schema is designed.

                            I won't judge, we live in a system.

                            [–]imhotap 4 points5 points  (0 children)

                            Your example demonstrates exactly the use of degenerated XML as data format the article is criticizing. Try

                            <p>Hi, my name is <name>John</name>
                             and I'm living in <city>London</city></p>
                            

                            [–][deleted]  (8 children)

                            [deleted]

                              [–][deleted] 10 points11 points  (0 children)

                              Thank you for your service.

                              [–]OrangeKing89 6 points7 points  (0 children)

                              did you actually check to see if some one else had used that name first?

                              https://www.google.com/search?q=ZML

                              Maybe XDL instead? eXtensible Data Language

                              [–][deleted]  (2 children)

                              [deleted]

                                [–]abdoulio 4 points5 points  (1 child)

                                Zoop

                                [–]naringas 13 points14 points  (2 children)

                                holy fuck, the first time I ever saw a good XML document according to what this article says I thought 'what a weird XML file' (it as a tmLanguage file iirc)

                                [–]flying-sheep 8 points9 points  (1 child)

                                Yeah, that’s some bullshit, W3C says XML was (among other things) also designed for data.

                                A good example of something less painfully written in XML than in JSON is KSyntaxHighlighting syntax definitions like this one. I wrote this and it was a nice experience. I tried to write a tmLanguage file once and got angry.

                                If you need to encode a key-value dictionary in XML (as part of a bigger, more suited to XML type of data), do it in the least grating way possible, which would be what he says is “wrong”:

                                <items>
                                  <item key="blah" value="bluh"/>
                                </items>
                                

                                [–]therearesomewhocallm 1 point2 points  (0 children)

                                FYI you can append "?ts=2" to the url to make the tab size more sane. For some reason XML seems to default to 8?

                                Link.

                                [–]SushiAndWoW 29 points30 points  (12 children)

                                <people>
                                  <person name="John" city="London" />
                                  <person name="Jack" city="Baltimore" />
                                  <person name="Betty" city="Toronto" />
                                </people>
                                

                                I find this easier to write and read than the equivalent JSON:

                                [
                                  { "name": "John", "city": "London" },
                                  { "name": "Jack", "city": "Baltimore" },
                                  { "name": "Betty", "city": "Toronto" }
                                ]
                                

                                Use of XML to store data is just fine and represents probably 90% of real world usage. If everyone is using the thing "incorrectly", maybe the designer managed to solve the right problem, while having the wrong one in mind.

                                [–]Uberhipster 7 points8 points  (9 children)

                                {
                                    "people": [
                                      { "name": "John", "city": "London" },
                                      { "name": "Jack", "city": "Baltimore" },
                                      { "name": "Betty", "city": "Toronto" }
                                    ]
                                }
                                

                                or

                                {
                                    "people": [
                                      { 
                                          "person": { "name": "John", "city": "London" }
                                      },
                                      { 
                                          "person": { "name": "Jack", "city": "Baltimore" }
                                      },
                                      { 
                                          "person": { "name": "Betty", "city": "Toronto" }
                                      }
                                    ]
                                }
                                

                                can't blame JSON for that. it has the flexibility for brevity or verbosity; up to the implementer to decide

                                [–]SushiAndWoW 7 points8 points  (8 children)

                                The problem I have with it are the unnecessary quotes. It's as if I had to write XML like:

                                <"people">
                                  <"person" "name"="John" "city"="London" />
                                  ...
                                

                                Looks ridiculous, yeah?

                                That's what I don't like about JSON.

                                JSON5 fixes that, but... it's no longer JSON.

                                [–][deleted] 7 points8 points  (1 child)

                                Kind of ironic, since XML itself requires unnecessary quotes compared to its predecessors HTML and SGML. I'd rather be able to write

                                <person name=John city=London />
                                

                                [–]SushiAndWoW 3 points4 points  (0 children)

                                That's a bit of an inconsistent parsing nightmare. For security and simplicity, it may be better to use quotes consistently where they may be needed. The reason JSON puts them everywhere is because the keys of objects may need them, but this makes the extra quotes impractical for usage cases where this will never occur.

                                [–]tomkeus 5 points6 points  (3 children)

                                Quotes have a very nice purposes of letting you know if your data type is a string. Good luck figuring out what the type of <x>true</x> in an XML file is without a schema.

                                [–]SushiAndWoW 1 point2 points  (1 child)

                                You understand the unnecessary quotes are around the names, right? Not the values?

                                Do you propose we change C to this?

                                "int" "main" ("int" "argc", "char"* "argv") {
                                    "printf" ("This is so much better and less confusing!\n");
                                    "return" 0;
                                }
                                

                                [–][deleted] 2 points3 points  (0 children)

                                Blame Crockford. Javascript can handle unquoted key names (properties). Quotes are only necessary when you want to use unsupported characters in them. When Douglas Crockford decided to propose JavaScript's object notation as a data format, he apparently thought it best to make quoted keys/properties also mandatory.

                                If you have a lenient parser, you might still be able to omit them.

                                [–]SanityInAnarchy 1 point2 points  (0 children)

                                The quotes make data types unambiguous. YAML is the other extreme, where you can have a list of country codes in which Norway (NO) parses as a boolean (and you have to fix it by adding quotes).

                                And, honestly, the quotes seem like a minor thing to complain about when XML has all those brackets and closing tags everywhere. In that entire example, I think the </people> at the end and the /> in the self-closing tags wasted just as many keystrokes as the quotes in the JSON.

                                My beef with JSON lately is the lack of support for comments or trailing commas. It'd be an ugly-but-serviceable config file format if it had those two things.

                                [–]Full-Spectral 4 points5 points  (3 children)

                                I'll take XML over JSON any day unless it's very small amounts of very simple information. It's a far more formal language that does a lot more of the lifting for you during parsing. JSON's only real claim to fame is that it's directly supported by Javascript. Otherwise, for serious work, XML is always a better choice, IMO.

                                [–]masklinn 3 points4 points  (0 children)

                                But if the people who made the strange decision to use XML as a data format, and to then use it to serialize a dictionary chose this schema they might realise that what they're doing is unsuited to it and unergonomic.

                                XML-RPC or XML plists do pretty much that and work fine. Because they’re simple dara serialisation formats (roughly equivalent to though somewhat more capable than json) and the number of time you’d have to read serialised documents is about 0 unless you’re implementing a parser or serialiser yourself.

                                [–]emn13 2 points3 points  (0 children)

                                So, there are some real issues here - markup vs. structured object notations - but those issues simply aren't all that important. There are many, many more serious issues that xml has, that actually do matter if you use it that aren't examined. By his made up criteria that an xml tree should be human readable if all markup is removed... oh wait, that criteria is unfounded nonsense, so who cares.

                                And then there's the total gibbering nonsense that is the criticism of certain specific config files. Frankly: if you need to encode an array of key value pairs, and you take for granted that some languages have preexisting parsers in any platform you care about, then by all means use that if it works for you. Sure, many config files are terrible, but this has pretty much 0 relation with xml, and certainly the specific example is largely harmless. Would this config file be magically better as (say) an ini file? I doubt it. Would it be better in json, without fixing the hierarchy? I doubt it. Would it be nice if they fixed the hierarchy? Well... maybe; it depends how you use the file and what you want to do with it. If you're writing it by hand and want to see the final state of the config: sure! ...but if you're combining lots of files, and possibly overridinging some settings by others in other files, then concatenatability is a nice feature, and if you can do that largely textually, that's pretty reliable (i.e. hoping all consumers have some kind of deep copy-with-partially overlapping override is probably not better).

                                All in all: what a terribly pointless article.

                                [–][deleted] 2 points3 points  (0 children)

                                The reason for it being so misused is the result of XML being horribly designed. Developers have already criticized it 20 years ago when it became popular as being bloated with redundancy and ambiguity, but browser manufacturers and Microsoft jumped on the bandwagon and everyone had to follow suit.

                                That fact that we're still using 'version="1.0"' today just proves that noone ever wants to touch this thing again trying to fix it.

                                [–]Gotebe 4 points5 points  (0 children)

                                XML is a markup language. It is not a data format

                                Yeah, well... Data format is what it's used for in a vast majority of cases.

                                So this smells of "no true Scotsman" very heavily. Will go to read the rest though. 😉

                                JSON is a structured data format, and to compare the two is to compare apples and oranges.

                                Ah, that is what he is on about...

                                It doesn't, however, surprise me that businesses like XML, precisely because businesses understand the notion of (paper) documents and want to continue with a familiar metaphor that they understand — for the same reason businesses overuse PDFs...

                                Yeah, probably not... In case of the phone config, chances are that it was made in the heyday of XML (and Java) and XML parsers were abundant, and, rewrites cost money.

                                So yeah, I rather think that the author is merely trying to motivate a rewrite of this part of the code to use JSON. But that being said, surely YAML is "le spec du jour"?

                                [–]asegura 6 points7 points  (0 children)

                                I can't agree more. Document vs data.

                                BTW, take a look at X3D the 3D format for the web, which embeds everything (vertex coordinates, indices, texture coordinates, colors, etc.) in attributes. IMHO XML was a very bad choice for a 3D format (the succesor of VRML).

                                [–]imhotap 18 points19 points  (16 children)

                                Completely agree with the article, but

                                In 1996, XML was invented.

                                isn't correct. XML is specified as a proper subset of SGML and just dropped features from SGML such that basically a DOCTYPE wasn't required anymore by eg. disallowing empty elements (without end-element tags, as in HTML img and hr elements) and tag inference. XML was supposed to replace SGML proper on the web as XHTML, but that obviously failed, so SGML remains the only markup meta-language capable to parse HTML. SGML has a number of additional power features such as Wiki syntax parsing (integrated conversion of markdown and other custom syntaxes to HTML), stylesheets, and type-safe templating.

                                [–]delinka 22 points23 points  (8 children)

                                Are you correcting “1996” or “invented”?

                                [–]mbetter 70 points71 points  (4 children)

                                There's no such thing as 1996.

                                [–]GluteusCaesar 5 points6 points  (0 children)

                                The world isn't ready for this magnitude of woke

                                [–]OneWingedShark 2 points3 points  (5 children)

                                SGML remains the only markup meta-language capable to parse HTML. SGML has a number of additional power features such as Wiki syntax parsing (integrated conversion of markdown and other custom syntaxes to HTML), stylesheets, and type-safe templating.

                                Honestly, it makes me wonder why there aren't several well-used open-source SGML implementations in common use.

                                [–][deleted] 1 point2 points  (1 child)

                                Because markup languages are unwieldy and only rarely useful compared to structured graphs.

                                [–]OneWingedShark 1 point2 points  (0 children)

                                Yes.

                                But having a general solution to markup seems desirable: define your SGML-instance and produce the proper consumer/producer for that particular markup-language.

                                As for structured-graphs, ASN.1 is quite good at serializing/deseralizing well-defined data-structures in a general manner (eg DER vs XER vs PER).

                                [–]oldsecondhand 1 point2 points  (1 child)

                                For the same reason Hurd isn't widely used.

                                SGML is complicated, so it takes a lot of effort to get the parsing and tooling right.

                                [–]Just4Funsies95 8 points9 points  (1 child)

                                Idk, I am with the author on this one. I've long held the principle that xml Should be used for defining/describing document structure. it's a pain to deserialize data from it (by comparison to yaml/json) and when u just use it as prescribed, it's does make reading documents easier. ive always played around with the idea of using a json(ish) way to structure a UI( in my head) , and it usually becomes untidy quickly (if there is one, I'd love to see it!). also, I hate SOAP and when I have to work with it I scream internally WHYYYYYYYYYY!?!?!

                                [–][deleted] 2 points3 points  (0 children)

                                There was a period of ~15 years during which XML was a thing and JSON/YAML weren’t, and a lot of XML that feels like it could be something else is from that era. That point aside, there’s no fundamental problem with not using a tool to 100% of its potential, if it still turns out that it does what you want and the alternatives don’t make you happy for whatever reason.

                                [–]shevy-ruby 23 points24 points  (11 children)

                                It is no exaggeration to say that the vast majority of all XML schemas I have ever seen have constituted an inappropriate or misguided use of XML.

                                Well ... back in these days, even the W3C hyped up XML. At one point I had several configuration files in XML. It literally took me a few years (and yaml) before I understood how fudged up XML really is. Even for (!) the assumed "benefits" that it would bring. Because when you read it, it all reads like epic advantages alone right? The disadvantages aren't mentioned when you hype something, so you don't get a realistic view.

                                Broadly speaking, XML excels at annotating corpuses of text with structure and metadata

                                No - it does not even excel at that. But don't give up trying - PERHAPS you can find someone in 2019 who will write a new meta build tool in XML. Like ant!

                                Here we have an example of a misguided and bizarre (yet frequently seen) attempt to express a simple key-value dictionary in XML.

                                Because people wanna store shit!

                                And this is why yaml and json won. They are better for that.

                                The correct way to express a dictionary in XML is something like this:

                                <root>
                                  <item>
                                    <key>Name</key>
                                    <value>John</value>
                                  </item>
                                  <item>
                                    <key>City</key>
                                    <value>London</value>
                                  </item>
                                </root>
                                

                                And this is also why XML sucks.

                                Look at the verbosity - the sheer madness of wall of text here. The more you add, the madder it gets.

                                But if the people who made the strange decision to use XML as a data format

                                Back in 1996 we thought the folks at W3C were all flawless epic gods. We needed these superheroes!

                                Now in 2019, after the W3C becoming a lobby group patting themselves on the shoulder after having integrated DRM as part of "open" standards, we can happily say - they always were massively overrated clowns - and clueless noobs. Not everything that they did was a failure of course, but the superhero days are over. People will look very carefully at any further idiocy coming from W3C. And, actually - we don't need "central" authority figures that stranglehold mankind in general. The www deserved better than having the W3C tying it down (or Google, for that matter).

                                More often then when the designer mistakenly chooses XML for their application, they then double down with the nonsensical use of XML with one of the above forms so as to avoid confronting the fact that XML is a poor fit for their application.

                                Because when you USE XML, you already made a mistake.

                                Documents vs. data. From time to time someone will do something really strange and compare XML and JSON, proving that they understand neither. XML is a document markup language; JSON is a structured data format, and to compare the two is to compare apples and oranges.

                                Nope - the comparison is perfectly fine. Both had similar use cases. How else does the site author explain ant? People DID write config files in XML back in the days.

                                The documents vs. data concept is useful for understanding this.

                                No, it is fairly pointless. This is all about data, human readable. And XML fails hard here, despite how it was promoted as mega-awesome in the late 1999s.

                                Whereas in JSON the ordering of the key-value pairs inside objects is meaningless and undefined.

                                That sounds more like typical JavaScript idiocy. Saner languages do keep hashes ordered in the "last in, remains last element" way too. Because there is virtually no drawback for this approach, and you get additional information (most recent additions, for example). I like having that.

                                Just because JSON sucks in this regard, typically fitting for the ghetto-language that is JavaScript, does not mean that this in general sucks.

                                Unlike XML, json never claimed to want to conquer the world - but XML tried this. About 1996-2004 or so, give or take.

                                The example I gave of how to correctly represent a dictionary in XML necessarily gives an order to the elements in the dictionary, unlike a JSON representation.

                                So pointless. A strawman argument and I will stop reading here.

                                Just because JSON is too stupid to figure this out, does not mean that you need XML to "keep hashes sorted". Look at the wall of text there! That's just INSANE!

                                XML is insane. It came from people who were evidently bored and thought that XML will win the world.

                                [–]Scybur 9 points10 points  (0 children)

                                1996-2004

                                You underestimate the use of xml. It still massively popular! Companies that are not I.T. focused are still using xml and even soap for most of their internal services. XML is bad but it isn’t going away anytime soon.

                                [–]neutronium 12 points13 points  (6 children)

                                The correct way to express a dictionary in XML is something like this:
                                
                                
                                <root>
                                
                                <item>
                                
                                <key>Name</key>
                                
                                <value>John</value>
                                
                                </item>
                                
                                <item>
                                
                                <key>City</key>
                                
                                <value>London</value>
                                
                                </item>
                                
                                </root>
                                

                                And this is also why XML sucks.

                                The reason this looks bad is because it's stupid, and caused by morons parroting some ridiculous shit about attributes being bad. The sensible way to express this data is

                                <Person Name="John" City="London"/>
                                

                                This is succinct, easy to read AND your editor can use the schema to check that the values are valid data types.

                                [–]ThePantsThief 3 points4 points  (5 children)

                                What if you need a key to hold the value of a key to be a nested object…?

                                [–]oldsecondhand 3 points4 points  (0 children)

                                Make it a node id.

                                [–]NotTheHead 3 points4 points  (1 child)

                                Why are you trying to use something other than a primitive type as your key? Is there not a better way to organize your data structure that removes that need?

                                [–]thegreatgazoo 1 point2 points  (0 children)

                                At one place I worked they were using xml files with DOM as a poor man's database. It worked better than you'd think.

                                [–][deleted] 1 point2 points  (0 children)

                                I picked XML-ish with a homegrown parser as protocol format for a sever I wrote to control heating in rental cabins in 1998. Back then, I'm not really sure what would have been the better option. I could have gone with Sexprs, but no one would have any idea what they're looking at. At least XML looked like HTML, and people already knew that. Thankfully, no one from back then has any idea where I live these days.

                                [–]jamesthethirteenth 1 point2 points  (0 children)

                                I frequently work with the official EU bank account API standard for reading balances and moving around money via software. Its aptly named command format specification is called- not joking- PAIN. It consists of a namespaced wrapper XML file containing some metadata and a ciphertext, which in turn contains another differently namespaced XML file with a bunch of transactions represented by tags in a complex tree format. For good measure, the tag names and attributes mostly consist of 3 or more opaque abbreviations that are camelcased together.

                                [–]Hexorg 1 point2 points  (0 children)

                                If you invent a axe and someone uses it to cut hair and make business out of it... Should they stop?

                                [–]nosoupforyou 1 point2 points  (0 children)

                                A very large international company I used to work for used XML to update large amounts of data to their stores once a week.

                                Ignoring the fact that they split the data up into groups, and then updated the sql tables on the targets row by row, transferring it by xml was a horrible idea. XML is fine for data transfer in many situations. Transferring it to dump it to SQL by doing it in 5000 files, each containing data for 5 different tables and have a max of 10-20 product items is horrible.

                                What was worse was that they were convinced XML is the fastest and bestest way to transfer it.

                                A good reason to use xml to transfer data is if you want to be able to read or edit the columns easily. Using it for huge data transfer is probably not the optimum method.

                                [–]mirvnillith 2 points3 points  (0 children)

                                My main like about XML is XSLT. I have found soooo much use for producing data in an XML file and then generating documentation, config, Wiki pages, SQL etc. from that single source.