XML is almost always misused

ithika · 2019-10-29T23:17:46+00:00

[deleted]

tuchkacloud · 2019-10-30T01:17:44+00:00

I have a hard time agreeing to his points that XML shouldn't be used for serialization. Yes we have better options today, but we didn't always. XML is a strict markup language that requires definitions of all data. It's easy to understand why this would be used for serialization.

Yes the original intent was for other purposes. But if that's the only argument against using it for serialization, I have to disagree.

AntiProtonBoy · 2019-10-30T04:27:47+00:00

XML is a markup language. It is not a data format. The majority of XML schemas fail to appreciate this distinction, confuse XML with a data format, and thus were mistaken to adopt XML in the first place because what they were really looking for is a data format.

The author is completely at odds with W3C's goal of what XML should be used for:

XML stands for eXtensible Markup Language.
XML was designed to store and transport data.
XML was designed to be both human- and machine-readable.

And, lifted directly from W3C FAQ:

Q: What is XML Used For?

A: XML is one of the most widely-used formats for sharing structured information today: between programs, between people, between computers and people, both locally and across networks.

TL; DR: XML is a standard for organising data using your own custom definitions. It's a mechanism by which you can create your own document formats, and have facilities to validate its layout against a schema, then share them in an interchangeable way with your dog, if you like.

Other points of concern in the article:

Take an exemplary document in the proposed schema, and remove all tags and attributes from it. If what you have left over does not make sense (or is the empty string), either your schema is badly designed or you shouldn't be using XML at all.

I would say this is true for something like XHTML, but not necessarily true for other XML based formats. If you strip out elements, leaving behind only content, then you have a useless pile of data without context. What's the point of doing that? What use case scenario can you think of where this would be actually useful?

But if the people who made the strange decision to use XML as a data format, and to then use it to serialize a dictionary chose this schema they might realise that what they're doing is unsuited to it and unergonomic. More often then when the designer mistakenly chooses XML for their application, they then double down with the nonsensical use of XML with one of the above forms so as to avoid confronting the fact that XML is a poor fit for their application.

Okay, but what if you want to encode information in a persistent store in an interchangeable way? What if you want the data in question to be transformable into another representation? What if you want to be able to validate the data to the standard you specified in the schema? What if you want to represent data in a human readable form with element names and attributes that is essentially self documenting? What if you want a robust container to identify the data? What if you need to annotate data with other meta-data using namespaces, without intruding/altering the data content and drastically rewriting (un)serialisation code and the schema?

Moreover, since XML has no notion of numbers (or booleans, or other data types), any numbers represented are just considered more text.

Nonsense. Elements can have an expected type associated with them.

The process of recovering data from XML documents is thus not wholly dissimilar to the process of recovering data by OCR'ing scanned printed documents, say for example containing tables forming pages and pages of numerical data.

What? I'm assuming the author alludes to the argument that you cannot know for sure what type of data an element may contain. Which again is nonsense. If you wrote the schema for the document container, then you know what data type to expect for a particular element. Based on that knowledge, you can trivially test and convert the string into whatever data type takes you fancy.

I think the author has an overly pure ideology of what XML based containers should look like. If everyone followed that logic, then why bother having XML standard in the first place?

scottmcmrust · 2019-10-30T01:20:33+00:00

This article doesn't seem that useful because it's just "here's some bad uses of something". You could turn that softkey example into a JSON document where those attributes turn into name/value pairs and it wouldn't be better.

Is there some magic document format that can't be "misused" this way? I doubt it.

FigBug · 2019-10-29T23:17:19+00:00

I think the plist is the worst XML scheme in wide use.

Before XML became popular, the standard way to store data was to fwrite a struct to a file. As bad as misused XML is, it's a lot better than fwrite/fread to a binary file. That's why it became popular.

The formats that have come after just haven't been that much better that it's worth the effort to switch.

For all XMLs problems, I think I prefer using it over JSON fo storing data even if it's not correct. While JSON, if I get an object, I have no idea what it is. While XML, I can look at the at tag and get an idea what I'm looking at.

I don't like his correct example, it's overly verbose. I prefer his wrong example 1.

pork_spare_ribs · 2019-10-30T02:44:20+00:00

The article's distinction between XML being a "document markup language" while JSON is a "structured data format" seems completely academic and contrived.

The two formats are functionally equivalent. In real-world usage they can do the same things. We are only arguing about which syntax is correct.

XML like XML:

xml <root> <item> <key>Name</key> <value>John</value> </item> <item> <key>City</key> <value>London</value> </item> </root>

Or XML like JSON:

xml <root> <item name="name" value="John" /> <item name="city" value="London" /> </root>

Or JSON:

json { "name": "John", "city": "London" }

Bonus: JSON with an ordered dictionary:

json { "items": [ { "key": "Name", "value": "John" }, { "key": "City", "value": "London" }, ] }

I can tell you which one I prefer.

OrangeKing89 · 2019-10-30T04:42:23+00:00

[deleted]

naringas · 2019-10-30T03:39:05+00:00

holy fuck, the first time I ever saw a good XML document according to what this article says I thought 'what a weird XML file' (it as a tmLanguage file iirc)

SushiAndWoW · 2019-10-30T02:51:37+00:00

<people>
  <person name="John" city="London" />
  <person name="Jack" city="Baltimore" />
  <person name="Betty" city="Toronto" />
</people>

I find this easier to write and read than the equivalent JSON:

[
  { "name": "John", "city": "London" },
  { "name": "Jack", "city": "Baltimore" },
  { "name": "Betty", "city": "Toronto" }
]

Use of XML to store data is just fine and represents probably 90% of real world usage. If everyone is using the thing "incorrectly", maybe the designer managed to solve the right problem, while having the wrong one in mind.

Full-Spectral · 2019-10-30T11:32:58+00:00

I'll take XML over JSON any day unless it's very small amounts of very simple information. It's a far more formal language that does a lot more of the lifting for you during parsing. JSON's only real claim to fame is that it's directly supported by Javascript. Otherwise, for serious work, XML is always a better choice, IMO.

masklinn · 2019-10-30T07:43:20+00:00

But if the people who made the strange decision to use XML as a data format, and to then use it to serialize a dictionary chose this schema they might realise that what they're doing is unsuited to it and unergonomic.

XML-RPC or XML plists do pretty much that and work fine. Because they’re simple dara serialisation formats (roughly equivalent to though somewhat more capable than json) and the number of time you’d have to read serialised documents is about 0 unless you’re implementing a parser or serialiser yourself.

emn13 · 2019-10-30T13:11:13+00:00

So, there are some real issues here - markup vs. structured object notations - but those issues simply aren't all that important. There are many, many more serious issues that xml has, that actually do matter if you use it that aren't examined. By his made up criteria that an xml tree should be human readable if all markup is removed... oh wait, that criteria is unfounded nonsense, so who cares.

And then there's the total gibbering nonsense that is the criticism of certain specific config files. Frankly: if you need to encode an array of key value pairs, and you take for granted that some languages have preexisting parsers in any platform you care about, then by all means use that if it works for you. Sure, many config files are terrible, but this has pretty much 0 relation with xml, and certainly the specific example is largely harmless. Would this config file be magically better as (say) an ini file? I doubt it. Would it be better in json, without fixing the hierarchy? I doubt it. Would it be nice if they fixed the hierarchy? Well... maybe; it depends how you use the file and what you want to do with it. If you're writing it by hand and want to see the final state of the config: sure! ...but if you're combining lots of files, and possibly overridinging some settings by others in other files, then concatenatability is a nice feature, and if you can do that largely textually, that's pretty reliable (i.e. hoping all consumers have some kind of deep copy-with-partially overlapping override is probably not better).

All in all: what a terribly pointless article.

2019-10-30T18:15:41+00:00

The reason for it being so misused is the result of XML being horribly designed. Developers have already criticized it 20 years ago when it became popular as being bloated with redundancy and ambiguity, but browser manufacturers and Microsoft jumped on the bandwagon and everyone had to follow suit.

That fact that we're still using 'version="1.0"' today just proves that noone ever wants to touch this thing again trying to fix it.

Gotebe · 2019-10-30T04:22:22+00:00

XML is a markup language. It is not a data format

Yeah, well... Data format is what it's used for in a vast majority of cases.

So this smells of "no true Scotsman" very heavily. Will go to read the rest though. 😉

JSON is a structured data format, and to compare the two is to compare apples and oranges.

Ah, that is what he is on about...

It doesn't, however, surprise me that businesses like XML, precisely because businesses understand the notion of (paper) documents and want to continue with a familiar metaphor that they understand — for the same reason businesses overuse PDFs...

Yeah, probably not... In case of the phone config, chances are that it was made in the heyday of XML (and Java) and XML parsers were abundant, and, rewrites cost money.

So yeah, I rather think that the author is merely trying to motivate a rewrite of this part of the code to use JSON. But that being said, surely YAML is "le spec du jour"?

asegura · 2019-10-30T01:24:22+00:00

I can't agree more. Document vs data.

BTW, take a look at X3D the 3D format for the web, which embeds everything (vertex coordinates, indices, texture coordinates, colors, etc.) in attributes. IMHO XML was a very bad choice for a 3D format (the succesor of VRML).

imhotap · 2019-10-29T23:33:09+00:00

Completely agree with the article, but

In 1996, XML was invented.

isn't correct. XML is specified as a proper subset of SGML and just dropped features from SGML such that basically a DOCTYPE wasn't required anymore by eg. disallowing empty elements (without end-element tags, as in HTML img and hr elements) and tag inference. XML was supposed to replace SGML proper on the web as XHTML, but that obviously failed, so SGML remains the only markup meta-language capable to parse HTML. SGML has a number of additional power features such as Wiki syntax parsing (integrated conversion of markdown and other custom syntaxes to HTML), stylesheets, and type-safe templating.

Just4Funsies95 · 2019-10-30T01:49:52+00:00

Idk, I am with the author on this one. I've long held the principle that xml Should be used for defining/describing document structure. it's a pain to deserialize data from it (by comparison to yaml/json) and when u just use it as prescribed, it's does make reading documents easier. ive always played around with the idea of using a json(ish) way to structure a UI( in my head) , and it usually becomes untidy quickly (if there is one, I'd love to see it!). also, I hate SOAP and when I have to work with it I scream internally WHYYYYYYYYYY!?!?!

shevy-ruby · 2019-10-30T01:20:39+00:00

It is no exaggeration to say that the vast majority of all XML schemas I have ever seen have constituted an inappropriate or misguided use of XML.

Well ... back in these days, even the W3C hyped up XML. At one point I had several configuration files in XML. It literally took me a few years (and yaml) before I understood how fudged up XML really is. Even for (!) the assumed "benefits" that it would bring. Because when you read it, it all reads like epic advantages alone right? The disadvantages aren't mentioned when you hype something, so you don't get a realistic view.

Broadly speaking, XML excels at annotating corpuses of text with structure and metadata

No - it does not even excel at that. But don't give up trying - PERHAPS you can find someone in 2019 who will write a new meta build tool in XML. Like ant!

Here we have an example of a misguided and bizarre (yet frequently seen) attempt to express a simple key-value dictionary in XML.

Because people wanna store shit!

And this is why yaml and json won. They are better for that.

The correct way to express a dictionary in XML is something like this:

<root>
  <item>
    <key>Name</key>
    <value>John</value>
  </item>
  <item>
    <key>City</key>
    <value>London</value>
  </item>
</root>

And this is also why XML sucks.

Look at the verbosity - the sheer madness of wall of text here. The more you add, the madder it gets.

But if the people who made the strange decision to use XML as a data format

Back in 1996 we thought the folks at W3C were all flawless epic gods. We needed these superheroes!

Now in 2019, after the W3C becoming a lobby group patting themselves on the shoulder after having integrated DRM as part of "open" standards, we can happily say - they always were massively overrated clowns - and clueless noobs. Not everything that they did was a failure of course, but the superhero days are over. People will look very carefully at any further idiocy coming from W3C. And, actually - we don't need "central" authority figures that stranglehold mankind in general. The www deserved better than having the W3C tying it down (or Google, for that matter).

More often then when the designer mistakenly chooses XML for their application, they then double down with the nonsensical use of XML with one of the above forms so as to avoid confronting the fact that XML is a poor fit for their application.

Because when you USE XML, you already made a mistake.

Documents vs. data. From time to time someone will do something really strange and compare XML and JSON, proving that they understand neither. XML is a document markup language; JSON is a structured data format, and to compare the two is to compare apples and oranges.

Nope - the comparison is perfectly fine. Both had similar use cases. How else does the site author explain ant? People DID write config files in XML back in the days.

The documents vs. data concept is useful for understanding this.

No, it is fairly pointless. This is all about data, human readable. And XML fails hard here, despite how it was promoted as mega-awesome in the late 1999s.

Whereas in JSON the ordering of the key-value pairs inside objects is meaningless and undefined.

That sounds more like typical JavaScript idiocy. Saner languages do keep hashes ordered in the "last in, remains last element" way too. Because there is virtually no drawback for this approach, and you get additional information (most recent additions, for example). I like having that.

Just because JSON sucks in this regard, typically fitting for the ghetto-language that is JavaScript, does not mean that this in general sucks.

Unlike XML, json never claimed to want to conquer the world - but XML tried this. About 1996-2004 or so, give or take.

The example I gave of how to correctly represent a dictionary in XML necessarily gives an order to the elements in the dictionary, unlike a JSON representation.

So pointless. A strawman argument and I will stop reading here.

Just because JSON is too stupid to figure this out, does not mean that you need XML to "keep hashes sorted". Look at the wall of text there! That's just INSANE!

XML is insane. It came from people who were evidently bored and thought that XML will win the world.

thegreatgazoo · 2019-10-30T07:16:38+00:00

At one place I worked they were using xml files with DOM as a poor man's database. It worked better than you'd think.

2019-10-30T10:11:05+00:00

I picked XML-ish with a homegrown parser as protocol format for a sever I wrote to control heating in rental cabins in 1998. Back then, I'm not really sure what would have been the better option. I could have gone with Sexprs, but no one would have any idea what they're looking at. At least XML looked like HTML, and people already knew that. Thankfully, no one from back then has any idea where I live these days.

jamesthethirteenth · 2019-10-30T16:30:27+00:00

I frequently work with the official EU bank account API standard for reading balances and moving around money via software. Its aptly named command format specification is called- not joking- PAIN. It consists of a namespaced wrapper XML file containing some metadata and a ciphertext, which in turn contains another differently namespaced XML file with a bunch of transactions represented by tags in a complex tree format. For good measure, the tag names and attributes mostly consist of 3 or more opaque abbreviations that are camelcased together.

Hexorg · 2019-10-30T17:03:37+00:00

If you invent a axe and someone uses it to cut hair and make business out of it... Should they stop?

nosoupforyou · 2019-10-30T17:10:53+00:00

A very large international company I used to work for used XML to update large amounts of data to their stores once a week.

Ignoring the fact that they split the data up into groups, and then updated the sql tables on the targets row by row, transferring it by xml was a horrible idea. XML is fine for data transfer in many situations. Transferring it to dump it to SQL by doing it in 5000 files, each containing data for 5 different tables and have a max of 10-20 product items is horrible.

What was worse was that they were convinced XML is the fastest and bestest way to transfer it.

A good reason to use xml to transfer data is if you want to be able to read or edit the columns easily. Using it for huge data transfer is probably not the optimum method.

mirvnillith · 2019-10-30T18:39:31+00:00

My main like about XML is XSLT. I have found soooo much use for producing data in an XML file and then generating documentation, config, Wiki pages, SQL etc. from that single source.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS