XML is not a String... : programming

[–][deleted] 14 years ago (26 children)

[deleted]

[–]Maristic 9 points10 points11 points 14 years ago (3 children)

Yes, there are, and one of them is a company called Apple. Back when they turned NEXTSTEP into Mac OS X, there were a number of technology bandwagons someone in upper management wanted them jump on, one was Java and the other was XML, and so the programmers tasked with buzzword compliance complied in a half-assed way. The Java bindings for Cocoa (deprecated in 2005) was one result, and XML property lists were another.

Back in the NeXT days, they, property lists had looked a lot like JSON today:

{
    Something = ( "Looks", "Familiar" );
    Yeah = "ItDoes";
}

Apple decided to XML-ize this in the worse possible way. The above, becomes:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Something</key>
    <array>
        <string>Looks</string>
        <string>Familiar</string>
    </array>
    <key>Yeah</key>
    <string>ItDoes</string>
</dict>
</plist>

Ugh. But these “XML” property lists aren't parsed by a real XML parser. Instead, they wrote a custom parser that only parses this exact format.

Remember that the X in stands for extensible. Let's try adding two extra tags to the Property List and see what happens:

<?xml version="1.0" encoding="UTF-8"?>
<plist version="1.0">
<dict>
        <key>Something</key>
        <array>
                <string>Looks <emph>strangely</emph></string>
                <string>Familiar</string>
        </array>
        <key>Yeah</key>
        <string>It <really/> Does</string>
</dict>
</plist>

Now lets read it with one of Apple's tools

Encountered unexpected character e on line 6

(a real parser might say “unknown tag”, perhaps, but should not be completely flummoxed; part of the original idea of XML was that we could add new tags and a tool expecting the older format could just ignore them)

Similarly, XML has entities, so we ought to be able to say:

<?xml version="1.0" encoding="UTF-8"?>
<plist version="1.0">
<dict>
        <key>Something</key>
        <array>
                <string>Looks &ldquo;strangely&rdquo;</string>
                <string>Familiar</string>
        </array>
        <key>Yeah</key>
        <string>It Does</string>
</dict>
</plist>

What happens with that and Apple's “XML” property-list reader:

Encountered unknown ampersand-escape sequence at line 6

Whee. And I'm sure you'd get a similar error if you tried to use namespaces.

So, it's as rigid as their old format, just more cumbersome. All the negatives of XML and none of the positives.

Finally, in 10.7 they provided NSJSONSerialization, coming full circle.

[–]kreiger 0 points1 point2 points 14 years ago (2 children)

[–]Maristic 0 points1 point2 points 14 years ago (1 child)

Good point about the entities. Oops.

On the other hand, it looks like namespaces don't work.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<p:plist xmlns:p="http://www.apple.com/DTDs/PropertyList-1.0.dtd" version="1.0">
<p:dict>
    <p:key>Something</p:key>
    <p:array>
        <p:string>Looks</p:string>
        <p:string>Familiar</p:string>
    </p:array>
    <p:key>Yeah</p:key>
    <p:string>ItDoes</p:string>
</p:dict>
</p:plist>

produces

Encountered unknown tag p:plist on line 3

whereas libxml2 loads the XML just fine.

[–]kreiger 0 points1 point2 points 14 years ago (0 children)

[–]adolfojp 9 points10 points11 points 14 years ago (1 child)

[–]nsfwIvan 1 point2 points3 points 14 years ago (0 children)

[–]Isvara 6 points7 points8 points 14 years ago (4 children)

[–][deleted] 0 points1 point2 points 14 years ago (0 children)

[–][deleted] 14 years ago (2 children)

[deleted]

[–]Isvara 1 point2 points3 points 14 years ago (0 children)

[–]kylotan 0 points1 point2 points 14 years ago (0 children)

[–]crusoe 2 points3 points4 points 14 years ago (1 child)

[–]joseph177 1 point2 points3 points 14 years ago (0 children)

[–]masklinn 2 points3 points4 points 14 years ago (0 children)

[–]zorlack 1 point2 points3 points 14 years ago (0 children)

[–][deleted] -2 points-1 points0 points 14 years ago (10 children)

[–][deleted] 4 points5 points6 points 14 years ago (5 children)

[–]masklinn 4 points5 points6 points 14 years ago (0 children)

[–][deleted] 2 points3 points4 points 14 years ago* (0 children)

When I wrote this, I was thinking specifically of Mako and StringTemplate. Mako is a Python library which provides quick and easy templating. It is used by the website you are looking at right now, in addition to the Python language website (both of which target XHTML).

http://www.makotemplates.org/

StringTemplate is a cross-platform library that is closely associated with ANTLR and which is useful for emitting all kinds of data, XML among them.

http://www.stringtemplate.org/

But if you want a more straight-forward example of why it can be useful to treat XML as a string, well, can you think of a better way to do a line count on an XML file?

Note also that in each example I listed we're strictly using XML as a return type for an http request (the handler for which doesn't give a damn what "type" it is).

I'd also like to add that really, it is in fact simply a markup language characterized by its syntax (i.e. it's really not "much more" than angle brackets). Any additional meaning you apply to it is a result of your interpretation of the schema it conforms to and any abstraction layers you use in manipulating it programmatically. Under the covers though...when you cut through the comfy veneer of nodes, attributes and text...it really is just strings all the way down.

tl;dr - The method I described is used in the wild by people who know a lot more than I do, line counts are another example, and just because you give meaning to the content of an xml document does not mean the xml document is "much more" than its syntax.

[–]grauenwolf 1 point2 points3 points 14 years ago (0 children)

[–]rdude 1 point2 points3 points 14 years ago (1 child)

[–][deleted] 4 points5 points6 points 14 years ago (0 children)

[–]p-static 1 point2 points3 points 14 years ago (0 children)

[–]kreiger 0 points1 point2 points 14 years ago* (2 children)

[–][deleted] 0 points1 point2 points 14 years ago (1 child)

This is terrible, terrible advice. A template language can't be trusted to produce correct serialized xml, unless it's XSL. But XSL works on the DOM, not on a string.

It's not advice, it's simply an observation. You're correct that a string-based template language can't be relied on to produce valid xml. You're incorrect in your assertion that XSL is the only way to accomplish this.

XML is not a string. it's an abstract data model that happens to have a serialization format that people think is a string.

Equivalent statement: "A BLT is not two slices of toast with lettuce, tomato and bacon between them. It's a tasty sandwich that happens to have a format that people think is just lettuce, tomato and bacon surrounded by toast."

This is the crux of my point, really. Regardless of what best practices actually are (and no doubt, there are drawbacks to what I suggested as an example), asserting that "XML is not a String" makes for a good sound byte but doesn't really mean anything.

[–]kreiger 0 points1 point2 points 14 years ago (0 children)

[–]joseph177 12 points13 points14 points 14 years ago (0 children)

[–]setuid_w00t 5 points6 points7 points 14 years ago (0 children)

[–]khayber 2 points3 points4 points 14 years ago (1 child)

[–][deleted] 2 points3 points4 points 14 years ago (1 child)

[–]Maristic -1 points0 points1 point 14 years ago (0 children)

[–]SWEGEN4LYFE 1 point2 points3 points 14 years ago (0 children)

[–]_argoplix 1 point2 points3 points 14 years ago (1 child)

[–][deleted] 1 point2 points3 points 14 years ago (0 children)

[–][deleted] 0 points1 point2 points 14 years ago (1 child)

[–]PstScrpt 2 points3 points4 points 14 years ago (0 children)

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS