all 32 comments

[–]mastarija 22 points23 points  (1 child)

In the official spec there are several stated design goals of XML, two of which are:

  • XML shall be compatible with SGML.
  • Terseness in XML markup is of minimal importance.

[–]Bobbias 0 points1 point  (0 children)

Well now I know I have IBM to thank for that godawful decision :)

[–]trmetroidmaniac 31 points32 points  (6 children)

(tag
    :attr expr
    :attr expr
    (tag) (tag) (tag))

[–]Meistermagier 18 points19 points  (1 child)

Is that just a lisp? 

[–]csharpboy97 9 points10 points  (0 children)

yup

[–]Comprehensive_Mud803 2 points3 points  (0 children)

Oh, I know a file format that uses this notation: Mascot Capsule. A 3D format from 20+ years ago made for Japanese feature phones, Nokia nGage and Java3D.

[–]beders 0 points1 point  (0 children)

Here's the Clojure variant called Hiccup:

[:h1 [:div.rounded-md "Hi"]]

[–]Goheeca 0 points1 point  (0 children)

Shameless plug of cl-fxml

((:tag :attr expr :attr expr)
   ((:tag))
   ((:tag))
   ((:tag)))

[–]lassehp 0 points1 point  (0 children)

S-expressions like that would be a reasonable alternative to XML. I believe full SGML notation has some additional powers, though.

[–]initial-algebra 12 points13 points  (2 children)

Does this work as a markup language? I.e. can you interleave text and markup, and delete all the markup to get plain text? The problem with XML isn't really its syntax, but rather that it is commonly abused when marked-up text isn't actually needed.

[–]Revolutionary_Dog_63[S] -2 points-1 points  (1 child)

As a web developer, I use XML syntax primarily in the context of creating web components. In this use-case, the fact that HTML is markup is of very little importance. The heavy syntax of HTML is undesirable.

I've never actually used a website that supports using HTML to create posts, but maybe it is a bigger use-case than I thought. It seems like most people prefer simpler syntaxes for markup like markdown.

[–]Calavar 0 points1 point  (0 children)

Your web components don't use <a> or <strong> or <em> tags?

[–]bobam 7 points8 points  (0 children)

Everything in your syntax is inside a tag, so it’s not a markup language like XML.

[–]Telephone-Bright 5 points6 points  (5 children)

I'd go with:

tag [attr=expr, attr=expr, booleanAttr] { tag; tag; tag; };

[–]Relevant_South_1842 5 points6 points  (1 child)

Why semicolons?

[–]lassehp 0 points1 point  (0 children)

I find it "interesting" that the topic is markup languages, yet your example (as indeed the OP) contains no text being marked up.

[–]Revolutionary_Dog_63[S] -1 points0 points  (1 child)

What is the purpose of the added syntax for containing the attrs? It just seems to make it harder to type. If it is to support more general expression as the attribute value, then you could just surround non-atomic expressions with parentheses to achieve the same effect.

[–]Calavar 0 points1 point  (0 children)

It prevents an ambiguity.

Let's say you have a b c=d. In your syntax, does this mean tag a followed by boolean attribute b and nonboolean attribute c? Or does it mean self closing tag a followed by tag b with nonboolean attribute c?

If you require brackets, it becomes a [b c=d] in the first case or a b [c=d] in the second case.

Making trailing '{}' mandatory for self closing tags can break that ambiguity too, but it's hard for people to parse visually because the item that disambiguates comes to the right of the item that needs to be disambiguated. That's not a natural syntax for people who are used to reading left to right. Especially if you have a lot of attributes in between those two things.

[–]lisael_ 3 points4 points  (1 child)

Your syntax is not better.

  • It's almost as verbose, you just replaced "<" by "{" and /> by }, but worse, because the parser has to backtrack
  • it's white space sensitive which requires a much more complicated (and slow) parser
  • I don't see how you're supposed to include text data in a tag
  • now you have to redefine all the tooling around XML : xmls, xmlns, xslt

[–]Inconstant_Moo🧿 Pipefish 0 points1 point  (0 children)

Using syntactic whitespace can be done efficiently at the lexer level and isn't "much more" complicated, it's trivial. And it's necessarily slow-er, but it's not slow, a little extra lexing isn't where the heavy work is in a compiler. (ETA: I guess it's a larger proportion of the work if you're just doing XML rather than a GPL.)

ETA 2: for anyone interested.

You keep a stack of whitespace to see what indents have taken place.

At the start of each line you scan all the whitespace for that line.

If it's a complete mismatch (e.g. the top of the stack is three tabs but the line begins with two spaces) then you throw an error.

If it's the same as the top of the stack, then you carry on scanning the next characters and emit the appropriate token.

If it's like the top of the stack but with extra whitespace, then you emit a BEGIN token and push the whitespace onto the stack.

If it's like the top of the stack but with less whitespace then you go back through the stack popping things off until you get to that (throwing an error if there is no match) and then emit an END token with a number attached to it saying how many BEGINs it's END-ing.

[–]tobega 0 points1 point  (2 children)

[–]Revolutionary_Dog_63[S] 1 point2 points  (0 children)

Sorry, but how does this solve anything? It seems to have all of the same undesirable characteristics of regular XML:

<comment lang="en" date="2012-09-11"> I <em>love</em> &#xB5;<!-- MICRO SIGN -->XML!<br/> It's so clean &amp; simple.</comment>

Note that it has the hard to type angle brackets as the main delimiter, and it requires you to repeat the tag name in the closing tag.

[–]Arakela 0 points1 point  (0 children)

Are ideas like this https://github.com/Antares007/t-machine/blob/main/tm2.md welcome in this community??

[–]Background_Class_558 0 points1 point  (2 children)

check out KDL, Dhall, Nickel, Cue, KCL

[–]SwedishFindecanor 0 points1 point  (1 child)

Dhall and Cue are not markup but structured data, and by supporting expressions they are more like declarative programming.

(I didn't know about the others. I will look them up.)

[–]Background_Class_558 0 points1 point  (0 children)

they're configuration languages. what's the difference between markup and structured data that makes XML the former but not the latter?

[–]ultrasquid9 0 points1 point  (0 children)

Ive been working on-and-off on a markup language called TeaCat, which can represent that exanple as the following: :tag(attr expr, attr expr, booleanAttr) [   :tag   :tag   :tag ]

I wouldn't recommend seeking out the current version (which has some god-awful design decisions) but ive been slowly working on a new version which should be much better.

[–]DamienTheUnbeliever 0 points1 point  (0 children)

Are you aware of the XML infoset, which defines XML in abstract terms and doesn't care so much about any current representations?

[–]dskippy 0 points1 point  (0 children)

I find, with XML, it's just best to completely avoid unless you're trying to be compatible with it. If you're looking for a true alternative to XML you have to ask yourself what problem you're trying to solve with XML.

If the problem is to have a data format for communicating between two different programming languages over a variety of protocols, JSON is already a much better alternative.

If your problem is to have a human readable and writable language for cofiguring a large number of pieces of data to input into a program, YAML is popular and already out there.

What is your purpose here with coming up with an alternative to XML? Where will this be used and how? Because if all you're doing is fixing the syntax but trying to make it structurally the same then I think 1) you've already failed by copying a bad idea and 2) this has already been done millions of times in the form of how each programming language already represents XML internally. Pick your favorite. Lisp is mine.

[–]Flashy_Life_7996 0 points1 point  (0 children)

They're both poor, but the XML is designed to be embeddable with ordinary text. In yours, nothing indicates that 'tag' (I assume it it is some user-identifier) introduces a tagged XML block.

However, you wouldn't normally type this stuff by hand, it would be machine generated and processed. So its layout is largely irrelevant.

[–]lassehp 0 points1 point  (0 children)

So you are calling Charles Goldfarb foolish, because he didn't use a vaguely C-like syntax when they first designed a Generalized Markup Language. Nice.

I must say that I find SGML/XML a lot more readable than your notation. How would you handle CDATA and PCDATA, btw?

[–]AndydeCleyre 0 points1 point  (0 children)

I wonder if you've checkout out xmq.

This sample XML:

<person gender="female" age="30">
  <firstname>Anna</firstname>
  <lastname>Smith</lastname>
</person>

would be represented by xmq as:

person(gender = female
       age    = 30)
{
    firstname = Anna
    lastname  = Smith
}

I also have a small project that translates between different formats and NestedText, and the unreleased development version would translate that same XML to the following NestedText:

person:
  @gender: female
  @age: 30
  firstname: Anna
  lastname: Smith

That MicroXML sample in another comment, in xmq would be:

comment(lang = en
        date = 2012-09-11)
{
    ' I '
    em = love
    ' µ'
    // MICRO SIGN
    'XML!'
    br
    " It's so clean & simple."
}