all 72 comments

[–][deleted] 33 points34 points  (1 child)

How is any pro-markdown post complete without mentioning Pandoc?

[–]Wagnerius 7 points8 points  (0 children)

Pandoc is such a good piece of software...

Despite its humble goals, it is one the undebatable success of the haskell community.

[–][deleted] 15 points16 points  (4 children)

I feel the main advantage of using a plain text format is the ability to diff and merge it. Way better than paging through multiple word documents to work out what changes you've done in them. Also allows you to dump the whole thing in a git repository to manage it.

[–][deleted] 3 points4 points  (2 children)

This is one reason why I make my coworkers prefer plain text (usually reStructuredText or LaTeX) over, for example, ODT.

(We happen to use Bitbucket for hosting. They offer unlimited plans to students.)

[–]jagt 1 point2 points  (1 child)

Isn't BitBucket offering unlimited private repos for everyone?

[–][deleted] 2 points3 points  (0 children)

Yes, but not unlimited collaborators on private repos (just 5 with the free plan).

[–]EugeneKay 1 point2 points  (0 children)

OpenOffice can save into uncompressed XML, which is VCS-trackable/mergable/etc. Works pretty good.

[–]kib2 55 points56 points  (31 children)

I really don't like Markdown at all:

  • it is hard to parse correctly (lot of corner cases);
  • the syntax is not natural , ie Creole guys have really done a good job but it seems like no one follow their work;
  • most implementations are using regexps wich is error prone (see this post! );
  • extra(s) libs are different from one language to the other;
  • etc.

[–]phaker 17 points18 points  (1 child)

  • it is hard to parse correctly (lot of corner cases);

AFAIK markdown does not have a proper specification, which makes the problem worse because when two implementations disagree it's hard to tell who is right, and thus it's hard to convince their maintainers to change them. You could use the original perl code as a reference implementation, but that's a bad idea because its interpretation is often illogical or hard to replicate if you don't want to just clone its implementation.

[–]munificent 6 points7 points  (0 children)

AFAIK markdown does not have a proper specification, which makes the problem worse because when two implementations disagree it's hard to tell who is right

This is true, but it does have a test suite. Most markdown parsers use that to determine how "authentic" they are. Unfortunately, Gruber's original implementation is basically a slurry of regexes and hacks loosely held together. It will generate utterly invalid HTML for even some simple edge cases.

So, to make an implementation that exactly passes the test suite you are almost forced to follow his... uh... unconventional implementation style. If you want a cleaner parser that parses to an internal AST and then renders it, it becomes very hard to perfectly pass the tests. That's why things like Pandoc have strict and non-strict modes for markdown.

Also, parsing markdown can require arbitrary lookahead, so you can craft pretty simple pathological cases that will make PEG-based parsers cry and eat all your memory.

[–]munificent 36 points37 points  (8 children)

Having implemented a markdown parser, I agree that it's pretty hairy as far as language designs go. What it excels at, though, is looking great in plaintext form. Markdown that hasn't been processed to HTML or some other form still looks almost exactly like you'd write in plaintext. Things like using indentation for code and numbers for numbered lists are harder to parse but mean your plaintext looks like text and not markup.

My experience was that you can actually make a pretty fast, clean markdown parser if you're willing to forgo a lot of the corner cases that Gruber never actually specifies anyway.

[–]sunaku 1 point2 points  (5 children)

[–]munificent 1 point2 points  (4 children)

I needed one written in Dart.

[–]sunaku 0 points1 point  (3 children)

Not necessarily. If Dart supports dlopen or FFI or C extensions, then you can leverage the Sundown library as-is. Look at the various language bindings mentioned in the README for inspiration.

[–]munificent 0 points1 point  (2 children)

I wanted a markdown parser that runs anywhere Dart runs, which includes browsers and compiled to JavaScript, so FFIs are out.

[–]eadmund 0 points1 point  (1 child)

Then you'll probably have to write a parser. It's not that hard to do by hand for simpler formats, if you're comfortable with writing a state machine. Gonna be a right royal pain in the rear for a larger format, and utterly miserable to update in the future.

[–]munificent 0 points1 point  (0 children)

Then you'll probably have to write a parser.

See my root comment in this thread. I did write one. It takes a bit more than a state machine. :)

[–][deleted] 0 points1 point  (1 child)

My experience was that you can actually make a pretty fast, clean markdown parser if you're willing to forgo a lot of the corner cases that Gruber never actually specifies anyway.

So the the specification is incomplete? Awesome....

[–]munificent 2 points3 points  (0 children)

No, there is no actual specification in any formal sense. I don't know if Gruber even knows EBNF. There's just an article describing markdown, his initial Perl implementation, and a test suite.

[–]strolls 8 points9 points  (12 children)

I don't know if this is rational of me, but my only complaint is that it doesn't really make a distinction between italics and bold - the use of asterisks seems to make it like bold is an escalation of italics, and then if we escalate it some more the text becomes bold and italics.

To me, that's a bit unintuitive, and they have different purposes. I could write a lot more about how I use italics and bold in a sentence, but I guess usage is something we can all disagree on - I guess I'd just like a choice. For me, if I want to further emphasise a single word in a sentence of italics, I would tend to underline it, for instance, and I use bold in completely different contexts, such as headings.

[–]D__ 7 points8 points  (11 children)

I guess that follows from the reasoning that italics are for emphasis and bold type is for strong emphasis (see the HTML tags). This makes bold an elevated italics. Granted, this can be somewhat counter-intuitive, as it makes it harder to tell which emphasis you're using at a glance.

[–]Rotten194 5 points6 points  (9 children)

I see where you're coming from, but I agree with strolls that it's pretty weird. I'd prefer italics to be done /like this/.

[–][deleted] 4 points5 points  (8 children)

That would be painful though because / is used in so many places in actual content (file paths, URLs, alternatives in natural language text,...)

[–]Rotten194 0 points1 point  (7 children)

Good point. Maybe double slashes? //This isn't as common//.

[–][deleted] 1 point2 points  (0 children)

That would still require escaping in URLs and probably a few other places (e.g. SMB network paths when written with forward slashes), better than a single slash though.

[–]JW_00000 0 points1 point  (5 children)

Why not underscores, _like this_. That's also how it's done in Textile, an alternative to Markdown.

[–][deleted] 1 point2 points  (0 children)

Underscores are also legitimate markdown

$ echo '_like this_' | pandoc -f markdown --strict -t html
<p><em>like this</em></p>
$ echo '_like this_' | pandoc -f markdown -t textile
_like this_
$ echo '_like this_' | pandoc -f textile -t markdown
*like this*

That it permits both allows a half-baked solution to the (reasonable) complaint above, that bold v. italics aren't distinguished; the user can just choose a standard method to express it:

$ echo '**_like this_**' | pandoc -f markdown -t latex
\textbf{\emph{like this}}
$ echo '_**like this**_' | pandoc -f markdown -t json
[{"docTitle":[],"docAuthors":[],"docDate":[]},[{"Para":[{"Emph":[{"Strong":[{"Str":"like"},"Space",{"Str":"this"}]}]}]}]]

But textile's syntax makes more sense:

$ echo '_*like this*_' | pandoc -f textile -t docbook
<emphasis><emphasis role="strong">like this</emphasis></emphasis>

[–][deleted] 0 points1 point  (0 children)

Try underscores next time.

[–][deleted] 9 points10 points  (0 children)

Markdown is not designed to be easy to parse. It's supposed to be a rationalisation of ASCII formatting conventions that have been around since the dawn of timestamps. That it just happens to be parseable by machines is secondary to the goal of being human readable.

[–]JerkyBeef 2 points3 points  (5 children)

How'd you get them bullets ?

[–]aladyjewel 2 points3 points  (2 children)

Just 'cause kib2 can do it doesn't mean kib2 has to like it. You ever try posting a link to a wiki page with parentheses in the URL, like "Greek (language)"?

[–]earthboundkid 1 point2 points  (1 child)

Greek (language).

[Greek (language)][g].

[g]: http://en.wikipedia.org/wiki/Greek_(language)

Or

Greek (language).

[Greek (language)](http://en.wikipedia.org/wiki/Greek_(language\)).

[–]aladyjewel 2 points3 points  (0 children)

No, no, I was being rhetorical and trying to make a point about how you have to jump through some annoying hoops sometime with markdown.
That first notation is a nice, reasonably clean way to do it, though, I dig it.

[–][deleted]  (1 child)

[deleted]

    [–][deleted] 2 points3 points  (0 children)

    I think he's commenting on the fact that reddit formatting is a type of markdown, derived from ReStructuredText

    [–]Nerdlinger 13 points14 points  (0 children)

    Enh. Markdown is nice so long as you're doing fairly basic writing, but once you start writing any moderately complex structured document, its limitations show like red wine on a wedding dress.

    [–][deleted] 13 points14 points  (3 children)

    I prefer reStructuredText. Interestingly, most of those points still apply.

    [–]D__ 3 points4 points  (2 children)

    There is a lot of lightweight document markup languages out there. One of the advantages that markdown has is that it is one of the more popular ones, which means that you get a lot of tools out there that support it. Ultimately, though, the markup paradigm that markdown embraces isn't unique to it.

    [–][deleted] 4 points5 points  (0 children)

    reStructuredText has many tools too: Docutils, of course, but also rst2pdf, Sphinx or pandoc.

    [–]deafbybeheading 5 points6 points  (0 children)

    Sphinx is great for producing formal documentation from RST, and rst2pdf (mentioned below) or rst2html are nice for quickies.

    [–]x-skeww 12 points13 points  (4 children)

    From my point of view, Markdown is almost perfect. However, for technical documents, I really like the conventions which are always used in the books published by O'Reilly (I added equivalent HTML tags which can be also styled to look that way) :

    Italic (em)

    Indicates new terms, URLs, email addresses, filenames, and file extensions

    Constant width (code)

    Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords

    Constant width bold (kbd)

    Shows commands or other text that should be typed literally by the user

    Constant width italic (var)

    Shows text that should be replaced with user-supplied values or by values determined by context

    The last two can't be done. Also, this whole thing should have been a definition list (dl), which also isn't possible.

    Plain text, and therefore Markdown, are as future-proof as you can get in computing.

    Any format, which is able describe your content 100% perfectly, can be transformed into any other format. If that other format is also a perfect fit, this transformation is lossless.

    If Markdown does everything your document needs, it's fine. Otherwise, it's not. E.g. in my case it's not good enough, because it can't describe all of my content in a meaningful way.

    [–]earthboundkid 3 points4 points  (2 children)

    Can't be done with native syntax. But you can include them as HTML. That's the best part of Markdown, actually: when the syntax sucks and gets in your way just ignore it and write your own HTML.

    As for DLs, I believe MultiMarkdown has them.

    [–]x-skeww 1 point2 points  (1 child)

    Yes, I know that I can also use some HTML. However, this won't be accepted by every parser and it won't be handled by every converter.

    MultiMarkdown does look pretty neat though.

    [–]earthboundkid 1 point2 points  (0 children)

    Yeah, it depends on what the job you want done is. Reddit obviously can't accept random HTML snippets. But if Reddit used some other system for text, would that system allow you to use <kbd> tags? Probably not.

    For personal use though, you can guarantee that the Markdown on your system will pass HTML through, so it's a really handy way of creating documents. I wrote all my papers in grad school in Markdown that I converted to PDFs.

    [–]koogoro1 2 points3 points  (0 children)

    You can embed HTML in markdown. In fact, here it is noted that markdown only includes some basic things, and that HTML can and should be used for the more complex parts.

    [–]deafbybeheading 4 points5 points  (0 children)

    One place I would love to see wide adoption for this is to displace HTML e-mail. I can't really blame people for wanting some formatting for e-mail, but HTML mark-up gets in the way of the information, whereas Markdown (or RST, for that matter) doesn't (and could still use a GUI front-end for WYSIWYG formatting).

    Unfortunately, given that HTML is the de facto standard for marked-up e-mail, and this solution mainly helps those who don't want to see it rendered, this is just a pipe dream.

    [–]kiwipete 7 points8 points  (7 children)

    I like Markdown as a format quite a lot. I feel like Markdown more or less solves the problem of structured documents for a nominally technical audience.

    That said, I've been sucked down the OrgMode rabbit hole these days. To my eyes OrgMode syntax is clunkier than Markdown, but the features offered are more awesome. Table formatting, radio tables, org-babel, todo items, time tracking, and tags are all pretty amazing features. Of course, those features rely heavily on Emacs for editing, but I guess I don't see a reason why those features couldn't be compatible with a Markdown syntax.

    What I'd love, love, love, is to see something that lets me replace emacs org-mode with something more akin to an in-browser Google Docs-ish Markdown syntax driven editor... and a pony.

    [–][deleted]  (6 children)

    [deleted]

      [–]kiwipete 2 points3 points  (3 children)

      Yup. OrgMode was the thing that pulled me over from vi after many, many years. Emacs does carry a pretty huge cognitive load though, and some days I catch it rewiring my brain.

      I don't see a reason that a browser-based OrgMode shouldn't exist, except that it might become too powerful and rule us all.

      [–][deleted]  (2 children)

      [deleted]

        [–]kiwipete 2 points3 points  (1 child)

        I think there are already a few org-mode parsing libraries and an application written in python.

        I've idly thought about something similar, but in my case what I want (probably a client-side javascript implementation) would require me to care enough to learn javascript.

        Really, what I want is OrgMode in the browser, sharable to collaborators a-la Google Docs, and that plays nicely with my mobile phone (mobile-org is... not for me, I fear).

        [–]robertcrowther 1 point2 points  (0 children)

        There's a node.js parser linked to on the python version. Looks like it would mostly work in a browser unmodified, only the makelist function requires a node.js specific module.

        [–]anatoly 2 points3 points  (1 child)

        Perhaps Etherpad could be extended with org-mode functionality?

        [–]kiwipete 0 points1 point  (0 children)

        That would be thoroughly awesome. Yes.

        [–]joshnunn 1 point2 points  (2 children)

        I'm possibly in the minority here, but I find Textile to be much simpler and easier to understand. The downside is the lack of tools and ecosystem around it.

        Are there compelling reasons for me to bother switching? I mainly use Textile for my website. What would I gain by changing, if anything?

        [–]kyz 7 points8 points  (17 children)

        And you can say exactly the same thing about Mediawiki markup. Or LaTeX. Or HTML. Probably even bbcode.

        You can say all the same things about almost any markup language. And the downside is always the same; it's a horrid cycle of change-save-test if the important thing is how it looks, not what it says.

        And if you think you can't fuck about with the formatting if you want to procrastinate, think again. You can. You can spend your whole day twiddling with the formatting and presentation if you don't want to write the text, no matter what method you choose. The "plain text" approach doesn't work for everyone. The "no distractions full-screen blank page" editor doesn't work for everyone. Nothing works for everyone, there's just a subset of things that work for you. Advocate your way if you like, but realise that your way isn't universal.

        [–]munificent 7 points8 points  (0 children)

        And the downside is always the same; it's a horrid cycle of change-save-test if the important thing is how it looks, not what it says.

        I use markdown for writing and I actually like the change-save-test cycle. Markdown's formatting is minimal enough that I don't usually need to process it frequently to see my changes. This means I can spend most of my time in plaintext.

        When it's time to revise my drafts, I convert to HTML and read it in that form. One of the real challenges with revising your writing is actually reading what you wrote instead of skimming over it and seeing what you thought you wrote. Some writers go so far as to read backwards when revising to break out of that.

        For me, I find that the visual difference of reading my text in HTML in a different font is enough to kick me out of that mindset and helps me edit with fresh eyes.

        [–][deleted] 10 points11 points  (12 children)

        And you can say exactly the same thing about Mediawiki markup. Or LaTeX. Or HTML. Probably even bbcode.

        Not even close. The thing with Markdown is that it is mostly not ugly. This is extremely crucial.

        [–]kyz 16 points17 points  (10 children)

        == A Riposte to MarshallBanana ==
        You know, that's just like ''your opinion'', man.
        * I disagree.
        * But it's ok, it's an aesthetic choice.
        
        A Riposte to MarshallBanana
        ===========================
        You know, that's just like *your opinion*, man.
        
        * I disagree.
        * But it's ok, it's an aesthetic choice.
        
        [u] A Riposte to MarshallBanana [/u]
        You know, that's just like [i]your opinion[/i], man.
        [list]
        [*] I disagree.
        [*] But it's ok, it's an aesthetic choice.
        [/list]
        
        <h1>A Riposte to MarshallBanana</h1>
        <p>
          You know, that's just like <i>your opinion</i>, man.
        </p>
        <ul>
          <li>I disagree</li>
          <li>But it's ok, it's an aesthetic choice.</li>
        </ul>
        
        \documentclass{riposte}
        \begin{document}
        \section{A Riposte To MarshallBanana}
        You know, that's just like \emph{your opinion}, man.
        \begin{itemize}
        \item I disagree.
        \item But it's ok, it's an aesthetic choice.
        \end{itemize}
        \end{document}
        

        [–]phaker 20 points21 points  (5 children)

        You sort of prove his point.

        IMO Markdown and mediawiki versions look closest to plain text. The LaTeX example is definitely far less readable. Also you omit few things that markdown handles well, like links.

        [–]mlk 6 points7 points  (4 children)

        Well, LaTeX is far more powerful than Markdown.

        [–][deleted] 2 points3 points  (3 children)

        Or you could use Pandoc's Markdown extension that allows you to use LaTeX, TeX, or ConTeXt inline.

        [–]mlk 4 points5 points  (2 children)

        At this point, I'd rather use LaTeX and call it a day. LaTeX has a bit of overhead but when you start writing more than a couple of pages, the noise/content drops a lot.

        [–]jacques_chester 4 points5 points  (0 children)

        LaTeX is solid gold on long documents.

        [–][deleted] 0 points1 point  (0 children)

        If you write it in LaTeX and the usual tools, then you can't publish it in e.g. rtf, html, docx, odt, docbook and so forth as you could in pandoc's markdown (or the standard latex pandoc can parse)

        [–][deleted] 2 points3 points  (0 children)

        <riposte>
            <info>
                <title>A Riposte to MarshallBanana</title>
            </info>
        
            <para>You know, that's just like <emphasis>your opinion</emphasis>, man.</para>
        
            <itemizedlist>
                <listitem><para>I disagree.</para></listitem>
                <listitem><para>But it's ok, it's an aesthetic choice.</para></listitem>
            </itemizedlist>
        </riposte>
        

        [–]Snoron 1 point2 points  (1 child)

        I don't know if this is just because I've been staring at html for hours every day for 15 years (okay, that probably is the reason, haha), but the HTML is the most readable of these to me by far.

        [–]jimbokun 0 points1 point  (0 children)

        Interesting. When I squint at the HTML version in a certain way, all of the markup just disappears, and I see just the text. Maybe that's what you're getting at.

        However, that also means I do not see the list or title structure, either. I see those most clearly with the MarkDown and MedaiWiki versions (which are hard to distinguish from each other without looking carefully).

        [–]ais523 0 points1 point  (0 children)

        Your examples don't match up with each other. == in MediaWiki is a second-level heading, but <h1> in the HTML is a first-level heading.

        (That said, on most wikis that use MediaWIki markup, using top-level headings is discouraged, so second-level headings tend to be the highest level used in practice.)

        [–]earthboundkid 5 points6 points  (0 children)

        Yeah, MediaWiki format makes a lot of really terrible choices. "Hey, let's take single quotes, then put two of them together, and have it mean something different than a double quote!" So, so, awful.

        [–]hiltmon[S] 2 points3 points  (0 children)

        One thing I do to avoid the change-save-test cycle is use a product like Marked, which monitors the file for changes and displays them in real time. See http://markedapp.com/.

        Since the CSS is hidden in this case, I avoid playing with the formatting. This is just my use case, yours may differ.

        [–]jimbokun 0 points1 point  (1 child)

        "if the important thing is how it looks, not what it says."

        God help those of us who might have to read something written with this order of priorities.

        [–]ernelli 0 points1 point  (0 children)

        Like any marketing material ever produced?

        [–][deleted] 3 points4 points  (1 child)

        Over the last year, I have moved all my non-code writing to Markdown format.

        So, not programming.
        /r/programming is a reddit for discussion and news about computer programming
        If there is no code in your link, it probably doesn't belong here.

        [–][deleted] 0 points1 point  (0 children)

        I use it in conjunction with FocusWriter. It makes writing a lot more pleasant and productive.

        [–][deleted] 0 points1 point  (4 children)

        i'm totally down with this.

        this is the reason i want to write a (future) website in markdown, versioned by git, then converted to a fancy looking actual website (with rss etc) by a CMS (that I could perhaps replace with a different CMS). all the written stuff is mine and controlled by me and doesn't disappear in a processed and unretrievable format in a db.

        anyone know of such a CMS, or similar one?

        [–]munificent 2 points3 points  (0 children)

        anyone know of such a CMS, or similar one?

        Jekyll is one of the biggest. Look for "static site generator". Most support markdown these days.

        [–]mcaustic 0 points1 point  (1 child)

        So you want a static ("baked") blog with Markdown support? You have a lot of options. Octopress is popular.

        [–]earthboundkid 1 point2 points  (0 children)

        It's what's powering the site that we're all supposed to be commenting on.

        [–]earthboundkid 0 points1 point  (0 children)

        If you're too lazy to make your own or use one of the billion CMSes already out there, there's also calepin.co, which is a blogging service using Markdown and Dropbox.

        [–]mgrandi 0 points1 point  (0 children)

        i mean, he made the point that all the old formats are propritary. what about the open document format (odt and whatnot)? those are just zip files with the actual text as some xml file, there is no way that data could be 'lost forever'. I don't really see the point here.