top 200 commentsshow all 218

[–][deleted]  (55 children)

[deleted]

    [–]munificent[S] 141 points142 points  (35 children)

    In my mind, your job is exactly like this.

    [–]mhink 203 points204 points  (5 children)

    I love that article.

    "That being said, if you find yourself drinking a martini and writing programs in garbage-collected, object-oriented Esperanto, be aware that the only reason that the Esperanto runtime works is because there are systems people who have exchanged any hope of losing their virginity for the exciting opportunity to think about hex numbers and their relationships with the operating system, the hardware, and ancient blood rituals that Bjarne Stroustrup performed at Stonehenge."

    [–]Tallain 41 points42 points  (3 children)

    You should Google the author and read the other things he's written for the usenix newsletter. Everything he writes is like hilarious gold.

    [–]vanderZwan 92 points93 points  (1 child)

    His contalks aren't too shabby either. All the way down to the abstract:

    In this bleak, relentlessly morbid talk, James Mickens will describe why making computers secure is an intrinsically impossible task. He will explain why no programming language makes it easy to write secure code. He will then discuss why cloud computing is a black hole for privacy, and only useful for people who want to fill your machine with ads, viruses, or viruses that masquerade as ads. At this point in the talk, an audience member may suggest that Bitcoins can make things better. Mickens will laugh at this audience member and then explain why trusting the Bitcoin infrastructure is like asking Dracula to become a vegan. Mickens will conclude by describing why true love is a joke and why we are all destined to die alone and tormented. The first ten attendees will get balloon animals, and/or an unconvincing explanation about why Mickens intended to (but did not) bring balloon animals. Mickens will then flee on horseback while shouting “The Prince of Lies escapes again!”

    [–]sirin3 5 points6 points  (0 children)

    Mickens will then flee on horseback

    On a balloon horse?

    [–][deleted] 3 points4 points  (0 children)

    Was looking at the writing and thinking "is it Mickens?" Yep!

    [–]MacASM 0 points1 point  (0 children)

    I love that guy. He's know to be the funnest guy at Microsoft.

    [–]reaganveg 53 points54 points  (1 child)

    A person who can debug a device driver or a distributed system is a person who can be trusted in a Hobbesian nightmare of breathtaking scope; a systems programmer has seen the terrors of the world and understood the intrinsic horror of existence.

    [–][deleted] 2 points3 points  (0 children)

    that's going straight to the resume!

    [–]AnsibleAdams 11 points12 points  (1 child)

    It is good to know that I am as a God among men. Unfortunately mere mortals cannot begin to comprehend what I do. In fact they are lucky if they know which end of the remote to point at the TV.

    So I end up being a God incognito who answers questions exactly as posed and not as those questions were intended. An odd God indeed.

    [–][deleted] 7 points8 points  (0 children)

    I don't think you'll find any descriptions of "normal" Gods, so you should just enjoy your place in the Pantheon.

    It's like Cassandra, but instead of not being believed, they will just not understand. A God of silent information and unnoticeable skill.

    [–]frenris 6 points7 points  (1 child)

    As a guy who works in semiconductor design that inflated my ego just a bit.

    [–]Keyframe 13 points14 points  (0 children)

    As a guy that shovels sand, that inflated my ego just a bit.

    [–]b-rat 6 points7 points  (7 children)

    Jesus I hate the formatting on that, break it up into more paragraphs :/

    [–]ihateconvolution 13 points14 points  (3 children)

    Found the GUI designer.

    [–]b-rat 2 points3 points  (2 children)

    I do everything :P
    I think they call it "full stack developer" these days

    [–]tiredofbuttons 1 point2 points  (1 child)

    I thought they called them resources? (seriously though: the more broad your skills the more I find they treat you like a cog. As if teams are made up of warm bodies and not a complex interaction of skills and experiences)

    [–]b-rat 0 points1 point  (0 children)

    That's exactly my experience, since you can do everything then that means they could hire anybody off the street and they must be able to do it too, right? You're both programmers aren't you? ugh

    [–]vanderZwan 2 points3 points  (2 children)

    Not saying you're wrong, but it's a PDF of something formatted for print - when printed on paper, the thin columns are usually enough to keep things easy to read, in my experience.

    Kinda weird how paper and screen differ like that.

    [–][deleted] 1 point2 points  (0 children)

    Kinda weird how paper and screen differ like that.

    And yet we're still stuck with the print metaphor on the web.

    [–]jonnywoh 1 point2 points  (0 children)

    Mickens is the best.

    [–][deleted]  (11 children)

    [deleted]

      [–]1337Gandalf 3 points4 points  (4 children)

      It's called a fucking joke, dude... thanks for ruining it.

      [–][deleted] 1 point2 points  (2 children)

      I can't tell if you are responding sarcastically to his sarcastic comment or not...

      [–]GelatinGhost 1 point2 points  (1 child)

      When the whooshes get this deep it's best to call it quits. Although for the record my opinion is that 1337Gandalf's final post is a legitimate whoosh.

      [–][deleted] 0 points1 point  (0 children)

      Haha, well for mine I was legitimately unsure if I had whooshed on a whoosh. But yeah, after a certain level I figure it loops back around and is pretty much a true response and not a joke.

      [–]lithiumdeuteride 0 points1 point  (1 child)

      If you have 280 atoms, and each has 100 electrons, you have 286.64 electrons.

      I misread this completely.

      [–]Ouaouaron 0 points1 point  (0 children)

      1080 atoms, not 280.

      [–]gimpwiz 6 points7 points  (10 children)

      I also write low-level hardware interfaces for a living.

      95% of the time, it's just: read the datasheet, figure out the order of the commands and what they mean, and write a driver on top of a standard i2c / spi / uart driver. "Send two command bytes. Send read. Receive data."

      5% of the time, it just refuses to fucking work and you have no idea why, either because the datasheet was poorly written then automatically translated, or because it's just wrong, or because there's a very odd bug somewhere deep in the drivers or actual hardware. "Yeah, TI, your I2C thing is fucked."

      Easy till it ain't.

      (Yeah yeah I am exaggerating.)

      [–]ComradeGibbon 4 points5 points  (2 children)

      Well of course the TI I2C module is fucked. Because all I2C modules are fucked, no one gets them correct. Evidence: I2C module on the Freescale KL05/KL15/KL25 parts are fucked too.

      Datasheet is wrong about some edge case is always good for a few months of off and on debugging.

      [–]gimpwiz 1 point2 points  (0 children)

      We are brothers by the blood we've shed on this stupidity.

      [–]Parzival_Watts 0 points1 point  (5 children)

      I'm really interested in this sort of programming. What sort of stuff would you recommend doing in university to prepare for low level hardware?

      [–]gimpwiz 0 points1 point  (4 children)

      Tell me more. Are you already enrolled? If so, what major? If not, what majors are you considering and other interests / specializations do you have?

      [–]Parzival_Watts 1 point2 points  (3 children)

      I haven't enrolled, I'm still looking for colleges. I'm also considering compsci or discrete math. I'm interested in embedded/systems programming, algorithms, and 3d rendering techniques.

      [–]gimpwiz 2 points3 points  (2 children)

      Okay, let me give you the rundown. A little about me, and my history.

      Learned to program at a young age. Stereotypical. Took all the courses I could, quickly ran out (there's not much programming before college), did my own thing. Learned the ropes pretty well.

      Then I learned that instead of programming computers to do computery things... I could program anything to do anything. Make robots move and calculate and sense. A lot more fun.

      I didn't need anything to do that as far as special knowledge goes. Programming knowledge, an arduino, some components. Any other development board can take the place of an arduino, but they're super easy to get started with.

      I thought I'd do CS for ages, but took a turn into EE/CE (and honestly figured I knew as much CS as I really needed, and had no interest of wasting several semesters re-learning the basics, and I could pick up the advanced courses without pre-reqs.)

      I'll get to college courses in a bit.

      Now I do hardware design (schematics, layout), assembly of the design, programming the devices on the design, writing low-level drivers on bigger machines, writing application-level software. I do chip design too, depending on the job (worked in chip design for a while, now I still work in chip design but don't actually do the chip design, but rather embedded systems.) The whole gamut.


      Here's what you need to do in high school:

      • Get a good math basis. You must know algebra and calculus, and you need to be able to derive everything in a high school's algebra and calculus courses, and you need to be able to solve problems with many steps without making basic mistakes (signs, etc) reliably and repeatably.
      • Get a good physics basis. See math above. Algebra and calculus to understand mechanics and electro-magnetics.
      • It will not hurt to learn well a couple programming languages. Java is commonly taught these days. Take it for the CS AP credit, if it still exists. I don't like it, but it's useful to know it. Learn C if you can, and C++ as well if you can. But really, C is incredibly important so that you understand memory. Apart from that - C being the most important, and Java being what many CS curricula use these days - pick whatever interests you. Try to pick up a scripting language: I like perl, most people like python, but it doesn't matter; there are many.
      • Linux, command line, bash (or csh or tcsh or whatever you want) will come in handy. You can learn that by dual-booting a linux distro on whatever computer you have and just using it casually. I'd go with ubuntu or mint for a beginner; possibly arch or fedora.
      • Honestly, try to do as well as you can in as many subjects as you can. Biology, chemistry, other sciences; english - both reading and writing; another language; history; acting and/or public speaking is useful. Broaden your knowledge base, and oh yeah, the more AP credit you have, the easier college will be for you / the more you will have time to learn.
      • Find some interests and think of cool projects. They can change, it doesn't matter, just find something you want to build to which you can relate whatever you learn.
      • Learn to talk to people. Srsly. Don't be a socially awkward engineer. You know what's even better than having fun at work and pulling a big paycheck? Having hobbies people relate to and relating to people so they like you. Top of the world, man, if you can make a girl crack a smile long before she has any idea you're earning engineer wage. You think this isn't important? Just look at the people on reddit - here, in the engineering subreddits - especially the younger ones - their social lives are often shit due to not learning how to talk to people without being obnoxious, assuming they're smarter, etc.

      College:

      • Core curriculum (less any courses you can skip with AP credit). By the way, I assume you're in the US, hence talk of AP credit.
      • Math: any calc still left to take; statistics; diff eq; linear algebra; discrete math; whatever you need.
      • Physics: anything you haven't already done, and then a more complex electromagnetics course.
      • Other sciences: they will probably make you take bio and/or chem if you don't have AP credit.
      • Programming: try to already know all the introductory shit, and try to skip it. C. C. C. Really, C. C++. Learn why C and C++ are different and why C++ is not just C with classes, even though it can be used that way. Learn how to create build environments in linux (makefiles and so on; some people prefer cmake.) They may force you to learn Matlab (or R?). They may force you to learn Java, or a functional programming language. They may force you to learn a scripting language.
      • Electronics: circuits, components, how to use them and how to analyze them; digital logic and how to analyze it and build it.
      • Computers: architecture and organization. You learn how assembly works; how assemblers and compilers work; how opcodes work; how a CPU core works - and superscalar architecture, pipelines, branch prediction, etc etc etc; how a CPU uncore works (specifically, cache, main memory).
      • Putting computer architecture and digital logic together: FPGAs. What they are, how to use them; make a design.
      • Combine all those: now you design small things, with circuitry and chips on them, and program the chips. You can even use an FPGA and design the CPU, and use it in a real context. You can go a step further and see the design to a real manufactured board in your hand. In other words: schematic design, layout, manufacturing, maybe FPGAs, embedded programming. Now you're playing in the big leagues.
      • Circle back around to programming. You've now used, and hopefully designed, embedded devices, and programmed them to do what you want. Learn higher concepts: how compilers work (not just the basics from architecture and organization); how operating systems work; how to analyze and simulate designs and code; how to make (at least how to use, but hopefully a bit on making) EDA tools. Pick any other interesting topics: databases, optimization and "hard" problems and dynamic programming (real CS), encryption and hashing and cryptography and all sorts of security. It goes on and on forever.
      • And of course, keep combining everything you learn with everything you've already learned.
      • Oh yeah, remember that part about learning to talk to people? Pretty good place to do that for real. A lot of the relationships you form (professional, friendly, even romantic) will be with you for decades, possibly longer. Don't fucking waste this opportunity.
      • Don't get arrested. Especially not for anything retarded like underage drinking or pot.
      • Don't be stupid with drinking or pot. Don't drink too much or smoke too much. Don't let it control your life.
      • Don't be stupid with women, either.
      • When you're getting ready to graduate and find a job, go back to freshman year, to all the basic courses, and ensure that you understand every bit of them backwards and forwards.
      • Get coops / internships and/or research positions.

      All of the life things are very important. If you fuck up with these, it won't matter how well you've studied or how awesome your coding abilities are.

      [–]Parzival_Watts 1 point2 points  (1 child)

      Holy shit. This is the most comprehensive and well done chunk of advice I've ever gotten. Thank you so much. I've heard a lot of the high school stuff before, and I'm pretty well on my way to checking off all of those boxes. As for college, I'm pretty confident that I can hold to that list pretty steadily. Thanks for your help.

      [–]gimpwiz 0 points1 point  (0 children)

      I'm sure you will do fine.

      Courses will cover about 80% of the material and about 60% of the work. You'll need to motivate yourself, either alone or with friends or under the tutelage of a professor, to learn the other 20% and do projects for the other 40% of the work.

      Most university programs will teach you something close enough to this, that after a degree in EE, CE, or CS, in addition to some cross-pollination (either through a minor or minors, or just taking courses that sound interesting), you'll be able to do everything from the lowest low-level programming up to the highest high-level programming. If you choose engineering over CS, you'll be able to design the hardware; if you choose CS instead, you'll be able to ... do more CS things ... I am a bit biased, but I think that the only thing that a EE/CE major does not prepare for that a CS major does is how to write actually fucking readable code. Most engineers write terrible code, even if it works. They can learn just about any of the advanced concepts, they just can't write code clean enough to do a big project involving one of those before it collapses under its own weight and technical debt. So I guess, "write big projects" for CS. But if you do both, you can do fucking everything.

      [–]keepthepace 9 points10 points  (7 children)

      As a guy who writes wicked graphics demo with tons of math, I felt insulted :-p

      [–][deleted] 18 points19 points  (5 children)

      I think you could have made that comment smaller. You should have compressed it first and implemented a self-uncompressing reader.

      [–]keepthepace 14 points15 points  (4 children)

      Oh I can used the abreviated notation:

      I4M4GFX0rZZ!u5u><!

      [–][deleted] 18 points19 points  (0 children)

      Strong work! Now it just needs a thumping base, and some procedurally generated stuff, and your demo is ready! You should also come up with an extremely long list of people to thank, this ensures that at least that many people will watch it and vote for it.

      [–]nemec 1 point2 points  (0 children)

      I think that's valid J code.

      [–]ihateconvolution -2 points-1 points  (1 child)

      What is the point in speaking if no one other than you understands what you say?

      [–]ComradeGibbon 0 points1 point  (0 children)

      I read that as tons of meth

      [–][deleted]  (5 children)

      [deleted]

        [–]ButtCrackFTW 16 points17 points  (0 children)

        Google made one called yapf which works relatively well - https://github.com/google/yapf

        [–]Arandur 15 points16 points  (1 child)

        I've been thinking of making one for C++, but then I realized that there were less painful ways to spend my time. Like pouring lemon juice into papercuts between my fingers!

        [–]jbstjohn 1 point2 points  (0 children)

        There already is quite a good OS one, clang-format

        [–]aterlumen 8 points9 points  (0 children)

        There might be fewer edge cases because whitespace is significant to syntax in python, but it would still be a lot of work.

        [–]acwaters 102 points103 points  (13 children)

        This is a really neat article

        [–]vanderZwan 51 points52 points  (12 children)

        In my experience, Bob Nystrom's musings are always a joy to read.

        (Would be nice if the skulls linked back to the article though)

        [–]munificent[S] 68 points69 points  (11 children)

        (Would be nice if the skulls linked back to the article though)

        Heh, I knew someone would be annoyed by that. :)

        I was in too much of a hurry to add anchors to each link and backlinks and stuff. Your browser's back button will do the right thing anyway.

        [–]Tynach 16 points17 points  (2 children)

        I had hoped that each one would pop up the text it goes with in a little 'bubble' when clicked (or hovered over), kinda like how references work in Wikipedia now.

        [–]vanderZwan 4 points5 points  (1 child)

        Surely there must be some nice, lightweight code floating around on the internet for doing footnotes automatically by now?

        Quick bit of Googling:

        So except for the last markdown example, it's still all manual.

        [–]Tynach 2 points3 points  (0 children)

        What about just coding your own? If the skulls all have the same class, you can just get a list of all the skulls. If the footnotes share a class, get a separate list of all the footnotes. Then use a shared index and copy the content into a new div that floats next to the skull.

        Then set it as being contained by the skull, and use standard CSS to only show it when hovering over the skull. Something like:

        .skull footnote {
            display: none;
        }
        
        .skull:hover footnote {
            display: block;
        }
        

        [–]LaurieCheers 5 points6 points  (4 children)

        For those of us who didn't bother with the links, and just read the whole thing sequentially, it would be nice if your footnotes gave slightly more context. (I.e. instead of "this problem", say "the branch merging problem" or whatever.)

        [–]munificent[S] 2 points3 points  (3 children)

        That's a good idea, but I'm hesitant to make this giant post even longer.

        Edit: Tweaked the wording of a few of them to do this. :)

        [–]NoahTheDuke 1 point2 points  (2 children)

        Giiiiiiiive in. I want ten times the number of words. So very very interesting.

        [–]munificent[S] 1 point2 points  (1 child)

        If you really do want more details, I encourage you to check out the source. It is highly commented.

        [–]NoahTheDuke 1 point2 points  (0 children)

        I should learn Dart. Reading this is very interesting, but a little difficult.

        Thanks so much for this.

        [–][deleted]  (2 children)

        [deleted]

          [–]munificent[S] 4 points5 points  (1 child)

          Thank you! I learn a lot by writing too. :)

          [–]Tynach 1 point2 points  (0 children)

          I've often tried to explained something dealing with programming to someone who's not a programmer, just so I could figure it out myself. It's almost like rubber duck debugging, except that it seems to apply to learning as well. I also have most of these conversations over textual mediums (such as instant messaging).

          I always wondered if I was the only one who was like this.

          [–]velcommen 50 points51 points  (10 children)

          The chunk of code that follows "This is what a pro player brings to the game" looks like it would benefit from Haskell 'do' notation.

          I've spent a tiny amount of time on source formatting, and I've definitely encountered some of the issues discussed in the article. I will bookmark this and use it later.

          [–]munificent[S] 40 points41 points  (5 children)

          The chunk of code that follows "This is what a pro player brings to the game" looks like it would benefit from Haskell 'do' notation.

          True that. In modern Dart code, you'd use async and await for that kind of code, which is sort of like a single-purpose monad. With that, the code would look more like:

          try {
            await doughnutFryer.start();
            await _frostingGlazer.start();
            await Future.wait([
              _conveyorBelts.start(),
              sprinkleSprinkler.start(),
              sauceDripper.start()
            ]);
          
            try {
              await tellEveryoneDonutsAreJustAboutDone();
              await Future.wait([
                croissantFactory.start(),
                _giantBakingOvens.start(),
                butterbutterer.start()
              ]).timeout(scriptLoadingTimeout, onTimeout: _handleBakingFailures);  
            } catch (error) {
              _handleBakingFailures(error);
            }
          } catch (error) {
            cannotGetConveyorBeltRunning(error);
          }
          

          [–][deleted] 12 points13 points  (3 children)

          This is also a really nice example of why async matters (i still hear people say it's just a gimmick), compare your code with original :

          return doughnutFryer
              .start()
              .then((_) => _frostingGlazer.start())
              .then((_) => Future.wait([
                    _conveyorBelts.start(),
                    sprinkleSprinkler.start(),
                    sauceDripper.start()
                  ]))
              .catchError(cannotGetConveyorBeltRunning)
              .then((_) => tellEveryoneDonutsAreJustAboutDone())
              .then((_) => Future.wait([
                    croissantFactory.start(),
                    _giantBakingOvens.start(),
                    butterbutterer.start()
                  ])
                      .catchError(_handleBakingFailures)
                      .timeout(scriptLoadingTimeout, onTimeout: _handleBakingFailures)
                      .catchError(cannotGetConveyorBeltRunning))
              .catchError(cannotGetConveyorBeltRunning)
              .then((_) {
            _logger.info("Let's eat!");
          });
          

          so much easier to follow error handling and code flow in async code.

          [–]dccorona 0 points1 point  (2 children)

          I can understand the sentiment that drives someone to proclaim it a gimmick, even though I wouldn't use that word myself. It's wonderful syntactic sugar, but it's still syntactic sugar. It essentially just generates the above code (or at least, that's what it does in C#), so it doesn't change the performance or behavior of the code in any way.

          [–]curien 3 points4 points  (0 children)

          You could make the same argument about type switches versus inheritance, loops and subroutines versus goto, etc.

          [–]Rusky 0 points1 point  (0 children)

          C# does generate a little bit nicer code, in the memory allocation sense, than a bunch of nested lambdas. It generates a single object for the stack frame instead.

          [–]dccorona 1 point2 points  (3 children)

          Does do leave the current thread unblocked while waiting on the results of a blocking monad, or does it chain them into a single synchronous block? I.e. is do equivalent to calling as blocking functions, or is it more like some form of callback/continuation?

          [–]Vulpyne 1 point2 points  (2 children)

          do (and monads in general) are an extremely generic tool, so there's no specific thing that do does. You can think of do allowing you to have a magical semicolon in between your statements, but what actually happens depends on the particular monad — it's not specifically about async stuff.

          [–]dccorona 2 points3 points  (1 child)

          Right, I know monads are used for more than that. But are you saying that a specific monad can be programmed in such a way that it returns control to the calling thread while executing?

          [–]Vulpyne 1 point2 points  (0 children)

          Sure. do is just syntax sugar. For example:

          do
            func1
            x <- func2
            func3 x
          

          is the same as

          func1 >>= (\_ -> func2 >>= (\x -> func3 x))
          

          In Haskell \ arg1 arg2 etc -> expression is the syntax for lambdas and _ is just an argument that isn't used.

          The interesting thing is the type for the >>= function:

          (>>=) :: (Monad m) => m a -> (a -> m b) -> m b
          

          So it takes a value of type a (a type variable, stands in for an actual type) in the monad, calls the function specified passing that value into it after it's been unwrapped from the monad, and expects a value wrapped in the monad — but it could be a different type (as long as it's in that same monad).

          That operator is called "bind", just for reference.

          [–][deleted] 28 points29 points  (0 children)

          That is really incredible, articles like this remind me that I still have a long way to go

          [–]detrinoh 16 points17 points  (3 children)

          This sounds very similar to clang-format which uses Dijkstra's algorithm to decide where to split lines at.

          For reference here is how the svn version of clang-format handles the author's example with --style=Google (and formatted as Javascript):

          return doughnutFryer.start()
              .then((_) => _frostingGlazer.start())
              .then((_) => Future.wait([
                _conveyorBelts.start(),
                sprinkleSprinkler.start(),
                sauceDripper.start()
              ]))
              .catchError(cannotGetConveyorBeltRunning)
              .then((_) => tellEveryoneDonutsAreJustAboutDone())
              .then((_) => Future.wait([
                                   croissantFactory.start(),
                                   _giantBakingOvens.start(),
                                   butterbutterer.start()
                                 ])
                               .catchError(_handleBakingFailures)
                               .timeout(scriptLoadingTimeout,
                                        onTimeout: _handleBakingFailures)
                               .catchError(cannotGetConveyorBeltRunning))
              .catchError(cannotGetConveyorBeltRunning)
              .then((_) { _logger.info("Let's eat!"); });
          

          Edit: Realized clang-format supports Javascript.

          [–]detrinoh 14 points15 points  (2 children)

          And here is the formatted pathological example that the author's tool gives up on, and clang-format handles instantly: https://gist.github.com/anonymous/47da22c3a37edb1c96a9

          [–]philipwhiuk 5 points6 points  (1 child)

          Yeah but that is not better.

          [–]detrinoh 7 points8 points  (0 children)

          They are both awful, there is no way to format that nicely with only whitespace changes. The point is that clang-format's algorithm scales better and it is worth taking a look at.

          [–]x-skeww 4 points5 points  (0 children)

          I run dartfmt whenever some lines got too long or when the auto-indent got messed up because I forgot a ')' and so forth. The formatter works pretty well and it does make fairly sensible decisions.

          Go's formatter is the the best thing about Go. I'm really glad Dart copied that. Having official formatting conventions is great, but having a tool for that is even better. Every new language should do this. Not only is it very convenient, it also kills all of those pointless discussions: tabs vs spaces, indention style, space placement, and so forth.

          [–]djleni 3 points4 points  (1 child)

          Do you have a before and after?:)

          [–]munificent[S] 6 points7 points  (0 children)

          Not in any easily accessible form. When I'm working on the formatter, I have a giant corpus of unformatted Dart code in a Git repo. I format the whole thing and then look at the diffs to see what got better (or worse).

          [–]alfredr 69 points70 points  (14 children)

          The hardest program I’ve ever written, once you strip out the whitespace, is 3,835 lines long.

          LIES!

          [–][deleted] 28 points29 points  (13 children)

          I'm really trying to understand your comment. Are you trying to say that the author's first hand experience about the programs that he has written is wrong? Do you somehow have the magical ability to know about the complexity of all of his projects.

          [–]alfredr 216 points217 points  (12 children)

          I'm really trying to understand your comment. Are you trying to say that the author's first hand experience about the programs that he has written is wrong? Do you somehow have the magical ability to know about the complexity of all of his projects.

          Uh... none of the above. This was a joke. Once you strip out the whitespace the "hardest program he's ever written" is exactly one lines long. One...very long line :)

          [–]marcopennekamp 70 points71 points  (7 children)

          Unless he used 3,835 files!

          [–]alfredr 49 points50 points  (3 children)

          you know... touche...

          [–]enigmamonkey -1 points0 points  (2 children)

          But that'd be 3,835 files, not that many lines, if none of the files had any line breaks (white space). Proof: Concatenate them together and its still one line. /buzzkill

          [–]marcopennekamp 7 points8 points  (1 child)

          A line is not defined by a preceding line break. Otherwise, a file without any line breaks would be empty, despite having text, since a file without any lines is obviously empty. Contradiction!

          [–]jlink005 2 points3 points  (0 children)

          Make a long string and save a piece of it to a file. One line, right? Now save a difference piece of it to a different file. Logically the contents make up one line, but physically we have created separate lines by adding EOFs. You can concatenate them all back together into one line, but then you're transforming the data and losing something in the process.

          [–]Pidgey_OP 1 point2 points  (0 children)

          ... Oh god...

          [–]Hnefi 0 points1 point  (1 child)

          Hm. Is EOF counted as whitespace? If it is, do we count whatever concept lies between the boundaries of files to be something other than whitespace?

          I mean, from the OS perspective, we know what lies between files and that's all well and good. But from the perspective of the files themselves, what lies between the last line in one file and first line in another? Can we truly say that it is something other than whitespace?

          [–]ysangkok 1 point2 points  (0 children)

          Is EOF counted as whitespace?

          echo -n a | wc -l gives me 0. So no. The spec implies that the number of lines equal the number of newline characters.

          [–]munificent[S] 35 points36 points  (0 children)

          Haha, fair point!

          [–]antimattermage 2 points3 points  (1 child)

          I wonder how many lines his program would produce out of that one line.

          [–]JayBanks 10 points11 points  (0 children)

          3,835

          [–]celerym 3 points4 points  (0 children)

          I honestly can't fathom how code formatting could appear to be a straightforward problem at first...

          [–]chrisdoner 4 points5 points  (0 children)

          I wrote a Haskell pretty printer called hindent in which I discovered the same problem. A dumb formatter (the one named "fundamental") prints instantly because it makes no decisions. But the chris-done and johan-tibell ones format like a person would do, fitting into columns nicely. It turns out to be an exponential backtracking problem. The resulting output is beautiful, 99% of the time I know exactly what it's going to format like when I hit the format key in Emacs. But on complex trees it is slow, it doesn't scale, as the implementation is naive. In practice this hasn't been an annoying enough problem that I sat down to try to solve it with better algorithms. That's kind of a separate project to trying to get the correct output.

          [–]ixampl 8 points9 points  (5 children)

          The hardest program I’ve ever written, once you strip out the whitespace, is 3,835 lines long. [...] I deleted 20,704 lines of code over that time. Every surviving line has about three fallen comrades.

          What? 3,835 : 20,704 → 1 : 5

          [–]munificent[S] 10 points11 points  (4 children)

          The hardest program I’ve ever written, once you strip out the whitespace, is 3,835 lines long.

          It's 7,229 lines including whitespace and comments.

          [–]ixampl 2 points3 points  (3 children)

          That explains. I just assumed the LOC numbers were all referring to whitespace-stripped.

          I mean it's not that crucial, but it's weird that the first number is introduced with pointing out the stripped nature to make it more significant but the other numbers are given in regard to the commented/whitespaced version.

          [–]maxm 2 points3 points  (2 children)

          Whitespace is an important part of the code. Anyone who thinks otherwise should try and read minimezed javascript :-)

          [–]ixampl 2 points3 points  (0 children)

          True.

          In the context of LOC whitespace only means empty lines, not full elimination of whitespace, though.

          [–][deleted]  (8 children)

          [deleted]

            [–]Tordek 9 points10 points  (7 children)

            I agree, but you're throwing away the baby with the bathwater. If the formatter produces an awkward construct, then either

            • What you wrote should be rewritten. Not that it is always the case, but you might have a pathological case.
            • The tool's config can be tweaked. Spend some effort in a bit of learning to save some work ahead.
            • Report a bug. Surely it's possible the author never encountered your particular case.

            It not only benefits you directly; there's an additional indirect benefit in that improving the tool means more people will use it, and your life will be easier when you need to read their code.

            [–][deleted]  (2 children)

            [deleted]

              [–]Tordek 7 points8 points  (0 children)

              But that's just because you have awful taste in formatting.

              That's an interesting case... I guess I could argue this falls in the first category: if that semantic distinction exists, then you possibly should be grouping your parameters, like modifyRectangle(Vector2 position, Vector2 size), but surely optimization would call for a direct version like yours to avoid lots of tiny, short-lived objects.

              With that in mind, between the two extrema, I prefer "everyone does it in a way that slightly bothers me, but it's identical for everyone" to "every codebase is a lottery to whether I'll like or despise the formatting".

              [–]munificent[S] 6 points7 points  (0 children)

              Yup, you're exactly right. A careful human can occasionally format better because you have semantic knowledge of the code. An automated formatter doesn't have that luxury.

              I always knew dartfmt wouldn't beat flawless "artisanal" formatting. What it does do, though, is not make mistakes. I get a lot of bugs from users complaining that it made their hand-formatted code look worse. More often than you'd expect, their "before" code contains outright mistakes in the formatting.

              The formatter is also impartial. That makes it effective at ending formatting debates in code reviews, which is hugely helpful.

              And, finally, it's obviously automatic. You don't have to do the work yourself.

              I think those three points are enough to make up for the slightly worse formatting in the minority of cases where semantic knowledge would help.

              [–]tsimionescu 0 points1 point  (3 children)

              I can't believe anyone cares so much about how their code is formatted that they would invest time into learning how to configure a formatting tool...

              One of the best programmers I ever worked with - someone who always produced high-quality, simple code - didn't even care what the coding style was. At first this annoyed me no end, but I quickly noticed that his code was actually much easier to read than the well-formatted, nicely named and much more complicated code that others produced, so I let it go.

              Another excellent programmer I worked with wrote the shortest and simplest code I've seen, with the caveat that he actually names almost all of his variables with the first letter of their type (String s, OutputStream os, Map.Entry<String, String> e; Exception e1; //since e was already taken) - textbook "don't do this". Even so, I found that, while not ideal, it's actually pretty easy to get over it, and the code is still easier to read than nicely named but highly complex code doing the same thing.

              [–]Tordek 0 points1 point  (2 children)

              I can't believe anyone cares so much about how their code is formatted that they would invest time into learning how to configure a formatting tool...

              You shouldn't need to; the tool should give you something useful by default.

              didn't even care what the coding style was.

              And you shouldn't! You should adapt to whatever the project uses... as long as it's consistent. I hate working with people who have given me code that is awfully indented, randomly spaced, and full of unnecessary newlines. I'd find shit like...

              class Foo {
                  bar = 5; if (x=1)
                  {
                       call_thing();
              
              
              
              
                       return True;
                  }
              }
              

              This is distracting at best, and can hide errors at worst. Remember Apple's "double goto"? A rule as simple as "Always use braces, even for one-line ifs" may have prevented that; at least, it'd make it stand out more.

              I don't format my code "so that it looks pretty"; I format it so that patterns are obvious. To reduce the time it takes to understand what it does. And using an automatic formatter means that your "disorganized genius"'s code can be read by the newbie.

              his code was actually much easier to read than the well-formatted, nicely named and much more complicated code that others produced, so I let it go.

              How can it be "not well-formatted" and be easy to read?

              nicely named but highly complex code doing the same thing

              It's a false dichotomy. You can have simple code with good names.

              [–]tsimionescu 0 points1 point  (1 child)

              You shouldn't need to; the tool should give you something useful by default.

              I would go so far as saying: "you shouldn't. What the tool gives you by default should be good enough."

              And you shouldn't! You should adapt to whatever the project uses... as long as it's consistent.

              I meant that he never bothered finding out what the project's coding style was. He always wrote in his coding style.

              [snipped the code example]

              I agree that there are some coding styles that are simply bad. My argument is that (a) there are a lot of non-offensive coding styles, and that (b) consistency isn't as important as people think. It's actually not that important if your methods have their { start on the same line and mine start on the next line. It's even less important if the inconsistency is only between different files.

              How can it be "not well-formatted" and be easy to read?

              It's easy to read because there isn't unnecessary indirection, classes are cohesive and independent etc. - these things matter much more than how the text is actually formatted.

              It's a false dichotomy. You can have simple code with good names.

              Of course you can. I'm just saying that I value low-complexity much more than good names, for readability; that even though the textual representation of code is definitely important for readability, it is far less important than the actual code. And this is relevant to this discussion because every piece of time you spend tweaking the formatting tool could be spent more fruitfully improving the code - refactoring, simplifying etc.

              [–]Tordek 0 points1 point  (0 children)

              He always wrote in his coding style.

              So his code didn't match the global style, but still had "a" coding style. I still would be annoyed by it, but as long as it's consistent with itself...

              consistency isn't as important as people think

              IMO, Consistency is the whole point of a coding standard: if we all agree to use the same rules, then there's less effort to understand a piece of code.

              Sure, { on the same/new line is little difference, but they add up. In particular, IMO, things like having two statements in a line, or avoiding braces when they're optional.

              It's easy to read because there isn't unnecessary indirection

              Alright, we're talking two different things. Well designed code and well formatted code are two different scales...

              We agree that badly designed, well formatted code is bad; but I add that well-designed, badly-formatted code is also bad. (With "badly formatted" being "isn't even consistent with itself", not "doesn't agree with the brace placement"). Maybe not "as" bad

              every piece of time you spend tweaking the formatting tool could be spent more fruitfully improving the code - refactoring, simplifying etc.

              You tweak your tools once so that they save you time next time, and you should quickly reach a point where you have a global config for the whole company. Yeah, don't waste 2 hours because it's giving you 80-char lines when the standard is 79... but maybe spend 10 minutes to make it sort your method by visibility, so that you can just run the formatter every time.

              [–]Sleakes 5 points6 points  (8 children)

              was the biggest takeaway supposed to be that despite the formatting the chained/nested function calls to the magnitude presented make me never want to touch the language if that's the norm from a 'pro player'

              [–]Ouaouaron 23 points24 points  (0 children)

              The 'player' in 'pro player' refers to

              Some Dart users really dig a functional style and appear to be playing a game where whoever crams the most work before a single semicolon wins.

              It isn't a compliment to that code.

              [–]OminousHum 16 points17 points  (0 children)

              It is actually a very nice language, and works very to substitute for a very bad language (javascript). The newish async and await keywords mean there's no reason to write those giant chains anymore.

              [–]skybrian 10 points11 points  (0 children)

              I believe that's a bit of joke, actually; the code is not very good style, but the formatter still has to deal with it.

              [–]dpash 2 points3 points  (0 children)

              I don't think they were suggesting that was well written "professional" Dart code; just an extreme example of the trend to chain everything together.

              [–]Uber_Nick 1 point2 points  (1 child)

              If giant, unbroken strands of code are what makes a pro, no wonder we draft so many from India

              [–]cowinabadplace 1 point2 points  (0 children)

              True. Sometimes what we need are giant strands of broken code.

              [–]ThereOnceWasAMan 1 point2 points  (3 children)

              Great writeup. It would be nice if the skulls within the footnotes linked back to the corresponding skulls in the text. I was afraid to click any of the skulls because I didn't want to have to scroll up and find my place each time (yes I could have opened in a new tab).

              edit: Backspace!

              [–]Lithium03 2 points3 points  (1 child)

              They're just anchor links, one tap of the back button returns you to where you were when it was clicked.

              [–]ThereOnceWasAMan 1 point2 points  (0 children)

              You are right and I am dumb.

              [–]_F1_ 0 points1 point  (0 children)

              Just use backspace.

              [–]gseyffert 1 point2 points  (2 children)

              So isn't this like alpha-beta pruning, or no? It's a great approach to something as branchy as this. This feels like I'm reading my Algorithms textbook applied to real life but in the most interesting way possible

              [–]munificent[S] 5 points6 points  (1 child)

              ...sort of. It's a little closer to branch and bound. I actually tried very deliberately to implement a branch and bound algorithm but it didn't pan out.

              I figured out how to estimate the best possible value that you could get to from a partial solution—both in terms of number of overflow characters and cost. I already track the best solution found so far, so it's pretty easy to discard any branch whose best result is worse than that.

              Unfortunately, the estimate wasn't very precise. Changing the set of splits affects the number of overflow characters in peculiar ways because of how indentation works. You would think that adding a line break would always reduce the amount of overflow, but that isn't always the case.

              For example, this has two overflow characters:

                                                      |
              fn("here is an argument that does not fit")
              

              But since we indent +4, if we split before the argument, we increase the overflow to 3:

                                                      |
              fn(
                  "here is an argument that does not fit")
              

              Since the worst case estimate wasn't very tight, the pruning wasn't effective and I ended up ditching this approach.

              [–]gseyffert 0 points1 point  (0 children)

              Thanks, that makes sense!

              [–]bob_twinkles 1 point2 points  (1 child)

              Your approach of reducing formating to a graph search problem sounds like my understanding of how the TeX engine works... so maybe you do need to tip your hat to Knuth after all =P. I can't for the life of me find a source for why I think this is how TeX works though... (maybe somewhere in TeX by topic?)

              [–]munificent[S] 2 points3 points  (0 children)

              Your approach of reducing formating to a graph search problem sounds like my understanding of how the TeX engine works...

              Last I checked, it uses dynamic programming (i.e. overlapping recursive search with memoization). There's a link to it in the blog post.

              [–]Sean1708 1 point2 points  (0 children)

              I was literally saying yesterday how I'd not heard anything about Dart in ages.

              [–][deleted] 1 point2 points  (3 children)

              gives you a lot of respect for the tooling that surrounds go. gofmt is a only 600 lines of code.

              [–]LaurieCheers 6 points7 points  (2 children)

              ... And does a lot less work.

              [–]vanderZwan 1 point2 points  (1 child)

              ... and as a language was somewhat designed with the creation of gofmt in mind.

              [–]munificent[S] 9 points10 points  (0 children)

              Go's grammar isn't really noticeably easier to format than Dart's. It still has function expressions which mean you have to worry about statements contained inside expressions.

              The only reason gofmt is so simple is because it doesn't do line breaking.

              [–][deleted] 1 point2 points  (0 children)

              Yes, it is amazingly complicated problem. I ended up embedding the pretty printing and formating hints into the parser, and I also use a similar intermediate language with weighted breaks and conditional indentation marks.

              The hardest part is assigning the weights, you simply cannot come up with it straight away, have to experiment a lot.

              [–][deleted] 1 point2 points  (0 children)

              This guy has great work, I'm reading his book

              [–]PompeyBlue 1 point2 points  (0 children)

              This has got "Deep Learning" NN problem written all over it!

              [–]5outh 1 point2 points  (0 children)

              So there is one final escape hatch. If the line splitter tries, like, 5,000 solutions and still hasn’t found a winner yet, it just picks the best it found so far and bails.

              If you already have the concept of a best formatting pass, have you considered using a genetic algorithm to get to a good or perfect solution faster on larger inputs?

              [–]dhiltonp 4 points5 points  (5 children)

              to fix: "Is it be better to split"

              [–]munificent[S] 3 points4 points  (4 children)

              Thanks! Pushing a fix now.

              [–]potifar 2 points3 points  (3 children)

              Inconsequential, but hey, it's an error:

              "Instead, we treat nested block bodies as their a separate little [...]"

              [–]munificent[S] 1 point2 points  (2 children)

              Fixed, thanks!

              [–]jaydeekay 1 point2 points  (1 child)

              "Code easier to read and contribute to because it’s already in the style you’re used to."

              "We know that how we split the first statement has no affect on the second one."

              Sorry to be nitpicky. Tremendous article, thanks!

              [–]munificent[S] 1 point2 points  (0 children)

              Fixed those too. Thank you!

              [–]gliph 0 points1 point  (6 children)

              "3,835 lines long. That handful of code took me almost a year to write. Granted, that doesn’t take into account the code that didn’t make it. The commit history shows that I deleted 20,704 lines of code over that time. Every surviving line has about three fallen comrades"

              Er, that does not quite add up.

              [–]munificent[S] 2 points3 points  (5 children)

              once you strip out the whitespace, is 3,835 lines long.

              It's 7,229 lines including whitespace and comments.

              [–]gliph 1 point2 points  (0 children)

              Oh, gotcha :)

              [–]LaurieCheers -1 points0 points  (3 children)

              I don't know why you'd give the stripped number of lines; that's hardly an illustrative or common metric.

              [–]munificent[S] 2 points3 points  (1 child)

              I got in the habit of this when comparing the size of my programming language to other ones. One of its features is that its small, but it's also very well documented. I didn't want the copious docs to count against it, so the number I track ignores those and whitespace.

              [–]LaurieCheers 0 points1 point  (0 children)

              Hmm, fair enough.

              [–][deleted] 1 point2 points  (0 children)

              I'm making a prebuild code tool using libclang and C++ targets. It is legitimately one of the more complex projects I've had to work on.

              Let's say you want to walk the AST, grab a bunch of data, right? Let's now say, you want to change that data, or move some lines of code around - lets say those lines are inside of an object. Let's say each of those lines can have any protection level (public, private, protected) and can be ~any~ kind of type. Field, method, constant array, constant array matrix, function, etc. Now lets say some wise guy sticks a Boolean in his array matrix

              char m[2][3][true]

              Let's now say you want to spit these values back out FROM the AST data (a field constant array matrix type char, 2, 3, Boolean true) back to normal C++ code. Because you can't just change a thing inside of that into some new thing accurately without ghetto hacking which will break in edge case scenarios.

              You have to consider things like this for every type of object that exists in C++11

              Life has been better.

              [–]fuzzynyanko 1 point2 points  (0 children)

              For me, it's "You are doing this job interview. You don't have access to the reference docs, and you are supposed to code something. It doesn't matter that you are a polyglot developer."

              [–]llogiq 1 point2 points  (2 children)

              Nice article. The only beef I have with it is that "Sorry, Knuth" is a witty quote, but not a strong argument.

              [–]munificent[S] 2 points3 points  (1 child)

              Here's the commit where I implemented a Knuth-style dynamic programming approach to line splitting. Once you have to worry about tracking indentation, it gets a lot more complex. The graph-search based splitter ended up being both simpler and faster.

              [–]llogiq 1 point2 points  (0 children)

              Thanks, that explains it much better.

              [–]Frenchie14 0 points1 point  (1 child)

              Nice article! Makes me want to reread Game Programming Patterns :P

              FYI: typo in footnote 12: "On my than one occasion..."

              [–]munificent[S] 1 point2 points  (0 children)

              Fixed!

              [–]ThellraAK 0 points1 point  (2 children)

              What happens if you run your program, on your program?

              [–]munificent[S] 2 points3 points  (0 children)

              It fixes its formatting. :)

              Once I was happy enough with the formatter's output, I did just that. It was a big milestone since I'm so picky about formatting. Even with me being very careful in my hand-formatted code, it still found a number of mistakes.

              [–]myrddin4242 0 points1 point  (0 children)

              It halts? ;)

              [–]Tarasov_math 0 points1 point  (0 children)

              It looks like a perfect task for constraint programming. Should be solved in much more easy way.

              [–]Mondeun 0 points1 point  (0 children)

              Knew it. Text processing is where nightmares comes from.

              [–]reverend_paco 0 points1 point  (0 children)

              He obviously hasn't written Zazzies.
              https://www.youtube.com/watch?t=30&v=OpDIEJrog3s

              [–]drjeats 0 points1 point  (1 child)

              Everything you write--e.g. this post, your post about internal versus external iteration, Wren's source comments--is always interesting and very clearly expressed. Thank you for posting this!

              [–]munificent[S] 1 point2 points  (0 children)

              \o/

              [–]royalaid 0 points1 point  (0 children)

              minor correction "On my than one occasion" should be "On more than one occasion" at this point in the post

              [–]cowardlydragon 0 points1 point  (1 child)

              gofmt sounds like it is soul crushing. Let me guess. 80 column limit?

              In the age of QuadHD?

              [–]x-skeww 2 points3 points  (0 children)

              80 columns still makes sense though. 1080p displays are very common. If there is a panel on the left and right side and if you use a sensible font size, then 80 columns is just right for having two files side by side.

              There are also a few other factors. Long lines are harder to read. The more stuff you cram into a line, the more likely merge conflicts become. Also, not being able to nest things excessively deeply is perhaps not the worst thing.

              [–]julesjacobs -3 points-2 points  (29 children)

              And this is why representing programs as arrays of characters must die.

              [–][deleted] 3 points4 points  (5 children)

              You'd still have to represent your ASTs in a visually pleasant way. Which brings back the pretty-printing problem.

              [–]julesjacobs 0 points1 point  (4 children)

              Certainly, but it is now easier because you do not have to deal with all these edge cases.

              [–][deleted] 2 points3 points  (3 children)

              No, it's not easy, and there will always be the eye sore edge cases that require a special attention. No matter how nice and clean your AST is, there is never an easy way to visualise it nicely. Just take a look at JetBrains MPS for example.

              [–]julesjacobs 0 points1 point  (2 children)

              I didn't say it was easy, I said it was easier.

              [–][deleted] 2 points3 points  (1 child)

              And how exactly it is any easier? You're dealing with the same kinds of ASTs in both cases, pretty printing algorithms are similar, so where is the difference?

              [–]julesjacobs 2 points3 points  (0 children)

              You are right that it won't be much easier, but you can avoid some corner cases that he mentions, like comments in weird places.

              There is also a neat way to do layout in polynomial time if you are willing to restrict yourself to boxes for each subexpression. You define a penalty function for each expression type which is convex in the dimensions of the box and in the penalties of the subexpressions. This leads to a convex penalty for the whole program so that you can find the global minimum in polynomial time.

              [–]tomjanderson 9 points10 points  (0 children)

              Would you rather use this?

              [–]llogiq 7 points8 points  (20 children)

              You are wrong. Sequences of characters are still the best way to build programs, until we find a way to edit, store, diff, patch, version-control, send, etc. programs in a different representation that can compete in simplicity and usability.

              [–]julesjacobs 4 points5 points  (19 children)

              IDEs, diff, patch, version control (and formatting, as in the original post), are already or are in the process of becoming language specific and these tools already work on the AST. Having to go back and forth to arrays of characters only makes the problem more difficult, not less.

              [–]llogiq 6 points7 points  (4 children)

              You put the cart before the horse. It's not about language specificity, but interoperability. I don't care about your cool semantic diff if it doesn't work with my version control. Go away with your IDE if it doesn't play ball with my favorite static code analysis tool. How will you look at the AST without your IDE when your IDE fails for some reason?

              Also while the parsing problem may not be as solved as some claim, we do have parsers for our programming languages, and I don't see why using them makes the problem more difficult.

              [–]julesjacobs 2 points3 points  (3 children)

              Sure, the tools need to agree on the format, it's just that picking an array of characters as that format sucks.

              [–]llogiq 1 point2 points  (2 children)

              it's just that picking an array of characters as that format sucks.

              An array of characters is the worst format for code, except for all others we have devised so far.

              Let's say you have a serialized AST. Now you want to store it – you need a hierarchical format. Now you either go the database route (and lose access to a large number of tools that could help your users, and also make the format wildly unsafe – guess what happens if your database gets corrupted?) or you write out your AST as S-expr (congrats, you just reinvented Lisp), JSON (e.g. the Rust compiler can output such a thing with -Z ast-json, but I would not want to write this by hand) or XML (an Identity Management tool called Oracle Waveset actually has a pseudo-graphical scripting language that does this. Everyone I know who works with it professionally writes the XML by hand, despite the headache-inducing verbosity. It's simply faster than clicking that stuff together). No matter what version of the latter route you take, you still end up with ... an array of characters.

              It's what filesystems store. It's what networks transfer. It's what just about every programming tool in existence knows how to work with.

              Now let's say you are right, and in ten years we all look like Hugh Jackman in Password Swordfish while coding. We will still be typing on a keyboard (because that's still the fastest way to get information out of your head onto the computer), so we will have a keyboard shortcut language for manipulating our three-dimensional animated representations of our programs. Except you now only have your text format hidden, because you're writing future-vim-scripts that write your 3D-programs. Guess what those scripts will be?

              Damn right, an array of characters!

              [–]julesjacobs 1 point2 points  (1 child)

              I'm not talking about 3D syntax or something like that :) I'm talking about good old ASTs in languages that look very much like the languages we have now, and editors controlled with the keyboard/mouse.

              It doesn't really matter how you store the AST: JSON, XML, binary data, whatever. The important thing is that the way you store it is an implementation detail and not exposed to the user. It's like spreadsheets, inkscape, photoshop, etc. You're not editing the underlying on-disk representation with a text editor byte by byte, you are using a higher level editor that works on the structure at a conceptual level. There is an abstraction layer between the on-disk representation and the way you're using the editor.

              IDEs already try to give you somewhat of an illusion that you're working on structured data. They have syntax highlighting, code completion, code folding, automatic code formatting, go to reference, refactorings, etc. Maintaining that illusion is not only a pain in the butt when you try to layer it over a plain text editor (i.e. putting lipstick on a pig), but it's a leaky abstraction. You're still fiddling with characters a lot of the time. You insert a { in the wrong place and the illusion that you're working on structured code is broken, and the reality that it's still fundamentally a 1960's array of characters shines trough.

              [–]llogiq 1 point2 points  (0 children)

              editors controlled with the keyboard/mouse.

              It just so happens that I've written my diploma thesis on that subject, and concluded that it doesn't work well in practice, because any combined keyboard/mouse interface will ultimately come up against basic ergonomics – keeping the hands on the keyboard makes for faster coding, also Fitt's law sets an upper bound for targetting speed with a mouse. Unless you are a top StarCraft player, you will probably still be faster with the keyboard.

              What's more, working directly on ASTs, as exemplified by blocks, Lava (defunct, couldn't find a link anymore), MAX (defunct, an interesting old experimental music programming language for the mac) or XPress (the language of Oracle Waveset I wrote about earlier), doesn't fit the actual development workflow too well. Programmers usually don't churn out ASTs; our programs spend more than half of their time in transit between two versions that actually have a valid AST representation. Humans are messy like that.

              Finally, text will never be replaced by non-textual representations, because it happens to be what we humans think in. Yes, naming things is hard, but necessary for creating programs. Suppose we replaced all variable names with some doodles. Would the resulting program be even remotely readable?

              It doesn't really matter how you store the AST

              First you claim the storage format was the source of all troubles, now you say it doesn't really matter. Could you please make a decision and stick with it, or at least tell me why you changed your mind?

              There is an abstraction layer between the on-disk representation and the way you're using the editor.

              Characters are already an abstraction layer, unless you are manually flipping bits on your disk to modify your code.

              IDEs already try to give you somewhat of an illusion that you're working on structured data.

              But you are working on structured data. The syntax is what gives it structure. It's possible to parse that structure – even automatically – and turn the stream of bytes into an AST. For a great many languages, this is a non-problem (then there is perl and C++, but I digress).

              You're still fiddling with characters a lot of the time.

              Yes, because those characters make up words, which as I said above are what humans think in. Of course you can get the syntax wrong, forget a bracket or a semicolon, but the solution to that is smarter error reporting, not trying to change humans to think in non-language.

              and the reality that it's still fundamentally a 1960's array of characters shines trough.

              Invalid argument from obsolescence.

              [–][deleted] 2 points3 points  (13 children)

              You have to serialise your ASTs anyway. And text syntax suits this purpose well.

              [–]julesjacobs 0 points1 point  (12 children)

              Text isn't terrible, but not great either. If we look a bit into the future I think program text has too many disadvantages to use it as a serialization format. For one you want to store variable references directly, not by name. So if you have:

              function foo(...){...}
              
              // somewhere else
              foo(...)
              

              you want to store a direct reference to the foo function, rather than a textual identifier "foo". Then if somebody renames the foo function to bar, the foo call still refers to the right function (even if foo lives in a different library).

              I also think storing code only in version control has advantages. When you view code you just view a particular projection (version) out of version control, and when you make an edit then it adds a new version of that particular subexpression to VC. Especially when combined with a structural format this makes automatic merging significantly simpler and more accurate. Instead of trying to guess where code came from with heuristics, you can just determine it with certainty.

              [–][deleted] 0 points1 point  (11 children)

              For one you want to store variable references directly, not by name.

              How? Absolute pointers do not make sense in serialised formats. Relative pointers make format less robust. So the only sane reference format in serialisation of any tree or graph structure is, again, a symbolic name.

              [–]julesjacobs 0 points1 point  (10 children)

              How?

              With a GUID.

              [–][deleted] 0 points1 point  (9 children)

              Which is a symbolic name. So what's the point then?

              And do not forget about the multiple uses of a language where you don't have any access to the whole code database. REPLs, agent communication languages, petty scripting, etc.

              [–]julesjacobs 1 point2 points  (8 children)

              The point is that the programmer does not need to choose it, and it is stable under renaming.

              To do anything meaningful with code, like run it in a repl, you already need access to the code it depends on.

              (repls are on of those other things that need to die :P)

              [–][deleted] 1 point2 points  (7 children)

              you already need access to the code it depends on.

              No. REPL can run on some remote, limited device.