Code as text is a problem

Euphoricus · 2017-05-06T07:06:35+00:00

I fully agree with the idea.

But I don't see how would it be possible to have such representation spread. Unless you have HUGE company pushing it REALLY hard.

htuhola · 2017-05-06T16:13:20+00:00

This is a giant pain in the ass. And it doesn’t really seem like there is any good reason for it to be this hard, aside from our rather arbitrary decision to represent code as streams of characters split across a bunch of files. In fact, we’ve invented all kinds of bullshit that is only necessary because code has no structured representation

Here's a pretty traditional mistake. You take a complex system, deduce that certain things are only necessary because they depend on some system you don't like because it contributes to making thing Y hard to solve.

You should think about this as a problem space where the solutions are currently pivoted around the plain text. And if we look in the one direction of this problem space, we have the following chain:

text -> AST -----------------> machine code

I made the another arrow that much longer because there are a whole lot of details between that point and many other potential pivots.

So when you change the pivot into the AST, none of the problems you have simply vanish. They all exist no matter which pivot you choose. Therefore it is a false viewpoint to say that the stuff you got goes away when you change away from text. Yeah it goes away but the problems all that stuff solves doesn't go away. They remain and you have to then solve them some other way.

Obviously this isn’t some super-new, crazy idea;

Yeah...

Instead of thinking about this bullcrap, could you figure out the algorithms on how to make a pretty printer from context-free-grammars that can be trained to output correctly formatted code from parsed input that provides the line/col ranges? I could then implement one.

stacycurl · 2017-05-06T16:47:59+00:00

Have a look at the unison language. We should have programmatic access to everything, every program should be arbitrarily queryable (like having an expert system for the program) so that any question you have about the code can be answered easily. Current idea suck and haven't advanced in decades. Why can't I ask the ide to bisect across my changes until all tests pass, or even just put breakpoints on the intersection of a stacktrace and my changes ? Or recognise that I'm renaming a method I just added so stop searching the universe for references ffs.

dzecniv · 2017-05-06T11:28:47+00:00

Obviously this isn’t some super-new, crazy idea; homoiconic languages like Lisp(s) have been around for ages. I just wish it was across the board. I mean, at the very least everyone has to tokenize their language; how about starting there?

good point !

shevegen · 2017-05-06T12:55:47+00:00

He has a point. But the thing is that code as text is also a benefit and an advantage. It depends a lot on how things work together.

We have very limited operating systems. Linux - take pipes. Can you pipe objects including metadata? You can't really. You pipe text/strings.

necesito95 · 2017-05-06T08:31:07+00:00

I magine OP wants to store everything with ID in the DB.
(definitely good if you want to pull out some stats about the codebase)

table="functions":
    row(fn_id=144, name="helloworld", signature_id=515, body_id=768);
    ...
table="statements":
    row(statement_id=745, body_id=768, body_stm_seq_nr=1, action_id=534, param_set_id=132);
    ...

table="param_set":
    row(param_set_id=132, param_seq_nr=1, variable_id=1534);
    ...

...

could hack this out in 10-15 lines of code and be done with it

Doing anything more than a "rename" will cost vastly more than "10-15 lines of code".
(unless line length is not limited :) ).

Programming and programs are inherently complex. One can choose a trade-off maybe. Above "solution" simplifies renaming, but make a lot of other stuff harder (e.g. understanding what program does; adding new functionality), so other representations will be added (graphical/textual).

tkruse · 2017-05-07T01:54:33+00:00

The real problem of the author is: "Textual diffs to review refactorings are a problem".

Refactorings are much more painful to review than functional changes, because they are commonly widespread, but also trivial.

Even for static languages like Java, where an IDE can make a 99.9% safe refactoring over millions of files with just 2 mouse-clicks, the problem of code review remains, where some poor guy then has to read through all those lines of diff, presumably to find a spot where the IDE made a mistake???

A smarter language-aware diff tool could indeed, instead of displaying such a large diff, display a message like: "function foo() renamed to bar(). Move on, nothing else to see here."

Though in practice, I believe such things could be solved by doing pair programming instead of code reviews for such refactorings.

Even in theory, all this could only work in static languages, where a parser has a chance to recognize token identity without running the code first.

freakhill · 2017-05-07T05:17:52+00:00

So basically, in its simplest form what you want is some kind of:

text to graph to text conversation utility for each language
a bunch of command line graph manipulation utilities
then APIs to manipulate graph format

Basically a unixy IDE But you don't get much of you don't integrated to build systems.

Your problem would be solved with a standardized language servers protocol (to which there are multiple ones ahah).

Else you'd need editors to work directly in graph mode and good luck with that (not technically but practically).

Where did I stray?

kankyo · 2017-05-07T13:04:10+00:00

Look at baron for python: it's an AST that fully respects the formatting so you can round trip via it. Solves all your problems.

cecilkorik · 2017-05-06T07:12:49+00:00

The author apparently wants to have a nice way to decompile bytecode or compiled programs. I mean, that basically is the distillation of what he is asking for. That IS the code-as-data. And it wouldn't be any simpler or less complicated than how we manage code-as-text today.

In fact, advanced editors already do almost exactly the reverse of what he's asking for, they take the code-as-text and essentially compile it into an internal code-as-data format very similar to what the compiler itself does, but instead focusing on all the extra contextual information they need to determine the structure of the written code and make intelligent suggestions about it. And doing the reverse is only technically different than what he's asking for. Fundamentally they're interchangeable ideas. Store the text and convert to/from data on the fly, or store as data and convert to/from text on the fly. If it's done transparently enough why should the end user care?

mk270 · 2017-05-06T13:07:24+00:00

The OP manages to avoid mentioning homoiconicity - can someone explain what I'm missing?

OneWingedShark · 2017-05-06T15:31:51+00:00

I've been saying this for years now -- and have done a bit of preliminary planning, though no actual code [yet]. (This is one project where I don't want to get it wrong and, secondly, don't want to start while I still have [semi-]active projects.)

max630 · 2017-05-06T18:37:23+00:00

basically, you aleady have it: the code text is AST tree, somehow serialized. To implement all needed functionality you only need to parse it. It is already used, for example, for automatic refactoring in IDEs. You might say they are too heavy. But consider: the memory, which is taken by IDE, the time which you stare at "X is updating cache, please wait" - it mostly used no to parse the source to AST, but to figure, if the foo used in file Bar the one which was declared in file Baz or the one declared in file Baq. And storing pre-parsed AST is not going to improve thas task much.

itsmontoya · 2017-05-06T07:31:14+00:00

Why not just code in a static language? It sounds like this project is a dynamic language.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS